Friday, January 29, 2010

what is PATSTAT

Ok
now I realized I started this blog assuming everybody knows what PATSTAT is.
This is unfair, for many years I was unaware of PATSTAT and lived very happy...

Anyway: PATSTAT stands for EPO Worldwide Patent Statistical Database: created by EPO for use by government/intergovernmental organizations and academic institutions.

Contains a snapshot of the EPO master documentation database (DOCDB) which contains data of about 90 national and international patent offices with different degree of coverage.

Data include bibliographic data, citations and family links. This database is designed to be used for statistical research and requires the data to be loaded in the customer's own database (basically data are the same of DocDB but some fields have been added to make easier to retrieve statistical data like extra address information extracted from US and EP registers and standardized names).

Detailed coverage information: 
http://documents.epo.org/projects/babylon/eponet.nsf/0/2464E1CD907399E0C12572D50031B5DD/$File/global_patent_data_coverage_0709.pdf

http://documents.epo.org/projects/babylon/rawdata.nsf/0/74B501F17B1004BAC125765D004AA5B9/$File/PFS_0944.xls

Patent information received at EPO from national patent offices, are made available.  For many countries  data are received on a weekly basis, for other countries it is delayed.  The above PDF document sheds some light on this delays.

The database is constructed with a relational structure, containing 20 different tables. The compressed raw data is around 10 Gb (distributed in 3 DVDs) and once loaded into a database it can rise up to 100Gb (indexes included).

Tuesday, January 19, 2010

Tweaking MySql for Patstat

When dealing with Patstat under Mysql you should expect some slowdown to queries and table accessions due to the huge number of records of some tables.

So someway we should be more careful in the setup of Mysql DB engine and in table design ...

Let's start from one MYsql environment setup that is Buffers' size.

From Mysql administrator choose STARTUP VARIABLES, then in GENERAL PARAMETERS you will find the MEMORY USAGE slot.
(Or if you preferr edit the file MY.INI under programs\MYSQL\MySql server 5.1\)

Buffer size helps to execute operations in memory instead of using disk; performance may differ of 100-1.000 times depending from OS, table structure and so on.

On the other hands increasing buffer size will decrease preformance for other concurrent applications (If you, however, make this too big your system may start to page and become extremely slow....)


KEY buffer of 512 Mb and Sort Buffer of 1 Mb may be a good compromise (but I use a double amount for both when building some heavy tables like citations).

Also monitorying memory health status (MYSQL ADMINISTRATOR --> HEALTH --> MEMORY HEALTH)  may be a good issue for more precise setting of buffers.



Some more issues I'll discuss more in details in some next posts are about type of DB engine (InnoDB vs MyIsam) and type of indexes (HASH vs BTREE).

One last thing: when making some joins where one of the tables is very small, it may be helpful to load it into memory:

CREATE TABLE new_table ENGINE=MEMORY AS SELECT * FROM small_table;
Then recreate the index (using HASH) and make the join...

Monday, January 11, 2010

breaking my disk

ok
this week special is about bad luck!
what if my file system would corrupt and I forgot (or even worse did not schedule) a back up of my beloved data?
A good tool in such a case may be Testdisk, freeware able to recover lost partitions or file systems.
After downloading and running the program, by choosing Analyze you may select the partition no more recognized by the file system; then with the commands Proceed, Intel, Advanced, Boot and BackUp BS you will be able to recover the backup boot sector.
If also the Backup BS is corrupted you may try the command Rebuild Bs.
If also this case fails, well this other site offers a good choice of talismans against bad luck!

Other uses of TestDisk are described at this link: http://www.cgsecurity.org/wiki/Data_Recovery_Examples

Friday, January 8, 2010

Losing my password

Raise your hand if it never happened to you to forget a password.
If this password is the one of your PC Admin or of your user this is going to become a big problem...
There is also a chance that, if your machine has no UPS, a power cut may lock you user.

If you do not want to 'refresh' your PC/server by reinstalling the OS a solution may be Offline NT Password & Registry Editor, a tool that may be downloaded from the site http://pogostick.net/~pnh/ntpasswd/.
There you can download the image of a bootable cd that, inserted in the machine where you lost the password / have a user locked, allows you to reset the password or unlock the user.
Obviously it works only on local accounts; no network accounts or active directories.

A piece to have in house, hopefully without using it.