Tuesday, January 31, 2012

Amended version of TLS206 ascii released by EPO

As in previous post  it was announced that version of TLS206 ascii distributed with patstat was corrupted, EPO has put a fix to this issues and the new amended version can be downloaded from this web page

https://publication.epo.org/raw-data/product?productId=1 

What applicant name is in Espacenet?

Espacenet is the online database from EPO that provides free access to more than 70 million patent documents worldwide, containing information about inventions and technical developments from 1836 to today.

Among data contained, is very interesting to understand what applicant name is displayed from this online resource.

Let's take an example: US PATENT 6646858

The web page lists as applicants:

Applicant(s): FILTEC GMBH [DE] +

And by pressing the + you will also get the names:

DINGENOTTO MEINOLF, ; KUHLE JORG, ; FILTEC FILTERTECHNOLOGIE FUER DIE ELECTRONIKINDUSTRIE GMBH

Let's see what these names are.

The first one, shown before pressing the +, is the standard name (as from table TLS208 in patstat) taken at the moment of publication of the document.

By pressing the + key we get the names of applicant(s) how they are written in DOCDB (or in TLS206 in patstat).

Why pressing + we see more names than the one shown in the first instance? Because (as for most of US patents) between first publication and grant there has been a change in legal status introducing the real owner and removing inventors from applicants list.


PRS Date : 2003/09/22
  PRS Code : AS
  Code Expl.:   ASSIGNMENT
     NEW OWNER : FILTEC FILTERTECHNOLOGIE FUR DIE ELEKTRONIKINDUSTR
     EFFECTIVE DATE : 20021120

We have a confirmation of thsi fact by checking the first publiation of the patent that is: US2003090856 and it shows as applicants:

Applicant(s): DINGENOTTO MEINOLF, ; KUHLE JORG, ; FILTEC FILTERTECHNOLOGIE FUER DIE ELECTRONIKINDUSTRIE GMBH

So to sum up what we discovered:

Espacenet shows the standard name (standardized by EPO) of applicants at the moment of publication of document.
Pressing + we get the orginal names as written in the document.
Names displayed are not sensitive to legal status changes like change of ownership or correction of names.

Tuesday, January 10, 2012

EEE-PPAT: Released version oct 2011



EEE-PPAT table is an extension of the PERSON TABLE produced by ECOOM (Catholic University of Leuven) and Eurostat. The extension concerns sector allocation and name harmonization of applicants.


It can be required at no cost by contacting technoinfo@ecoom.be; some documentation @ this link 


It contains the following columns with original PATSTAT person_id:
        - PERSON_ID
        - HRM_LEVEL1: harmonized name level1
        - SECTOR : Sector of assignee name 


The file is coded in UTF-8, tab-delimited.
The definition of table is as follow:
        - PERSON_ID                       number(9)
        - HRM_LEVEL1                   char  (400)
        - SECTOR                             char  (50)


The file, tab delimited, contains 12488647 records.


The 'compression rate' is good: out of 10.324.068 distinct applicant names contained in TLS206, EE-PPAT reduces them into 8.227.328, and sector allocation is distributed as follows:


COMPANY
2173055
COMPANY GOV HOSPITAL
1
COMPANY GOV NON-PROFIT
35307
COMPANY GOV UNIVERSITY
176
COMPANY HOSPITAL
1601
COMPANY UNIVERSITY
1544
GOV NON-PR0FIT
4
GOV NON-PROFIT
105750
GOV NON-PROFIT UNIVERSITY
669
HOSPITAL
5028
INDIVIDUAL
4774071
UNIVERSITY
45506
UNKNOWN
1252219

[Lines below are cancelled since data have been corrected with updated the data on FTP on 2012 Jan. 10 CET 13:51.]


(some look like small mistakes like GOV NON-PR0FIT)



While loading the data you can have an error with person_id  4264883, 9883343, 8758108 that have sector null since the text COMPANY was 'taken' in the name harmonized name that finished with a slash, making problem to the recognition of tab field delimiter in the three records.


@ this link you can download a script for loading the table EEE-PPAT into mysql; it also has a patch for the 3 wrong records. 
[do not use the patch for 2012 Jan. 10 CET 13:51. data]


Thursday, January 5, 2012

Some useful links about NUTS

As Eurostat website says,the NUTS classification is a hierarchical system for dividing up the economic territory of the EU for the purpose of :


The current NUTS classification valid from 1 January 2008 until 31 December 2011 lists 97 regions at NUTS 1, 271 regions at NUTS 2 and 1303 regions at NUTS 3 level.
 
The law regulation extablishing NUTS can be downlaodin in PDF from this link

Here is NUTS3 classification from 2003 taken from "NUTS_2003_03M.mdb", table "NUTS_AT_2003" (part of zip achieve "NUTS_RG_03M_2003_0.zip" available at Eurostat

Eventually here can be downloaded a corrrespondance table among zip codes and nuts for most EU countries.








Monday, January 2, 2012

Error in TLS206 ASCII version sept. 2011


By comparing the 2 versions of TLS206 I realized there is an error that makes TLS206 ASCII impossible to be used, in sept. 2011 version.

Starting from person_id 1340952 there is a misalignment among the 2 files since the associated names are:

 'ARANAGA TOMOYUKI' for TLS206

 while
'ARANAGA YASUNORI' is in TLS206 ASCII

By checking on TLS207 if ind that the corrisponding appln_id  36165645 has publication number JP        11158733
and in espacenet I see the TLS206 inventor is the right one.

After a while (260K records) error disappears with this record:

1544759, 'ASAHINA', 'ASAHINA AKI', '', ''

but the bad thing is that it happens alsewhere.
F.I.

20000000, 'LESLIE * SMITH' <> 'LESLIE * STEELE', '', ''
25000000, 'NUECHTER,PATRICK' <> 'Nuechter', 'Peter', ''
30000001, 'Schaevitz; Lester P.' <>  'SCHAEVITZ, SAM', '', ''

So the bad news are that, since person_id is build by appending in alfabetic order the names, there are ranges of person_id in TLS206 ASCII where the names accociated to person_id is wrong.

EPO is aware of the problem and is  working to a fix.