Monday, July 19, 2010

PATSTAT: an assessment on inventors' addresses quality

When trying to consolidate data on inventors in PATSTAT, a part from name, we need some toponomastic data.
An innovator named "J. Smith" would be difficult to connect either with "John Smith" or with "James Smith", even if we have some further data like address, city...

So I investigated with some queries on patstat ediction 10/2009 in what percentuage the fields address, city, zip code and country are filled, by patent office.

I used tables TLS201_APPLN for linking application authority, and TLS206_ASCII (that is the ascii/parsed version of TLS206, included in patstat) for person ids and data.

For selecting distinct person id for inventors (A_I_FLAG = 'I') I used this SQL

Select
  T1.APPLN_AUTH, Count(Distinct t6.PERSON_ID)
From
  tls201_appln T1 Inner Join tls206_ascii t6 

  On t6.APPLN_ID = T1.APPLN_ID
Where   t6.A_I_FLAG = 'I';

Here below I'm pasting the  resulting table for the TOP 20 by inventor count, the full table can be downloaded @ this link.


APPLN_AUTH inventors no state no zip no country no address no city
US 5960856 86% 98% 21% 97% 25%
EP 3705123 100% 100% 0% 1% 1%
DE 2750079 100% 100% 33% 100% 100%
JP 1798271 100% 100% 98% 99% 100%
CN 1537587 100% 100% 2% 100% 100%
CA 1120490 100% 100% 45% 100% 100%
AU 1087573 100% 100% 98% 100% 100%
SU 968915 100% 100% 41% 100% 100%
AT 653048 100% 100% 29% 100% 100%
KR 637296 100% 100% 14% 100% 100%
FR 565254 100% 100% 98% 99% 100%
GB 531087 100% 100% 70% 65% 100%
RU 394691 100% 100% 29% 100% 100%
CH 338739 100% 100% 11% 100% 100%
BR 292047 100% 100% 89% 100% 100%
SE 256248 100% 100% 85% 98% 100%
FI 212722 100% 100% 11% 43% 100%
IT 192460 100% 100% 74% 100% 100%
ES 133471 100% 100% 17% 100% 100%
DD 129845 100% 100% 7% 97% 100%

It might be argued that such a count could have more sense if done by publication authority instead of application authority, since data are taken from search report and it can be noticed that FI data applied @ EPO and published @ WIPO/PCT differ in quality from those applied and published @ EPO.
Maybe it could be topic of a further post...

Wednesday, July 14, 2010

KITeS Patstat based reports

I've been vacating this blog for the majority of june because I implemented a set of reports based on Patstat EPO data.

At web address db.kites.unibocconi.it you will be able to find the following reports


Patent count by inventor country / priority year
Patent count by applicant country / priority year
Patent count by inventor region / priority year
Patent count by applicant region / priority year
Patent count by inventor nuts3 / priority year
Patent count by applicant nuts3 / priority year
Patent count by applicant name / priority year
Patent count by main IPC - first 4 digits
Patent count by main IPC class reclassified on OST30
Patent count by applicant, priority year, OST30 IPC reclassification
Patent count by applicant country, county, region, OST30 IPC reclassification, priority year
Citations count by applicant name, priority year
Citations count by applicant country, priority year
Citations count by inventor country, priority year
Copatenting by inventor country, priority year
Copatenting by applicant country, priority year
Applicants by IPC - first 4 digits
Inventors by IPC - first 4 digits

Please remember this is still a beta release; feel free to report me any problem or suggestion...

Eventually you will find some other reports password protected; such reports contain detail data and cannot be distributed since PATSTAT data are property of EPO.

Wednesday, July 7, 2010

Inventors data in patstat: Epo vs uspto

Patstat stores data about applicants and inventors inside a table with the prefix TLS206, indexed using a field named PERSON_ID that can be linked to the applications via table TLS207_PERS_APPLN linking each person_id to an application id (appln_id)Patstat DVD provides two versions of TLS206: the first TLS206_PERSON is a comma separated value file containing mainly the fields name, address and country code.
TLS206_ASCII instead contains the same information already parsed (and also something more, see below for fields list): name is splitted into last, first and middle name; address into street, city, state and zip code.

Data origins for person names and addresses:


1) EPO Register for EP patent applications; details are those that were the most recent in the EP Register at the time of extraction of the data.
2) OECD patents database for US data post 1976-01-01 up to and including November 15th 2005 for Published Grants.
3) PATSTAT weekly file extracts from USPTO website for Published Grants from November 22nd 2005 until today;  Published Applications  from September 29th 2005 to today inclusive.
4) Inventor & Applicant names for US PTO Published Applications from March 1st 2001 to September 22nd 2005 from DOCDB ,  data-format="docdba".
5) all other names  from DOCDB , data-format="docdba" (US data for names and addresses for patents published before 1976-01-01 is taken from the EPO's DOCDB Database)

TLS206_ASCII FIELD LIST                           

prof-last-name
prof-first-name
prof-middle-names
prof-street
prof-city
prof-state
prof-zip-code
 
TLS206_person field list                      

Person_id                         
Person_name                 
Person_address                             
Person_ctry_code