Monday, July 19, 2010

PATSTAT: an assessment on inventors' addresses quality

When trying to consolidate data on inventors in PATSTAT, a part from name, we need some toponomastic data.
An innovator named "J. Smith" would be difficult to connect either with "John Smith" or with "James Smith", even if we have some further data like address, city...

So I investigated with some queries on patstat ediction 10/2009 in what percentuage the fields address, city, zip code and country are filled, by patent office.

I used tables TLS201_APPLN for linking application authority, and TLS206_ASCII (that is the ascii/parsed version of TLS206, included in patstat) for person ids and data.

For selecting distinct person id for inventors (A_I_FLAG = 'I') I used this SQL

Select
  T1.APPLN_AUTH, Count(Distinct t6.PERSON_ID)
From
  tls201_appln T1 Inner Join tls206_ascii t6 

  On t6.APPLN_ID = T1.APPLN_ID
Where   t6.A_I_FLAG = 'I';

Here below I'm pasting the  resulting table for the TOP 20 by inventor count, the full table can be downloaded @ this link.


APPLN_AUTH inventors no state no zip no country no address no city
US 5960856 86% 98% 21% 97% 25%
EP 3705123 100% 100% 0% 1% 1%
DE 2750079 100% 100% 33% 100% 100%
JP 1798271 100% 100% 98% 99% 100%
CN 1537587 100% 100% 2% 100% 100%
CA 1120490 100% 100% 45% 100% 100%
AU 1087573 100% 100% 98% 100% 100%
SU 968915 100% 100% 41% 100% 100%
AT 653048 100% 100% 29% 100% 100%
KR 637296 100% 100% 14% 100% 100%
FR 565254 100% 100% 98% 99% 100%
GB 531087 100% 100% 70% 65% 100%
RU 394691 100% 100% 29% 100% 100%
CH 338739 100% 100% 11% 100% 100%
BR 292047 100% 100% 89% 100% 100%
SE 256248 100% 100% 85% 98% 100%
FI 212722 100% 100% 11% 43% 100%
IT 192460 100% 100% 74% 100% 100%
ES 133471 100% 100% 17% 100% 100%
DD 129845 100% 100% 7% 97% 100%

It might be argued that such a count could have more sense if done by publication authority instead of application authority, since data are taken from search report and it can be noticed that FI data applied @ EPO and published @ WIPO/PCT differ in quality from those applied and published @ EPO.
Maybe it could be topic of a further post...

No comments:

Post a Comment