Thursday, October 28, 2010

USPTO persons data quality

In a previous post I analized geographic data quality of inventors in PATSTAT, for all the data contained.
The results for USPTO where not so good...


APPLN_AUTH inventors no state no zip no country no address no city
US 5960856 86% 98% 21% 97% 25%

In order to understand how is the trend across years (if data quality is improving or noise is spread allover the data) I selected applicant/inventors who filed after 31/12/1999 applications type “A” (patents of invention);
the results was that 1% have no country and about 2% have no city (so it could be impossible to assign a region/NUTS2 code).



distinct appl/inv 2774408
ctry code blank 29858 1,08%
city blank 56119 2,02%

If we investigate by inventor/applicant county whether  there may be some nation having a BIAS (only for those countries having more than 100 inventors/applicants) we see that those who have values above average are countries outside Europe, so not included in some regional reclassifications like NUTS2.


CTRY CODE COUNT EMPTY CITY %
'' 29858 29071 97,40%
'AE' 166 2 1,20%
'AR' 1265 21 1,70%
'AT' 10330 117 1,10%
'AU' 22688 499 2,20%
'BB' 181 21 11,60%
'BE' 12980 230 1,80%
'BG' 428 5 1,20%
'BM' 249 39 15,70%
'BR' 4137 54 1,30%
'BS' 137 3 2,20%
'BY' 163 2 1,20%
'CA' 70324 800 1,10%
'CH' 23982 489 2,00%
'CL' 558 14 2,50%
'CN' 29490 180 0,60%
'CO' 251 5 2,00%
'CR' 148 20 13,50%
'CU' 547 7 1,30%
'CY' 156 5 3,20%
'CZ' 1297 34 2,60%
'DE' 178725 2477 1,40%
'DK' 10973 204 1,90%
'EE' 225 0 0,00%
'EG' 228 0 0,00%
'ES' 10369 151 1,50%
'FI' 13552 123 0,90%
'FR' 67931 881 1,30%
'GB' 76473 1480 1,90%
'GR' 801 14 1,70%
'HK' 5289 128 2,40%
'HR' 378 6 1,60%
'HU' 2316 14 0,60%
'ID' 219 8 3,70%
'IE' 5163 60 1,20%
'IL' 25108 273 1,10%
'IN' 18600 189 1,00%
'IR' 236 5 2,10%
'IS' 414 9 2,20%
'IT' 31398 516 1,60%
'JP' 479365 3346 0,70%
'KR' 115483 978 0,80%
'KW' 162 0 0,00%
'KY' 255 39 15,30%
'LI' 376 21 5,60%
'LT' 149 2 1,30%
'LU' 815 55 6,70%
'LV' 115 0 0,00%
'MC' 113 6 5,30%
'MX' 2341 25 1,10%
'MY' 2666 24 0,90%
'NL' 28215 972 3,40%
'NO' 5912 106 1,80%
'NZ' 3875 141 3,60%
'PH' 742 3 0,40%
'PK' 135 0 0,00%
'PL' 1139 5 0,40%
'PT' 735 8 1,10%
'RO' 328 0 0,00%
'RU' 6956 40 0,60%
'SA' 516 3 0,60%
'SE' 23806 343 1,40%
'SG' 8106 98 1,20%
'SI' 567 4 0,70%
'SK' 277 0 0,00%
'TH' 797 18 2,30%
'TR' 618 3 0,50%
'TW' 113480 939 0,80%
'UA' 844 1 0,10%
'US' 1311956 10561 0,80%
'VE' 456 3 0,70%
'VG' 430 48 11,20%
'YU' 103 0 0,00%
'ZA' 2572 68 2,60%

Friday, October 22, 2010

WIPO and PATSTAT (III): patents entering national and regional phase

As previously stated on part (I) of this topic, WO patents pass across two phases: international and national (or regional in EP case).

In this post I'll face the issue of determining how to distinguish PCT applications that entered the latter phase.


I'll referr to the case of EURO-PCT appication but it may be valid also for any other patent office.

We already know that by selecting Appln_auth = ‘EP’, Appln_kind = ‘W’, Publn_auth = ‘WO’, we gather PCT at international phase that were filed at the EPO. 

PCT applications that have entered the European regional phase will have (a) corresponding EP application(s) for which the international_appln_id refers to the original PCT application.
So, you have to select all EPO applications (Appln_auth EP) having a Internat_appln_id filled in (PATSTAT team say it’s exhaustive for EPO).

Mind that the Application kind will be ‘A’, not ‘W’, since these have been validated as EPO applications (entrance to the EPO regional phase).


Let’s consider this EURO-PCT  applications


APPLN_ID
 APPLN AUTH
 APPLN_NR
 APPLN KIND
 APPLN_FILING DATE
 IPR TYPE
15808675
 'EP'
 '        9901446'
 'W'
 '1999-03-05'
 'PI'


That gives origin to this publication

PAT_PUBLN ID
 PUBLN_ AUTH
 PUBLN_NR
 PUBLN_ KIND
 APPLN_ID
 PUBLN_DATE
17714675
 'WO'
 '        9944650'
 'A1'
15808675
 '1999-09-10'

Now we cross check the application_id on internat_appln_id of TLS201 and we get  this:


APPLN_ID
 APPLN AUTH
 APPLN_NR
 APPLN KIND
 APPLN_FILING DATE
 IPR TYPE
 INTERNAT APPLN_ID
1289254
 'AU'
 '        3328799'
 'D'
 '1999-03-05'
 'PI'
15808675
15821876
 'EP'
 '       00104638'
 'A'
 '2000-03-03'
 'PI'
15808675
52820607
 'US'
 '       51888700'
 'A'
 '2000-03-06'
 'PI'
15808675

This means:  the international application gave origin to 2 national phase and 1 regional phase patent.

See also this patent on espacenet along with it’s patent family.

Thanks to Catalina Martinez, Helene Dernis and Geert Boedt for clarifying me the issue.
NOTE: I know I should have posted part II before part III but I swear next week I'll post it...