Thursday, October 28, 2010

USPTO persons data quality

In a previous post I analized geographic data quality of inventors in PATSTAT, for all the data contained.
The results for USPTO where not so good...

APPLN_AUTH inventors no state no zip no country no address no city
US 5960856 86% 98% 21% 97% 25%

In order to understand how is the trend across years (if data quality is improving or noise is spread allover the data) I selected applicant/inventors who filed after 31/12/1999 applications type “A” (patents of invention);
the results was that 1% have no country and about 2% have no city (so it could be impossible to assign a region/NUTS2 code).

distinct appl/inv 2774408
ctry code blank 29858 1,08%
city blank 56119 2,02%

If we investigate by inventor/applicant county whether  there may be some nation having a BIAS (only for those countries having more than 100 inventors/applicants) we see that those who have values above average are countries outside Europe, so not included in some regional reclassifications like NUTS2.

'' 29858 29071 97,40%
'AE' 166 2 1,20%
'AR' 1265 21 1,70%
'AT' 10330 117 1,10%
'AU' 22688 499 2,20%
'BB' 181 21 11,60%
'BE' 12980 230 1,80%
'BG' 428 5 1,20%
'BM' 249 39 15,70%
'BR' 4137 54 1,30%
'BS' 137 3 2,20%
'BY' 163 2 1,20%
'CA' 70324 800 1,10%
'CH' 23982 489 2,00%
'CL' 558 14 2,50%
'CN' 29490 180 0,60%
'CO' 251 5 2,00%
'CR' 148 20 13,50%
'CU' 547 7 1,30%
'CY' 156 5 3,20%
'CZ' 1297 34 2,60%
'DE' 178725 2477 1,40%
'DK' 10973 204 1,90%
'EE' 225 0 0,00%
'EG' 228 0 0,00%
'ES' 10369 151 1,50%
'FI' 13552 123 0,90%
'FR' 67931 881 1,30%
'GB' 76473 1480 1,90%
'GR' 801 14 1,70%
'HK' 5289 128 2,40%
'HR' 378 6 1,60%
'HU' 2316 14 0,60%
'ID' 219 8 3,70%
'IE' 5163 60 1,20%
'IL' 25108 273 1,10%
'IN' 18600 189 1,00%
'IR' 236 5 2,10%
'IS' 414 9 2,20%
'IT' 31398 516 1,60%
'JP' 479365 3346 0,70%
'KR' 115483 978 0,80%
'KW' 162 0 0,00%
'KY' 255 39 15,30%
'LI' 376 21 5,60%
'LT' 149 2 1,30%
'LU' 815 55 6,70%
'LV' 115 0 0,00%
'MC' 113 6 5,30%
'MX' 2341 25 1,10%
'MY' 2666 24 0,90%
'NL' 28215 972 3,40%
'NO' 5912 106 1,80%
'NZ' 3875 141 3,60%
'PH' 742 3 0,40%
'PK' 135 0 0,00%
'PL' 1139 5 0,40%
'PT' 735 8 1,10%
'RO' 328 0 0,00%
'RU' 6956 40 0,60%
'SA' 516 3 0,60%
'SE' 23806 343 1,40%
'SG' 8106 98 1,20%
'SI' 567 4 0,70%
'SK' 277 0 0,00%
'TH' 797 18 2,30%
'TR' 618 3 0,50%
'TW' 113480 939 0,80%
'UA' 844 1 0,10%
'US' 1311956 10561 0,80%
'VE' 456 3 0,70%
'VG' 430 48 11,20%
'YU' 103 0 0,00%
'ZA' 2572 68 2,60%

No comments:

Post a Comment