Friday, April 23, 2010

Investigating patstat through application numbers

When investigating patstat by application number / application office, we soon come to a problem 'cause this pair is not a unique identifier.

FI if we would look into espacenet for application number 143781 we would get the following 7 results:

1    Molding for a front glass
    Publication info: US5553428 (A) - 1996-09-10
2    Display tab for overwrapped package
    Publication info: US4779733 (A) - 1988-10-25
3    Welding socket for thermoplastic materials, method and apparatus for its manufacture.
    Publication info: DK143781 (A) - 1981-10-01
4    WIRBELBETTVERFAHREN ZUR DURCHFUEHRUNG VON ENDOTHERMEN CHEMISCHEN UND/ODER PHYSIKALISCHEN PROZESSEN
    Publication info: BE814393 (A1) - 1974-08-16
5    Information handling apparatus
    Publication info: US3248560 (A) - 1966-04-26
6    Magnetic unit
    Publication info: US2163161 (A) - 1939-06-20
7    System of gas control
    Publication info: US1348379 (A) - 1920-08-03















Still nothing bad, a part from the fact that when investigating the espacenet application numbers, in many cases the application year (and occasionaly a 0) is added; so if you look for patstat BE 143781 (that is BE814393 in above example) you'll find in espacenet web page

Application number: BE19740143781 19740430  

But having a deeper look into the results we will discover that we have 5 hits for USPTO, that means (using the same mechanism)

US19930143781 19931101
US19880143781 19880114 
US19610143781 19611009
US19370143781 19370520 
US19170143781 19170122

so at what extent duplication of application is spread?
by running a simple query like the one below we may get some results...

Select
  patstat.tls201_appln.APPLN_AUTH, 

  patstat.tls201_appln.APPLN_NR,
  Count(patstat.tls201_appln.APPLN_ID) As appcount
From
  patstat.tls201_appln
Group By
  patstat.tls201_appln.APPLN_AUTH, patstat.tls201_appln.APPLN_NR
Having
  Count(patstat.tls201_appln.APPLN_ID) > 1


What we get is that out of a table counting 63.595.305 applications, 5.805.769 are duplicated.
How... lets aggragate a bit the results....



count of duplications count of appl #
2 5721284
3 83472
4 979
5 29
6 1
7 3
9 1
Total count: 5805769


(the lucky application with 9 occurrence is TW85219019)

Obviously by multipling the count of duplications for the number of occurrences we get
 11.697.081 applications involved in the issue...

Splitting by country and comparing it to the total number of applications by country we get that the distribution varies a lot from country to country: from the 0,03 of CN or 0,06 of EP up to the 50.73 % of SE


appln auth # of duplicated ap nr total # of applications %
AM 2 147 1,36%
AP 12 4742 0,25%
AR 408 79581 0,51%
AT 6334 1028408 0,62%
AU 683801 1553990 44,00%
BA 44 344 12,79%
BE 4639 642989 0,72%
BG 11611 53602 21,66%
BR 58551 496326 11,80%
BY 20 761 2,63%
CA 2390 2580140 0,09%
CH 8672 1055204 0,82%
CL 10 4453 0,22%
CN 826 2803867 0,03%
CS 1765 166312 1,06%
CU 26 2843 0,91%
CY 2 2621 0,08%
CZ 5617 66921 8,39%
DD 8937 266209 3,36%
DE 1506427 6792342 22,18%
DK 12118 429570 2,82%
EA 32 13114 0,24%
EC 2 4949 0,04%
EE 1298 6251 20,76%
EG 160 11450 1,40%
EM 4 4254 0,09%
EP 1441 2388584 0,06%
ES 195371 902689 21,64%
FI 20051 271371 7,39%
FR 104899 2917043 3,60%
GB 133141 3319126 4,01%
GR 194 97444 0,20%
GT 4 1301 0,31%
HK 34 69450 0,05%
HR 154 11297 1,36%
HU 17796 137061 12,98%
IB 8 65369 0,01%
ID 4561 14768 30,88%
IE 773 91248 0,85%
IL 1155 160970 0,72%
IN 358 66238 0,54%
IS 57 7797 0,73%
IT 161988 708724 22,86%
JP 7567660 16282860 46,48%
KE 28 1392 2,01%
KR 279459 1659443 16,84%
KZ 4 477 0,84%
LT 22 3651 0,60%
LU 34 68491 0,05%
LV 26 4835 0,54%
MA 4 10092 0,04%
MC 682 2789 24,45%
MD 554 4586 12,08%
MK 2 87 2,30%
MN 12 246 4,88%
MX 1262 162007 0,78%
MY 26 11106 0,23%
NL 17313 607873 2,85%
NO 1610 226939 0,71%
NZ 30 109638 0,03%
OA 100 12934 0,77%
PE 8 431 1,86%
PH 339 23261 1,46%
PL 2263 233730 0,97%
PT 112 81034 0,14%
RO 472 61095 0,77%
RU 2117 394084 0,54%
SE 421795 831377 50,73%
SG 595 51063 1,17%
SI 235 17619 1,33%
SK 614 23261 2,64%
SU 3977 1249050 0,32%
TR 837 42800 1,96%
TT 2 52 3,85%
TW 2956 369739 0,80%
UA 748 50213 1,49%
US 434521 11376401 3,82%
UY 66 6573 1,00%
VN 8 240 3,33%
WO 39 5065 0,77%
XH 2 1242 0,16%
YU 91 33687 0,27%
ZA 749 256542 0,29%
ZM 4 2742 0,15%
ZW 10 2909 0,34%


 From table have been excluded 9779 applications from 86 lesser patent offices that have no duplications.


Thanks also to Elena Verdolini for rising and helping investigating this issue.

No comments:

Post a Comment