Tuesday, March 29, 2011

Disambiguation of inventors' names and addresses from patent data workshop

On May 30 will take place workshop on  “Disambiguation of inventors' names and addresses from patent data”, organized by "Academic Patenting in Europe" (APE‐INV), a project funded by the European Science Foundation.

The meeting will be hosted by the Department of Informatics, Systems and Communication (DISCO) of University Milano Bicocca.
It is meant to provide the possibility for computer scientists to meet lead users of inventors’ data from the fields of economics and management of technology, for an exchange of ideas over methodologies and tractability of disambiguation problems.

For any query, information or to confirm your presence, please contact dr. Monica Coffano (monica.coffano@unibocconi.it).

Program will be soon published @ this web page

Wednesday, March 9, 2011

How to get missing country code from homonyms in patstat applicants

In october 2010 patstat ediction we find in table TLS206 37.428.107 distinct person id (applicants or inventors); we would expect (or hope) they to have, a part from name, some geographic data, but as stated in some previous posts, a lot of them miss all informations a part from name making data quality improvement a little harder.
Exactly 13.032.871 persons (a 28% of the total) have no country code (and obviously in most cases no city, address etc.).

I start here a some posts about how to try to find clusters where it's possible to improve data quality of countries.

The first case I'll investigate applicants with no country codes.

let's take FI patent AP273 invented by BRUCE HOWARD DIXON [US] and applied by HOWARD DIXON BRUCE (with no country);
If we look at the PDF of the patent we see Bruce as applicant is listed as living in Florida, US, and as inventor is listed as "SEE ABOVE"; so we may presume a lot of applicants with no country when they have omonims in the same application, can inheritate country code from homonim inventor.

We must anyway remove possible doubles, like this couple of us patents, A & B where same applicant (and person id) Abate Riccardo invents a patent first as US, latter as IT.
So when creating correction table we must remove multiple occurences of same persons.

Applying this procedure in a simple way (I mean no standardization of names, just sheer string match) 472.668 (3,7% of missing)  persons with no country code can be assigned a country code.

Monday, March 7, 2011

Hidden non patent references

 This is no more than a note, born from a email exchange with Éric Archambault from Science Metrix.
 He was raising a question: do patstat count ALL non patent references listed in the patent?
 
Let's take as example patent USPTO  6365360 :
Table TLS214 lists 38 NPL references, those listed in the first pages of the patent.
It does not count the one 'hidden' in the body of the patent. We may see these 2:
 
 page 15:

These synthetic mRNAs are injected into Xenopus oocytes (stage 5-6) by standard procedures [Gurdon, J. B. and Wickens, M. D. Methods in Enzymol. 101: 370-386, (1983)]. Oocytes are harvested and analyzed for IP expression as described below.

Page 16:
and 9016 or other CMV promoter vectors, co-transfected with pDLAT-3 containing the thymidine kinase gene [Colbere and Garopin, F., Proc. Natl. Acad. Sci. 76: 3755 (1979)] in APRT and TK deficient L cells, selected in APRT (0.05 mM azaserine, 0.1 mM adenine, 4 ug/ml adenosine) and amplified with HAT (100 uM hypoxanthine, 0.4 uM aminopterin, 16 uM thymidine).

So, the answer is: NO,  in patstat table TLS214 we have only those listed in the 'reference cited' chapter...
this means that, if the examiner believes that such applicant's references are relevant, they will be listed, otherwise not.

Thursday, March 3, 2011

ECLA coverage in Patstat

As previously stated, a new table (TLS217) has been recently changed in order to allow users to have ECLA (European classification system) and ICO data. (be aware in espacenet only ECLA will be indicated cause EPO application management decided that using the term ICO would only create confusion)

 I calculated a time series for coverage rate of ECLA for some selected application authorities (the coverage is never 100% due to applications D2 and unpublished priorities)


ctry
year
app#
ecla app#
rate
'AU'
1990
36251
30301
83,6%
'AU'
1991
34788
27117
77,9%
'AU'
1992
33847
27308
80,7%
'AU'
1993
35066
28061
80,0%
'AU'
1994
47278
34999
74,0%
'AU'
1995
51287
38489
75,0%
'AU'
1996
58047
45081
77,7%
'AU'
1997
65698
52721
80,2%
'AU'
1998
74505
59884
80,4%
'AU'
1999
82800
66252
80,0%
'AU'
2000
100118
82115
82,0%
'AU'
2001
108967
91214
83,7%
'AU'
2002
61498
47992
78,0%
'AU'
2003
108648
97395
89,6%
'AU'
2004
30908
22849
73,9%
'AU'
2005
32393
24428
75,4%
'AU'
2006
32508
24819
76,3%
'AU'
2007
31329
24315
77,6%
'AU'
2008
28592
23256
81,3%
'AU'
2009
8106
6756
83,3%
'AU'
2010
1707
1364
79,9%
'BR'
1990
10851
5385
49,6%
'BR'
1991
10122
4740
46,8%
'BR'
1992
9102
4683
51,5%
'BR'
1993
10272
5240
51,0%
'BR'
1994
10992
6160
56,0%
'BR'
1995
13554
7637
56,3%
'BR'
1996
15569
9701
62,3%
'BR'
1997
18562
11974
64,5%
'BR'
1998
19018
13194
69,4%
'BR'
1999
20987
14292
68,1%
'BR'
2000
20698
14020
67,7%
'BR'
2001
20588
13730
66,7%
'BR'
2002
19239
12435
64,6%
'BR'
2003
20878
13580
65,0%
'BR'
2004
22811
15196
66,6%
'BR'
2005
23922
16625
69,5%
'BR'
2006
13414
6627
49,4%
'BR'
2007
9197
2639
28,7%
'BR'
2008
7340
2201
30,0%
'BR'
2009
1712
1141
66,6%
'BR'
2010
11
2
18,2%
'CA'
1990
32986
28984
87,9%
'CA'
1991
31437
28248
89,9%
'CA'
1992
30629
28291
92,4%
'CA'
1993
28982
27070
93,4%
'CA'
1994
30365
27363
90,1%
'CA'
1995
30908
27382
88,6%
'CA'
1996
33304
29800
89,5%
'CA'
1997
36219
33416
92,3%
'CA'
1998
40014
37617
94,0%
'CA'
1999
41987
39584
94,3%
'CA'
2000
44197
41784
94,5%
'CA'
2001
43170
41082
95,2%
'CA'
2002
41726
39841
95,5%
'CA'
2003
42094
40245
95,6%
'CA'
2004
41836
39672
94,8%
'CA'
2005
43828
41964
95,7%
'CA'
2006
43777
42042
96,0%
'CA'
2007
40618
38841
95,6%
'CA'
2008
32917
31095
94,5%
'CA'
2009
7582
5679
74,9%
'CA'
2010
286
79
27,6%
'CN'
1990
32090
3993
12,4%
'CN'
1991
37717
3709
9,8%
'CN'
1992
48833
4368
8,9%
'CN'
1993
54913
7473
13,6%
'CN'
1994
57316
12462
21,7%
'CN'
1995
60234
15948
26,5%
'CN'
1996
67849
18995
28,0%
'CN'
1997
73307
22322
30,5%
'CN'
1998
79968
26654
33,3%
'CN'
1999
91154
28843
31,6%
'CN'
2000
112362
34774
30,9%
'CN'
2001
133048
41360
31,1%
'CN'
2002
165720
50178
30,3%
'CN'
2003
205557
62185
30,3%
'CN'
2004
235189
72010
30,6%
'CN'
2005
287662
81866
28,5%
'CN'
2006
341493
88081
25,8%
'CN'
2007
382948
86849
22,7%
'CN'
2008
404476
57082
14,1%
'CN'
2009
217326
15741
7,2%
'CN'
2010
168
9
5,4%
'DE'
1990
120133
109406
91,1%
'DE'
1991
119845
110519
92,2%
'DE'
1992
123828
113161
91,4%
'DE'
1993
127957
115675
90,4%
'DE'
1994
132197
119448
90,4%
'DE'
1995
135744
122253
90,1%
'DE'
1996
146343
131170
89,6%
'DE'
1997
154198
137603
89,2%
'DE'
1998
160431
142636
88,9%
'DE'
1999
164069
146394
89,2%
'DE'
2000
167977
152133
90,6%
'DE'
2001
160504
148403
92,5%
'DE'
2002
145041
134575
92,8%
'DE'
2003
134623
126893
94,3%
'DE'
2004
111554
103906
93,1%
'DE'
2005
105002
96516
91,9%
'DE'
2006
95404
89233
93,5%
'DE'
2007
83663
77257
92,3%
'DE'
2008
73819
61864
83,8%
'DE'
2009
25269
18843
74,6%
'DE'
2010
2800
894
31,9%
'EP'
1990
66699
66125
99,1%
'EP'
1991
62947
61982
98,5%
'EP'
1992
64930
63558
97,9%
'EP'
1993
65083
63563
97,7%
'EP'
1994
67463
65647
97,3%
'EP'
1995
71797
69406
96,7%
'EP'
1996
78950
76017
96,3%
'EP'
1997
89734
86132
96,0%
'EP'
1998
101724
97197
95,5%
'EP'
1999
111765
106112
94,9%
'EP'
2000
125923
119752
95,1%
'EP'
2001
134235
127466
95,0%
'EP'
2002
132311
124389
94,0%
'EP'
2003
137230
127106
92,6%
'EP'
2004
145312
135039
92,9%
'EP'
2005
154398
143605
93,0%
'EP'
2006
160288
149421
93,2%
'EP'
2007
160275
152546
95,2%
'EP'
2008
139610
130010
93,1%
'EP'
2009
54378
49541
91,1%
'EP'
2010
1537
1132
73,6%
'FR'
1990
15926
15858
99,6%
'FR'
1991
15857
15768
99,4%
'FR'
1992
15393
15316
99,5%
'FR'
1993
15505
15408
99,4%
'FR'
1994
15835
15725
99,3%
'FR'
1995
15955
15840
99,3%
'FR'
1996
16781
16592
98,9%
'FR'
1997
17782
17527
98,6%
'FR'
1998
18132
17842
98,4%
'FR'
1999
18586
18190
97,9%
'FR'
2000
19508
19095
97,9%
'FR'
2001
19654
19283
98,1%
'FR'
2002
19643
19208
97,8%
'FR'
2003
19308
18902
97,9%
'FR'
2004
19587
19032
97,2%
'FR'
2005
19777
19125
96,7%
'FR'
2006
19666
19111
97,2%
'FR'
2007
19399
18952
97,7%
'FR'
2008
19001
17569
92,5%
'FR'
2009
5571
4769
85,6%
'FR'
2010
166
95
57,2%
'GB'
1990
30053
14328
47,7%
'GB'
1991
30028
14125
47,0%
'GB'
1992
30290
14229
47,0%
'GB'
1993
30275
14332
47,3%
'GB'
1994
29560
14127
47,8%
'GB'
1995
29909
14785
49,4%
'GB'
1996
30448
15322
50,3%
'GB'
1997
31218
16147
51,7%
'GB'
1998
32828
16272
49,6%
'GB'
1999
35225
17287
49,1%
'GB'
2000
36994
17977
48,6%
'GB'
2001
36881
18722
50,8%
'GB'
2002
36315
18265
50,3%
'GB'
2003
35451
17954
50,6%
'GB'
2004
33788
17439
51,6%
'GB'
2005
30984
16693
53,9%
'GB'
2006
30029
15875
52,9%
'GB'
2007
30069
16144
53,7%
'GB'
2008
27638
14671
53,1%
'GB'
2009
25665
8049
31,4%
'GB'
2010
8461
724
8,6%
'JP'
1990
504479
71179
14,1%
'JP'
1991
482500
70490
14,6%
'JP'
1992
465127
70228
15,1%
'JP'
1993
443212
70985
16,0%
'JP'
1994
358575
71132
19,8%
'JP'
1995
375762
77305
20,6%
'JP'
1996
385899
84277
21,8%
'JP'
1997
398658
88601
22,2%
'JP'
1998
410097
89572
21,8%
'JP'
1999
414393
94508
22,8%
'JP'
2000
443598
103576
23,3%
'JP'
2001
448569
105439
23,5%
'JP'
2002
434335
107597
24,8%
'JP'
2003
432789
113556
26,2%
'JP'
2004
443034
121599
27,4%
'JP'
2005
447845
130070
29,0%
'JP'
2006
428966
133705
31,2%
'JP'
2007
405234
123480
30,5%
'JP'
2008
356748
80185
22,5%
'JP'
2009
66685
34792
52,2%
'JP'
2010
3361
884
26,3%
'US'
1990
132065
107893
81,7%
'US'
1991
138309
110782
80,1%
'US'
1992
145817
115837
79,4%
'US'
1993
152252
121461
79,8%
'US'
1994
167353
138544
82,8%
'US'
1995
192446
161648
84,0%
'US'
1996
194519
165358
85,0%
'US'
1997
226188
192533
85,1%
'US'
1998
235994
195596
82,9%
'US'
1999
267241
211016
79,0%
'US'
2000
320254
235039
73,4%
'US'
2001
374760
301396
80,4%
'US'
2002
384385
305251
79,4%
'US'
2003
400703
318661
79,5%
'US'
2004
450492
355179
78,8%
'US'
2005
490460
393708
80,3%
'US'
2006
478678
380253
79,4%
'US'
2007
473942
367942
77,6%
'US'
2008
423425
295259
69,7%
'US'
2009
223980
157793
70,4%
'US'
2010
17724
12686
71,6%

 We may note that data coverages changes with the time among different patent authorities (see FI Brasil or Japan in below chart), and that it has a decay in the last two years that can indicate a delay in inserting such info into patstat.