Monday, March 26, 2012

Some statistics about IPC coverage in national offices patent data from patstat

Patstat data have been mainlly used for statistic on EP, WO, USdata.
Recently interest on such data is reised about othen national patent autorities contained in the db.


But how good is data coverage for such patent offices? We look here how good is IPC coverage also related to the chance of  reclassifying applications IPCs.

Here I report # and % of applications with no IPC and no reclassification (IPC35, nace, TP stands for EC thematic priorities); timeframe 2000-2010; selection for all appln auth above 1000 applications
Aside from TP, that cannot catch all IPCS, we notice that data from UK, BE, FI, IL, IT cannot be reliable cause 50% of IPCs is missing.


APPS
IPC
no ipc35
no nace
no ipc
no TP
appln auth

#
%
#
%
#
%
#
%
#
%
AP
2914
2809
96,40%
105
3,60%
105
3,60%
105
3,60%
308
10,57%
AR
45726
19213
42,02%
26518
57,99%
26513
57,98%
26513
57,98%
28991
63,40%
AT
13357
12368
92,60%
990
7,41%
989
7,40%
989
7,40%
4938
36,97%
AU
358767
336850
93,89%
21939
6,12%
21917
6,11%
21917
6,11%
73570
20,51%
BE
5237
2748
52,47%
2489
47,53%
2489
47,53%
2489
47,53%
3387
64,67%
BG
1887
1879
99,58%
9
0,48%
8
0,42%
8
0,42%
326
17,28%
BR
129982
127711
98,25%
2286
1,76%
2271
1,75%
2271
1,75%
30178
23,22%
CA
361499
361355
99,96%
153
0,04%
144
0,04%
144
0,04%
57076
15,79%
CH
8015
8014
99,99%
1
0,01%
1
0,01%
1
0,01%
2717
33,90%
CL
3280
3201
97,59%
79
2,41%
79
2,41%
79
2,41%
407
12,41%
CN
1801197
1796750
99,75%
4499
0,25%
4447
0,25%
4447
0,25%
326649
18,14%
CO
5566
5456
98,02%
113
2,03%
110
1,98%
110
1,98%
872
15,67%
CZ
12774
12773
99,99%
1
0,01%
1
0,01%
1
0,01%
2828
22,14%
DE
639075
638924
99,98%
153
0,02%
151
0,02%
151
0,02%
169150
26,47%
DK
3399
3390
99,74%
9
0,26%
9
0,26%
9
0,26%
967
28,45%
DO
1424
552
38,76%
872
61,24%
872
61,24%
872
61,24%
907
63,69%
EA
16632
16623
99,95%
9
0,05%
9
0,05%
9
0,05%
1735
10,43%
EC
5178
5022
96,99%
163
3,15%
156
3,01%
156
3,01%
375
7,24%
EE
1397
1388
99,36%
9
0,64%
9
0,64%
9
0,64%
134
9,59%
EG
1990
1901
95,53%
89
4,47%
89
4,47%
89
4,47%
445
22,36%
EL
4595
4507
98,08%
88
1,92%
88
1,92%
88
1,92%
1096
23,85%
ES
22083
22044
99,82%
42
0,19%
39
0,18%
39
0,18%
5231
23,69%
FI
24154
12079
50,01%
12076
50,00%
12075
49,99%
12075
49,99%
15287
63,29%
FR
151527
151122
99,73%
408
0,27%
405
0,27%
405
0,27%
39387
25,99%
HK
28393
20252
71,33%
8147
28,69%
8141
28,67%
8141
28,67%
12347
43,49%
HR
4675
4659
99,66%
16
0,34%
16
0,34%
16
0,34%
806
17,24%
HU
15514
12073
77,82%
3441
22,18%
3441
22,18%
3441
22,18%
5193
33,47%
IE
4283
4242
99,04%
41
0,96%
41
0,96%
41
0,96%
784
18,30%
IL
57131
27315
47,81%
29822
52,20%
29816
52,19%
29816
52,19%
32023
56,05%
IS
2106
1987
94,35%
119
5,65%
119
5,65%
119
5,65%
185
8,78%
IT
96898
19883
20,52%
77016
79,48%
77015
79,48%
77015
79,48%
84551
87,26%
JP
3311535
3311283
99,99%
258
0,01%
252
0,01%
252
0,01%
829783
25,06%
KR
1272062
1253347
98,53%
18717
1,47%
18715
1,47%
18715
1,47%
317267
24,94%
LV
1422
616
43,32%
806
56,68%
806
56,68%
806
56,68%
920
64,70%
MA
6235
2759
44,25%
3476
55,75%
3476
55,75%
3476
55,75%
3836
61,52%
MD
2691
2627
97,62%
64
2,38%
64
2,38%
64
2,38%
375
13,94%
MX
104502
103820
99,35%
699
0,67%
682
0,65%
682
0,65%
19280
18,45%
MY
1131
1071
94,69%
65
5,75%
60
5,31%
60
5,31%
268
23,70%
NL
22183
22177
99,97%
6
0,03%
6
0,03%
6
0,03%
6357
28,66%
NO
43466
40429
93,01%
3037
6,99%
3037
6,99%
3037
6,99%
7994
18,39%
NZ
29009
25558
88,10%
3451
11,90%
3451
11,90%
3451
11,90%
6829
23,54%
OA
1251
1148
91,77%
103
8,23%
103
8,23%
103
8,23%
211
16,87%
PA
1722
1713
99,48%
9
0,52%
9
0,52%
9
0,52%
116
6,74%
PE
7733
7688
99,42%
45
0,58%
45
0,58%
45
0,58%
554
7,16%
PL
38432
38321
99,71%
111
0,29%
111
0,29%
111
0,29%
7355
19,14%
PT
1952
1950
99,90%
2
0,10%
2
0,10%
2
0,10%
470
24,08%
RO
3620
3620
100,00%
2
0,06%
0
0,00%
0
0,00%
804
22,21%
RS
3500
3434
98,11%
66
1,89%
66
1,89%
66
1,89%
553
15,80%
RU
279348
278938
99,85%
430
0,15%
410
0,15%
410
0,15%
47900
17,15%
SE
15817
15793
99,85%
24
0,15%
24
0,15%
24
0,15%
4551
28,77%
SG
19544
10416
53,30%
9128
46,70%
9128
46,70%
9128
46,70%
10260
52,50%
SI
2666
2662
99,85%
4
0,15%
4
0,15%
4
0,15%
919
34,47%
SK
4427
4425
99,95%
2
0,05%
2
0,05%
2
0,05%
632
14,28%
TR
4802
4398
91,59%
410
8,54%
404
8,41%
404
8,41%
1868
38,90%
TW
114054
112012
98,21%
2044
1,79%
2042
1,79%
2042
1,79%
25083
21,99%
UA
20200
20190
99,95%
12
0,06%
10
0,05%
10
0,05%
3059
15,14%
UK
292968
146057
49,85%
146912
50,15%
146911
50,15%
146911
50,15%
179753
61,36%
UY
6368
5466
85,84%
902
14,16%
902
14,16%
902
14,16%
1173
18,42%
ZA
50077
22879
45,69%
27235
54,39%
27198
54,31%
27198
54,31%
30177
60,26%

Thursday, March 22, 2012

Converting old appln_id to current version in patstat


Even if now Application id in Patstat is stable, it could be needed to link data from edictions previous to 2011 to current one.

So I built a table named APPLN_ID_BRIDGE with the goal of providing a mean to link data from PatStat editions published before april 2011 (that is, before a stable APPLN_ID was introduced) to current data. In particular, they link the available (with unstable APPLN_ID) of  the October 2010 ediction to data from the  October 2011 data (with stable APPLN_ID)

The table is structured with following fields:

Appln_id1    application id of first PATSTAT version
Appln_id2    application id of second PATSTAT version
Edition1       PATSTAT edition of first version (YYYYMM)
Edition2       PATSTAT edition of second version (YYYYMM)

where Edition1 < Edition2 in order to avoid duplication of records (and decrease table size).

Due to the fact that APPLN_ID is unique by application authority, application number and application kind, we can build a bridge table by matching the appln_id that share into different addiction same values for previously listed fields.

Here is an example of SQL building the table for October 2010, October 2011.

ALTER TABLE `PATSTAT`.`tls201_appln` ADD INDEX `Index_3`(`APPLN_AUTH`),
 ADD INDEX `Index_4`(`APPLN_NR`),
 ADD INDEX `Index_5`(`APPLN_KIND`);

ALTER TABLE `PATSTATold`.`tls201_appln` ADD INDEX `Index_3`(`APPLN_AUTH`),
 ADD INDEX `Index_4`(`APPLN_NR`),
 ADD INDEX `Index_5`(`APPLN_KIND`);

create table PATSTAT.appln_id_bridge
Select
  t1old.APPLN_ID As appln_id1,
  t1new.APPLN_ID As appln_id2,
  "201010" As edition1,
  "201110" As edition2
From
  PATSTATold.tls201_appln t1old Inner Join
  PATSTAT.tls201_appln t1new On t1old.APPLN_AUTH = t1new.APPLN_AUTH And
    t1old.APPLN_NR = t1new.APPLN_NR And t1old.APPLN_KIND = t1new.APPLN_KIND
Group By
  t1old.APPLN_ID, t1new.APPLN_ID, "201010", "201110"

ALTER TABLE `PATSTAT`.`tls201_appln` drop INDEX `Index_3`, drop INDEX `Index_4`, drop INDEX `Index_5`;
ALTER TABLE `PATSTATold`.`tls201_appln` drop INDEX `Index_3`, drop INDEX `Index_4`, drop INDEX `Index_5`;

Be aware not all appln_id from a previous version will be matched in following one. The main reason seem to be corrections. For example application kind codes. Corrections create a new appln_id and delete the old one.

Table witrh data can be downloaded from this link