Saturday, July 9, 2016

Mysql upload scripts for patstat 2016a

Following the release of patstat 2016a, at this link

https://dl.dropboxusercontent.com/u/3004945/rawpatentdata/patstat2016a.rar

it's possible to download my scripts for mysql allowing to upload the majority of tables from ediction 2016a of patstat;

Please note I uploaded the new TLS906 in TLS206 in order to make smoother the db steps that follow the import of patstat data.

Also changed CHARACTER SET to utf8mb4 COLLATE utf8mb4_general_ci following EPO suggestions.

Monday, May 30, 2016

Citation generating authority in Patstat

Table TLS212 in Patstat includes a filed named CITN_GENER_AUTH that is ment to contain, for those offices who have no examiners, the reference to the authority who genereate the citation.

Patstat data handbook describes it in this way:

Name: Identification of International Search Authority (ISA) for PCT search reports (incl. supplementary search reports)
Also Known As: n/a
Description: Country code identifying the patent authority performing the International Search Report.

DOCDB-XML contains the generating authority for examiner citations in WO publications. This field in DOCDB will be better populated using the data file provided by WIPO and shown in the usage example above. These fields will be loaded into column CITN_GENER_AUTH in PATSTAT table TLS212_CITATION.
The column CITN_GENER_AUTH will not be populated for other citations, only ISA ones.
If a WO publication has no citations by examiners, then the ISA will not be traceable. This is not a problem, as it only affects a small percentage of the total.

In reality , if we run a count of distinct CITN_GENER_AUTH over all patent origins we find:

APL 1
APP 1
CH2 1
EXA 1
FOP 1
ISR 19
OPP 1
PRS 1
SEA 20
SUP 3
TPO 1

Thus also SEA origin can have multiple generating autorities (for WO only)

SEA  'WO'  'AT' 40305
SEA  'WO'  'AU' 226899
SEA  'WO'  'BR' 7803
SEA  'WO'  'CA' 68975
SEA  'WO'  'CN' 336595
SEA  'WO'  'EG' 25
SEA  'WO'  'EP' 7042821
SEA  'WO'  'ES' 57180
SEA  'WO'  'FI' 21923
SEA  'WO'  'IL' 4657
SEA  'WO'  'IN' 344
SEA  'WO'  'JP' 1453032
SEA  'WO'  'KR' 551079
SEA  'WO'  'RU' 65639
SEA  'WO'  'SE' 315066
SEA  'WO'  'SU' 6100
SEA  'WO'  'US' 1629423
SEA  'WO'  'XN' 4336

So the field is meaningless for all but WO, where it contains both for IPR and SEA origin, data.

it is also worthy recall all citation origins here:

APP citations introduced by the applicant
SEA citations introduced during search (from Search Report)
ISR citations from the International Search Report
SUP citations from the Supplementary Search Report
PRS "PRe-Search" citations (available before official publication)
EXA citations introduced during examination
OPP citations introduced during opposition (citations by opponent published with a European Patent Specification (EP-B2))
APL citations introduced when filed for appeal by applicant / proprietor / patentee
FOP citations introduced when filed opposition by any third party after the publication of a European Patent Specification (EP- B1)
TPO citations introduced because of Third Party Observations (Art 115 EPC)
CH2 citations introduced during the Chapter 2 phase of the PCT



Thursday, May 12, 2016

Number of claims coverage in patstat - 2015b update

This is an update to a previous post issued in 2012 (see here)


EP
US
1975

0,981
1976

0,9856
1977

0,9962
1978
1
0,9969
1979
0,9995
0,997
1980
0,9224
0,9978
1981
0,885
0,997
1982
0,8428
0,9965
1983
0,8436
0,9956
1984
0,8116
0,9959
1985
0,7952
0,9959
1986
0,7545
0,9937
1987
0,7094
0,9941
1988
0,7166
0,992
1989
0,7169
0,992
1990
0,7006
0,9942
1991
0,6621
0,9939
1992
0,6767
0,9947
1993
0,733
0,9935
1994
0,7644
0,9934
1995
0,7275
0,9948
1996
0,7342
0,9956
1997
0,6783
0,9955
1998
0,6522
0,9954
1999
0,6209
0,9964
2000
0,5486
0,9956
2001
0,5827
0,7706
2002
0,5867
0,5061
2003
0,5828
0,4662
2004
0,5562
0,4202
2005
0,5646
0,3626
2006
0,5732
0,4103
2007
0,5609
0,3863
2008
0,55
0,3725
2009
0,5262
0,3743
2010
0,5404
0,4295
2011
0,583
0,4437
2012
0,5606
0,4657
2013
0,5594
0,4803
2014
0,5323
0,4788
2015
0,5023
0,4683






this is the sql query to get all coverage by publication authority
(this version has @ line 18 a filter for EP and US: you can remove it if you like to double check whether other patent offices data are loaded)

Select
  t11.PUBLN_AUTH,
  Year(t11.PUBLN_DATE) As pubyear,
  ifnull(a.n_pat_c,0)/Count(Distinct t11.APPLN_ID) As claim_perc
From
  patstat.tls211_pat_publn t11 left join
(Select
  t11.PUBLN_AUTH,
  Year(t11.PUBLN_DATE) As pubyear,
  Count(Distinct t11.APPLN_ID) As n_pat_c
From
  patstat.tls211_pat_publn t11
Where
  t11.PUBLN_CLAIMS > 0
Group By
  t11.PUBLN_AUTH, Year(t11.PUBLN_DATE)) as a
on a.publn_auth=t11.PUBLN_AUTH and a.pubyear=Year(t11.PUBLN_DATE)
where t11.PUBLN_AUTH in ("EP","US")
Group By
  t11.PUBLN_AUTH, Year(t11.PUBLN_DATE)


Saturday, April 30, 2016

About citations category

Patstat table tsl215_citn_categ stores the category of the citation, as mentioned in serch report.

Among these categories, the most relevant are:


X - particularly relevant if taken alone
Y - particularly relevant if combined with another document of the same category
A - technological background
O - non-written disclosure
P - intermediate document
T - theory or principle underlying the invention
E - earlier patent document, but published on, or after the filing date
D - document cited in the application
L - document cited for other reasons

However, the coverage of this information is limited :  if we do a scan by generating autority, we get that over 77% of it come from EPO (both for EP and pct applications); the rest from other patent offices




 A
 D
 E
 F
 I
 L
 O
 P
 T
 X
 Y
total
%
AT
24089
69
127

5
3

720
30
11973
3931
40947
0,16%
AU
79480
82
488

1
679
145
11660
280
131733
25675
250223
0,98%
BR
4832
38
16


10
4
218
9
2860
2322
10309
0,04%
CA
42347
30
239


167
38
4007
138
26258
16969
90193
0,35%
CH
1837
75
13




3
1
978
990
3897
0,02%
CL
25






2

9
19
55
0,00%
CN
260486
131
4741

11
128
2
40001
63
124913
69241
499717
1,96%
EG
10






1

21
10
42
0,00%
EP
8818087
1261537
142562
1
397060
19971
2221
537680
43334
5735892
2744789
19703134
77,43%
ES
41571
100
101


15
8
1179
3
16579
8263
67819
0,27%
FI
14889
32
137


50
4
876
60
11336
3278
30662
0,12%
IL
3201

6


7
1
205
18
2721
2679
8838
0,03%
IN
371
1
1




39
1
454
987
1854
0,01%
JP
769632
37
12077


107
58
55516
974
323363
595207
1756971
6,90%
KR
515906
44
985

2
19
8
5925
417
85944
111552
720802
2,83%
RU
49714
451
81

15

5
275
16
8424
19953
78934
0,31%
SE
207835
413
1927


168
21
11091
177
76705
37430
335767
1,32%
SU
5105

19




30
4
42
335
5535
0,02%
US
637559
276
21001


555
250
94045
3963
342685
692237
1792571
7,04%
XN
4044
107
35


2

143
6
1533
707
6577
0,03%
blank
15721
1780
1108

2854
101
1
5374
501
10579
5090
43109
0,17%

11496741
1265203
185664
1
399948
21982
2766
768990
49995
6915002
4341664
25447956




 Thus the coverage for this info can be considered complete only for EP patents.