Showing posts with label citations. Show all posts
Showing posts with label citations. Show all posts

Thursday, July 18, 2019

Patents citations from Pubmed Scientific publications



A recently released dataset from Marx, Matt; Aaron Fuegi contains citations from USPTO patents granted 1947-2018 to articles captured by the MS Academic Graph (ID) from 1800-2018.

Files, tab-separated, are available at link : https://zenodo.org/record/3338601

The main file, pcs.tsv, contains the resolved citations matching patent number, MAG ID, the original citation from the patent, an indicator for whether the citation was supplied by the applicant, examiner, or unknown, and a confidence score (1-10) indicating how likely this match is correct.

There is also a PubMed-specific match in pcs-pubmed.tsv.

Authors also made availabel source code for generating the patent citations to science in pcs.tsv is available at https://github.com/mattmarx/reliance_on_science. Source code for generating jif.zip and jcif.zip (Journal Impact Factor and Journal Commercial Impact Factor) is at https://github.com/mattmarx/jcif.

Scripts and programs are mainly for stata & linux (DO files and sh scripts).

 

Monday, May 30, 2016

Citation generating authority in Patstat

Table TLS212 in Patstat includes a filed named CITN_GENER_AUTH that is ment to contain, for those offices who have no examiners, the reference to the authority who genereate the citation.

Patstat data handbook describes it in this way:

Name: Identification of International Search Authority (ISA) for PCT search reports (incl. supplementary search reports)
Also Known As: n/a
Description: Country code identifying the patent authority performing the International Search Report.

DOCDB-XML contains the generating authority for examiner citations in WO publications. This field in DOCDB will be better populated using the data file provided by WIPO and shown in the usage example above. These fields will be loaded into column CITN_GENER_AUTH in PATSTAT table TLS212_CITATION.
The column CITN_GENER_AUTH will not be populated for other citations, only ISA ones.
If a WO publication has no citations by examiners, then the ISA will not be traceable. This is not a problem, as it only affects a small percentage of the total.

In reality , if we run a count of distinct CITN_GENER_AUTH over all patent origins we find:

APL 1
APP 1
CH2 1
EXA 1
FOP 1
ISR 19
OPP 1
PRS 1
SEA 20
SUP 3
TPO 1

Thus also SEA origin can have multiple generating autorities (for WO only)

SEA  'WO'  'AT' 40305
SEA  'WO'  'AU' 226899
SEA  'WO'  'BR' 7803
SEA  'WO'  'CA' 68975
SEA  'WO'  'CN' 336595
SEA  'WO'  'EG' 25
SEA  'WO'  'EP' 7042821
SEA  'WO'  'ES' 57180
SEA  'WO'  'FI' 21923
SEA  'WO'  'IL' 4657
SEA  'WO'  'IN' 344
SEA  'WO'  'JP' 1453032
SEA  'WO'  'KR' 551079
SEA  'WO'  'RU' 65639
SEA  'WO'  'SE' 315066
SEA  'WO'  'SU' 6100
SEA  'WO'  'US' 1629423
SEA  'WO'  'XN' 4336

So the field is meaningless for all but WO, where it contains both for IPR and SEA origin, data.

it is also worthy recall all citation origins here:

APP citations introduced by the applicant
SEA citations introduced during search (from Search Report)
ISR citations from the International Search Report
SUP citations from the Supplementary Search Report
PRS "PRe-Search" citations (available before official publication)
EXA citations introduced during examination
OPP citations introduced during opposition (citations by opponent published with a European Patent Specification (EP-B2))
APL citations introduced when filed for appeal by applicant / proprietor / patentee
FOP citations introduced when filed opposition by any third party after the publication of a European Patent Specification (EP- B1)
TPO citations introduced because of Third Party Observations (Art 115 EPC)
CH2 citations introduced during the Chapter 2 phase of the PCT



Saturday, April 30, 2016

About citations category

Patstat table tsl215_citn_categ stores the category of the citation, as mentioned in serch report.

Among these categories, the most relevant are:


X - particularly relevant if taken alone
Y - particularly relevant if combined with another document of the same category
A - technological background
O - non-written disclosure
P - intermediate document
T - theory or principle underlying the invention
E - earlier patent document, but published on, or after the filing date
D - document cited in the application
L - document cited for other reasons

However, the coverage of this information is limited :  if we do a scan by generating autority, we get that over 77% of it come from EPO (both for EP and pct applications); the rest from other patent offices




 A
 D
 E
 F
 I
 L
 O
 P
 T
 X
 Y
total
%
AT
24089
69
127

5
3

720
30
11973
3931
40947
0,16%
AU
79480
82
488

1
679
145
11660
280
131733
25675
250223
0,98%
BR
4832
38
16


10
4
218
9
2860
2322
10309
0,04%
CA
42347
30
239


167
38
4007
138
26258
16969
90193
0,35%
CH
1837
75
13




3
1
978
990
3897
0,02%
CL
25






2

9
19
55
0,00%
CN
260486
131
4741

11
128
2
40001
63
124913
69241
499717
1,96%
EG
10






1

21
10
42
0,00%
EP
8818087
1261537
142562
1
397060
19971
2221
537680
43334
5735892
2744789
19703134
77,43%
ES
41571
100
101


15
8
1179
3
16579
8263
67819
0,27%
FI
14889
32
137


50
4
876
60
11336
3278
30662
0,12%
IL
3201

6


7
1
205
18
2721
2679
8838
0,03%
IN
371
1
1




39
1
454
987
1854
0,01%
JP
769632
37
12077


107
58
55516
974
323363
595207
1756971
6,90%
KR
515906
44
985

2
19
8
5925
417
85944
111552
720802
2,83%
RU
49714
451
81

15

5
275
16
8424
19953
78934
0,31%
SE
207835
413
1927


168
21
11091
177
76705
37430
335767
1,32%
SU
5105

19




30
4
42
335
5535
0,02%
US
637559
276
21001


555
250
94045
3963
342685
692237
1792571
7,04%
XN
4044
107
35


2

143
6
1533
707
6577
0,03%
blank
15721
1780
1108

2854
101
1
5374
501
10579
5090
43109
0,17%

11496741
1265203
185664
1
399948
21982
2766
768990
49995
6915002
4341664
25447956




 Thus the coverage for this info can be considered complete only for EP patents.