Table TLS214 contains backward citations from patents to toher documents (mainly scientific articles).
Until now the field was mainly a raw batch of data with all fields in one fulltext.
Only a tag, npl_type, was discriminating the following types:
For articles: b Book citation c Chemical abstracts citation i Biological abstract citation j Patent Abstracts of Japan citation s Serial / Journal / Periodical citation
For online citations: d Derwent citation e Database citation w World Wide Web / Internet search citation
For poor NPL citations (no rich NPL structure): a Abstract citation of no specific kind
From 2017a ediction EPO has parsed part of the data in the following fields:
NPL_DOI
NPL_EDITOR
NPL_ISBN
NPL_ISSN
NPL_ISSUE
NPL_PAGE_FIRST
NPL_PAGE_LAST
NPL_PUBLISHER
NPL_PUBLN_ DATE
NPL_PUBLN_END_DATE
NPL_TITLE1
NPL_TITLE2
NPL_VOLUME
ONLINE_AVAILABILITY
ONLINE_CLASSIFICATION
ONLINE_SEARCH_DATE
Also NPL citations which contain only strings like “none” or “See also references of WOxxxxxx” are removed to not distort citations counts.
2017a table contains 32.802.796 records, but still duplications exists.
If we want to check the coverage pf the parsing by publication authority we can write a query (we set a npl document as 'parsed' where at least the title1 is not blank)
SELECT
t11.PUBLN_AUTH,
Count(DISTINCT t14.npl_publn_id) AS Count_npl_publn_id,
Count(DISTINCT if(t14.npl_title1<>'',t14.npl_publn_id,0))-1 AS parsed_npl_publn_id,
(Count(DISTINCT if(t14.npl_title1<>'',t14.npl_publn_id,0))-1)/Count(DISTINCT t14.npl_publn_id) as ratio
FROM
tls214_npl_publn t14
INNER JOIN tls212_citation t12 ON t12.CITED_NPL_PUBLN_ID = t14.npl_publn_id
INNER JOIN tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = t12.PAT_PUBLN_ID
GROUP BY
t11.PUBLN_AUTH
The oveall 11,8% tells us it is a work in progress.
Until now the field was mainly a raw batch of data with all fields in one fulltext.
Only a tag, npl_type, was discriminating the following types:
For articles: b Book citation c Chemical abstracts citation i Biological abstract citation j Patent Abstracts of Japan citation s Serial / Journal / Periodical citation
For online citations: d Derwent citation e Database citation w World Wide Web / Internet search citation
For poor NPL citations (no rich NPL structure): a Abstract citation of no specific kind
From 2017a ediction EPO has parsed part of the data in the following fields:
NPL_DOI
NPL_EDITOR
NPL_ISBN
NPL_ISSN
NPL_ISSUE
NPL_PAGE_FIRST
NPL_PAGE_LAST
NPL_PUBLISHER
NPL_PUBLN_ DATE
NPL_PUBLN_END_DATE
NPL_TITLE1
NPL_TITLE2
NPL_VOLUME
ONLINE_AVAILABILITY
ONLINE_CLASSIFICATION
ONLINE_SEARCH_DATE
Also NPL citations which contain only strings like “none” or “See also references of WOxxxxxx” are removed to not distort citations counts.
2017a table contains 32.802.796 records, but still duplications exists.
If we want to check the coverage pf the parsing by publication authority we can write a query (we set a npl document as 'parsed' where at least the title1 is not blank)
SELECT
t11.PUBLN_AUTH,
Count(DISTINCT t14.npl_publn_id) AS Count_npl_publn_id,
Count(DISTINCT if(t14.npl_title1<>'',t14.npl_publn_id,0))-1 AS parsed_npl_publn_id,
(Count(DISTINCT if(t14.npl_title1<>'',t14.npl_publn_id,0))-1)/Count(DISTINCT t14.npl_publn_id) as ratio
FROM
tls214_npl_publn t14
INNER JOIN tls212_citation t12 ON t12.CITED_NPL_PUBLN_ID = t14.npl_publn_id
INNER JOIN tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = t12.PAT_PUBLN_ID
GROUP BY
t11.PUBLN_AUTH
publication office
|
num npl
|
parsed npl
|
ratio
|
AP
|
1175
|
108
|
9,2%
|
AT
|
4280
|
21
|
0,5%
|
AU
|
77366
|
1544
|
2,0%
|
BE
|
6120
|
1835
|
30,0%
|
BG
|
2
|
0
|
0,0%
|
CH
|
1743
|
400
|
23,0%
|
CN
|
1359528
|
1186981
|
87,3%
|
CY
|
5
|
2
|
40,0%
|
CZ
|
3228
|
65
|
2,0%
|
DE
|
703827
|
80
|
0,0%
|
DK
|
23
|
15
|
65,2%
|
EA
|
14061
|
8380
|
59,6%
|
EP
|
2668773
|
915907
|
34,3%
|
ES
|
27263
|
137
|
0,5%
|
FI
|
1
|
0
|
0,0%
|
FR
|
275439
|
87877
|
31,9%
|
GB
|
96775
|
2984
|
3,1%
|
GR
|
4867
|
1008
|
20,7%
|
HR
|
119
|
50
|
42,0%
|
HU
|
3
|
2
|
66,7%
|
IT
|
11971
|
10747
|
89,8%
|
JP
|
429099
|
0
|
0,0%
|
KR
|
58641
|
49
|
0,1%
|
KZ
|
1
|
0
|
0,0%
|
LU
|
493
|
283
|
57,4%
|
MY
|
12
|
1
|
8,3%
|
NL
|
13287
|
6388
|
48,1%
|
NO
|
5
|
3
|
60,0%
|
RO
|
1
|
0
|
0,0%
|
RU
|
13894
|
30
|
0,2%
|
SG
|
788
|
91
|
11,6%
|
TR
|
2621
|
278
|
10,6%
|
TW
|
4704
|
18
|
0,4%
|
US
|
24098473
|
116687
|
0,5%
|
WO
|
3405095
|
1589524
|
46,7%
|
33283683
|
3931495
|
11,8%
|
The oveall 11,8% tells us it is a work in progress.
No comments:
Post a Comment