Thursday, May 4, 2017

Non Patent Literature improvements in PATSTAT

Table TLS214 contains backward citations from patents to toher documents (mainly scientific articles).
Until now the field was mainly a raw batch of data with all fields in one fulltext.
Only a tag, npl_type, was discriminating the following types:

For articles: b Book citation c Chemical abstracts citation i Biological abstract citation j Patent Abstracts of Japan citation s Serial / Journal / Periodical citation
For online citations: d Derwent citation e Database citation w World Wide Web / Internet search citation
For poor NPL citations (no rich NPL structure): a Abstract citation of no specific kind

From 2017a ediction EPO has parsed part of the data in the following fields:

NPL_DOI
NPL_EDITOR
NPL_ISBN
NPL_ISSN
NPL_ISSUE
NPL_PAGE_FIRST
NPL_PAGE_LAST
NPL_PUBLISHER
NPL_PUBLN_ DATE
NPL_PUBLN_END_DATE
NPL_TITLE1
NPL_TITLE2
NPL_VOLUME
ONLINE_AVAILABILITY
ONLINE_CLASSIFICATION
ONLINE_SEARCH_DATE

Also NPL citations which contain only strings like “none” or “See also references of WOxxxxxx” are removed to not distort citations counts.

2017a table contains 32.802.796 records, but still duplications exists.

If we want to check the coverage pf the parsing by publication authority we can write a query (we set a npl document as 'parsed' where at least the title1 is not blank)

SELECT
  t11.PUBLN_AUTH,
  Count(DISTINCT t14.npl_publn_id) AS Count_npl_publn_id,
  Count(DISTINCT if(t14.npl_title1<>'',t14.npl_publn_id,0))-1 AS parsed_npl_publn_id,
 (Count(DISTINCT if(t14.npl_title1<>'',t14.npl_publn_id,0))-1)/Count(DISTINCT t14.npl_publn_id) as ratio
FROM
  tls214_npl_publn t14
  INNER JOIN tls212_citation t12 ON t12.CITED_NPL_PUBLN_ID = t14.npl_publn_id
  INNER JOIN tls211_pat_publn t11 ON t11.PAT_PUBLN_ID = t12.PAT_PUBLN_ID
GROUP BY
  t11.PUBLN_AUTH





publication office
num npl
parsed npl
ratio
AP
1175
108
9,2%
AT
4280
21
0,5%
AU
77366
1544
2,0%
BE
6120
1835
30,0%
BG
2
0
0,0%
CH
1743
400
23,0%
CN
1359528
1186981
87,3%
CY
5
2
40,0%
CZ
3228
65
2,0%
DE
703827
80
0,0%
DK
23
15
65,2%
EA
14061
8380
59,6%
EP
2668773
915907
34,3%
ES
27263
137
0,5%
FI
1
0
0,0%
FR
275439
87877
31,9%
GB
96775
2984
3,1%
GR
4867
1008
20,7%
HR
119
50
42,0%
HU
3
2
66,7%
IT
11971
10747
89,8%
JP
429099
0
0,0%
KR
58641
49
0,1%
KZ
1
0
0,0%
LU
493
283
57,4%
MY
12
1
8,3%
NL
13287
6388
48,1%
NO
5
3
60,0%
RO
1
0
0,0%
RU
13894
30
0,2%
SG
788
91
11,6%
TR
2621
278
10,6%
TW
4704
18
0,4%
US
24098473
116687
0,5%
WO
3405095
1589524
46,7%





33283683
3931495
11,8%



The oveall 11,8% tells us it is a work in progress.






No comments:

Post a Comment