Monday, July 4, 2011

Testing application/publication link (PART I)

Even if PATSTAT is based on application ID it may be useful (and much people do) to have a publication number based database.
Publication number is easier to use, it may be bridged to other patent data and so on.
In some previous posts I highlightened the link among appln_id and punr is not painless.
Here I suggest some test we may use to check if our data implemented correctly and consistently publication number/application id linkage.


Since this relation may be m x n some examples are listed in order to enable you to see in critical cases if implemented data work correctly.
Cases examined are multiple publication numbers related to the same application ids, and on the other hand, multiple application_id related to the same publication number.
At the end of each paragraph there is a test to make on your DB.

Appln_ids and data are from patstat 2010/10 ediction.

1) double punr

 We have several cases where publication number changes with the status of the application.
This is a count by publication authority, excluding D2 application kind and 9999 filing date applications.


auth
appl #
# multiple punr
share
'AP'
5138
2032
39,55%
'AT'
1062556
87749
8,26%
'AU'
1533392
19363
1,26%
'BE'
366138
1
0,00%
'BG'
53556
1076
2,01%
'BR'
507674
2
0,00%
'CA'
1163197
308
0,03%
'CH'
1053680
5
0,00%
'CN'
3194875
593155
18,57%
'CS'
162230
32723
20,17%
'CU'
2997
100
3,34%
'CZ'
68576
21007
30,63%
'DD'
177131
78
0,04%
'DE'
6024317
11
0,00%
'DK'
378544
36042
9,52%
'EA'
18778
4074
21,70%
'EE'
6422
1913
29,79%
'ES'
919193
60551
6,59%
'FI'
230851
70950
30,73%
'GB'
3282728
343413
10,46%
'GE'
136
1
0,74%
'GR'
97851
2286
2,34%
'HR'
12015
1
0,01%
'HU'
136668
46219
33,82%
'IE'
91747
22197
24,19%
'IL'
190468
11
0,01%
'IN'
68199
2
0,00%
'IS'
7813
2438
31,20%
'IT'
674327
264070
39,16%
'JP'
16002805
4406789
27,54%
'KR'
1898155
189594
9,99%
'LT'
3773
2553
67,66%
'LV'
4968
2
0,04%
'MC'
2757
1
0,04%
'MD'
4685
691
14,75%
'MX'
185994
720
0,39%
'NL'
422670
53729
12,71%
'NO'
221330
69807
31,54%
'PL'
232387
91078
39,19%
'PT'
86822
1
0,00%
'RO'
65905
11
0,02%
'RU'
444798
69317
15,58%
'SE'
551022
397
0,07%
'SI'
19604
32
0,16%
'SK'
23432
8558
36,52%
'SM'
642
72
11,21%
'SU'
1232392
50
0,00%
'TJ'
374
84
22,46%
'TW'
373289
7
0,00%
'US'
10320935
1049026
10,16%
'YU'
33649
12316
36,60%
'ZA'
268184
1
0,00%


This issue has impact both on count of application and on count of citations.

 1a) a case @ USPTO

Us patents get a publication number like YYYYXXXXX whith the first publication, and a different publication number when granted.
FI appln_id 58139710 is published both as

'US', '        7285137', 'B2'
'US', '     2005166335', 'A1'

In our DB the two publications should be somehow the same (maybe keeping only granted patents).
This issue should be verified also on citations.

another sinthomatic case is appln_id  48363687


61180019, 'US', '        7345336', 'B2', 48363687, '2008-03-18', 'EN', 1
61180020, 'US', '     2005133849', 'A1', 48363687, '2005-06-23', 'EN', 0

that has also a lot of record in citation table TLS212

PAT_PUBLN_ID
 CITN_ID
 CITED_PAT_PUBLN_ID
 PAT_CITN_SEQ_NR
 NPL_CITN_SEQ_NR
 CITN_ORIGIN
17472674
2
61180020
2
0
 '0    '
61180019
1
69488265
1
0
 '0    '
61180019
2
68438301
2
0
 '0    '
61180019
3
66754652
3
0
 '0    '
61180019
4
70514475
4
0
 '0    '
61180019
5
65067355
5
0
 '0    '
61180019
6
66578342
6
0
 '0    '
61180019
9
73815080
7
0
 '1    '
61180019
10
73815081
8
0
 '1    '
61180019
11
73815082
9
0
 '1    '
61180019
12
73815083
10
0
 '1    '
66070794
3
61180019
3
0
 '0    '
66070794
4
61180020
4
0
 '0    '

and we see how strange is that publication 66070794 cites both punr!

66070794, 'US', '        7470592', 'B2', 52789971, '2008-12-30', 'EN', 1

TEST: see if PUNR 7470592 cites 7345336 and/or 2005133849 [should cite only one]
See if in our punr US 7345336 and/or 2005133849 exist [should exist only one]

 1b) same for japan, but 5

appln_id 27213431

PAT_PUBLN_ID
 PUBLN_AUTH
 PUBLN_NR
 PUBLN_KIND
 APPLN_ID
 PUBLN_DATE
 PUBLN_FIRST_GRANT
33698332
 'JP'
 '        1427889'
 'C'
27213431
 '1988-02-25'
0
33698333
 'JP'
 '       62034618'
 'B2'
27213431
 '1987-07-28'
1
33698334
 'JP'
 '       62034618'
 'T3'
27213431
 '1987-07-28'
0
33698335
 'JP'
 '       58134863'
 'A'
27213431
 '1983-08-11'
0
33698336
 'JP'
 '       58134863'
 'T1'
27213431
 '1983-08-11'
0


We go to citations and we see

PAT_PUBLN_ID
 CITN_ID
 CITED_PAT_PUBLN_ID
 PAT_CITN_SEQ_NR
 CITN_ORIGIN
18796442
5
33698335
5
 '4    '
18796442
6
33698336
6
 '4    '
18796442
7
33698333
7
 '4    '
18796442
8
33698334
8
 '4    '
33698333
1
38471795
1
 '0    '
33698333
2
42774760
2
 '0    '
33698333
3
35800209
3
 '0    '
48589428
3
33698335
3
 '0    '
52762211
3
33698335
3
 '0    '

That is EP1182223

But if we look into citations in espacenet we see only JP58134863 (A)

http://worldwide.espacenet.com/allCitations?compact=true&page=0&KC=A1&NR=1182223A1&DB=EPODOC&locale=en_EP&CC=EP&FT=D

TEST: see if PUNR EP1182223 cites JP 62034618 and/or 58134863 [should cite only one]



No comments:

Post a Comment