Friday, December 12, 2014

Patstat 2014b mysql upload scripts

 Autumn 2014 patstat ediction (now onward 2014b) has just been released.
 Here you can download the mysql scripts I made for uploading the files from DVDs data.

remember to customize them for your own upload directory / target library
also note I did not upload TLS 203, 222 and 223 but scripts should work fine

Here below a comparison with expected records and october 2013 version
table
date of files
declared
patstat Y-1
imported
delta imp
delta y-1
tls201_appln
2014b
80025895
76594275
80025895
0
4.48%
tls202_appln_title
2014b
60127407
57335762
60127391
16
4.87%
tls203_appln_abstr
2014b
36607548
33870394
36607548
0
8.08%
tls204_appln_prior
2014b
35035840
33615459
35035840
0
4.23%
tls205_tech_rel
2014b
2149383
2138085
2149383
0
0.53%
tls206_person
2014b
47827725
44730405
47827690
35
6.92%
tls207_pers_appln
2014b
176679567
163534542
176679567
0
8.04%
tls208_doc_std_nms
2014b
22127059
19409591
22127059
0
14.00%
tls209_appln_ipc
2014b
195884529
176153334
195884529
0
11.20%
tls210_appln_n_cls
2014b
22016211
21884148
22016211
0
0.60%
tls211_pat_publn
2014b
90478383
85733781
90478383
0
5.53%
tls212_citation
2014b
160518077
142247157
160518077
0
12.84%
tls214_npl_publn
2014b
25448833
22095090
25448826
7
15.18%
tls215_citn_categ
2014b
23690308
22390889
23690308
0
5.80%
tls216_appln_contn
2014b
2459400
2272520
2459400
0
8.22%
tls218_docdb_fam
2014b
71081760
67766434
71081760
0
4.89%
tls219_inpadoc_fam
2014b
78312042
74865923
78312042
0
4.60%
tls221_inpadoc_prs
2014b
157857976
135555494
135555494
0
16.45%
tls222_appln_jp_class
2014b
304165357
294109156
294109156
0
3.42%
tls223_appln_docus
2014b
39008347
36668997
36668997
0
6.38%
tls224_appln_cpc
2014b
149340067
137478291
149340067
0
8.63%
tls226_person_orig
2014b
52524269
47699382
52524269
0
10.12%
tls227_pers_publn
2014b
210802117
193982180
210802117
0
8.67%

Monday, November 24, 2014

Using patstat in universities evaluation procedures

This work shows a methodology used to match PATSTAT inventor names to a full list of researchers working in Italian universities.
The goal is to have higher recall, leaving institutions/researchers to validate the data.
Focus will not be on results (evaluation still in progress) but on data processing, selection and match algorithm, highlighting some difficulties and relative workarounds.


Monday, October 13, 2014

Differences of inventors within the same docdb family (part II)

(Continues from previous post)

As previously stated, 95% of docdb families contain applications with the same number of inventors;
obviously it may also happen that part of inventors change among the applications. Thus we must check how deep is the variance of person_ids within the same docdb family.

Here we see the results (sql code appended in the end of this post):



range
count
%
<1 span="">
4620
0,01%
1
28708521
84,99%
1-2
1928802
5,71%
2-5
2447120
7,24%
5-20
637558
1,89%
>20
52859
0,16%



Two results are interesting:
1) almost 85% of families share the same inventors (in person_ids): if we wouls also count that some person_ids inside the family may refer to the same entity but only spelling is different (due to different data origin) this is validatign again our hypothesis;
2) we have 4620 odd families with more inventors than person_ids (but this may be explained either with duplications due to see applicant issue or with duplications in TLS207)


SQL CODE for counting number of inventors / persond_ids ratio:

Drop table if exists t1;

create TABLE T1
select a.DOCDB_FAMILY_ID, avg(ninv) as avginv from

    (Select   t18.DOCDB_FAMILY_ID, Max(t7.invt_seq_nr) as ninv, t18.APPLN_ID
      From
        patstat.tls218_docdb_fam t18 Inner Join
        patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID
      Where t7.invt_seq_nr > 0
      Group By   t18.DOCDB_FAMILY_ID, t18.APPLN_ID) as a
  group by a.DOCDB_FAMILY_ID;

alter table t1 add index i1(DOCDB_FAMILY_ID);

select floor((totpers/avginv)*10)/10 as rate, count(c.DOCDB_FAMILY_ID) as cc
from
t1 as c
inner join
  (Select   t18.DOCDB_FAMILY_ID, Count(Distinct t7.person_id) As totpers
    From
      patstat.tls218_docdb_fam t18 Inner Join
      patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID
    Where t7.invt_seq_nr > 0
    Group By   t18.DOCDB_FAMILY_ID) as b
  on c.DOCDB_FAMILY_ID = b.DOCDB_FAMILY_ID
group by floor((totpers/avginv)*10)/10;



Thursday, October 9, 2014

Differences of inventors within the same docdb family (part I)

To create the full list of inventors that participated to an innvoation is not an trivial task.
Especially because if we mean for innovation not a mere application but a patent family, to make an append of all the person_ids for all applications belonging to the family would surely lead to undetected duplication of names (ie due to different spelling or address in different application authorities).
Thus one way could be to take only the inventors related to one application (ie the older or the one where data are more likely to be complete fi EPO).
In this case we may instead have an uncomplete recall of inventors whether across different applications one or more inventors may change, be amended or added.

One way to validate this idea is to count what is the difference between min and max count of inventors in the applications within the family. This could validate the fact that in most cases the list of inventors remains the same.
The count is here below: over 95% of docdb families have the same number of inventors for all applications





delta
n families
%
0
36.048.365
95,523%
1
859.567
2,278%
2
413.670
1,096%
3
206.235
0,546%
4
101.529
0,269%
5
48.545
0,129%
6
25.400
0,067%
7
13.372
0,035%
8
7.775
0,021%
9
4.432
0,012%
10
2.972
0,008%
11
1.697
0,004%
12
1.122
0,003%
13
836
0,002%
14
580
0,002%
15 or more
1.661
0,004%


The higher difference within a familis (98 inventors) is for family_id 39324928, containing 74 distinct patent applications where is patent  WO2008051495 has 98 inventors, while  JP2010520959 counts 0 inventors.