Monday, October 13, 2014

Differences of inventors within the same docdb family (part II)

(Continues from previous post)

As previously stated, 95% of docdb families contain applications with the same number of inventors;
obviously it may also happen that part of inventors change among the applications. Thus we must check how deep is the variance of person_ids within the same docdb family.

Here we see the results (sql code appended in the end of this post):



range
count
%
<1 span="">
4620
0,01%
1
28708521
84,99%
1-2
1928802
5,71%
2-5
2447120
7,24%
5-20
637558
1,89%
>20
52859
0,16%



Two results are interesting:
1) almost 85% of families share the same inventors (in person_ids): if we wouls also count that some person_ids inside the family may refer to the same entity but only spelling is different (due to different data origin) this is validatign again our hypothesis;
2) we have 4620 odd families with more inventors than person_ids (but this may be explained either with duplications due to see applicant issue or with duplications in TLS207)


SQL CODE for counting number of inventors / persond_ids ratio:

Drop table if exists t1;

create TABLE T1
select a.DOCDB_FAMILY_ID, avg(ninv) as avginv from

    (Select   t18.DOCDB_FAMILY_ID, Max(t7.invt_seq_nr) as ninv, t18.APPLN_ID
      From
        patstat.tls218_docdb_fam t18 Inner Join
        patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID
      Where t7.invt_seq_nr > 0
      Group By   t18.DOCDB_FAMILY_ID, t18.APPLN_ID) as a
  group by a.DOCDB_FAMILY_ID;

alter table t1 add index i1(DOCDB_FAMILY_ID);

select floor((totpers/avginv)*10)/10 as rate, count(c.DOCDB_FAMILY_ID) as cc
from
t1 as c
inner join
  (Select   t18.DOCDB_FAMILY_ID, Count(Distinct t7.person_id) As totpers
    From
      patstat.tls218_docdb_fam t18 Inner Join
      patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID
    Where t7.invt_seq_nr > 0
    Group By   t18.DOCDB_FAMILY_ID) as b
  on c.DOCDB_FAMILY_ID = b.DOCDB_FAMILY_ID
group by floor((totpers/avginv)*10)/10;



Thursday, October 9, 2014

Differences of inventors within the same docdb family (part I)

To create the full list of inventors that participated to an innvoation is not an trivial task.
Especially because if we mean for innovation not a mere application but a patent family, to make an append of all the person_ids for all applications belonging to the family would surely lead to undetected duplication of names (ie due to different spelling or address in different application authorities).
Thus one way could be to take only the inventors related to one application (ie the older or the one where data are more likely to be complete fi EPO).
In this case we may instead have an uncomplete recall of inventors whether across different applications one or more inventors may change, be amended or added.

One way to validate this idea is to count what is the difference between min and max count of inventors in the applications within the family. This could validate the fact that in most cases the list of inventors remains the same.
The count is here below: over 95% of docdb families have the same number of inventors for all applications





delta
n families
%
0
36.048.365
95,523%
1
859.567
2,278%
2
413.670
1,096%
3
206.235
0,546%
4
101.529
0,269%
5
48.545
0,129%
6
25.400
0,067%
7
13.372
0,035%
8
7.775
0,021%
9
4.432
0,012%
10
2.972
0,008%
11
1.697
0,004%
12
1.122
0,003%
13
836
0,002%
14
580
0,002%
15 or more
1.661
0,004%


The higher difference within a familis (98 inventors) is for family_id 39324928, containing 74 distinct patent applications where is patent  WO2008051495 has 98 inventors, while  JP2010520959 counts 0 inventors.