(Continues from previous post)
As previously stated, 95% of docdb families contain applications with the same number of inventors;
obviously it may also happen that part of inventors change among the applications. Thus we must check how deep is the variance of person_ids within the same docdb family.
Here we see the results (sql code appended in the end of this post):
Two results are interesting:
1) almost 85% of families share the same inventors (in person_ids): if we wouls also count that some person_ids inside the family may refer to the same entity but only spelling is different (due to different data origin) this is validatign again our hypothesis;
2) we have 4620 odd families with more inventors than person_ids (but this may be explained either with duplications due to see applicant issue or with duplications in TLS207)
SQL CODE for counting number of inventors / persond_ids ratio:
As previously stated, 95% of docdb families contain applications with the same number of inventors;
obviously it may also happen that part of inventors change among the applications. Thus we must check how deep is the variance of person_ids within the same docdb family.
Here we see the results (sql code appended in the end of this post):
range
|
count
|
%
|
<1 span="">1>
|
4620
|
0,01%
|
1
|
28708521
|
84,99%
|
1-2
|
1928802
|
5,71%
|
2-5
|
2447120
|
7,24%
|
5-20
|
637558
|
1,89%
|
>20
|
52859
|
0,16%
|
Two results are interesting:
1) almost 85% of families share the same inventors (in person_ids): if we wouls also count that some person_ids inside the family may refer to the same entity but only spelling is different (due to different data origin) this is validatign again our hypothesis;
2) we have 4620 odd families with more inventors than person_ids (but this may be explained either with duplications due to see applicant issue or with duplications in TLS207)
SQL CODE for counting number of inventors / persond_ids ratio:
Drop table if exists t1;
create TABLE T1
select a.DOCDB_FAMILY_ID, avg(ninv) as avginv from
(Select t18.DOCDB_FAMILY_ID, Max(t7.invt_seq_nr) as ninv, t18.APPLN_ID
From
patstat.tls218_docdb_fam t18 Inner Join
patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID
Where t7.invt_seq_nr > 0
Group By t18.DOCDB_FAMILY_ID, t18.APPLN_ID) as a
group by a.DOCDB_FAMILY_ID;
alter table t1 add index i1(DOCDB_FAMILY_ID);
select floor((totpers/avginv)*10)/10 as rate, count(c.DOCDB_FAMILY_ID) as cc
from
t1 as c
inner join
(Select t18.DOCDB_FAMILY_ID, Count(Distinct t7.person_id) As totpers
From
patstat.tls218_docdb_fam t18 Inner Join
patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID
Where t7.invt_seq_nr > 0
Group By t18.DOCDB_FAMILY_ID) as b
on c.DOCDB_FAMILY_ID = b.DOCDB_FAMILY_ID
group by floor((totpers/avginv)*10)/10;