As previously stated, Patstat contains 2 family data tables: tls218 (docdb family) and tls219 (inpadoc family).
In this post you may find differnces in definitions.
Here we are going to make some analisis on the data contained in the two tables, based on oct 2010 ediction.
A first issue that comes out is the difference in number of records:
In this post you may find differnces in definitions.
Here we are going to make some analisis on the data contained in the two tables, based on oct 2010 ediction.
A first issue that comes out is the difference in number of records:
tls218_docdb_fam | 58713013 | ||
tls219_inpadoc_fam | 66226956 |
One more issue can be to extract the number of distinct families from the two tables:
inpadoc: 39.301.955
docdb: 40.677.058
So we expect average size of docdb family to be smaller, in agreement with it's definition.
Then We may calculate the family composition in number of applications.
inpadoc | docdb | |||
#apps | # | % | # | % |
1 | 29923455 | 76,14% | 35468031 | 87,19% |
2 | 5374436 | 13,67% | 1863423 | 4,58% |
3 | 1048185 | 2,67% | 898137 | 2,21% |
4 | 718316 | 1,83% | 656255 | 1,61% |
5 | 552198 | 1,41% | 514797 | 1,27% |
6 | 424765 | 1,08% | 359279 | 0,88% |
7 | 308614 | 0,79% | 243965 | 0,60% |
8 | 222540 | 0,57% | 170751 | 0,42% |
9 | 161787 | 0,41% | 122107 | 0,30% |
>=10 | 567659 | 1,44% | 380313 | 0,93% |
TOT | 39301955 | 100% | 40677058 | 100% |
If we go for absolute numbers we will find that inpadoc bigger family counts 4927 applications (where 20% of them are unpublished priorities), where docdb has 'only' 329 applications in it's bigger 'clan'.
No comments:
Post a Comment