We may discover (in sept 2009 ediction) 57988 multiples (most doubles, only 3311 >=3) .
These are the application authorities who have more than100 duplications.
'JP' | 31055 |
'US' | 11214 |
'DE' | 7376 |
'AU' | 3694 |
'IT' | 2393 |
'DD' | 720 |
'NO' | 235 |
'SE' | 207 |
'SU' | 186 |
'NL' | 125 |
'ES' | 107 |
'KR' | 104 |
If we add also filing date, duplications figure remains almoste the same (57985) where all cases, but 10, have filing date 31/12/9999.
Out of 63595000 records (I'm always talking about sept 2009 ediction) is less than 0,1 %.
As a matter of fact the application_id (that is the main key of Patstat) is defined as a unique combination of application number, application authority and application kind.
On the other hand we should be aware that different triples of application authority, number and kind may address the same document: FI kind A (application) and T (translation) identify (given the same application autority and number) the same document.
So we need to understand how to filter the pairs application autority/ application kind in order to consider only those documents who are applications.
The problem is that we have 762 pairs application autority/ application kind.
We would need to find a concordance table (like FI the one we may find on delphion for publication kind http://www.delphion.com/help/kindcodes) and reclassify it in macroclasses allowing us to drop 'accessory' documents and retain only the documents we want to keep.
To do it we must also go deeper into relationship among applications and pubblications, since according to the data model, every application has 0 to N publications, where every publication belongs to exactly 1 application.
[to be continued]
No comments:
Post a Comment