As described in product web page, Patstat contains data from more than 80 patent offices.
Bytheway how is the quality of such data? Especially when trying to make some meaningful analisys we need to know the coverage of data.
Most of all patstat is told to have a good coverage on european patent authorities, but to be less raliable on the rest. Is it?
Here I develop the full procedure for checking EU27 inventors countries coverage, based on october 2011 data.
Query run:
First query creates a
table named NATIONALCOUNTS where we have a fractional count by applications of
inventors having a country code (CYDATA = “Y”) or not.
We select application
with filing date >= 1/1/2000 to approximate the priority date as much as
possible without complicationg data extractions.
Q1: totals by appln authorities in
EU27
Create table
nationalcounts
Select distinct
c.EU27sort,
c.Countrycode,
t01.APPLN_AUTH,
t01.APPLN_KIND,
t07.APPLN_ID,
t07.person_id,
if(t06.PERSON_CTRY_CODE = '', "N",
"Y") as CYDATA,
a.weight
From
patstat.tls201_appln t01 Inner Join
patstat.tls207_pers_appln t07 On t01.APPLN_ID
= t07.APPLN_ID Inner Join
patstat.tls206_person
t06 On t07.PERSON_ID = t06.PERSON_ID Inner Join
rdk.countries c On c.ISO3166 = t01.APPLN_AUTH
inner join
(select 1/count(b.person_id)
as weight
From patstat.tls207_pers_appln
as b
Where b.INVT_SEQ_NR
> 0
Group by appln_id)
as a
on a.appln_id =
t01.appln_id
Where
c.EU27sort > 0 And
t07.INVT_SEQ_NR > 0
And
year(t01.appln_filing_date) >=2000;
where table COUNTRIES
contains following information (some records are shown up):
Countrycode
|
countryname
|
ISO3166
|
GEOAREA
|
EU27sort
|
'AD'
|
'ANDORRA'
|
'AD'
|
''
|
|
'AE'
|
'UNITED ARAB EMIRATES'
|
'AE'
|
''
|
|
'AF'
|
'AFGHANISTAN'
|
'AF'
|
''
|
|
'AG'
|
'ANTIGUA AND BARBUDA'
|
'AG'
|
''
|
|
'AI'
|
'ANGUILLA'
|
'AI'
|
''
|
|
'AL'
|
'ALBANIA'
|
'AL'
|
''
|
|
'AM'
|
'ARMENIA'
|
'AM'
|
''
|
|
'AO'
|
'ANGOLA'
|
'AO'
|
''
|
|
'AQ'
|
'ANTARCTICA'
|
'AQ'
|
''
|
|
'AR'
|
'ARGENTINA'
|
'AR'
|
''
|
|
'AS'
|
'AMERICAN SAMOA'
|
'AS'
|
''
|
|
'AT'
|
'AUSTRIA'
|
'AT'
|
'EU27'
|
1
|
'AU'
|
'AUSTRALIA'
|
'AU'
|
''
|
|
'AW'
|
'ARUBA'
|
'AW'
|
''
|
|
'AX'
|
'Ã…LAND ISLANDS'
|
'AX'
|
''
|
|
'AZ'
|
'AZERBAIJAN'
|
'AZ'
|
''
|
|
'BA'
|
'BOSNIA AND HERZEGOVINA'
|
'BA'
|
''
|
|
'BB'
|
'BARBADOS'
|
'BB'
|
''
|
|
'BD'
|
'BANGLADESH'
|
'BD'
|
''
|
|
'BE'
|
'BELGIUM'
|
'BE'
|
'EU27'
|
2
|
Thus we can
link it to patstat data using ISO3166 field as application authority, filtering
where EU27sort is > 0 (also selecting geoarea EU27 would work as well).
We may also
need is the number of applications among
EU27 that have no inventor at all; such records will be appended to nationalcounts
table with CYDATA = “V” (void) and weight = 1
Q2: EU27 applications with no
inventor
Insert into nationalcounts
Select distinct
c.EU27sort,
c.Countrycode,
t01.APPLN_AUTH,
t01.APPLN_KIND,
t01.APPLN_ID,
0 as person_id,
“V” as CYDATA,
1 as weight
From
patstat.tls201_appln t01 left Join
(select * from patstat.tls207_pers_appln
where
INVT_SEQ_NR > 0 ) as t07
On
t01.APPLN_ID = t07.APPLN_ID Inner Join
rdk.countries c On c.ISO3166 =
t01.APPLN_AUTH
Where
c.EU27sort > 0
and year(t01.appln_filing_date)
>=2000
and t07.APPLN_ID is
null;
just to check our population has been chosen
correctly we compare results of two queries:
Q3: count of applications in EU27
buy appln auth, filing year >= 2000 source - patstat
Select
t01.APPLN_AUTH,
count(distinct t01.APPLN_ID)
From
patstat.tls201_appln t01 Inner Join
rdk.countries c On c.ISO3166 = t01.APPLN_AUTH
Where
c.EU27sort > 0
And
year(t01.appln_filing_date) >=2000
Group by
t01.appln_auth;
Q4: count of applications in EU27
buy appln auth, filing year >= 2000 source – table created
Select
a.APPLN_AUTH,
Sum(a.weight) As total
From
Nationalcounts as a
Group By
a.APPLN_AUTH;
Thus we get these results:
|
patstat
|
nationalcounts
|
% error
|
'AT'
|
263598
|
263548.5942
|
0.02%
|
'BE'
|
284520
|
284461.5883
|
0.02%
|
'BG'
|
7589
|
7587.6561
|
0.02%
|
'CY'
|
490
|
489.9922
|
0.00%
|
'CZ'
|
30678
|
30664.7612
|
0.04%
|
'DE'
|
2066227
|
2066157.965
|
0.00%
|
'DK'
|
119126
|
119114.213
|
0.01%
|
'EE'
|
3079
|
3078.5031
|
0.02%
|
'ES'
|
182663
|
182651.6436
|
0.01%
|
'FI'
|
86397
|
86396.5538
|
0.00%
|
'FR'
|
306000
|
305990.4801
|
0.00%
|
'GB'
|
412625
|
412295.7409
|
0.08%
|
'GR'
|
10524
|
10523.6455
|
0.00%
|
'HU'
|
25328
|
25320.2408
|
0.03%
|
'IE'
|
8303
|
8302.9573
|
0.00%
|
'IT'
|
170324
|
170215.7062
|
0.06%
|
'LT'
|
1177
|
1176.7488
|
0.02%
|
'LU'
|
11581
|
11580.9929
|
0.00%
|
'LV'
|
1755
|
1754.9937
|
0.00%
|
'MT'
|
18
|
18
|
0.00%
|
'NL'
|
225667
|
225664.0008
|
0.00%
|
'PL'
|
54386
|
54385.7814
|
0.00%
|
'PT'
|
28764
|
28758.0775
|
0.02%
|
'RO'
|
5040
|
5039.7973
|
0.00%
|
'SE'
|
755366
|
755343.1424
|
0.00%
|
'SI'
|
12502
|
12501.7939
|
0.00%
|
'SK'
|
7564
|
7558.2712
|
0.08%
|
Where all
errors are <<0,1% and mainly due to roundings;
Eventually
in next post we will develop some analisys of data.
No comments:
Post a Comment