Monday, June 4, 2012

EU27 patent authorities inventors country analisys (1/3)


As described in product web page, Patstat contains data from more than 80 patent offices.
Bytheway how is the quality of such data? Especially when trying to make some meaningful analisys we need to know the coverage of data.

Most of all patstat is told to have a good coverage on european patent authorities, but to be less raliable on the rest. Is it?

Here I develop the full procedure for checking EU27 inventors countries coverage, based on october 2011 data.
  
Query run:

First query creates a table named NATIONALCOUNTS where we have a fractional count by applications of inventors having a country code (CYDATA = “Y”) or not.
We select application with filing date >= 1/1/2000 to approximate the priority date as much as possible without complicationg data extractions.

Q1: totals by appln authorities in EU27

Create table nationalcounts
Select distinct
  c.EU27sort,
  c.Countrycode,
  t01.APPLN_AUTH,
  t01.APPLN_KIND,
  t07.APPLN_ID,
 t07.person_id,
  if(t06.PERSON_CTRY_CODE = '', "N", "Y") as CYDATA,
  a.weight
From
  patstat.tls201_appln t01 Inner Join
  patstat.tls207_pers_appln t07 On t01.APPLN_ID = t07.APPLN_ID Inner Join
  patstat.tls206_person t06 On t07.PERSON_ID = t06.PERSON_ID Inner Join
  rdk.countries c On c.ISO3166 = t01.APPLN_AUTH inner join
(select 1/count(b.person_id) as weight
From patstat.tls207_pers_appln as b
Where b.INVT_SEQ_NR > 0
Group by appln_id) as a
on a.appln_id = t01.appln_id
Where
  c.EU27sort > 0 And
  t07.INVT_SEQ_NR > 0
And year(t01.appln_filing_date) >=2000;
               


where table COUNTRIES contains following information (some records are shown up):

Countrycode
 countryname
 ISO3166
 GEOAREA
 EU27sort
'AD'
 'ANDORRA'
 'AD'
 ''

'AE'
 'UNITED ARAB EMIRATES'
 'AE'
 ''

'AF'
 'AFGHANISTAN'
 'AF'
 ''

'AG'
 'ANTIGUA AND BARBUDA'
 'AG'
 ''

'AI'
 'ANGUILLA'
 'AI'
 ''

'AL'
 'ALBANIA'
 'AL'
 ''

'AM'
 'ARMENIA'
 'AM'
 ''

'AO'
 'ANGOLA'
 'AO'
 ''

'AQ'
 'ANTARCTICA'
 'AQ'
 ''

'AR'
 'ARGENTINA'
 'AR'
 ''

'AS'
 'AMERICAN SAMOA'
 'AS'
 ''

'AT'
 'AUSTRIA'
 'AT'
 'EU27'
1
'AU'
 'AUSTRALIA'
 'AU'
 ''

'AW'
 'ARUBA'
 'AW'
 ''

'AX'
 'Ã…LAND ISLANDS'
 'AX'
 ''

'AZ'
 'AZERBAIJAN'
 'AZ'
 ''

'BA'
 'BOSNIA AND HERZEGOVINA'
 'BA'
 ''

'BB'
 'BARBADOS'
 'BB'
 ''

'BD'
 'BANGLADESH'
 'BD'
 ''

'BE'
 'BELGIUM'
 'BE'
 'EU27'
2


























Thus we can link it to patstat data using ISO3166 field as application authority, filtering where EU27sort is > 0 (also selecting geoarea EU27 would work as well).

We may also need is the number of applications among EU27 that have no inventor at all; such records will be appended to nationalcounts table with CYDATA = “V” (void) and weight = 1

Q2: EU27 applications with no inventor

Insert into nationalcounts
Select distinct
  c.EU27sort,
  c.Countrycode,
  t01.APPLN_AUTH,
  t01.APPLN_KIND,
  t01.APPLN_ID,
0 as person_id,
“V”  as CYDATA,
 1 as weight
From
  patstat.tls201_appln t01 left Join
  (select * from patstat.tls207_pers_appln
   where  INVT_SEQ_NR > 0 ) as t07
  On t01.APPLN_ID = t07.APPLN_ID  Inner Join
  rdk.countries c On c.ISO3166 = t01.APPLN_AUTH
Where
  c.EU27sort > 0
and year(t01.appln_filing_date) >=2000
and t07.APPLN_ID is null;


just to check our population has been chosen correctly we compare results of two queries:

Q3: count of applications in EU27 buy appln auth, filing year >= 2000 source - patstat

Select
t01.APPLN_AUTH, count(distinct t01.APPLN_ID)
From
  patstat.tls201_appln t01 Inner Join
  rdk.countries c On c.ISO3166 = t01.APPLN_AUTH
Where
  c.EU27sort > 0
And year(t01.appln_filing_date) >=2000
Group by t01.appln_auth;
Q4: count of applications in EU27 buy appln auth, filing year >= 2000 source – table created

Select
  a.APPLN_AUTH,
  Sum(a.weight) As total
From
Nationalcounts as a
Group By
  a.APPLN_AUTH;

Thus we get these results:

patstat
nationalcounts
% error
'AT'
263598
263548.5942
0.02%
'BE'
284520
284461.5883
0.02%
'BG'
7589
7587.6561
0.02%
'CY'
490
489.9922
0.00%
'CZ'
30678
30664.7612
0.04%
'DE'
2066227
2066157.965
0.00%
'DK'
119126
119114.213
0.01%
'EE'
3079
3078.5031
0.02%
'ES'
182663
182651.6436
0.01%
'FI'
86397
86396.5538
0.00%
'FR'
306000
305990.4801
0.00%
'GB'
412625
412295.7409
0.08%
'GR'
10524
10523.6455
0.00%
'HU'
25328
25320.2408
0.03%
'IE'
8303
8302.9573
0.00%
'IT'
170324
170215.7062
0.06%
'LT'
1177
1176.7488
0.02%
'LU'
11581
11580.9929
0.00%
'LV'
1755
1754.9937
0.00%
'MT'
18
18
0.00%
'NL'
225667
225664.0008
0.00%
'PL'
54386
54385.7814
0.00%
'PT'
28764
28758.0775
0.02%
'RO'
5040
5039.7973
0.00%
'SE'
755366
755343.1424
0.00%
'SI'
12502
12501.7939
0.00%
'SK'
7564
7558.2712
0.08%

Where all errors are <<0,1% and mainly due to roundings;

Eventually in next post we will develop some analisys of data.

No comments:

Post a Comment