Showing posts with label applicants. Show all posts
Showing posts with label applicants. Show all posts

Monday, September 30, 2019

Patents applicants: how to create the full time series

I share a presentation I made @ EPo & KUL summer school in Vienna last september!
hope it's useful to someone else.

Patents change applicants data within time;
Main reason for change are ownership change, name/address change, M&A …
Applicant’s names contained in TLS206 is the ‘last available’ data;
PATSTAT Global + EP Register make available several sources to build a chain of names
and a timeline for patents contained;




Wednesday, September 30, 2015

About institutions and companies identifiers

Disentangling unique identity for company/institutions entities has always been a hard task when dealing with bibliometrics and patents data .

Some attempts have been done for example by Thomson Reuter in WOS introducing a field named ORGANIZATION ENHANCED where you can search for  a preferred organization name returns all records that contain the preferred name and all records that contain its name variants, based on self reported names variants available in WOS database.

In patstat recently harmonized names from KUL Leuven have been added to person tables, as well many attempts were done in matching ORBIS and patstat names.

Two more datasources that will be made more relevant also because included in ORCID, are

ISNI and RINGGOLD identifiers.

ISNI is an ISO standard, in use by numerous libraries, publishers, databases, and rights management organizations around the world. As an open standard, ISNI is not a proprietary "walled garden" - it is diffused widely on the open web, and is a critical component in Linked Data and Semantic Web applications.

The ISNI is not intended to provide direct access to comprehensive information about a "Public Identity" - instead, it is designed to act as a 'bridge identifier' to link systems where comprehensive information is held, such as Ringgold’s Identify Database.

The latter has been built upon ISNI and adds also other info, like hierarchy, institution data etc.

We named ORCID previously: Users may include employment and education affiliations in their ORCID Records. Affiliations must include an institution's name and address but should also include a unique identifier to disambiguate the institution. Currently, ORCID uses Ringgold organization identifiers to generate a pick-list of organizations.


Tuesday, February 4, 2014

Missing addresses in patstat ediction oct. 2013

Aside from an already announced by epo lack of data in ipc table (see here) there is another issue, this time involving address data.

The check has been run against EPO data (thanks to Xiaoyan Song from ECOOM for rising the issue)

Select
  patstat.tls207_pers_appln.PERSON_ID,
  patstat.tls207_pers_appln.APPLN_ID,
  patstat.tls207_pers_appln.APPLT_SEQ_NR,
  patstat.tls207_pers_appln.INVT_SEQ_NR,
  patstat.tls211_pat_publn.PUBLN_AUTH,
  patstat.tls206_person.PERSON_NAME,
  patstat.tls206_person.PERSON_ADDRESS,
  patstat.tls206_person.DOC_STD_NAME_ID
From
  patstat.tls206_person Inner Join
  patstat.tls207_pers_appln On patstat.tls207_pers_appln.PERSON_ID =
    patstat.tls206_person.PERSON_ID Inner Join
  patstat.tls211_pat_publn On patstat.tls211_pat_publn.APPLN_ID =
    patstat.tls207_pers_appln.APPLN_ID
Where
  patstat.tls211_pat_publn.PUBLN_AUTH = 'EP' And
  patstat.tls206_person.PERSON_ADDRESS = ""

We find that 292110 application/person have no address vs a total of 16.866.000 (about 2%).

This means many person_ids that in previous edictions had an address now have lost it.

For instance appln_id 57746196 had in oct 2012 applicant 4473356 Canon Kabushiki Kaisha with address 30-2 Shimomaruko 3-chome  Ohta-ku Tokyo.

Now appln_id 57746196 has as applicant person_id 5216598 that is still Canon Kabushiki Kaisha has no address.

In total we have 188107 person_id that suffer from this issue.

Some of them, counting by appln_id, are big players.


person_id  count(appln_id)
263 7208
5355192 1547
5238080 1333
5235181 1014
5217811 612
5818520 459
5213673 386
6156118 344
5262071 288

Person_id 263 (empty name and address) in reality has been also added in many cases where previous versions of patstat reported no applicant / inventor for a given application.

Monday, September 23, 2013

UK Business Structure Database (BSD)

The Business Structure Database (BSD) is a dataset made available without fee for research porposes, which contains a number of variables for almost all business organisations in the UK. The BSD is derived primarily from the Inter-Departmental Business Register (IDBR), which is a live register of data collected by HM Revenue and Customs via VAT and Pay As You Earn (PAYE) records. The IDBR data are complimented with data from ONS business surveys. Timeframe available is 1997-2011.

The following variables are available for enterprises and local units:

    employment (and employees)
    turnover
    Standard Industrial Classification (1992, 2003 and 2007 classifications are available)
    legal status (e.g. sole proprietor, partnership, public corporation, non-profit organisation etc)
    foreign ownership
    birth (company start date)
    death (termination date of trading)

'Employment' includes business owners, whereas 'employees' measures the number of staff, excluding owners.

Full list of variables is listed here:

http://www.esds.ac.uk/doc/6697/mrdoc/excel/variables_in_idbr_1997_2005_with_ent_code_generator.xls

Registration is required and standard conditions of use apply.

http://discover.ukdataservice.ac.uk/catalogue?sn=6697

Friday, May 24, 2013

About persons country code in Patstat

One of the most important piece of information about inventors and applicant is the country code.

This code is copied from the 'standard' DOCDB table and added to the 'bypass' data, matching on the application id of authority, number and kind code and inventor sequence number or applicant sequence number.

In october 2012 ediction, only 50% of country-codes are present. In future versions , EPO promises to be able to fill the missing codes.

Note that the EPO does not receive the Country Code value with the Japanese data which is loaded into DOCDB; for this reason there are no PERSON_CTRY_CODEs in PATSTAT for Japanese documents.



Anyway EPO states 'country code does not necessarily indicate the "Nationality" of inventor or applicant'.


Looking into TLS206_ASCII  table, that is a separate table for inventors and applicants data, we may also find two pieces of information who are not provided in standard data, named as nationality and residence.

Unfortunately out of 41.478.407 persons in oct. 2012 ediction, only 3.248.663 have either a non blank nationality or residence (most of them come from USPTO application authority).
Even worse, only 350 records out of those with a non blank nationality or residence show a value different from PERSON_CTRY_CODE, making that piece of information unreliable. 

Tuesday, January 31, 2012

Amended version of TLS206 ascii released by EPO

As in previous post  it was announced that version of TLS206 ascii distributed with patstat was corrupted, EPO has put a fix to this issues and the new amended version can be downloaded from this web page

https://publication.epo.org/raw-data/product?productId=1 

What applicant name is in Espacenet?

Espacenet is the online database from EPO that provides free access to more than 70 million patent documents worldwide, containing information about inventions and technical developments from 1836 to today.

Among data contained, is very interesting to understand what applicant name is displayed from this online resource.

Let's take an example: US PATENT 6646858

The web page lists as applicants:

Applicant(s): FILTEC GMBH [DE] +

And by pressing the + you will also get the names:

DINGENOTTO MEINOLF, ; KUHLE JORG, ; FILTEC FILTERTECHNOLOGIE FUER DIE ELECTRONIKINDUSTRIE GMBH

Let's see what these names are.

The first one, shown before pressing the +, is the standard name (as from table TLS208 in patstat) taken at the moment of publication of the document.

By pressing the + key we get the names of applicant(s) how they are written in DOCDB (or in TLS206 in patstat).

Why pressing + we see more names than the one shown in the first instance? Because (as for most of US patents) between first publication and grant there has been a change in legal status introducing the real owner and removing inventors from applicants list.


PRS Date : 2003/09/22
  PRS Code : AS
  Code Expl.:   ASSIGNMENT
     NEW OWNER : FILTEC FILTERTECHNOLOGIE FUR DIE ELEKTRONIKINDUSTR
     EFFECTIVE DATE : 20021120

We have a confirmation of thsi fact by checking the first publiation of the patent that is: US2003090856 and it shows as applicants:

Applicant(s): DINGENOTTO MEINOLF, ; KUHLE JORG, ; FILTEC FILTERTECHNOLOGIE FUER DIE ELECTRONIKINDUSTRIE GMBH

So to sum up what we discovered:

Espacenet shows the standard name (standardized by EPO) of applicants at the moment of publication of document.
Pressing + we get the orginal names as written in the document.
Names displayed are not sensitive to legal status changes like change of ownership or correction of names.

Tuesday, January 10, 2012

EEE-PPAT: Released version oct 2011



EEE-PPAT table is an extension of the PERSON TABLE produced by ECOOM (Catholic University of Leuven) and Eurostat. The extension concerns sector allocation and name harmonization of applicants.


It can be required at no cost by contacting technoinfo@ecoom.be; some documentation @ this link 


It contains the following columns with original PATSTAT person_id:
        - PERSON_ID
        - HRM_LEVEL1: harmonized name level1
        - SECTOR : Sector of assignee name 


The file is coded in UTF-8, tab-delimited.
The definition of table is as follow:
        - PERSON_ID                       number(9)
        - HRM_LEVEL1                   char  (400)
        - SECTOR                             char  (50)


The file, tab delimited, contains 12488647 records.


The 'compression rate' is good: out of 10.324.068 distinct applicant names contained in TLS206, EE-PPAT reduces them into 8.227.328, and sector allocation is distributed as follows:


COMPANY
2173055
COMPANY GOV HOSPITAL
1
COMPANY GOV NON-PROFIT
35307
COMPANY GOV UNIVERSITY
176
COMPANY HOSPITAL
1601
COMPANY UNIVERSITY
1544
GOV NON-PR0FIT
4
GOV NON-PROFIT
105750
GOV NON-PROFIT UNIVERSITY
669
HOSPITAL
5028
INDIVIDUAL
4774071
UNIVERSITY
45506
UNKNOWN
1252219

[Lines below are cancelled since data have been corrected with updated the data on FTP on 2012 Jan. 10 CET 13:51.]


(some look like small mistakes like GOV NON-PR0FIT)



While loading the data you can have an error with person_id  4264883, 9883343, 8758108 that have sector null since the text COMPANY was 'taken' in the name harmonized name that finished with a slash, making problem to the recognition of tab field delimiter in the three records.


@ this link you can download a script for loading the table EEE-PPAT into mysql; it also has a patch for the 3 wrong records. 
[do not use the patch for 2012 Jan. 10 CET 13:51. data]


Wednesday, March 9, 2011

How to get missing country code from homonyms in patstat applicants

In october 2010 patstat ediction we find in table TLS206 37.428.107 distinct person id (applicants or inventors); we would expect (or hope) they to have, a part from name, some geographic data, but as stated in some previous posts, a lot of them miss all informations a part from name making data quality improvement a little harder.
Exactly 13.032.871 persons (a 28% of the total) have no country code (and obviously in most cases no city, address etc.).

I start here a some posts about how to try to find clusters where it's possible to improve data quality of countries.

The first case I'll investigate applicants with no country codes.

let's take FI patent AP273 invented by BRUCE HOWARD DIXON [US] and applied by HOWARD DIXON BRUCE (with no country);
If we look at the PDF of the patent we see Bruce as applicant is listed as living in Florida, US, and as inventor is listed as "SEE ABOVE"; so we may presume a lot of applicants with no country when they have omonims in the same application, can inheritate country code from homonim inventor.

We must anyway remove possible doubles, like this couple of us patents, A & B where same applicant (and person id) Abate Riccardo invents a patent first as US, latter as IT.
So when creating correction table we must remove multiple occurences of same persons.

Applying this procedure in a simple way (I mean no standardization of names, just sheer string match) 472.668 (3,7% of missing)  persons with no country code can be assigned a country code.

Thursday, October 28, 2010

USPTO persons data quality

In a previous post I analized geographic data quality of inventors in PATSTAT, for all the data contained.
The results for USPTO where not so good...


APPLN_AUTH inventors no state no zip no country no address no city
US 5960856 86% 98% 21% 97% 25%

In order to understand how is the trend across years (if data quality is improving or noise is spread allover the data) I selected applicant/inventors who filed after 31/12/1999 applications type “A” (patents of invention);
the results was that 1% have no country and about 2% have no city (so it could be impossible to assign a region/NUTS2 code).



distinct appl/inv 2774408
ctry code blank 29858 1,08%
city blank 56119 2,02%

If we investigate by inventor/applicant county whether  there may be some nation having a BIAS (only for those countries having more than 100 inventors/applicants) we see that those who have values above average are countries outside Europe, so not included in some regional reclassifications like NUTS2.


CTRY CODE COUNT EMPTY CITY %
'' 29858 29071 97,40%
'AE' 166 2 1,20%
'AR' 1265 21 1,70%
'AT' 10330 117 1,10%
'AU' 22688 499 2,20%
'BB' 181 21 11,60%
'BE' 12980 230 1,80%
'BG' 428 5 1,20%
'BM' 249 39 15,70%
'BR' 4137 54 1,30%
'BS' 137 3 2,20%
'BY' 163 2 1,20%
'CA' 70324 800 1,10%
'CH' 23982 489 2,00%
'CL' 558 14 2,50%
'CN' 29490 180 0,60%
'CO' 251 5 2,00%
'CR' 148 20 13,50%
'CU' 547 7 1,30%
'CY' 156 5 3,20%
'CZ' 1297 34 2,60%
'DE' 178725 2477 1,40%
'DK' 10973 204 1,90%
'EE' 225 0 0,00%
'EG' 228 0 0,00%
'ES' 10369 151 1,50%
'FI' 13552 123 0,90%
'FR' 67931 881 1,30%
'GB' 76473 1480 1,90%
'GR' 801 14 1,70%
'HK' 5289 128 2,40%
'HR' 378 6 1,60%
'HU' 2316 14 0,60%
'ID' 219 8 3,70%
'IE' 5163 60 1,20%
'IL' 25108 273 1,10%
'IN' 18600 189 1,00%
'IR' 236 5 2,10%
'IS' 414 9 2,20%
'IT' 31398 516 1,60%
'JP' 479365 3346 0,70%
'KR' 115483 978 0,80%
'KW' 162 0 0,00%
'KY' 255 39 15,30%
'LI' 376 21 5,60%
'LT' 149 2 1,30%
'LU' 815 55 6,70%
'LV' 115 0 0,00%
'MC' 113 6 5,30%
'MX' 2341 25 1,10%
'MY' 2666 24 0,90%
'NL' 28215 972 3,40%
'NO' 5912 106 1,80%
'NZ' 3875 141 3,60%
'PH' 742 3 0,40%
'PK' 135 0 0,00%
'PL' 1139 5 0,40%
'PT' 735 8 1,10%
'RO' 328 0 0,00%
'RU' 6956 40 0,60%
'SA' 516 3 0,60%
'SE' 23806 343 1,40%
'SG' 8106 98 1,20%
'SI' 567 4 0,70%
'SK' 277 0 0,00%
'TH' 797 18 2,30%
'TR' 618 3 0,50%
'TW' 113480 939 0,80%
'UA' 844 1 0,10%
'US' 1311956 10561 0,80%
'VE' 456 3 0,70%
'VG' 430 48 11,20%
'YU' 103 0 0,00%
'ZA' 2572 68 2,60%