Rawpatentdata: inventors

Showing posts with label inventors. Show all posts

Sunday, June 7, 2015

PatentsView Inventor Disambiguation Technical Workshop

On behalf of the US Patent & Trademark Office, the American Institutes for Research (AIR) is hosting an inventor disambiguation technical workshop. USPTO is seeking creative new approaches to get better information on innovators and the new technologies they develop by disambiguating inventor names.

AIR invites individual researchers or research teams to develop inventor disambiguation algorithms using US patent data. The top fifteen teams will be invited to present their results at the final workshop, which will be held at USPTO headquarters September 23 and 24. The researcher or team that contributes the most effective algorithm will receive a $25,000 stipend.

July 1st is the Deadline for prospective participants to submit a 1-page “intent to participate” document. This includes any requests to incorporate additional data, software, or hardware requirements. All teams with proposals deemed to be reasonable (by the judges’ panel) will be invited to participate.

Additional information is posted at www.dev.patentsview.org/workshop, together with the training datasets.

Wednesday, March 11, 2015

Refilling patent inventors country in patstat

When trying to make statistics by country of inventor, using Patstat, we always have to face the issue of missing data I already highlighted in some previous posts (for example here http://rawpatentdata.blogspot.it/2013/05/about-persons-country-code-in-patstat.html)

de Rassenfosse, G., Dernis, H., Guellec, D., Picci, L. & van Pottelsberghe de la Potterie, B., 2013. "The worldwide count of priority patents: A new indicator of inventive activity". Research Policy 42(3), 720-737.

The indicator proposed in this paper counts priority patent applications filed by inventors from

a given country regardless of the patent office of application eliminating the geographic bias (but at the

introducing an institutional bias)

This methodology thus needs to get rid of the wide gaps in Patstat country assignation in table TLS206.

REFILLING of ctry codes:

The algorithm first selects all the priority filings of a given patent office in a given year. Then, for each filing that has missing information on the inventor’s country of residence, the algorithm looks into

six potential sources of information

• Source 1: the priority document itself, when the information is available.

• Source 2: Retrieves information on inventors from the earliest direct equivalent in which the information is available.

• Source 3: If no information is available in the direct equivalents, the other second filings of the same family are browsed.

• Source 4: the country of residence of the applicant, as indicated in the priority document,

is used to proxy the country of the inventor.

• Source 5: If the country of the applicant is missing, it is searched for in the direct equivalents.

• Source 6: If no information on the applicant’s country was found, it is tracked in all the other second filings of the same family.

• Source 7: Finally, if the information is still missing, the country of the priority office is used for the country of residence of the inventor.

Mysql code and explanation can be downloaded from:

http://gder.phpnet.org/rassenfosse/paper_The_worldwide_count_of_priority_patents.html

Monday, November 24, 2014

Using patstat in universities evaluation procedures

This work shows a methodology used to match PATSTAT inventor names to a full list of researchers working in Italian universities.
The goal is to have higher recall, leaving institutions/researchers to validate the data.
Focus will not be on results (evaluation still in progress) but on data processing, selection and match algorithm, highlighting some difficulties and relative workarounds.

Monday, October 13, 2014

Differences of inventors within the same docdb family (part II)

(Continues from previous post)

As previously stated, 95% of docdb families contain applications with the same number of inventors;
obviously it may also happen that part of inventors change among the applications. Thus we must check how deep is the variance of person_ids within the same docdb family.

Here we see the results (sql code appended in the end of this post):

range	count	%
<1 span="">	4620	0,01%
1	28708521	84,99%
1-2	1928802	5,71%
2-5	2447120	7,24%
5-20	637558	1,89%
>20	52859	0,16%

Two results are interesting:
1) almost 85% of families share the same inventors (in person_ids): if we wouls also count that some person_ids inside the family may refer to the same entity but only spelling is different (due to different data origin) this is validatign again our hypothesis;
2) we have 4620 odd families with more inventors than person_ids (but this may be explained either with duplications due to see applicant issue or with duplications in TLS207)

SQL CODE for counting number of inventors / persond_ids ratio:

Drop table if exists t1;

create TABLE T1

select a.DOCDB_FAMILY_ID, avg(ninv) as avginv from

(Select t18.DOCDB_FAMILY_ID, Max(t7.invt_seq_nr) as ninv, t18.APPLN_ID

From

patstat.tls218_docdb_fam t18 Inner Join

patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID

Where t7.invt_seq_nr > 0

Group By t18.DOCDB_FAMILY_ID, t18.APPLN_ID) as a

group by a.DOCDB_FAMILY_ID;

alter table t1 add index i1(DOCDB_FAMILY_ID);

select floor((totpers/avginv)*10)/10 as rate, count(c.DOCDB_FAMILY_ID) as cc

from

t1 as c

inner join

(Select t18.DOCDB_FAMILY_ID, Count(Distinct t7.person_id) As totpers

From

patstat.tls218_docdb_fam t18 Inner Join

patstat.tls207_pers_appln t7 On t18.APPLN_ID = t7.APPLN_ID

Where t7.invt_seq_nr > 0

Group By t18.DOCDB_FAMILY_ID) as b

on c.DOCDB_FAMILY_ID = b.DOCDB_FAMILY_ID

group by floor((totpers/avginv)*10)/10;

Thursday, October 9, 2014

Differences of inventors within the same docdb family (part I)

To create the full list of inventors that participated to an innvoation is not an trivial task.
Especially because if we mean for innovation not a mere application but a patent family, to make an append of all the person_ids for all applications belonging to the family would surely lead to undetected duplication of names (ie due to different spelling or address in different application authorities).
Thus one way could be to take only the inventors related to one application (ie the older or the one where data are more likely to be complete fi EPO).
In this case we may instead have an uncomplete recall of inventors whether across different applications one or more inventors may change, be amended or added.

One way to validate this idea is to count what is the difference between min and max count of inventors in the applications within the family. This could validate the fact that in most cases the list of inventors remains the same.
The count is here below: over 95% of docdb families have the same number of inventors for all applications

delta	n families	%
0	36.048.365	95,523%
1	859.567	2,278%
2	413.670	1,096%
3	206.235	0,546%
4	101.529	0,269%
5	48.545	0,129%
6	25.400	0,067%
7	13.372	0,035%
8	7.775	0,021%
9	4.432	0,012%
10	2.972	0,008%
11	1.697	0,004%
12	1.122	0,003%
13	836	0,002%
14	580	0,002%
15 or more	1.661	0,004%

The higher difference within a familis (98 inventors) is for family_id 39324928, containing 74 distinct patent applications where is patent WO2008051495 has 98 inventors, while JP2010520959 counts 0 inventors.