Wednesday, November 22, 2017

Using Orcid for migration studies

ORCID provides a persistent digital identifier that distinguishes every  researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between professional activities ensuring that one's work is recognized.

ORCID wasn't intended as a massive longitudinal survey of the global population of scientists, but with 3 million profiles and growing, it is becoming just that. So far a quarter of those researchers have voluntarily added personal information to their public ORCID profiles including the years, locations, and descriptions of their education and employment histories. As this voluntary sampling grows, the demographic and migration patterns of the scientific workforce is coming into focus. The biases are also apparent: ORCID users skew young, and certain countries are over- and underrepresented.

Bohannon J and Doran K prepared a set of tools and datasets allowing to use more easily such data:

A file contains 2.8 million public profiles from ORCID in both XML and JSON format. You will need 300 GB of free space to decompress the data and work with it.

A IPython Notebook provides code for processing the data from public_profiles.tar into manageable data files for analysis.

Another examples of usage of such data in migration studies:

Friday, November 17, 2017

alternative coordinates systems

Geohash is a public domain geocoding system invented by Gustavo Niemeyer; it encodes a geographic location into a short string of letters and digits. It is hierarchical thus is possible to have arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision). As a consequence of the gradual precision degradation, nearby places will often (but not always) present similar prefixes. The longer a shared prefix is, the closer the two places are.

query example:,-121.7&format=url&redirect=0

What3words is a geocoding system that encodes geographic coordinates into three dictionary words with a resolution of three metres.  For example, the torch of the Statue of Liberty is located at "toned.melt.ship".
the idea behind is that a triple of words is easier to remember than a long sequence of numbers that is the usual lat-long coordinates system.
What3words has a website, apps for iOS and Android, and an API that enables bidirectional conversion of what3words address and latitude/longitude coordinates.
The grid is two-dimensional, so the addressing scheme does not distinguish between floors in a building. The system supports 14 languages, although each language covers the world's entire land areas.

This website
allows to test conversion between lat-long and w3w.

Wednesday, October 25, 2017

How to calculate number of renewal in PATSTAT

One piece of information that is worthy to be collected in order to have a proxy of patent life or patent value is the number of renewals (yearly fees payed) after grant.

TLS231 in PATSTAT contain the renewal information under the code PGFP.

Unfortunately such table does not contain all renewals but only last one for each country.
I paste down here a procedure that calculates the max number of renewals, as years passed from grant to last renewal, grant year and number of countries where the patent had been renewed in the last valid renewal.

Starting point is a table applnidlist with a field appln_id with the list of applications we want to enrich [table should be indexed on appln_id].

alter table applnidlist add column grantyear int(4) default null;

update applnidlist a 
  INNER JOIN patstat.tls211_pat_publn t11 ON a.appln_id = t11.APPLN_ID
SET a.grantyear=    Year(t11.PUBLN_DATE) 
-- last data

drop table if exists t1;

create table t1
  Max(b.fee_renewal_year) AS Max_fee_renewal_year
  tazza.applnidlist a
  INNER JOIN patstat.tls231_inpadoc_legal_event b ON a.appln_id = b.appln_id
  b.event_code = 'PGFP'
 alter table applnidlist add column nrenew int(4) default null;
 alter table applnidlist add column nrenewcy int(4) default null; 
 alter table applnidlist add column maxrenyear int(4) default null;
alter table t1 add index i1(appln_id);

update applnidlist a inner join t1 b
on a.appln_id=b.appln_id set maxrenyear=Max_fee_renewal_year;

alter table applnidlist add index i6(maxrenyear);

  applnidlist a
(SELECT      b.appln_id,  b.fee_renewal_year,
Count(b.fee_country) AS Count_fee_country,
  Year(b.fee_payment_date) AS fee_payment_date
FROM   patstat.tls231_inpadoc_legal_event b
WHERE   b.event_code = 'PGFP'
GROUP BY   Year(b.fee_payment_date),   b.fee_renewal_year,   b.appln_id  ) z
   ON a.appln_id = z.appln_id AND a.maxrenyear = z.fee_renewal_year
  a.nrenew = fee_payment_date - grantyear,

Wednesday, October 11, 2017

MySQL upload scripts for EEE-PPAT 2017a

EEE-PPAT table is an extension of the PERSON TABLE (TLS206) produced by ECOOM (Catholic University of Leuven) and Eurostat. The extension concerns sector allocation and name harmonization of applicants.

2017a version contains 54430027 records, but it has improvements for standard names and sector, compared to PATSTAT ediction.

Here below how sector allocation changes (note sector '' is for inventors/individual that have no allocation)



It can be required at no cost by contacting

MySql script I created for uploading the table can be downloaded here

Friday, September 15, 2017

Patentsview mysql upload scripts

In previous days the new version of Patentsview database has been release

The PatentsView initiative was established in 2012 and is a collaboration between USPTO, US Deptartment of Agriculture (USDA)(1), the Center for the Science of Science and Innovation Policy, New York University, the University of California at Berkeley, Twin Arch Technologies, and Periscopic.
The PatentsView platform is built on a newly developed database that longitudinally links inventors, their organizations, locations, and overall patenting activity. The platform uses data derived from USPTO bulk data files. These data are provided for research purposes and do not constitute the official USPTO record.

From this link you can download my scripts for mysql to upload the new data.

Tuesday, August 22, 2017

2017 PatentsView Workshop on Engaging User Communities

The US Patent and Trademark Office’s Chief Economist has open  invitations to the 2017 PatentsView Workshop on Engaging User Communities.

The workshop is open to the public and will be held on Friday October 6, from 8:30am – 12:30pm on the USPTO campus in Alexandria, VA.

USPTO will officially launch at the workshop the PatentsView Community Site, updated data visualization with export functionality, and new data fields that can be accessed across PatentsView tools.  This year’s updates are based on feedback gathered by the user community at the 2016 PatentsView workshop.  The new Community Site includes a moderated forum for user inquiries and a Data in Action page for sharing analyses, visualizations, and publications. 

The goals of the workshop are:

(1) to launch the new Community Site and Data Visualization features;

(2) to present newly parsed and available patent data fields; and

(3) to gather feedback from patent data and analytics user communities in order to set priorities for future PatentsView open data products.  

More informations here.

Agenda buildup is in progress and it will be fully delivered in the coming weeks. 

Thursday, July 27, 2017

patent citations: a summary

In patent data, a CITATION is a reference to a  previous work which is relevant to the current patent application . They can be provided by the applicant, the patent examiner or sometimes third parties (typically during opposition phase).
The list of relevant citations for one patent is included in the search report.
They are documents the examiner is going to look closely at to establish the novelty of the invention
The cited document may be a patent or a non-patent (NPL) publication (e.g. a journal article).
Documents listed in the search report are also called backward citations (since they have been made public before the given patent.
Further applications that will cite the given patent will be addressed as forward citations.
Thus if Patent Z contains in its search report a reference to Patent A, A will be a backward citation for Z and Z a forward citation for A.
Other information that can be retrieved from citations is:
Citation origin: at what phase of patent’s life the reference has been introduced

APP citations introduced by the applicant
SEA citations introduced during search (from Search Report)
ISR citations from the International Search Report
SUP citations from the Supplementary Search Report
PRS "PRe-Search" citations (available before official publication; only for US applications)
EXA citations introduced during examination
OPP citations introduced during opposition (citations by opponent published with a European Patent Specification (EP-B2))
APL citations introduced when filed for appeal by applicant / proprietor / patentee
FOP citations introduced when filed opposition by any third party after the publication of a European Patent Specification (EP- B1)
TPO citations introduced because of Third Party Observations (Art 115 EPC)
CH2 citations introduced during the Chapter 2 phase of the PCT

Citation origin suffers in many cases of an institutional bias in cases, like USPTO, where applicant is required to file all possible prior art for duty of candor.

Application Category: this flag shows out how relevant is the citation regarding the limitation of novelty or claims.

X             Particularly relevant if taken alone.
Y             Particularly relevant if combined with another document of the same category.
A             Defining the state of the art and not prejudicing novelty or inventive step.
O            Non-written disclosure.
P             Intermediate document.
T             Theory or principle underlying the invention.
E             Earlier patent application, but published after the filing date of the application searched (potentially conflicting patent documents).
D             Document cited in the application.
L              Document cited for other reasons.