Thursday, July 18, 2019

Patents citations from Pubmed Scientific publications

A recently released dataset from Marx, Matt; Aaron Fuegi contains citations from USPTO patents granted 1947-2018 to articles captured by the MS Academic Graph (ID) from 1800-2018.

Files, tab-separated, are available at link :

The main file, pcs.tsv, contains the resolved citations matching patent number, MAG ID, the original citation from the patent, an indicator for whether the citation was supplied by the applicant, examiner, or unknown, and a confidence score (1-10) indicating how likely this match is correct.

There is also a PubMed-specific match in pcs-pubmed.tsv.

Authors also made availabel source code for generating the patent citations to science in pcs.tsv is available at Source code for generating and (Journal Impact Factor and Journal Commercial Impact Factor) is at

Scripts and programs are mainly for stata & linux (DO files and sh scripts).


Monday, July 8, 2019

PATSTAT - patentsview concordance update 2019

PatentsView  is a platform built on data derived from USPTO bulk data files.

This dataset complements perfectly PATSTAT since the former has an native disambiguation of inventors and applicants and a geocoding system applied to inventors and applicants, while the latter links US data to other offices, allowing to calculate knowledge spillovers, family data etc.

At this link is possible to download a table of concordance between patent_id (Patentsview main key) and appln_id (from PATSTAT).

Overlapping of the two datasets is not perfect since Patentsview contains only granted patents after 1975, where PATSTAT has also application (ungranted) and timeframe covers also pre-1975 data.
On the other hand PATSTAT misses design patents before 2001,  plants before 2001 and 'statutory invention registration' type of patent.

Data are from PATSTAT spring 2019 and Patentsview march 2019, thus also 2019 data in PW are partially missing in PATSTAT.