Wednesday, October 17, 2018

PATSTAT autumn 2018 MySQL upload scripts

at this link is possible to download a batch of scripts for MySQL that will allow you to upload new PATSTAT edition autumn 2018.

This release has some improvements as:

* Table TLS201_APPLN and TLS211:  attribute granted changed from 0/1 to Y/N.

* Table TLS212_CITATION: Euro-PCT applications did not have the citations from the international search report linked to the respective application (and publication). These are the so called A0 publications. To avoid this, EPO simply duplicated the citations from the international search report, and linked them to the respective EP publications.

* Table TLS803_LEGAL_EVENT_CODE: has been redesigned to match WIPO ST.27.

Tuesday, October 16, 2018

PATSTAT projects on github

Refilling PATSTAT addresses

this project contains a docker container in Python and MySQL to refill persons where addresses is missin

Classify Legal Entities And Individuals From Patent Applicants

A batch of MySQL script to discriminate type of applicant

Add official name of patent office

building descriptions for the International Patent Classification

An API embedded into a VM to get the full description of IPC codes

PATSTAT loader

Python library and associated code for preparing PATSTAT inventor-patent data for disambiguation with either the Torvik-Smallheiser or Open City Dedupe algorithms.

fuzzygeo provides a fuzzy geocoding routine for geocoding at the named entity (city or similar) level

a simple supervised learning algorithm to classify PATSTAT records into two categories:
  • person names
  • not person names


Friday, September 21, 2018

Google dataset search

Recently Google launced a new service aiming to index local, public and national data repositories: Google Dataset Search.

Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.

Google also developed guidelines for dataset providers to describe their data in a way that search engines can better understand the content of their pages.

The approach is based on an open standard for describing this information ( and anybody who publishes data can describe their dataset this way.

The engine also links, where possible, the dataset to Google Scholar articles using them.

Full story @ link

Wednesday, September 19, 2018

How to build indicators from PATSTAT step by step

A presentation I prepared about how to build a patents based indicator (Patents or├Čiginality) step by step for EPO PATSTAT avoiding most common pitfalls, included commented SQL code
hope you will find it useful

Thursday, September 13, 2018

MySQL upload scripts for EEE-PPAT 2018a

EEE-PPAT table is an extension of the PERSON TABLE (TLS206) produced by ECOOM (Catholic University of Leuven) and Eurostat. The extension concerns sector allocation and name harmonization of applicants.

2018a version contains 56611335 records, and it has improvements for standard names and sector, compared to PATSTAT ediction.

It can be required at no cost by contacting

MySql script I created for uploading the table can be downloaded here

Thursday, August 30, 2018

Online regular expression validators

a quick note on two useful on line tools allowing test and validate regular expressions.

First one is

that allows tests in multiple dialects (php, javascript, python...) and gives a verbose but useful explanation of the steps.

Last but not least it allows to share links to tests prepared like this one:

also useful is:

that comes with the comforting subtitle
You thought you only had two problems…
and allows crreating diagrams about how a given regexp works


Monday, June 18, 2018

Linked Open EP data

at URL
is now available an linked open data version of EP patent data.

Linked Open EP data is a  data product provided by the EPO. It contains EP publications with their
bibliographic and  family information.
It also contains some basic information on non-EP patents, which are related to EP patents, e. g. because they are a priority of an EP document or they are in the same familyas an EP application.

The product also comes with a simple application programming interface (API), allowing you to consult reference data, explore that data and try out ideas on a small scale.
A SPARQL interface enables you to analyse the data.

Linked open EP data uses Uniform Resource Identifiers (URIs) to identify patent applications, publications and other resources present in patent data. This allows data in one dataset to be linked to data in another dataset. Given its URI, data about a resource can be retrieved in a variety of formats over the web. For occasional use there is a simple data browser, an API and a query interface. For heavier use, bulk data is available for download.

Each application has a unique identifier which looks like a URL and has this structure:

cc stands for the Office code.
nnnnnnnn stands for the application number.