Rawpatentdata: 2018

Wednesday, October 17, 2018

PATSTAT autumn 2018 MySQL upload scripts

at this link is possible to download a batch of scripts for MySQL that will allow you to upload new PATSTAT edition autumn 2018.

This release has some improvements as:

* Table TLS201_APPLN and TLS211: attribute granted changed from 0/1 to Y/N.

* Table TLS212_CITATION: Euro-PCT applications did not have the citations from the international search report linked to the respective application (and publication). These are the so called A0 publications. To avoid this, EPO simply duplicated the citations from the international search report, and linked them to the respective EP publications.

* Table TLS803_LEGAL_EVENT_CODE: has been redesigned to match WIPO ST.27.

Tuesday, October 16, 2018

PATSTAT projects on github

Refilling PATSTAT addresses

this project contains a docker container in Python and MySQL to refill persons where addresses is missin

https://github.com/cortext/patstat/tree/master/parsed%20addresses

Classify Legal Entities And Individuals From Patent Applicants

A batch of MySQL script to discriminate type of applicant

https://github.com/cortext/patstat/tree/master/applicants%20classification

Add official name of patent office

https://github.com/cortext/patstat/tree/master/nomenclatures/offices_classification

building descriptions for the International Patent Classification

An API embedded into a VM to get the full description of IPC codes

https://github.com/cortext/patstat/tree/master/nomenclatures/ipc_descriptions

PATSTAT loader

https://github.com/simonemainardi/load_patstat

psClean

Python library and associated code for preparing PATSTAT inventor-patent data for disambiguation with either the Torvik-Smallheiser or Open City Dedupe algorithms.

https://github.com/markhuberty/psClean

fuzzygeo

fuzzygeo provides a fuzzy geocoding routine for geocoding at the named entity (city or similar) level
https://github.com/markhuberty/fuzzygeo

psClassify
a simple supervised learning algorithm to classify PATSTAT records into two categories:

person names
not person names

https://github.com/mkln/psClassify

Friday, September 21, 2018

Google dataset search

Recently Google launced a new service aiming to index local, public and national data repositories: Google Dataset Search.

Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.

Google also developed guidelines for dataset providers to describe their data in a way that search engines can better understand the content of their pages.

The approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way.

The engine also links, where possible, the dataset to Google Scholar articles using them.

Full story @ link
https://www.blog.google/products/search/making-it-easier-discover-datasets/

Wednesday, September 19, 2018

How to build indicators from PATSTAT step by step

A presentation I prepared about how to build a patents based indicator (Patents orìiginality) step by step for EPO PATSTAT avoiding most common pitfalls, included commented SQL code
hope you will find it useful

Patstat indicators step by step from Gianluca Tarasconi

Thursday, September 13, 2018

MySQL upload scripts for EEE-PPAT 2018a

EEE-PPAT table is an extension of the PERSON TABLE (TLS206) produced by ECOOM (Catholic University of Leuven) and Eurostat. The extension concerns sector allocation and name harmonization of applicants.

2018a version contains 56611335 records, and it has improvements for standard names and sector, compared to PATSTAT ediction.

It can be required at no cost by contacting technoinfo@ecoom.be

MySql script I created for uploading the table can be downloaded here

Thursday, August 30, 2018

Online regular expression validators

a quick note on two useful on line tools allowing test and validate regular expressions.

First one is

https://regex101.com

that allows tests in multiple dialects (php, javascript, python...) and gives a verbose but useful explanation of the steps.

Last but not least it allows to share links to tests prepared like this one:

https://regex101.com/r/9y9n85/20

also useful is:
https://regexper.com/

that comes with the comforting subtitle
You thought you only had two problems…
and allows crreating diagrams about how a given regexp works

example:

https://regexper.com/#Results%20%3D%20%28%5Cd%2B%29

Monday, June 18, 2018

Linked Open EP data

at URL
https://www.epo.org/searching-for-patents/data/linked-open-data.html
is now available an linked open data version of EP patent data.

Linked Open EP data is a data product provided by the EPO. It contains EP publications with their
bibliographic and family information.
It also contains some basic information on non-EP patents, which are related to EP patents, e. g. because they are a priority of an EP document or they are in the same familyas an EP application.

The product also comes with a simple application programming interface (API), allowing you to consult reference data, explore that data and try out ideas on a small scale.
A SPARQL interface enables you to analyse the data.

Linked open EP data uses Uniform Resource Identifiers (URIs) to identify patent applications, publications and other resources present in patent data. This allows data in one dataset to be linked to data in another dataset. Given its URI, data about a resource can be retrieved in a variety of formats over the web. For occasional use there is a simple data browser, an API and a query interface. For heavier use, bulk data is available for download.

Each application has a unique identifier which looks like a URL and has this structure:

https://data.epo.org/linked-data/doc/application/cc/nnnnnnnn

where
cc stands for the Office code.
nnnnnnnn stands for the application number.

Friday, June 8, 2018

Applications disappearing across PATSTAT edictions

Interesting fact, especially if you have to run periodical reports: new edictions in PATSTAT not only add new application but a small number of existign applications disappear across edictions.

Below query counts patents disappearing. Results are listed in the table below for offices/appln_kind with more than 1000 disappearing applications

APPLN_AUTH	APPLN_KIND	EARLIEST_PUBLN_year	Count_APPLN_ID
'AU'	'A'	9999	1142
'AU'	'D'	1968	1067
'AU'	'D'	1969	1351
'AU'	'D'	1970	1458
'AU'	'D'	1971	1491
'AU'	'D'	1972	1389
'AU'	'D'	1973	7322
'AU'	'D'	1974	7474
'AU'	'D'	1975	4478
'AU'	'D'	1976	2554
'AU'	'D'	1977	4680
'AU'	'D'	1978	7522
'AU'	'D'	1979	11628
'AU'	'D'	1980	9765
'AU'	'D'	1981	7680
'AU'	'D'	1982	7652
'AU'	'D'	1983	7804
'AU'	'D'	1984	7995
'AU'	'D'	1985	8747
'AU'	'D'	1986	9701
'AU'	'D'	1987	9809
'AU'	'D'	1988	10245
'AU'	'D'	1989	6807
'AU'	'D'	1990	11410
'AU'	'D'	1991	10272
'AU'	'D'	1992	8128
'AU'	'D'	1993	5961
'AU'	'D'	1994	7735
'AU'	'D'	1995	9861
'AU'	'D'	1996	12173
'AU'	'D'	1997	11543
'AU'	'D'	1998	15028
'AU'	'D'	1999	15274
'AU'	'D'	2000	15162
'AU'	'D'	2001	11687
'AU'	'D'	2002	3623
'KR'	'A'	2005	1956
'KR'	'A'	2006	2954

Some offices show a systematic decrease of number of applications

office	app kind	earl pub date	n apps 2018a	n apps 2016b	Colonna1
'IT'	'A'	2010	9040	9257	-2%
'IT'	'A'	2011	9411	9681	-3%
'IT'	'A'	2012	8913	9152	-3%
'IT'	'A'	2013	8858	9180	-4%
'IT'	'A'	2014	8234	8494	-3%
'IT'	'A'	2015	3833	3865	-1%

SELECT
a.APPLN_AUTH,
a.APPLN_KIND,
a.EARLIEST_PUBLN_year,
Count(DISTINCT a.APPLN_ID) AS Count_APPLN_ID
FROM
patstat2016b.tls201_appln a
LEFT JOIN patstat.tls201_appln b ON b.APPLN_ID = a.APPLN_ID
WHERE
b.APPLN_ID IS NULL
GROUP BY
a.APPLN_AUTH,
a.APPLN_KIND,
a.EARLIEST_PUBLN_year,
b.APPLN_ID