Friday, September 21, 2018

Google dataset search

Recently Google launced a new service aiming to index local, public and national data repositories: Google Dataset Search.

Dataset Search lets you find datasets wherever they’re hosted, whether it’s a publisher's site, a digital library, or an author's personal web page.

Google also developed guidelines for dataset providers to describe their data in a way that search engines can better understand the content of their pages.

The approach is based on an open standard for describing this information (schema.org) and anybody who publishes data can describe their dataset this way.

The engine also links, where possible, the dataset to Google Scholar articles using them.


Full story @ link
https://www.blog.google/products/search/making-it-easier-discover-datasets/

Wednesday, September 19, 2018

How to build indicators from PATSTAT step by step

A presentation I prepared about how to build a patents based indicator (Patents or├Čiginality) step by step for EPO PATSTAT avoiding most common pitfalls, included commented SQL code
hope you will find it useful

Thursday, September 13, 2018

MySQL upload scripts for EEE-PPAT 2018a

EEE-PPAT table is an extension of the PERSON TABLE (TLS206) produced by ECOOM (Catholic University of Leuven) and Eurostat. The extension concerns sector allocation and name harmonization of applicants.

2018a version contains 56611335 records, and it has improvements for standard names and sector, compared to PATSTAT ediction.


It can be required at no cost by contacting technoinfo@ecoom.be

MySql script I created for uploading the table can be downloaded here

Thursday, August 30, 2018

Online regular expression validators

a quick note on two useful on line tools allowing test and validate regular expressions.

First one is

https://regex101.com

that allows tests in multiple dialects (php, javascript, python...) and gives a verbose but useful explanation of the steps.

Last but not least it allows to share links to tests prepared like this one:

https://regex101.com/r/9y9n85/20

also useful is:
https://regexper.com/


that comes with the comforting subtitle
You thought you only had two problems…
and allows crreating diagrams about how a given regexp works

example:

 





https://regexper.com/#Results%20%3D%20%28%5Cd%2B%29 

Monday, June 18, 2018

Linked Open EP data

at URL
https://www.epo.org/searching-for-patents/data/linked-open-data.html
is now available an linked open data version of EP patent data.

Linked Open EP data is a  data product provided by the EPO. It contains EP publications with their
bibliographic and  family information.
It also contains some basic information on non-EP patents, which are related to EP patents, e. g. because they are a priority of an EP document or they are in the same familyas an EP application.

The product also comes with a simple application programming interface (API), allowing you to consult reference data, explore that data and try out ideas on a small scale.
A SPARQL interface enables you to analyse the data.

Linked open EP data uses Uniform Resource Identifiers (URIs) to identify patent applications, publications and other resources present in patent data. This allows data in one dataset to be linked to data in another dataset. Given its URI, data about a resource can be retrieved in a variety of formats over the web. For occasional use there is a simple data browser, an API and a query interface. For heavier use, bulk data is available for download.

Each application has a unique identifier which looks like a URL and has this structure:

https://data.epo.org/linked-data/doc/application/cc/nnnnnnnn

where
cc stands for the Office code.
nnnnnnnn stands for the application number.

Friday, June 8, 2018

Applications disappearing across PATSTAT edictions

Interesting fact, especially if you have to run periodical reports: new edictions in PATSTAT not only add new application but a small number of existign applications disappear across edictions.

Below query counts patents disappearing. Results are listed in the table below for offices/appln_kind with more than 1000 disappearing applications

APPLN_AUTH
 APPLN_KIND
 EARLIEST_PUBLN_year
 Count_APPLN_ID
'AU'
 'A'
9999
1142
'AU'
 'D'
1968
1067
'AU'
 'D'
1969
1351
'AU'
 'D'
1970
1458
'AU'
 'D'
1971
1491
'AU'
 'D'
1972
1389
'AU'
 'D'
1973
7322
'AU'
 'D'
1974
7474
'AU'
 'D'
1975
4478
'AU'
 'D'
1976
2554
'AU'
 'D'
1977
4680
'AU'
 'D'
1978
7522
'AU'
 'D'
1979
11628
'AU'
 'D'
1980
9765
'AU'
 'D'
1981
7680
'AU'
 'D'
1982
7652
'AU'
 'D'
1983
7804
'AU'
 'D'
1984
7995
'AU'
 'D'
1985
8747
'AU'
 'D'
1986
9701
'AU'
 'D'
1987
9809
'AU'
 'D'
1988
10245
'AU'
 'D'
1989
6807
'AU'
 'D'
1990
11410
'AU'
 'D'
1991
10272
'AU'
 'D'
1992
8128
'AU'
 'D'
1993
5961
'AU'
 'D'
1994
7735
'AU'
 'D'
1995
9861
'AU'
 'D'
1996
12173
'AU'
 'D'
1997
11543
'AU'
 'D'
1998
15028
'AU'
 'D'
1999
15274
'AU'
 'D'
2000
15162
'AU'
 'D'
2001
11687
'AU'
 'D'
2002
3623
'KR'
 'A'
2005
1956
'KR'
 'A'
2006
2954


Some offices show a systematic decrease of number of applications

office
app kind
earl pub date
n apps 2018a
n apps 2016b
Colonna1
'IT'
 'A'
2010
9040
9257
-2%
'IT'
 'A'
2011
9411
9681
-3%
'IT'
 'A'
2012
8913
9152
-3%
'IT'
 'A'
2013
8858
9180
-4%
'IT'
 'A'
2014
8234
8494
-3%
'IT'
 'A'
2015
3833
3865
-1%


SELECT
  a.APPLN_AUTH,
  a.APPLN_KIND,
  a.EARLIEST_PUBLN_year,
  Count(DISTINCT a.APPLN_ID) AS Count_APPLN_ID
FROM
  patstat2016b.tls201_appln a
  LEFT JOIN patstat.tls201_appln b ON b.APPLN_ID = a.APPLN_ID
WHERE
  b.APPLN_ID IS NULL
GROUP BY
  a.APPLN_AUTH,
  a.APPLN_KIND,
  a.EARLIEST_PUBLN_year,
  b.APPLN_ID