Friday, November 20, 2015

What's new in patstat and EP register 2015b

Recently released from EPO, new version of EPO patent datasets contain a lot of minor changes, but all together is a big batch of novelties. Let make a list of them all:

a) TLS_201_APPLN: many aggregated attributes representing various date formats have been removed. To be more coherent, a number of attributes have been renamed. (See Data Catalog for the details.)
One extra attribute: EARLIEST_FILING_ID has been added (first priority in other terms).
This attribute creates a link to the application from which the EARLIEST_FILING_DATE was taken. Analysts use this date as an indicator for being the “closest date to moment of invention”, but many times they also want to know where this invention was filed or who the applicant was. This EARLIEST_FILING_ID creates the link to those “earliest” applications without the need for further aggregations.
Language fields have been removed to be put in the realtive table (202 and 203).

In sintesys following fields have been added:

GRANTED int(4) NOT NULL default '0',
DOCDB_FAMILY_SIZE int(4) NOT NULL default '0',
NB_CITING_DOCDB_FAM int(4) NOT NULL default '0',
NB_APPLICANTS int(4) NOT NULL default '0',
NB_INVENTORS int(4) NOT NULL default '0',

b) Table TLS202_APPLN_TITLE : moved the attribute APPLN_TITLE_LG away from TLS201_APPLN into TLS202_APPLN_TITLE.

c) Table TLS203_APPLN_ABSTRACT: similar to (b), moved the language attribute into the abstracts table.

d) IPC_SUBCLASS_SYMBOL and TECHN_FIELD_NR have been removed from TLS209_APPLN_IPC.They are replaced by table TLS230_APPLN_TECHN_FIELD.

e) Table TLS212_CITATION: in the past the attribute CITN_ORIGIN could have the value “115”, which sounded very mysterious, but was in fact nothing more than a reference to Article 115 from the European Patent Convention on “Observations by third parties”. So we changed the value to “TPO” (Third Party Observations).

f) Tables TLS218_DOCDB_FAM and TLS219_INPADOC_FAM have been removed all together and integrated into TLS201_APPLN. This is perfectly possible because every application (except some replenished applications) belongs to exactly 1 DOCDB and 1 INPADOC patent family. This means less “joins” when working with patent families.

g) Table TLS224_APPLN_CPC: it has been removed the CPC_MAINGROUP_SYMBOL attribute. If you need it you may use this SQL expression: LEFT(CPC_CLASS_SYMBOL,4) will give you exactly the same.

h) TLS226_PERSON_ORIGIN: extra attributes have been added to store the 5 address lines which are now provided for EP applications. (Keep in mind that addresses information is removed from PATSTAT Online (not PATSTAT Raw Data) for inventors and applicants where the SECTOR attribute is “INDIVIDUAL”, “UNKNOWN” or empty.)

i) TLS230_APPLN_TECHN_FIELD: new table; this table is the result of combining and applying the weighing of the information in table TLS901_TECHN_FIELD_IPC and the IPCs of an application. Consequently, applications without IPCs cannot be assigned to technical fields.

EPO Register for PATSTAT
Numerous changes have been implemented in the Register database, in fact too many to go into detail of each one of them.
Similar to the PATSTAT database, it was optimised the logical model and removed and/or regrouped many tables in order to make the database more compact and easier to work with. As a result of these changes, many tables have different Primary Keys, which you have to be aware about when joining tables in your queries.

a) Table REG114_DATES is a re-grouping of REG114 together with REG122, REG123, REG126 and REG134.

b) Table REG128_LIMITATION regroups REG128 with REG129 and REG130

c) Table REG130_OPPONENT replaces REG131, REg132, REG133 and REG137

d) Tables REG120, REG121 and REG124 (all containing dates) have been removed because the data was a duplication of information contained in other REG20X tables.

e) Table REG117_RELATION which establishes the links between parent and child applications has been completely remodelled.

f) Various attributes such as TYPE_LICENCE, SEARCH_TYPE, CAUSE_INTERUPTION have now mnemonic codes instead of numerical ones.

g) The biggest change is probably the fact that tables REG115_NPL_CITATION and REG116_PAT_CITATION have been completely removed. The same and even more complete citation data is already available via the various citation tables in the PATSTAT database. (TLS212, TLS214, TLS215, ...) The “EPO Register for PATSTAT” database can be linked to the PATSTAT database via the APPLN_ID in REG101_APPLN and TLS201_APPLN.

Recently, the data model of the input data for the PATSTAT database was considerably changed.
In this context, a minor inconsistency of the citation data was observed when the PATSTAT Autumn 2015 edition was compiled.

The observed inconsistency may only affect detailed analyses of non-patent citations (NPL). Analyses limited to counting non-patent citations and other analysis are not affected. The EPO will make extra efforts to eliminate the inconsistency in due course.
In light of the limited impact of the observed inconsistency, the EPO decided to deliver the PATSTAT Autumn 2015 edition before the extra quality check of the PATSTAT citation data is completed. This pragmatic approach will enable the majority of PATSTAT customers to use the new edition now and to tap the benefits of the updated data.
When the extra quality check is completed, the amended citation data will be made available to PATSTAT customers. The reviewed citation data is expected to be available in November, and PATSTAT customers will be informed as soon as possible.