Thursday, March 22, 2012

Converting old appln_id to current version in patstat


Even if now Application id in Patstat is stable, it could be needed to link data from edictions previous to 2011 to current one.

So I built a table named APPLN_ID_BRIDGE with the goal of providing a mean to link data from PatStat editions published before april 2011 (that is, before a stable APPLN_ID was introduced) to current data. In particular, they link the available (with unstable APPLN_ID) of  the October 2010 ediction to data from the  October 2011 data (with stable APPLN_ID)

The table is structured with following fields:

Appln_id1    application id of first PATSTAT version
Appln_id2    application id of second PATSTAT version
Edition1       PATSTAT edition of first version (YYYYMM)
Edition2       PATSTAT edition of second version (YYYYMM)

where Edition1 < Edition2 in order to avoid duplication of records (and decrease table size).

Due to the fact that APPLN_ID is unique by application authority, application number and application kind, we can build a bridge table by matching the appln_id that share into different addiction same values for previously listed fields.

Here is an example of SQL building the table for October 2010, October 2011.

ALTER TABLE `PATSTAT`.`tls201_appln` ADD INDEX `Index_3`(`APPLN_AUTH`),
 ADD INDEX `Index_4`(`APPLN_NR`),
 ADD INDEX `Index_5`(`APPLN_KIND`);

ALTER TABLE `PATSTATold`.`tls201_appln` ADD INDEX `Index_3`(`APPLN_AUTH`),
 ADD INDEX `Index_4`(`APPLN_NR`),
 ADD INDEX `Index_5`(`APPLN_KIND`);

create table PATSTAT.appln_id_bridge
Select
  t1old.APPLN_ID As appln_id1,
  t1new.APPLN_ID As appln_id2,
  "201010" As edition1,
  "201110" As edition2
From
  PATSTATold.tls201_appln t1old Inner Join
  PATSTAT.tls201_appln t1new On t1old.APPLN_AUTH = t1new.APPLN_AUTH And
    t1old.APPLN_NR = t1new.APPLN_NR And t1old.APPLN_KIND = t1new.APPLN_KIND
Group By
  t1old.APPLN_ID, t1new.APPLN_ID, "201010", "201110"

ALTER TABLE `PATSTAT`.`tls201_appln` drop INDEX `Index_3`, drop INDEX `Index_4`, drop INDEX `Index_5`;
ALTER TABLE `PATSTATold`.`tls201_appln` drop INDEX `Index_3`, drop INDEX `Index_4`, drop INDEX `Index_5`;

Be aware not all appln_id from a previous version will be matched in following one. The main reason seem to be corrections. For example application kind codes. Corrections create a new appln_id and delete the old one.

Table witrh data can be downloaded from this link

No comments:

Post a Comment