Monday, September 8, 2014

Creating a stable id for docdb families

When trying to work across different patstat edictions one issue that remains (even if now appln_id and person_id are stable) is the change of family ids.

In this post I offer a possibile solution and SQL code (mysql dialect) for a workaround for this issue.

The basic idea underlying is the fact that a docdb family can be appended with younger applications but not with older patents.

So I created a conversion table where for each family id the younger member's appln_id is listed; where two member were applied the same day I list the smaller appln_id.

This conversion table, tested across oct 2013 and april 2014 was working for 99.9% of cases, leftovers were families where probably some error was corrected in original data or families with some oddities in them.

I also take the chance to thank Geert Boedt from EPO for the suggestions he gave.


-- SQL CODE aiming to create table docdbfam_stableid

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

create table t1
Select
  t18.DOCDB_FAMILY_ID,
  Min(t01.APPLN_FILING_DATE) As fmin
From
  patstat.tls201_appln t01 Inner Join
  patstat.tls218_docdb_fam t18 On t18.APPLN_ID = t01.APPLN_ID
Group By
  t18.DOCDB_FAMILY_ID, t01.APPLN_ID;
 
create table t2
Select
  t18.DOCDB_FAMILY_ID,
  t01.APPLN_ID,
  t01.APPLN_FILING_DATE
From
  patstat.tls201_appln t01 Inner Join
  patstat.tls218_docdb_fam t18 On t18.APPLN_ID = t01.APPLN_ID; 
 
 
alter table t1 add index i1(fmin), add index i2(DOCDB_FAMILY_ID); 
alter table t2 add index i1(APPLN_FILING_DATE), add index i2(DOCDB_FAMILY_ID); 

create table t3
select distinct   t2.DOCDB_FAMILY_ID,  t2.APPLN_ID
from t1 inner join t2 on t1.fmin = t2.APPLN_FILING_DATE and t1.DOCDB_FAMILY_ID = t2.DOCDB_FAMILY_ID;

drop table if exists docdbfam_stableid;

create table docdbfam_stableid
select DOCDB_FAMILY_ID, min(appln_id) as stable_DOCDB_FAMILY_ID
from t3 group by DOCDB_FAMILY_ID;

-- exit

drop table if exists t1;
drop table if exists t2;
drop table if exists t3;

No comments:

Post a Comment