Monday, November 8, 2010

Linkage among application and publication tables in PATSTAT

We could expect (from EPO documents) that every application has 0 to N publications, where every publication belongs to exactly 1 application.
In reality, it isn't so.

Let's check in deep  the the table containing applications (TLS201) and the one with publications (TLS211).

Linkage among them may be established via APPLN_ID field, that is the main key in patstat.

Referring to sept 2009 ediction, we start by checking application table TLS201.
The first control aiming to discover if the same application_id could referr to distinct couples application authority / application number gives a positive feedback (it would have been a real truble otherwise!)

If we check, on the same table, couples application authority / application number having more than one application id, we find a 9% of duplication.
Nothing strange: we already stated  that the disambiguating data for applications are application authority, application number and APPLICATION KIND, so such a percentuage referrs to offices where same application number may be used (for instance) for a patent of invention and for an utility model, referring to different application filed.
So also this case looks ok with our data model.

Let's see instead what happens with publications table (TLS211).
Using the same citeria used on TLS201 we start checking if the same application_id could referr to distinct couples publication authority / publication number.

We get about a 10% of doublecounting, where the top 5 by publication office is
'JP'    4202644
'US'    867657
'CN'    513398
'GB'    333662
'IT'    264060


Bad news here is that this duplication is due to issues tightly related to the application authority!

In case of Japan, duplication is due to different publication numbers originated from the same application
See this example for publication number JP53123578.

For China happens the same of Japan (see example: CN 1274133)

About USPTO change of law took place in Y2K (eighteen-month publication provisions of the American Inventors Protection Act of 1999) creating a duplication the publications related to the same application: a publication number issued 18 month after publication, related to the application only, then later a final publication number issued when application is granted. An example here.

In Italian patent office (I'm so proud eventually we are in a top 5!!!) we meet a lot of D0 publication kind, that referr to a lot of non existing publication, or better, publication we cannot find FI in espacenet.
FI publication number IT1243259, kind B, shares application id with IT9001711, status D0, that is pastat documents is listed as 'Filing application'

Same issue with D0 kind is found in GB patent office, together with other duplicated kinds (British always overmake... see GB2358979)

So we cannot really find a general rule but country by country should be investigated; probably we may drop D0 type everywhere, and we should know that by counting by publication number we run the risk of ovestimate the number of patents.
Good news: EPO data are not affected from this issue.

Eventually if we check, on the same table, couples publication authority / publication number having more than one application id, we find a 11% of duplication.

We meet here patents in different status and, in some cases, also a link to an unpublished priority application (see from appln_id > 58.000.000 and date = 9999-12-31)

An example from EPO is patent EP122624, having 4 application ids: one for each of following states: A2, A3, B1, and also one for D1 that is an unpublished priority with date 31/12/9999.

Another example from JP patent office (we met this patent before) JP53123578.

No comments:

Post a Comment