Monday, November 23, 2009

Patstat vs non patent citations (NPL)

For lovers of bibliometry patstat, along with patent citations, contains also non patent literature referenced by patents.

You con get those references from citations table (TLS212_CITATION) linking the table containing the fulltext of NPL citation (TLS214_NPL_PUBLN) by NPL_PUBLN_ID where NPL_CITN_SEQ_NR (the progressive for NPL citations) is different from 0

An issue on these tables is that TLS214 contains a lot of duplicates!!!

With an easy

select distinct trim(NPL_BIBLIO) from TLS214_NPL_PUBLN
I reduce the figures from 12.139.696 to 9.449.779 (a 23% less...))
[I added a trim cause many records start with a space]

maybe later on I'll post some SQL to deduplicate the data...

No comments:

Post a Comment