When import procedure in NAVYCAT has finished, first we should check the log in order to see if some records have been left due to import errors;
Regardless what the documents say, there will be some records that contain in L510EP field data with non ascii chars (a dozen in 2010 files).
You can use the log, correcting the wrong chars, to cue the records skipped via SQL.
On the other hand we should also crosscheck data contained in the T12BFYYMM___STAT file against the content of our table, for T12BF, by using the following SQL
select L001EP, count(l002ep) as c from test.patlegal
group by L001EP
If figures are equivalent, we can guess import was OK.
The import procedure up to here imported also header and trailing records, so we need to run a cleanup sql.
delete from test.patlegal
where L001EP is null and L002EP is null and L003EP is null
For better exploiting the data, this two handbook are very useful…
Also a vocabulary of PRS code can be downloaded here:
4) RESTRUCTURING DATA
In order to need to build a more consistent dataset, we need to start to look better into field contents.
We will deal here with T12BF data. XLEV will be subject of a further post.
4a) drop empty fields
Fields L006EP L009EP L010EP L011EP L015 L016 L019 L020 L514 are empty and they may be dropped.
Georg Huber from EPO explained me very kindly the reason:
"During the implementation of the new PRS system (finished 2003) it was not evident that some of the data in the old system have been superfluous (L006EP, L009EP).
For deletion of the data special information have not been implemented (=reasons for deletion of data) (L010EP, L011EP, L014EP).
The application numbers and the publication numbers had in the old PS system another number format (INPADOC number format) as in the new system (DocDB number format). These old number format was given to our customers until 2007 in the tags (L015EP, L016EP)
Tags L019EP, L020EP are foreseen as future possibilities that are not implemented yet.
We have not included any new legal events, the publication language is important or supplied by the national offices. therefore L015EP is always empty."
Be also aware that for some hundred cases the pair COUNTRY/PRSCODE1 in L001EP/L008EP field will not match the vocabulary previously linked;
These are some examples from 2010 table:
Always citing Georg Huber these are the reasons:
"For WO, I can see three reasons that these errors occurred.
1. There is not yet a publication number in the main database for these data and therefore the filing number was given instead of WO publication number (these is the reasons we send for WO publication numbers as in the main database the application numbers have the country code of the patent authority the PCT application has been filed).
2. A former existing publication number disappeared from the bibliographic main database
For SU and CS are have been similar problems, as these are applications of RU and CZ patents.
I assume that in the main bibliographic database the application number is still the former country. "
So Eventually this may be the demi-final version of our data
|L002EP||$1||Format of document number following rules for either (F)iling applications or (P)ublications|
|L004EP||$2||Kind code for document number (if provided)|
|L005EP||$2||IPR type (PI Patent of Invention / UM Utility Model)|
|L007EP||DATE8||PRS date; DATE_GAZETTE; date of notification to the public|
|L008EP||$4||4 bytes Legal Event code 1(lookup on table PRSCODE1)|
|L014EP||DATE8||Publication or filing date (if provided) of DOCDB document in tags L001EP, L003EP, L004EP|
|L017EP||$171||DOCDB publication ID; relates to the first publication level found in DOCDB|
|L018EP||$8||DATE this event was last exchanged to subscribers|
|L501EP||$2||Corresponding country code for PRS code •EP REG••|
|L502EP||$4||Corresponding EP code 1 for PRS code •EP REG••|
|L503EP||20||Corresponding patent document|
|L504EP||$2||Country code of corresponding patent document|
|L505EP||DATE8||Publication date of corresponding patent|
|L506EP||$2||Kind of corresponding patent document|
|L507EP||$300||List of designated states|
|L509EP||$255||New owner name or address if name or address of owner changes; addresses are NOT stored in this tag|
|L510EP||$700||Free format text|
|L515EP||$255||Inventor name (separated by ;)|
|L516EP||$50||International Patent Classification (comma separated)|
|L520EP||2||Year of fee payment - contains the xxth year for which the payment was made|
|L521EP||$30||New kind of IPR, new number; e.g. Brazil utility model - code GA;"MI4601602-3"|
|L522EP||$50||Name of requester|
|L524EP||$100||List of countries concerned with an event L507EP & L508EP have special significance.|
|L525EP||DATE8||Effective date; DATE_IN_FORCE|
|L526EP||DATE8||Date of withdrawal|
|L527EP||$1||Indicator for format of attribute list document number following rules for either (F)iling applications or (P)ublications. If not known, this tag will not be present; refers to the document given in L503EP and L504EP|
Some issues are left out of this post and will be faced soon, like:
- identify macro type of PRSCODE1 and link them to the correct field;
- transpose fields constining more occurrences of same info (FI L507EP with designated states)
- link to patstat via document number.