Ok.
Let's figure out we have a list of addresses coming from a database (PATSTAT FI).
You can bet that two cities who are the same will be written in a different number of way, the longer the city name the highest the figure!
A part from already cited Levenshtein distance we may also need to get rid of all those interpunctions, slash and other wonderful chars that the keyboard gives us.
Let's go to Germany, for instance.
If you have
Frankfurt/Main
FRANKFURT MAIN
FRANKFURT-MAIN
How may you say they are the same city?
We would need a function (let's call it ALFANUM) who removes all chars except A-Z and 0-9 and puts the string in uppercase returning
FRANKFURTMAIN
for all.
Here it is my version that works with strings up to 500 chars (but you may change it...)
CREATE DEFINER=`root`@`localhost` FUNCTION `alfanum`(in1 varchar(500)) RETURNS varchar(500) CHARSET latin1
BEGIN
DECLARE a INT;
DECLARE b varCHAR(500);
DECLARE c varCHAR(500);
DECLARE d char(1);
SET b="";
SET c = in1 ;
ciclo: WHILE c <>"" DO
set d = left(c,1);
set d = ucase (d);
set c= right(c, length(c)-1);
if ( d >= "A" and d <= "Z") or (d>="0" and d <="9" ) then
set b = concat(b,d) ;
end if;
END WHILE ciclo;
return b;
END
academic patenting
(4)
algorithms
(2)
anvur
(1)
APE-INV
(3)
applicants
(10)
applications
(11)
ascii
(1)
bibliometrics
(7)
bocconi
(2)
bug
(1)
china
(2)
citations
(11)
claims
(3)
concordance
(7)
conference
(8)
CPCs
(2)
curiosities
(1)
data quality
(12)
data recovery
(1)
database
(26)
datamining
(5)
disk
(1)
download
(1)
dump
(1)
ecla
(1)
entity resolution
(4)
EP register
(7)
epo
(15)
equivalents
(1)
espacenet
(2)
ethnicity
(2)
examination
(3)
excel
(3)
free
(2)
function
(1)
GDPR
(1)
gender
(1)
geocoding
(6)
github
(1)
icons
(1)
indicators
(1)
inpadoc
(9)
inventors
(21)
IPC
(21)
IPC35
(4)
job offers
(1)
KITeS
(3)
legal status
(16)
levenshtein
(1)
line breaks
(1)
linked open data
(1)
match
(1)
mobility
(1)
mysql
(23)
nace
(2)
national patents data
(6)
NBER
(1)
news
(1)
NPL
(7)
NUTS3
(6)
OHIM
(1)
openoffice
(1)
orbis
(1)
orcid
(1)
OS
(1)
OST
(2)
password recover
(1)
patent attorneys
(1)
patent data
(2)
patent family
(17)
patent ownership
(3)
patent status
(3)
patent value
(1)
patents
(49)
patentsview
(3)
patstat
(145)
person_id
(13)
priorities
(5)
python
(2)
reclassification
(8)
renewals
(1)
replace
(2)
scientific articles
(2)
scopus
(1)
semantic analysis
(2)
sipo
(3)
sql
(6)
strings
(4)
tool
(9)
trademarks
(2)
triadic patents
(2)
UDF
(1)
USPC
(1)
USPTO
(12)
VBA
(1)
vista
(1)
VM
(1)
webscraping
(2)
WIPO
(10)
workshops
(1)
Wos
(1)
xp
(1)
No comments:
Post a Comment