Ok.
Let's figure out we have a list of addresses coming from a database (PATSTAT FI).
You can bet that two cities who are the same will be written in a different number of way, the longer the city name the highest the figure!
A part from already cited Levenshtein distance we may also need to get rid of all those interpunctions, slash and other wonderful chars that the keyboard gives us.
Let's go to Germany, for instance.
If you have
Frankfurt/Main
FRANKFURT MAIN
FRANKFURT-MAIN
How may you say they are the same city?
We would need a function (let's call it ALFANUM) who removes all chars except A-Z and 0-9 and puts the string in uppercase returning
FRANKFURTMAIN
for all.
Here it is my version that works with strings up to 500 chars (but you may change it...)
CREATE DEFINER=`root`@`localhost` FUNCTION `alfanum`(in1 varchar(500)) RETURNS varchar(500) CHARSET latin1
BEGIN
DECLARE a INT;
DECLARE b varCHAR(500);
DECLARE c varCHAR(500);
DECLARE d char(1);
SET b="";
SET c = in1 ;
ciclo: WHILE c <>"" DO
set d = left(c,1);
set d = ucase (d);
set c= right(c, length(c)-1);
if ( d >= "A" and d <= "Z") or (d>="0" and d <="9" ) then
set b = concat(b,d) ;
end if;
END WHILE ciclo;
return b;
END
academic patenting
(4)
algorithms
(2)
anvur
(1)
APE-INV
(3)
applicants
(10)
applications
(11)
ascii
(1)
bibliometrics
(7)
bocconi
(2)
bug
(1)
china
(2)
citations
(11)
claims
(3)
concordance
(7)
conference
(8)
CPCs
(2)
curiosities
(1)
data quality
(12)
data recovery
(1)
database
(26)
datamining
(5)
disk
(1)
download
(1)
dump
(1)
ecla
(1)
entity resolution
(4)
EP register
(7)
epo
(15)
equivalents
(1)
espacenet
(2)
ethnicity
(2)
examination
(3)
excel
(3)
free
(2)
function
(1)
GDPR
(1)
gender
(1)
geocoding
(6)
github
(1)
icons
(1)
indicators
(1)
inpadoc
(9)
inventors
(21)
IPC
(21)
IPC35
(4)
job offers
(1)
KITeS
(3)
legal status
(16)
levenshtein
(1)
line breaks
(1)
linked open data
(1)
match
(1)
mobility
(1)
mysql
(23)
nace
(2)
national patents data
(6)
NBER
(1)
news
(1)
NPL
(7)
NUTS3
(6)
OHIM
(1)
openoffice
(1)
orbis
(1)
orcid
(1)
OS
(1)
OST
(2)
password recover
(1)
patent attorneys
(1)
patent data
(2)
patent family
(17)
patent ownership
(3)
patent status
(3)
patent value
(1)
patents
(49)
patentsview
(3)
patstat
(145)
person_id
(13)
priorities
(5)
python
(2)
reclassification
(8)
renewals
(1)
replace
(2)
scientific articles
(2)
scopus
(1)
semantic analysis
(2)
sipo
(3)
sql
(6)
strings
(4)
tool
(9)
trademarks
(2)
triadic patents
(2)
UDF
(1)
USPC
(1)
USPTO
(12)
VBA
(1)
vista
(1)
VM
(1)
webscraping
(2)
WIPO
(10)
workshops
(1)
Wos
(1)
xp
(1)
Showing posts with label replace. Show all posts
Showing posts with label replace. Show all posts
Wednesday, November 18, 2009
Tuesday, November 17, 2009
Excel vs in cell line breaks
Recently, working on a EU Tender (Using performance indicators in monitoring the implementation of ICT research in FP6 and FP7) I had to transform into a DB a batch of reports (reporting publication titles, magazine and so on) whose original format was MS excel (may the hell swallow them!!!).
One (out of dozens) of the problems was that many cells were containing a line break (due to web cut & paste or in order to give 'em a pretty look) and I found a quick solution for removing them all from a given worksheet
STEP 1 select the area for substitution
STEP 2 press ALT+F11 (opens VB window)
STEP 3 press CTRL+G (opens IMMEDIATE frame)
STEP 4 paste the following: selection.replace chr(10)," " and press enter in the IMMEDIATE frame
Obviously sobstituting chr(10) or " " it works with anything you need to replace.
As a matter of fact I had also to replace doublespaces...!!!
STEP 5 selection.replace " "," "
One (out of dozens) of the problems was that many cells were containing a line break (due to web cut & paste or in order to give 'em a pretty look) and I found a quick solution for removing them all from a given worksheet
STEP 1 select the area for substitution
STEP 2 press ALT+F11 (opens VB window)
STEP 3 press CTRL+G (opens IMMEDIATE frame)
STEP 4 paste the following: selection.replace chr(10)," " and press enter in the IMMEDIATE frame
Obviously sobstituting chr(10) or " " it works with anything you need to replace.
As a matter of fact I had also to replace doublespaces...!!!
STEP 5 selection.replace " "," "
Subscribe to:
Posts (Atom)