Monday, November 27, 2017

PATSTAT 2017b MySql upload scripts

From link:

it's possible to download my scripts for mysql allowing to upload the majority of tables from ediction 2017b of PATSTAT;

changes from previous ediction:
Some attributes are now populated for more NPL types.
Attribute ONLINE_CLASSIFICATION may hold more than one Derwent class.
Attribute ONLINE_AVAILABILITY can now hold up to 500 characters.
Attribute NPL_AUTHOR can now can hold up to 1 000 characters.


Note also The OECD harmonized name (cf. attributes HAN_ID, HAN_NAME, HAN_HARMONIZED) have not been updated since the 2016 Autumn Edition. As a consequence, persons which have been added since then will have default values in these attributes.

Wednesday, November 22, 2017

Using Orcid for migration studies

ORCID provides a persistent digital identifier that distinguishes every  researcher and, through integration in key research workflows such as manuscript and grant submission, supports automated linkages between professional activities ensuring that one's work is recognized.

ORCID wasn't intended as a massive longitudinal survey of the global population of scientists, but with 3 million profiles and growing, it is becoming just that. So far a quarter of those researchers have voluntarily added personal information to their public ORCID profiles including the years, locations, and descriptions of their education and employment histories. As this voluntary sampling grows, the demographic and migration patterns of the scientific workforce is coming into focus. The biases are also apparent: ORCID users skew young, and certain countries are over- and underrepresented.

Bohannon J and Doran K prepared a set of tools and datasets allowing to use more easily such data:

A file contains 2.8 million public profiles from ORCID in both XML and JSON format. You will need 300 GB of free space to decompress the data and work with it.

A IPython Notebook provides code for processing the data from public_profiles.tar into manageable data files for analysis.

Another examples of usage of such data in migration studies:

Friday, November 17, 2017

Alternative coordinates systems

Geohash is a public domain geocoding system invented by Gustavo Niemeyer; it encodes a geographic location into a short string of letters and digits. It is hierarchical thus is possible to have arbitrary precision and the possibility of gradually removing characters from the end of the code to reduce its size (and gradually lose precision). As a consequence of the gradual precision degradation, nearby places will often (but not always) present similar prefixes. The longer a shared prefix is, the closer the two places are.

query example:,-121.7&format=url&redirect=0

What3words is a geocoding system that encodes geographic coordinates into three dictionary words with a resolution of three metres.  For example, the torch of the Statue of Liberty is located at "toned.melt.ship".
the idea behind is that a triple of words is easier to remember than a long sequence of numbers that is the usual lat-long coordinates system.
What3words has a website, apps for iOS and Android, and an API that enables bidirectional conversion of what3words address and latitude/longitude coordinates.
The grid is two-dimensional, so the addressing scheme does not distinguish between floors in a building. The system supports 14 languages, although each language covers the world's entire land areas.

This website
allows to test conversion between lat-long and w3w.