Sometimes, it can be hard to find the data we want. We spend hours looking in all our normal places. We cruise the census bureau, hit the EPA’s data portal, and browse UW’s collection of geospatial resources. If you have a topic or a storyline in mind, it can be really frustrating to find that you can’t find the right data. In most cases, the data you seek is actually out there, it might just be a little harder to find that you might hope. I’ll discuss a couple of techniques I use from time to time when I find myself in this situation. This is my first blog post, so stay with me.
Reverse engineer someone else’s work
Chances are, if you’re exploring a topic, you’ve seen a map that shows some data that you’d like to use. If you haven’t seen a map like this, either (a) you’re doing a topic that shouldn’t be mapped/isn’t spatial, (b) you already have your data in hand and don’t need to reverse engineer someone else work, or (c) you haven’t looked hard enough. In many cases, these maps will show the exact data that you want to map with, just presented in a different way than what you’re thinking. If the map you’ve found is made in-browser (slippy map, svg illustration, etc), chances are you can find the data that was used to create it. If you find a pdf, png, or other more traditional desktop format, you’d be hard pressed to reverse engineer the data – I wouldn’t bother, since the spatial information has been lost.
I recently wanted to make a map of Syrian refugee routes to Europe. I found that UNHCR has a great data portal, which supports some nice visualizations. It’s hard, though, to pull the data out of the visualizations, since the specific numbers are only shown when elements are moused over. I will use this case as an example, and walk through the steps I took before eventually finding an excel file that contained all the data I wanted and more! The data portal can be found (here)[ http://data.unhcr.org/mediterranean/regional.php].
Now I can see all of the html written on the page.
In the UNCHR page example, I see both a main.js file and an iframe. I see at lot more written in the page, but it’s all below the map in the rendered page. I browse the main.js file, but its not quite what I’m looking for. I don’t see anything that looks like its drawing a visualization, so I skip it for now – may I’ll need to come back later. Next, I examine the iframe, click the link, and see the map in its own web page, without the extra information provided in the portal entry point. One step closer! Let’s take a look at the source of the iframe. The iframe is the link:
Minification is a way to reduce file size, and thus improve performance of the application, because less data has to be transferred across the network. Minification is done by removing extra characters, comments, and whitespace, resulting in a compact, but unreadable file. One solution when looking at other people’s sources is to use a tool like http://unminify.com/, which will expand the code, add whitespace, and make it much more readable.
dataSrc = "http://data.unhcr.org/data_sources/mediterranean/data.xls"
Seems promising! I copy and paste the link into my browser – Jackpot! an excel file containing all of the data that the original visualization was built on. In fact, in this case, theres much more data than just what is shown in the UN visualization. I am free to build new and creative visualizations from the data, or just to look and inspect it.
- Use view-source: to look at the webpage source
- Make use of tools like unminify.com to make minified code more readable.
- Prioritize main.js files over other files.
- Practice! You might not always find what you want and you might waste some time, but in any case, you will learn how other developers structure their web pages and develop web visualizations.
Disclaimer: Use the advice here at your own discretion. Sometimes, it may not be appropriate to use other people’s data, so make sure that you’re ethical in what you’re doing.