Data can be beautiful. The tools we have at our disposal mean that we can visualise and explore data in ways that simply were not possible a decade ago.
Hans Rosling, the Swedish statistician who founded Gapminder, demonstrates some of the possibilities in this video:
The Gapminder world statistics cover a huge range of different indicators, these can be analysed using the site’s own tools or exported for use. As part of an online course (Data To Insight from the University of Auckland, via FutureLearn), I have been learning how to use this huge data set using a piece of free software called iNZight.
The difficulty with data, especially when there is a huge amount of it, is that the volume of data can obscure trends. This is where visualisation can help.
In the visualisation below, I used the Gapminder statistics to look at changes to the number of children born per woman, for each leap year between 1952 and 2012, split by region.
A view of the summary data (on the left), for just two regions, gives an indication of the difficulty seeing clear patterns by viewing the numbers alone.
The iNZight software allows you to view the data as a series of dot plots (one dot representing each country for which there is data) and a box plot underneath showing the spread (this is the dark area under each of the coloured sections.
By stacking the plots for each leap year on top of each other, changes are easy to spot and trends identified. By placing the multiple plots for each region next to each other, comparisons are easy to make.
Even at a glance, the key trend – that the number of children per woman has dropped since 1952 – is visible, both at a regional level and between regions.
However, this is not the full picture, to explore the data further, and perhaps start drawing some conclusions about the data, we need to dig deeper. For example, if we suspect that changes in infant mortality may be part of the cause, we can colour the countries by these rates.
This shows a gradual move in many regions from the pinks and blues which indicate high infant mortality, to the greens and browns which are lower.
The iNZight software is designed to be easy to use, in fact it is taught in schools in New Zealand. The same analysis can be carried out using the animated tools on the Gapminder World site – this is pretty impressive and has the added benefit of being interactive so individual data points can be clicked to gain more information.
This type of analysis could be done by looking closely at the numerical data, however, the tools available mean that we can test our hypotheses with a click of a button, allowing time for more in-depth exploration, or for the exploration of larger quantities of data.
The increased use of free and interactive online tools mean that almost anyone with an internet connection can carry out detailed analyses. Tools are no longer limited to those with the funding or the expertise in programming necessary to create them. This change is one of the core benefits of increased digitisation, a more democratic access to analytical tools. However, for the tools to be of any use, and for this democratisation to truly take place, access to statistical information is also necessary, and in some fields, this is the the front line between open access and proprietary control.