From the OKCupid post “The Big Lies People Tell In Online Dating“:
Tagged: Data Visualisation Toggle Comment Threads | Keyboard Shortcuts
Just keeping for later…
Update: Protovis is now called d3.js and is found here: http://mbostock.github.com/d3/
A quick follow-up to the previous post on the power of data reduction and presentation… here is another example showing how rounding, ordering and thoughtful presentation can turn an incomprehensible grid of numbers into something most people can grok.
It is from the same article (Ehrenberg, Feb 1992, The Problem of Numeracy, AdMap), but this time relates to television programme viewership. The first table presents detailed correlations for responses to the question ‘I really like to watch programme x‘ across a range of programmes and two channels (ITV and BBC).
Apart from an obvious diagonal line of 1.000 in the table (of course each programmes’ rating correlates perfectly with itself), there isn’t much else you can take out from it. The next table renders the data a little more readable by introducing rounding to one decimal place, discarding the redundant leading zeros and disposing of the meaningless 1.000 diagonal.
And with a little more thought to row order, spacing and the key data for presentation (i.e., do we really need channel?), we get to the following:
Those familiar with television in the UK will now see that people who like to watch one sport programme also like to watch other sports programmes, particularly if they are ’round up’ type shows. They don’t, however, like news or current events programmes so much. A similar pattern occurs for current event watchers, but the programmes within that cluster have slightly lower correlations, meaning viewership is less likely to be homogeneous amongst that group. If you are an advertiser or producer, this is useful stuff to know because it will give you an idea of the reach of, and competition around, a certain programme. And you are more likely to understand this if the data is presented in a clear and concise way.
ShortURL for this post: http://wp.me/pnqr9-67
I was talking to a friend last night about data presentation. We were looking at an iPad ap that allows users to thumb through and drill-down into their sales data for different geographic regions. Among other things, the ap displayed charts with smoothed trend-lines to help users get a feel for what the future might hold. Yet, in the relatively brief time I spent looking at the data it was hard to get any real sense of what the key take-outs might be.
This will have been partly due to my lack of familiarity with the dataset; the person responsible for sales for the organisation would have brought a wealth of historic knowledge to the data that may have enabled them to quickly see discrepancies or commonalities in the charts. However, there was also an element of ‘too much’ information. There is only so much we humans can hold in our short term memory before we become overwhelmed and our ability to do mental calculations or comparisons is compromised. This is why it is critical for anyone presenting data to consider not only the level of detail required, but also how the information should be delivered for quick and clear consumption.
Marketing scientist Andrew Ehrenberg spent a fair amount of time on these issues and was a strong advocate of data reduction (which relates to the idea that much success in research relies on the discovery of patterns in data, and that this process is aided by its presentation in simple tables). In fact, Ehrenberg wrote a book on the subject that is freely downloadable from the EmpGens Journal.
Here is an example of Ehrenberg’s approach. I’ve reproduced the tables from a four page article of his in Admap from 1992 titled ‘The Problem of Numeracy‘. First up is a table not optimised for human consumption. Try to pick out some noteworthy patterns.
Now try again, using a modified presentation of the same data:
The rounding, averages and different row ordering (population size, rather than alphabet) all make it easier to quickly understand the data. We can now see, for instance, that most regions saw a dip in Q3, that Leeds and Edinburgh have seen strong growth in Q4, and that Leeds is consistently punching above its weight in per capita sales. We can also easily answer comparative questions like ‘how much larger was Edinburgh than Swansea over the year‘ (about 2.5x), which were much harder to do from the first table.
People don’t often think of treating tables like other design elements in a user interface. Yet as the example shows, they can fairly easily be tweaked to great effect. And, when presented clearly, a table can convey more information in a short space of time than a series of charts.