How ‘Information’ is made out of Thin Air

Have you heard that a weekday edition of The New York Times contains more information than the average person was likely to come across in a lifetime in 17th century England?

If so, did you also know that this statement probably has no basis in fact?

I’ve come across the tidbit more than once over the last couple of months, in conversations and online.  It struck me as being in the same general vein as another popular myth: that we only use 10% of our brains, so I did a little digging.  Here’s what I found:

  • This viral YouTube video contains the snippet (at 3:20) and may be the key source of its popularity at the moment.  No sources are cited for any of the information presented.  Not even in fine print.   The video is one of a number of versions of this particular montage of ‘facts’ which, together, have been seen over 10 million times.
  • The snippet has popped up in a number of places around the web, some more reputable than others.  It also appears in numerous books.  Like the YouTube vid, most don’t make any effort to verify the assertion.  The context also changes from repeat to repeat; sometimes it is the Sunday edition and 19th century citizens, sometimes it is exposure to information in a day and ‘our ancestors’.  So, that lesson you learnt playing Chinese whispers as a kid still holds; people are prone to error when retelling a story.  It’s worth being wary of this when retelling something yourself or hearing a startling ‘truth’ from someone else.
  • Where sources are cited, they generally lead to a statement made by Richard Saul Wurman in his 1989 book titled Information Anxiety (page 32).  Unfortunately, I don’t have access to the book, so I can’t see whether he presents any evidence to support the claim.  Thankfully, Geoffrey Nunberg at UC Berkeley managed to find a copy and took a look for himself… apparently the book  “asserts the fact without offering a source or explanation” (p9 of a pdf by Nunberg that in part examines the likely veracity and lack of meaning of the snippet).

So, after making a reasonable effort aimed at finding evidence to support the claim, I’ve come up empty handed.  Others have had the same experience.  My conclusion is that it is probably nothing more than an statement without foundation that has made its way into popular consciousness by virtue of it being superficially plausible and sufficiently repeated without critical thought.  Fascinating stuff.

This particular example is innocuous, but sometimes these ‘facts from thin air’ can make their way into places where they might have more impact on policy or business decisions.  For instance, in Damn Lies and Statistics, Joel Best recounts that a published Journal article he once read contained the statement “Every year since 1950, the number of American children gunned down has doubled”.  Go here if you are interested to see why the statement is so absurd, along with the history of how this ‘mutant statistic’ came to be.

_____

ShortURL for this post: http://wp.me/pnqr9-4j

The Five-Point Rating House of Cards

The web is awash with 5-point rating schemes.  Netflix, Amazon, YouTube, WordPress (via PollDaddy ratings), Apple’s Ap Store, the Android Market and countless blogs use them to gauge people’s experience with various items.  It’s not hard to see why 5-point schemes are so popular;  they are really simple to implement, familiar to most people, and can be made to look all kinds of pretty using icons for stars, hearts or smileys.

Unfortunately, they often don’t gather very useful data.  And that’s a big problem for sites that intend to use ratings as the backbone of their recommendation systems.

The 5-point schemes common on the web suffer from two core problems: nonresponse and measurement bias.  First, many people choose not to rate many items, and those that do tend to have had a positive experience.  Data from YouTube supports this, as does that from Netflix.  Second, the scales are usually not labelled, meaning people answer under a wide variety of interpretations as to what each ‘point’ means.  This comment from a YouTube user suggests ambiguity in the scale can also exacerbate the nonresponse issue…

Ratings on YouTube have always been somewhat confusing for me: should I rate the content of the video or the quality? There are some wonderfully shot videos on YouTube that really don’t have any meaningful content, and there are also a lot of videos that have wonderful content but are shot very poorly. I think a dual content vs. quality rating would add too much complexity to the system, but I often don’t rate a video for that very reason.
I don’t find ratings all that helpful, probably due to the fact that there are millions of people using YouTube, each with a different opinion. It doesn’t influence whether I watch a video, but then again, I usually find videos from friends or other channels I respect.

Ratings on YouTube have always been somewhat confusing for me: should I rate the content of the video or the quality? There are some wonderfully shot videos on YouTube that really don’t have any meaningful content, and there are also a lot of videos that have wonderful content but are shot very poorly. I think a dual content vs. quality rating would add too much complexity to the system, but I often don’t rate a video for that very reason.

I don’t find ratings all that helpful, probably due to the fact that there are millions of people using YouTube, each with a different opinion. It doesn’t influence whether I watch a video, but then again, I usually find videos from friends or other channels I respect. [comment found here]

Probably the best way to get around these problems is to measure a person’s preferences indirectly by recording their behaviour: how much of the video did they watch?  did they share the content? did they look for related items?  It is fairly well established that what people say and what they do can be very different things, so users’ actions may be much more useful than their words.  Certainly, the ‘popular’ and ‘most viewed’ categories in YouTube appear to rely on behavioural metrics, so perhaps their rating metric is redundant.

However, the ‘measure behaviour’ solution is best suited to organisations that deliver interactive material consumed on-site (YouTube, StumbleUpon).   So, what can you do if you are dealing with items that aren’t consumed on-site? Collapsing the scale to “liked it”/”didn’t like it” won’t solve the core issues – if anything it will just mean you give up what little discriminative power the 5-point scale might have had.  Another suggestion is to expand the scale to 10 points. While this may increase the discriminative power of the scale and is a format people are familiar with, it won’t solve the ambiguity problem.  For that you need to construct clear labels that are likely to be interpreted in much the same way by most people.  Ideally, the scale will also relate as directly as possible to whatever it is you want to use the data for. This is much easier said than done, but here is an example that might work for a site recommending local restaurants:

0 – I will definitely not (0%) eat there again soon

1 – It is unlikely (20% chance) I will eat there again soon

2 – There is some chance (40%) I will eat there again soon

3 – There is a good chance (60%) I will eat there again soon

4 – It is quite likely (80%) I will eat there again soon

5 – I will definitely (100%) eat there again soon

This is actually a heavily butchered version of a probability-based predictive instrument called the Juster Scale.  It would have to be tested, but it at least serves to demonstrate the qualities I outlined above.  The scale could also easily be extended to more points (in fact, the Juster Scale is an 11-point scale).

Finally, there is the issue of nonresponse.  A good scale will help resolve this, but ultimately you need to follow-up users to increase rating participation.  TradeMe and TravelBug are two local examples that do this well.  You’ll never get every user giving a rating for the products they’ve tried, but at least you’ll bump the proportion up, which will provide a more solid foundation for any recommendation or imputation algorithms you want to run over the data.

So, if you are at the early stages of developing a rating function for your site, give some careful thought to how your scheme will work.  Test it out before you commit to it longer term.  Doing so will give you much better data to work with down the track.

One final point: you can probably forget all this if your core reason for implementing ratings is to generate reassuring sales cues to prospective buyers (i.e., in the same way sites put testimonials up to reassure users).  In that case, you are likely to be better off with an unlabeled 5-point scale.  As the folks at YouTube found, most of the ratings you will get with such a scale will be positive!

_____

Short URL for this post: http://wp.me/pnqr9-2u

Proof that People Appreciate Good Survey Design

I’m a huge advocate of simplicity in survey design, especially when a survey is to be delivered online. Yet, when I talk to people about cutting out questions, simplifying response tasks, and minimizing the use of various presentational options (e.g., AJAX), I sometimes get the sense I’m viewed as a spoil-sport. Fortunately, people don’t have to take my word for it. A wealth of methodological research shows that completions and data quality suffer as you stray from following the Keep It Simple, Stupid (K.I.S.S.) principle in survey design (see the links at the bottom of this page).

Some respondents will also tell you how well (or badly) you’ve structured your questionnaire, although waiting until you get this to do anything is probably leaving things a little too late! Respondents can give feedback on a survey in a couple of ways: by dropping out if they are having difficulty with it, or by mentioning their experience at the end (assuming you give them an opportunity to do so). Here are some examples pulled from three surveys I’ve been involved with over the past 12 months. These went out to general population samples provided by a well-known consumer panel. The topics differed, but the surveys were similar in length – about 35 questions over 15 pages. Two of them involved presenting choice sets as part of a stated choice modelling experiment.

My intent here is not to take the glory for the results I’m about to present; I took care of the online delivery in these surveys, but the questions and structure were mainly developed by others. I’m using them because I do think the questionnaires were generally well designed. Questions were kept to a minimum, pre-testing was done, and the technology used was as simple as possible.

First, some selected respondent comments taken from across the three surveys. Many other comments echoed the same general sentiment:

“Very good survey, was easy to follow and understand.”
“Thoroughly enjoyed that survey is all I can say.”
“Clear and simple, well worded – well done, whomever designed it.”
“It was more interesting than the usual surveys :o)”
“It was a very simple, well put together survey that was easy to understand. Well done.”
“I enjoyed doing it :)”
“This was a great survey, thank you!”
“I really enjoyed this survey. Very easy to follow.”
“Great Survey, easy to do and no dumb questions.”
“Wish they were all this easy to complete.”

My key take-out points are that a) it is actually possible for people to enjoy completing a questionnaire and b) many of the online surveys people are sent appear to be complex, hard to follow, and sprinkled with “dumb questions”.  Although I’m speculating, I think the “dumb question” comment refers to those that are ambiguous, overly complicated or repetitive (e.g., matrix-style questions) or attempt to psychoanalyze the respondent (e.g., brand ‘personality’ items).

However, most people won’t take the time to leave a comment.  In fact, if your survey suffers from particularly bad design, many won’t even stick around to get to the last page.  So, you should pay attention to the second (silent) respondent feedback mechanism: completion rates.   Here are the completion rates for the three surveys mentioned above. These show the proportion of people who started the survey that went on to complete it. A low completion rate is a key signal of problems with your survey design because it means many people dropped-out.

Survey 1: 79%
Survey 2: 71%
Survey 3: 81%

These are pretty good completion rates, but you’ll notice that one is about 10 percentage points lower than the others (Survey 2).  We knew download times were going to be an issue for that survey because it contained several large images and used a JavaScript library.  The images were large because they contained complex backgrounds (i.e., they weren’t simple!). Despite doing all we could to optimize and pre-load the images, use the bare-minimum JavaScript library and warn respondents, this issue clearly led to increased drop-out rates.   It is also worth noting that, although 1,100 people started the survey, only 8 of the 800 who completed it  mentioned the slow load times at the end.

So, it really is worth keeping things as simple as possible in your survey design.  Respondents can tell when you are asking them flaky, ill-prepared questions and many won’t stick around if your questionnaire causes them frustration.
_____

Short URL for this post: http://wp.me/pnqr9-1K