<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	xmlns:georss="http://www.georss.org/georss" xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#" xmlns:media="http://search.yahoo.com/mrss/"
	>

<channel>
	<title>Ben Healey &#187; Microsoft</title>
	<atom:link href="http://benhealey.info/tag/microsoft/feed/" rel="self" type="application/rss+xml" />
	<link>http://benhealey.info</link>
	<description>Data Aficionado  &#124;  Wellington, New Zealand</description>
	<lastBuildDate>Sat, 14 Jan 2012 20:18:17 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.com/</generator>
<cloud domain='benhealey.info' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
<image>
		<url>http://s2.wp.com/i/buttonw-com.png</url>
		<title>Ben Healey &#187; Microsoft</title>
		<link>http://benhealey.info</link>
	</image>
	<atom:link rel="search" type="application/opensearchdescription+xml" href="http://benhealey.info/osd.xml" title="Ben Healey" />
	<atom:link rel='hub' href='http://benhealey.info/?pushpress=hub'/>
		<item>
		<title>Have You Fallen Prey to Simpson&#8217;s Paradox?</title>
		<link>http://benhealey.info/2009/12/06/have-you-fallen-prey-to-simpsons-paradox/</link>
		<comments>http://benhealey.info/2009/12/06/have-you-fallen-prey-to-simpsons-paradox/#comments</comments>
		<pubDate>Sat, 05 Dec 2009 22:15:14 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Experimental Design]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Simpson's Paradox]]></category>
		<category><![CDATA[Split Testing]]></category>

		<guid isPermaLink="false">http://benhealey.info/?p=245</guid>
		<description><![CDATA[In a previous post on experimentation at Microsoft I linked to a recent presentation by Ron Kohavi (GM of their experimentation platform).  One point he raised was that you can actually get the wrong answers from split tests because of a phenomenon called Simpson&#8217;s Paradox.  You read right; your test might tell you that version A is [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=benhealey.info&amp;blog=5583171&amp;post=245&amp;subd=benhealey&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>In a previous post on <a href="http://benhealey.info/2009/11/28/online-experimentation-at-microsoft/">experimentation at Microsoft</a> I linked to a recent presentation by <a href="http://robotics.stanford.edu/~ronnyk/">Ron Kohavi </a>(GM of their experimentation platform).  One point he raised was that you can actually get the wrong answers from split tests because of a phenomenon called Simpson&#8217;s Paradox.  You read right; your test might tell you that version A is the best bet when in reality the better performing version is B.</p>
<p>That should send a shiver down the spine of anyone tasked with improving a website&#8217;s ROI.</p>
<p>Simpson&#8217;s paradox can occur in any setting where the proportion of people allocated to split groups (e.g., control and test) varies according to some important attribute in the study.  It is easiest to understand the paradox by example.  Thankfully, the Wall Street Journal presented one a couple of days ago in an article on the <a href="http://online.wsj.com/article/SB125970744553071829.html">Flaw of Averages</a>.  Essentially, it showed that although current aggregate unemployment rates in the US (expressed as % jobless) don&#8217;t appear as bad as they were during the 80s recession, they are actually consistently worse when the figures are examined by educational subgroup.  This is because the proportion of people in each educational subgroup has shifted between the 1980s and now, and each subgroup has a different susceptibility to unemployment.</p>
<p>The WSJ article also presents two other examples (U of C Berkeley admissions gender bias and Kidney stone treatment efficacy).  If you are still scratching your head after reading through the narrative explanations, try having a look at a the data-based explanations of the same examples on this <a href="http://en.wikipedia.org/wiki/Simpson's_paradox">Wikipedia</a> entry.</p>
<p>Turning to a web-based scenario, in a recent paper outlining <a href="http://exp-platform.com/ExPpitfalls.aspx">pitfalls to avoid in online experimentation</a>, the folks at Microsoft showed how Simpson&#8217;s Paradox can occur when a test is &#8216;ramped up&#8217; over time.  Their example involves a page design test run over two days, with a 1% sample of users assigned to the test group on the first day (Friday) and then a 50% sample assigned to the test group on the second day (Saturday).  Here is the data from the paper:</p>
<p style="text-align:center;"><img class="aligncenter size-full wp-image-253" title="simpsons_paradox" src="http://benhealey.files.wordpress.com/2009/12/simpsons_paradox.gif?w=632" alt=""   /></p>
<p>(Note: The percentage in the version B &#8216;total&#8217; cell is different here due to an error in the original)</p>
<p>On both test days &#8216;B&#8217; was the winning version.  However, the result is reversed in the aggregated total; Version A is the winner.  This is essentially because both the test split allocations and response levels varied by day.</p>
<p>Test &#8216;ramp ups&#8217; are quite common.  It is good practice to do a pilot of the test on a small sample to make sure everything is working OK before unleashing it on a larger sample. So, the potential for Simpson&#8217;s Paradox to occur is very real.  If you are analysing split test results, you can make sure your analysis avoids the problem by re-weighting the results from periods with different allocation procedures or by simply discarding the results from the pilot phase.</p>
<p>_____</p>
<p>ShortURL for this post: <a href="http://wp.me/pnqr9-3X">http://wp.me/pnqr9-3X</a></p><br />Posted in Thoughts Tagged: Experimental Design, Microsoft, Simpson's Paradox, Split Testing <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/benhealey.wordpress.com/245/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/benhealey.wordpress.com/245/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/benhealey.wordpress.com/245/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/benhealey.wordpress.com/245/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/benhealey.wordpress.com/245/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/benhealey.wordpress.com/245/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/benhealey.wordpress.com/245/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/benhealey.wordpress.com/245/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/benhealey.wordpress.com/245/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/benhealey.wordpress.com/245/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/benhealey.wordpress.com/245/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/benhealey.wordpress.com/245/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/benhealey.wordpress.com/245/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/benhealey.wordpress.com/245/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=benhealey.info&amp;blog=5583171&amp;post=245&amp;subd=benhealey&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://benhealey.info/2009/12/06/have-you-fallen-prey-to-simpsons-paradox/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7242c6f38f9056b8d9a96695535fe428?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">Ben</media:title>
		</media:content>

		<media:content url="http://benhealey.files.wordpress.com/2009/12/simpsons_paradox.gif" medium="image">
			<media:title type="html">simpsons_paradox</media:title>
		</media:content>
	</item>
		<item>
		<title>Online Experimentation at Microsoft</title>
		<link>http://benhealey.info/2009/11/28/online-experimentation-at-microsoft/</link>
		<comments>http://benhealey.info/2009/11/28/online-experimentation-at-microsoft/#comments</comments>
		<pubDate>Sat, 28 Nov 2009 05:17:53 +0000</pubDate>
		<dc:creator>Ben</dc:creator>
				<category><![CDATA[Thoughts]]></category>
		<category><![CDATA[Data-Driven Design]]></category>
		<category><![CDATA[Microsoft]]></category>
		<category><![CDATA[Split Testing]]></category>

		<guid isPermaLink="false">http://benhealey.info/?p=219</guid>
		<description><![CDATA[Over the last three years Microsoft embraced experimentation as a mechanism for testing changes to their various online products.  That they are only recently formally adopting a data-driven approach to their design was a little surprising to me, but it is certainly better late than never! As part of the process of making the shift [...]<img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=benhealey.info&amp;blog=5583171&amp;post=219&amp;subd=benhealey&amp;ref=&amp;feed=1" width="1" height="1" />]]></description>
			<content:encoded><![CDATA[<p>Over the last three years Microsoft embraced experimentation as a mechanism for testing changes to their various online products.  That they are only recently formally adopting a data-driven approach to their design was a little surprising to me, but it is certainly better late than never!</p>
<p>As part of the process of making the shift away from simply following the Highest Paid Person&#8217;s Opinion (HiPPO) to actually testing the ROI of different ideas, the team in charge of experimentation has been disseminating some of their experiences. You can see a recent talk on the topic, presented at a September meeting of Seattle Tech Startups, at the URL below (sorry, the quality isn&#8217;t great and I can&#8217;t embed because of WordPress.com restrictions).  Alternatively, go to the <a href="http://exp-platform.com/default.aspx">Microsoft experimentation</a> portal to see other work from this group.</p>
<p><a href="http://www.ustream.tv/flash/video/2134721">http://www.ustream.tv/flash/video/2134721</a></p>
<p>The talk presents a number of interesting insights, ranging from the results of some tests (winning versions are often different to what you&#8217;d think) through to the cultural hurdles arising from an increased reliance on data for decision making (e.g., people with strong opinions get their egos bruised).</p>
<p>Amazon.com is also mentioned a couple of times.  I think a few of the current Microsoft team originally cut their teeth there, so those of you interested in this topic might also like to see this <a href="http://robotics.stanford.edu/~ronnyk/emetricsAmazon.pdf">eMetrics Summit 2004 presentation</a> (pdf).  It showcases the Amazonian approach to deciding on site changes and resolving bitter political disputes over whose pet area should get highly coveted slots on the home page.  Interesting stuff that more and more organisations are going to have to grapple with as their products and services become increasingly digitized.</p><br />Posted in Thoughts Tagged: Data-Driven Design, Microsoft, Split Testing <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gocomments/benhealey.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/comments/benhealey.wordpress.com/219/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godelicious/benhealey.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/delicious/benhealey.wordpress.com/219/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gofacebook/benhealey.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/facebook/benhealey.wordpress.com/219/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gotwitter/benhealey.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/twitter/benhealey.wordpress.com/219/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/gostumble/benhealey.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/stumble/benhealey.wordpress.com/219/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/godigg/benhealey.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/digg/benhealey.wordpress.com/219/" /></a> <a rel="nofollow" href="http://feeds.wordpress.com/1.0/goreddit/benhealey.wordpress.com/219/"><img alt="" border="0" src="http://feeds.wordpress.com/1.0/reddit/benhealey.wordpress.com/219/" /></a> <img alt="" border="0" src="http://stats.wordpress.com/b.gif?host=benhealey.info&amp;blog=5583171&amp;post=219&amp;subd=benhealey&amp;ref=&amp;feed=1" width="1" height="1" />]]></content:encoded>
			<wfw:commentRss>http://benhealey.info/2009/11/28/online-experimentation-at-microsoft/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
	
		<media:content url="http://1.gravatar.com/avatar/7242c6f38f9056b8d9a96695535fe428?s=96&#38;d=identicon&#38;r=PG" medium="image">
			<media:title type="html">Ben</media:title>
		</media:content>
	</item>
	</channel>
</rss>
