Recent Updates RSS Toggle Comment Threads | Keyboard Shortcuts

  • Ben 9:16 on Sunday, January 15, 2012 Permalink | Reply
    Tags:   

    Just keeping for later: Public datasets hosted on Amazon AWS. https://aws.amazon.com/datasets

     
  • Ben 13:03 on Thursday, January 12, 2012 Permalink | Reply  

    A few select pics from a recent trip.  By fluke of nature we managed to catch 11 days of sun from the 13 we were away. The rest of the country wasn’t so lucky.  It was great to get out and see more of the homeland. Like many New Zealanders, prior to this road trip I’d seen more foreign soil than I had of my own.

    The Marina at Picton.

     

    A seal playing.  Royal Albatross centre, Otago Peninsula.

     

    Mark of the seagull. Royal Albatross centre, Otago Peninsula.

     

    New Year Rodeo, Wanaka.

     

    View over Wanaka from Mt. Iron.

     

    An earnest Dork impression. Fox Glacier.

     

    Franz Joseph Glacier.

     

    Inside an abandoned Gold Mine. Near Greymouth.

     

    Steps carved into the Pancake Rocks. Punakaiki.

     

    Sea-spray through a blow hole. Punakaiki.

     

    There are no photos from the Ferry crossing at the end of the trip, but it was eventful enough to remember without them.  We crossed in 50-55 knot gales, so at least half of the passengers got seasick …Myself included.

     
  • Ben 11:54 on Wednesday, December 21, 2011 Permalink | Reply  

    Confidence bias in action 

    I’ve dabbled a little with crowdsourcing for my own projects, but never used it as a primary research tool.  It isn’t hard to see how the major crowdsourcing platforms like Mechanical Turk could be used to undertake quick and cost-effective behavioural research (potential for bias notwithstanding!).  So, the following study by crowdsourcing firm Crowdflower on its own worker base was interesting in itself.  That it related to another interest of mine, human bias, made it even more intriguing :)

    Confidence Bias: Evidence from Crowdsourcing

    The key take-out: over 75% of contributors overestimated their ability to answer multiple choice questions correctly.  The Dunning-Kruger effect is alive and well!

     
  • Ben 10:18 on Sunday, November 20, 2011 Permalink | Reply  

    Do not therefore consider this life as an object of any moment. Look back on the immense gulf of time already past; and forwards, to that infinite duration yet to come, and you will find how trifling the difference is between a life of three days and of three ages. Let us then employ properly this moment of time allotted us by fate, and leave the world contentedly; like a ripe olive dropping from its stalk, speaking well of the soil that produced it, and of the tree that bore it.

    Marcus Aurelius, Meditations
     
  • Ben 8:23 on Friday, November 11, 2011 Permalink | Reply
    Tags: , , , , , Survey Walls   

    Looks like Google is getting into the Survey Business 

    From Neiman Journalism Lab:

    Google appears to be experimenting with a new paywall-esque content roadblock for publishers, and it’s not One Pass. For lack of a better name, let’s call it a “survey wall,” because instead of dollars the system asks readers a question before they can move on to continue reading what they like.

    This could get interesting.  Instead of a standard paywall, people may be able to ‘pay’ for content by answering survey questions.  The publisher gets valuable information it can on-sell to advertisers, and Google dulls the old-media knives that are increasingly aimed at its vital organs. A natural extension of this would be that the publisher would become a survey panel provider of sorts.  Survey companies would be able to buy access to the survey-wall to ask their own questions for a fee-per-answer.  There is also no reason why independent panel companies could attempt to step into the role Google appears to be playing as the third-party technology provider.

    Of course, there are big questions about the quality of data that may come from these distributed surveys.

    • Would people answer honestly?
    • What can reasonably be done with one or two answers from each visitor? (e.g., it would be difficult to examine relationships between more than a couple of variables)
    • Why would we expect people who visit survey-wall sites to be representative of a given population?
    These, and other questions, will keep survey methodologists in business for a while :)
     
    • davidwallacefleming 9:00 on Friday, November 11, 2011 Permalink

      Valuable information to stay appraised of. Thank you. I hope this does not get implemented.

  • Ben 20:01 on Wednesday, October 19, 2011 Permalink | Reply
    Tags: Brand Loyalty, Double Jeopardy,   

    Double Jeopardy in Hotel Ratings 

    A well established, and surprisingly general, empirical pattern in markets is that brands with lower market share have buyers that also exhibit less loyalty toward the brand.  This pattern has a name – Double Jeopardy - and it undermines the logic of niche marketing strategies focussed on appealing to a small group in a larger market in the hope that doing so will garner greater loyalty.

    Another feature of the pattern is that the average small brand buyer tends to be less favourable towards that brand than the average large brand buyer.   Indeed, it appears that Double Jeopardy even applies to hotel rating data presented in a recent post in the Data Miners blog by Michael Berry.  Here is the key quote:

    It is hardly surprising that the Bellagio in Las Vegas has about 250 times more reviews than say, the Cambridge Gateway Inn, an unloved motel in Cambridge, Massachusetts. It may or may not be surprising that these oft-reviewed properties tend to be well-liked by our reviewers. Surprising or not, it’s true: the hotels with the most reviews have a higher average rating than the long tail of hotels, motels, B&Bs, and Inns with only a handful of reviews each.

     
  • Ben 8:38 on Wednesday, September 21, 2011 Permalink | Reply
    Tags: , ,   

    A great write-up on determining sample sizes for, and avoiding common traps in, split testing. Yet another good testing post from the folks at 37 signals. R code and discussion of power calcs included. http://37signals.com/svn/posts/3004-ab-testing-tech-note-determining-sample-size

     
  • Ben 12:17 on Thursday, September 8, 2011 Permalink | Reply
    Tags: Creative Commons   

    The easy way to find creative commons content: http://search.creativecommons.org

     
  • Ben 18:46 on Friday, August 26, 2011 Permalink | Reply
    Tags: , Document Classification, ,   

    KiwiPycon 2011: Document Classification with the Natural Language Toolkit 

    I’m heading to KiwiPycon in Welly this weekend to meet some fellow Python fans and give a presentation on using the Python-based Natural Language Toolkit (NLTK) to classify documents.  I’ll be using the Enron emails as an example document set.

    If you’ve travelled here from the future because you saw the presentation and want the files I referred to, here they are.

    There is a missing link between the two code files: changes I made to the dataset to enable training of the classifier and analysis of the results. If you are interested in getting the final dataset, just get in touch.
    ______
    Update: Here is the slideshare version of the presentation with audio.  And here is a text-to-speech video version, with some extra content.
     
  • Ben 8:51 on Wednesday, August 24, 2011 Permalink | Reply
    Tags: , , , SQL   

    Links: Using SQL ‘With’ statements, and a great example of A/B Testing 

    Two links worth keeping:

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel
Follow

Get every new post delivered to your Inbox.