Updates from August, 2010 Toggle Comment Threads | Keyboard Shortcuts

  • Ben 13:03 on Thursday, January 12, 2012 Permalink | Reply  

    A few select pics from a recent trip.  By fluke of nature we managed to catch 11 days of sun from the 13 we were away. The rest of the country wasn’t so lucky.  It was great to get out and see more of the homeland. Like many New Zealanders, prior to this road trip I’d seen more foreign soil than I had of my own.

    The Marina at Picton.

     

    A seal playing.  Royal Albatross centre, Otago Peninsula.

     

    Mark of the seagull. Royal Albatross centre, Otago Peninsula.

     

    New Year Rodeo, Wanaka.

     

    View over Wanaka from Mt. Iron.

     

    An earnest Dork impression. Fox Glacier.

     

    Franz Joseph Glacier.

     

    Inside an abandoned Gold Mine. Near Greymouth.

     

    Steps carved into the Pancake Rocks. Punakaiki.

     

    Sea-spray through a blow hole. Punakaiki.

     

    There are no photos from the Ferry crossing at the end of the trip, but it was eventful enough to remember without them.  We crossed in 50-55 knot gales, so at least half of the passengers got seasick …Myself included.

     
  • Ben 8:23 on Friday, November 11, 2011 Permalink | Reply
    Tags: , , , , , Survey Walls   

    Looks like Google is getting into the Survey Business 

    From Neiman Journalism Lab:

    Google appears to be experimenting with a new paywall-esque content roadblock for publishers, and it’s not One Pass. For lack of a better name, let’s call it a “survey wall,” because instead of dollars the system asks readers a question before they can move on to continue reading what they like.

    This could get interesting.  Instead of a standard paywall, people may be able to ‘pay’ for content by answering survey questions.  The publisher gets valuable information it can on-sell to advertisers, and Google dulls the old-media knives that are increasingly aimed at its vital organs. A natural extension of this would be that the publisher would become a survey panel provider of sorts.  Survey companies would be able to buy access to the survey-wall to ask their own questions for a fee-per-answer.  There is also no reason why independent panel companies could attempt to step into the role Google appears to be playing as the third-party technology provider.

    Of course, there are big questions about the quality of data that may come from these distributed surveys.

    • Would people answer honestly?
    • What can reasonably be done with one or two answers from each visitor? (e.g., it would be difficult to examine relationships between more than a couple of variables)
    • Why would we expect people who visit survey-wall sites to be representative of a given population?
    These, and other questions, will keep survey methodologists in business for a while :)
     
    • davidwallacefleming 9:00 on Friday, November 11, 2011 Permalink

      Valuable information to stay appraised of. Thank you. I hope this does not get implemented.

  • Ben 10:24 on Monday, May 9, 2011 Permalink | Reply  

    Some ideas are timeless 

    Came across this Roosevelt quote from 1910 the other day.  Couldn’t agree more.

    It is not the critic who counts; not the man who points out how the strong man stumbles, or where the doer of deeds could have done them better. The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood; who strives valiantly; who errs, who comes short again and again, because there is no effort without error and shortcoming; but who does actually strive to do the deeds; who knows great enthusiasms, the great devotions; who spends himself in a worthy cause; who at the best knows in the end the triumph of high achievement, and who at the worst, if he fails, at least fails while daring greatly, so that his place shall never be with those cold and timid souls who neither know victory nor defeat.

    Theodore Roosevelt
    Excerpt from the speech “Citizenship In A Republic”, delivered at the Sorbonne, in Paris, France on 23 April, 1910

     
  • Ben 20:04 on Tuesday, February 15, 2011 Permalink | Reply
    Tags: Fun   

    Gordon at the Mission 

    We went to see Sting (Gordon Sumner) at the annual Mission Estate concert on the weekend; it was a stunning event with a mix of old stuff from his time with The Police through to his early and then later solo singles.  The weather held up admirably the entire time we were in the Hawkes Bay, and since we were staying in Havelock North we managed to have a bit of a tour around the area.  A couple of shots from my mobile below…

    A segment of around 25,000 fellow concert-goers

     

    A large crowd descends on the Hastings farmers market the following day

     

    Havelock North's town centre

     

    They clearly take pride in their village

     
  • Ben 13:47 on Monday, January 24, 2011 Permalink | Reply
    Tags: AppEngine, , Random Selection   

    Selecting Randomly from the Appengine Datastore… 

    I’ve been working with Google Appengine a little lately and thinking about how I might go about randomly selecting records (entities) from a larger logical group of related records (a table in RDBMS terminology, or a set of entities of the same kind in datastore terminology).  The datastore is not really structured to easily or efficiently enable this out-of-the-box.  To be fair, neither are RDBMS systems.  Yet, there are a range of reasons why you might want to get records at random from a stored set of data.  For instance, you might want to take a representative sample out to:

    • perform some statistical modelling;
    • allocate records to some testing groups (e.g., for split testing); or
    • process changes to the set in chunks that fit within some processing or quota limit.

    One of the mantras of Appengine data modelling is to ‘stop worrying about disk space and denormalise‘ (yes, there are other reasons to worry about denormalisation, but you are also forced to get over those if you are developing on BigTable).

    So, rather than attempt to deal with random selection down the line when I actually need the random records, my approach is to support this functionality up-front in the design of my data models.  How?  By allocating a set of random numbers to every entity created.  Specifically, I’m setting up the following properties on all models (entity kinds) I might conceivably want to sample from in future:

    randomnum = db.FloatProperty()
    randomnum1000 = db.IntegerProperty() # entities will be randomly allocated to 1 of 1000 bins in this set
    randomnum10000 = db.IntegerProperty() # entities will be randomly allocated to 1 of 10000 bins in this set

    …And then allocating the random number and associated bins when the entity is first saved to the datastore. Note, you’ll need to import floor from the standard math module, and import the random module.

    random.seed()
    self.randomnum = random.random()
    self.randomnum1000 = int(floor(self.randomnum*1000))
    self.randomnum10000 = int(floor(self.randomnum*10000))

    This will provide for simple 1/1000 or 1/10000 random selection from entities of the same kind in my datastore; for 1/1000, pick a random number between 0 and 999 and select all records that have that number in the random1000 property.  It should also scale fine, and will be tolerant to the deletion of entities since deletes will be at random with respect to the random groups.  This means each random bin will stay roughly the same size relative to the other bins in each set over time.  I’ve kept the full random number in case I ever want to create more random bin sets, but other sample sizes could also be accommodated within the bin sets I have.  For instance, if I want to select one entity at random I could first select a bin from the 1/10000 bin set and, once I have those back from the datastore, randomly select an entity from the returned bin.

    Of course, this technique won’t generate perfectly random selections because the random number generator is only pseudo-random and the bin that an entity is initially allocated to affects its chances of being individually selected from the whole. Nevertheless, it will be close enough for what I can imagine I might want to do with the data.

    If anyone reading has an alternative solution to random selection from the datastore I’d be really interested to hear it.

    _____

    ShortURL for this post: http://wp.me/pnqr9-7m

     
  • Ben 19:07 on Sunday, January 2, 2011 Permalink | Reply  

    Tricksy Chinese Spammers and your Gmail Account 

    Hot on the heels of my last post about strong passwords comes news of my partner’s Gmail account being hacked over Christmas.  No, she doesn’t read this blog (neither does my mother), so the password was not particularly strong.

    I’m sure such a fate will not befall you, dear reader, since you are armed to the hilt with good password generation advice. Nevertheless, you may find it interesting to hear what the digital deviant got up to once they gained access to her account.

    The first sign something was up came when I received an email message titled ‘Merry Christmas‘ from my better half gushing about the brilliant Macbook Pro we’d purchased from website X.  Her knowledge of the machine’s features was quite good, especially since we hadn’t bought any computing equipment from anywhere in recent memory.  I fired back a six-worded reply; “You need to change your password“.  It turns out a few people from her address book were more forgiving and simply queried whether she had intended to send the message.

    Thankfully, Gmail notified her when she logged in with a nice big red banner message saying the account had been accessed from China.  A quick look at the ‘last account activity’ log (see the link in the footer of your Gmail screen) confirmed that the account had been opened via a couple of different Chinese IP addresses around the 27th and 28th of December.  The glaring red ‘this seems odd‘ notification instructed her to change her password if the access was unexpected, which she dutifully did.

    Funnily enough my message, and those from others, never hit her inbox.  After a little digging we found that they were instead automatically archived and forwarded to a random Yahoo! email account – a setting that remained active even after the password had been changed.  Had we not been living together it may have been a little while before she realised the account was compromised in this way.  The spam messages sent from her account had also been deleted, along with everything in her trash folder.  Tricksy.

    To cut the story short, we had to trawl through her account settings with a fine-toothed comb to make sure all filters, forwards and addons were legit in addition to the password change.  Your loved ones should do the same if they get the red message from Google.

    At this point we don’t know if her email history has been downloaded.  There would have been nothing stopping the spammer from doing so.  Thankfully the password was only used on one other (non-critical) site, so the damage should be limited.  Still, it is a nice reminder of the down-sides of cloud-based storage.

    _____

    ShortURL to this Post: http://wp.me/pnqr9-7d

     
  • Ben 9:58 on Sunday, December 5, 2010 Permalink | Reply
    Tags: DDoS, , Security, Strong Passwords   

    Avoiding your own Private Cablegate 

    Distributed Denial of Service (DDoS) attacks have hit the mainstream news again recently in stories about attempts by [insert shadowy US government organisation here]  to bring down the Wikileaks website after its release of the initial cablegate records.

    Reports rightly focus on the attacks themselves and their effect on the Wikileaks site, rather than the BotNets behind many DDoS events or the other malicious ends to which they can be put.  However, it is interesting  that the technology being used to hinder the Wikileaks distribution is the same used by criminals to gather private information on a massive scale, for release to the highest bidder.

    In early 2009  some researchers at the University of California, Santa Barbara managed to take over part of a BotNet being used to steal private information such as passwords and credit card numbers.  You can see Richard A. Kemmerer, a member of the research group, explaining the 10-day exploit in this Google talk.  As part of the experiment, the group analysed encrypted passwords stolen by the BotNet to see how easily each user’s data could be cracked.  They found that of 173,686 unique passwords discovered, just under 58% could be cracked within 24 hours (56k  -  about 32%  -  were able to be cracked within 65 minutes).  Further analysis also revealed that 28% of people reused the same password on multiple domains.  So, there are some relatively easy pickings for BotNet creators to harvest and on-sell.

    The results aren’t particularly surprising; we are a pretty lazy bunch in general and there are so many points online and offline at which passwords are required to access content.  The effort required to generate different memorable, but secure, passwords is high.  Yet, the risks associated with not having strong passwords are rising as we move more of our digital lives to the cloud.  So, here are a couple of ideas I’ve gathered for generating memorable, strong passwords with minimal effort.

    • Create an acronym from a phrase.  For instance, “Please Let Me In To Twitter So I Can See Some Tweets” would translate to “plmittsicsst”.  You can vary the phrase easily enough for different sites.  When combined with symbol or number replacement and a sprinkling of upper-case letters, this can generate strong passwords quickly.  So, our string above could become “Plm!tt%!c%%T” if we capitalise the first and last letters and replace i and s with the shift symbols for 1 and 5 (which look like i and s).
    • Create a password base that you use everywhere, then mix in a site-specific password with that.  For example, you could take the first and last two characters of the street you grew up on along with the last two digits of your old student ID to get a base (e.g., “adde32″) then append the reverse of the consonants from the site name to this (e.g., “rttwt” for twitter).  Sprinkle with symbol replacement and you get “$dd#32rttwt”.  For good measure you can add a prefix and suffix symbol to add extra security “#$dd#32rttwt#”
    • Use a password manager that generates random passwords :)

    You can test out different approaches to get a feel for how secure the passwords you generate are at this site: http://howsecureismypassword.net/.  Obviously, you shouldn’t put any of your real passwords in, but you can use it to test out ideas using fake details or phrases.

    _____

    ShortURL to this post: http://wp.me/pnqr9-4D

     
  • Ben 18:59 on Sunday, November 7, 2010 Permalink | Reply  

    20 GOTO 10 

    Thankfully I only had to sit through one or two educational videos like this in my school years.   They made just as much sense.

    And the reference to BASIC in the first 20 seconds brings back waaay to many geeky memories.

    _____

    ShortURL to this post: http://wp.me/pnqr9-6r

     
    • minofgeek 11:27 on Tuesday, November 9, 2010 Permalink

      The BBC Acorn Model B shown in the early moments of that segment is a classic. Every child should have one; forget about OLPC.

  • Ben 11:42 on Sunday, September 26, 2010 Permalink | Reply  

    Google Font API: Why didn’t I think of that? 

    A few months back Google released a font API to give designers access to a wider range of royalty-free native typographical options in their web pages and apps.  The font library is still fairly sparse, which will hamper widespread developer uptake quite a bit in the short term.  So, my initial thoughts were that this was a half-arsed response to Apple’s prior foray into the area around the time the iPad was released.

    However, just this week I saw a Google blog post announcing inclusion of the fonts across the Google docs suite.   The fonts will no doubt also be made available to Android developers in native form.  Now the API makes a lot more sense to me: it is necessary for Google to compete with Microsoft in the productivity suite area and with Apple in the mobile device area.  The API and related font set will probably be given a lot more internal support than I’d originally anticipated.  Nice.

    _____

    ShortURL to this post: http://wp.me/pnqr9-5w

     
  • Ben 11:09 on Sunday, August 29, 2010 Permalink | Reply
    Tags: , , ,   

    Old School Data Visualisation (Part 2) 

    A quick follow-up to the previous post on the power of data reduction and presentation… here is another example showing how rounding, ordering and thoughtful presentation can turn an incomprehensible grid of numbers into something most people can grok.

    It is from the same article (Ehrenberg, Feb 1992, The Problem of Numeracy, AdMap), but this time relates to television programme viewership.  The first table presents detailed correlations for responses to the question ‘I really like to watch programme x‘ across a range of programmes and two channels (ITV and BBC).

    Apart from an obvious diagonal line of 1.000 in the table (of course each programmes’ rating correlates perfectly with itself), there isn’t much else you can take out from it.  The next table renders the data a little more readable by introducing rounding to one decimal place, discarding the redundant leading zeros and disposing of the meaningless 1.000 diagonal.

    And with a little more thought to row order, spacing and the key data for presentation (i.e., do we really need channel?), we get to the following:

    Those familiar with television in the UK will now see that people who like to watch one sport programme also like to watch other sports programmes, particularly if they are ’round up’ type shows.  They don’t, however, like news or current events programmes so much.  A similar pattern occurs for current event watchers, but the programmes within that cluster have slightly lower correlations, meaning viewership is less likely to be homogeneous amongst that group.  If you are an advertiser or producer, this is useful stuff to know because it will give you an idea of the reach of, and competition around, a certain programme.  And you are more likely to understand this if the data is presented in a clear and concise way.

    _____

    ShortURL for this post: http://wp.me/pnqr9-67

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel
Follow

Get every new post delivered to your Inbox.