Tagged: Google Toggle Comment Threads | Keyboard Shortcuts

  • Ben 10:03 on Saturday, September 1, 2012 Permalink | Reply
    Tags: , Google,   

    Optimization: Even the oldies are doing it 

    Two things struck me when I saw this AdSense optimization ad…

    First, the age profile of the first two cases was toward the high end.  Google is probably trying to target older website owners, but it is still worth  remembering that the urge to test empirically does not necessarily discriminate by age.  They did manage to find two cases for their ad, after all :)

    Second, the fact that Google is running ads focussed on getting people to try out different ad formats suggests they are pretty sure a large proportion of current placements are sub-optimal.  More testing and optimisation of AdSense placements by website owners means more revenue for both parties.  Unless, of course, the website owner has other revenue streams that may be cannibalised by changes to the ad formats they have on their site…

     
  • Ben 19:47 on Tuesday, March 13, 2012 Permalink | Reply
    Tags: , , Google   

    Data driven decision making 

    Some real-world examples from one of the exemplars:

    Originally found at:


    http://insidesearch.blogspot.co.nz/2012/03/video-search-quality-meeting-uncut.html?m=1


    http://insidesearch.blogspot.co.nz/2011/08/another-look-under-hood-of-search.html

    See also, Google’s User Interface Design and Decision Process.

     

     
  • Ben 8:23 on Friday, November 11, 2011 Permalink | Reply
    Tags: , Google, , , , Survey Walls   

    Looks like Google is getting into the Survey Business 

    From Neiman Journalism Lab:

    Google appears to be experimenting with a new paywall-esque content roadblock for publishers, and it’s not One Pass. For lack of a better name, let’s call it a “survey wall,” because instead of dollars the system asks readers a question before they can move on to continue reading what they like.

    This could get interesting.  Instead of a standard paywall, people may be able to ‘pay’ for content by answering survey questions.  The publisher gets valuable information it can on-sell to advertisers, and Google dulls the old-media knives that are increasingly aimed at its vital organs. A natural extension of this would be that the publisher would become a survey panel provider of sorts.  Survey companies would be able to buy access to the survey-wall to ask their own questions for a fee-per-answer.  There is also no reason why independent panel companies could attempt to step into the role Google appears to be playing as the third-party technology provider.

    Of course, there are big questions about the quality of data that may come from these distributed surveys.

    • Would people answer honestly?
    • What can reasonably be done with one or two answers from each visitor? (e.g., it would be difficult to examine relationships between more than a couple of variables)
    • Why would we expect people who visit survey-wall sites to be representative of a given population?
    These, and other questions, will keep survey methodologists in business for a while :)
     
    • davidwallacefleming 9:00 on Friday, November 11, 2011 Permalink

      Valuable information to stay appraised of. Thank you. I hope this does not get implemented.

  • Ben 19:39 on Saturday, April 16, 2011 Permalink | Reply
    Tags: App Engine, , Full Text Search, Google, PySolr, , Solr   

    Implementing Full Text Search on Google App Engine 

    [Update May 2012: App Engine now has a native full text search API!  I've left this post here in case people are still interested in how to get something set up without using that API.]

    Despite being a product of search giant Google, App Engine doesn’t yet provide in-built support for full-text searching of substantial strings in your datastore entities.  There are a few approaches to building your own, which involve using equality filters to search on the start of a string or ListProperties to hold lists of terms garnered from your text (as long as you stay within the limits of allowable indexes for a given enitity).  However, if you want to be able to run an index on larger documents or support more advanced search features like faceting and scoring you may find yourself scratching your head.  Unfortunately, the sandbox environment of GAE also restricts your ability to employ third-party open source search solutions like Lucene.

    Native full text search functionality will no doubt come to App Engine in due course.  But in the meantime my solution has been to use a remote-hosted Solr instance from WebSolr and a slightly modified version of PySolr to get the job done.  Why PySolr rather than other Python-based interface packages like Haystack or Sunburnt?  The simple reason is that none of these will work out-of-the box on App Engine and PySolr was the simplest of them all to modify for my (relatively modest) needs.  You can grab a copy of the PySolr code modifed for App Engine if you want it.

    Here’s a quick overview of my setup in case you are looking to do something similar.  I use Django as my framework, so your specifics may vary.

    1. Put a copy of PySolrGAE in your app directory so you’ll be able to import the module into your views as needed.
    2. Add the following variables in your settings:
      SOLR_PATH = ‘http://index.websolr.com/solr/%5Byourkey%5D/’
      SOLR_BATCH_SIZE = 100
      MAX_RESULT_SIZE = 100
      (obviously, your values will differ!)
    3. Set up a schema document (XML) and put it up on your Solr instance so it knows what particular fields you will be passing to it and how it should tokenise, stem and otherwise work its magic on the text within them.  The Solr documentation is pretty good, so it is easy to pick up.
    4. Import the module (e.g. from apps.search.pysolrGAE import Solr) into your views and use it to interface with your solr instance.  The ‘readme’ included with the modified PySolr code gives an overview of the syntax for adding, deleting and modifying entries in your index.  I’ve managed to set up views to delete the index, re-create it, and return results which are then passed to a template.  You can also set up a hook in the ‘save’ method of your models to incrementally add/modify or delete items depending on what you’ve done to a particular entity.

    One of the nice things about Solr is that you can pass it a field which will not be indexed but is stored alongside an entry.  You can get this field returned as part of a query response.  Hence, you can set up an HTML rendered version of the search result snippet for a particular entry and pass it to Solr at the time you add the entry to the index.  Then, when you run a query you can get that field back and simply pass it through to your template.  This saves you a round trip to the datastore to get a copy of the entity for presentation.  Sweet!

     
  • Ben 13:47 on Monday, January 24, 2011 Permalink | Reply
    Tags: AppEngine, Google, Random Selection   

    Selecting Randomly from the Appengine Datastore… 

    I’ve been working with Google Appengine a little lately and thinking about how I might go about randomly selecting records (entities) from a larger logical group of related records (a table in RDBMS terminology, or a set of entities of the same kind in datastore terminology).  The datastore is not really structured to easily or efficiently enable this out-of-the-box.  To be fair, neither are RDBMS systems.  Yet, there are a range of reasons why you might want to get records at random from a stored set of data.  For instance, you might want to take a representative sample out to:

    • perform some statistical modelling;
    • allocate records to some testing groups (e.g., for split testing); or
    • process changes to the set in chunks that fit within some processing or quota limit.

    One of the mantras of Appengine data modelling is to ‘stop worrying about disk space and denormalise‘ (yes, there are other reasons to worry about denormalisation, but you are also forced to get over those if you are developing on BigTable).

    So, rather than attempt to deal with random selection down the line when I actually need the random records, my approach is to support this functionality up-front in the design of my data models.  How?  By allocating a set of random numbers to every entity created.  Specifically, I’m setting up the following properties on all models (entity kinds) I might conceivably want to sample from in future:

    randomnum = db.FloatProperty()
    randomnum1000 = db.IntegerProperty() # entities will be randomly allocated to 1 of 1000 bins in this set
    randomnum10000 = db.IntegerProperty() # entities will be randomly allocated to 1 of 10000 bins in this set

    …And then allocating the random number and associated bins when the entity is first saved to the datastore. Note, you’ll need to import floor from the standard math module, and import the random module.

    random.seed()
    self.randomnum = random.random()
    self.randomnum1000 = int(floor(self.randomnum*1000))
    self.randomnum10000 = int(floor(self.randomnum*10000))

    This will provide for simple 1/1000 or 1/10000 random selection from entities of the same kind in my datastore; for 1/1000, pick a random number between 0 and 999 and select all records that have that number in the random1000 property.  It should also scale fine, and will be tolerant to the deletion of entities since deletes will be at random with respect to the random groups.  This means each random bin will stay roughly the same size relative to the other bins in each set over time.  I’ve kept the full random number in case I ever want to create more random bin sets, but other sample sizes could also be accommodated within the bin sets I have.  For instance, if I want to select one entity at random I could first select a bin from the 1/10000 bin set and, once I have those back from the datastore, randomly select an entity from the returned bin.

    Of course, this technique won’t generate perfectly random selections because the random number generator is only pseudo-random and the bin that an entity is initially allocated to affects its chances of being individually selected from the whole. Nevertheless, it will be close enough for what I can imagine I might want to do with the data.

    If anyone reading has an alternative solution to random selection from the datastore I’d be really interested to hear it.

    _____

    ShortURL for this post: 
    http://wp.me/pnqr9-7m

     
  • Ben 19:43 on Saturday, May 22, 2010 Permalink | Reply
    Tags: , Google,   

    Two More Google-Related Links 

    • Get access to Google’s predictive modelling capabilities with their Prediction API
    • And use their developer storage to hold your data (essentially an Amazon AWS competitor)
     
  • Ben 11:53 on Saturday, May 8, 2010 Permalink | Reply
    Tags: , Google,   

    Google’s User Interface Design and Decision Process 

    Here is a link worth keeping.  Google recently updated the look and feel of its search user interface.  This article describes the behind-the-scenes process Googlers followed to get to the end point we are all seeing today.  Unsurprisingly, they followed a thorough research process, incorporating extensive qualitative and quantitative feedback before settling on an optimal solution.

    How Google got its New Look.

     
  • Ben 9:46 on Saturday, October 31, 2009 Permalink | Reply
    Tags: , , Evidence-Based Policy, Fraud Detection, Google, GPS, Mobile   

    Link Post: Google GPS, Fraud Detection and PolitiScience 

    A number of interesting links came through the Twitterverse this morning, so I’m putting them here to share/remember.

    Enjoy!

    _____

    ShortURL for this post:
    http://wp.me/pnqr9-3g

     
c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel
Follow

Get every new post delivered to your Inbox.