Tagged: Django RSS Toggle Comment Threads | Keyboard Shortcuts

  • Ben 19:39 on Saturday, April 16, 2011 Permalink | Reply
    Tags: App Engine, Django, Full Text Search, , PySolr, , Solr   

    Implementing Full Text Search on Google App Engine 

    Despite being a product of search giant Google, App Engine doesn’t yet provide in-built support for full-text searching of substantial strings in your datastore entities.  There are a few approaches to building your own, which involve using equality filters to search on the start of a string or ListProperties to hold lists of terms garnered from your text (as long as you stay within the limits of allowable indexes for a given enitity).  However, if you want to be able to run an index on larger documents or support more advanced search features like faceting and scoring you may find yourself scratching your head.  Unfortunately, the sandbox environment of GAE also restricts your ability to employ third-party open source search solutions like Lucene.

    Native full text search functionality will no doubt come to App Engine in due course.  But in the meantime my solution has been to use a remote-hosted Solr instance from WebSolr and a slightly modified version of PySolr to get the job done.  Why PySolr rather than other Python-based interface packages like Haystack or Sunburnt?  The simple reason is that none of these will work out-of-the box on App Engine and PySolr was the simplest of them all to modify for my (relatively modest) needs.  You can grab a copy of the PySolr code modifed for App Engine if you want it.

    Here’s a quick overview of my setup in case you are looking to do something similar.  I use Django as my framework, so your specifics may vary.

    1. Put a copy of PySolrGAE in your app directory so you’ll be able to import the module into your views as needed.
    2. Add the following variables in your settings:
      SOLR_PATH = ‘http://index.websolr.com/solr/yourkey/’
      SOLR_BATCH_SIZE = 100
      MAX_RESULT_SIZE = 100
      (obviously, your values will differ!)
    3. Set up a schema document (XML) and put it up on your Solr instance so it knows what particular fields you will be passing to it and how it should tokenise, stem and otherwise work its magic on the text within them.  The Solr documentation is pretty good, so it is easy to pick up.
    4. Import the module (e.g. from apps.search.pysolrGAE import Solr) into your views and use it to interface with your solr instance.  The ‘readme’ included with the modified PySolr code gives an overview of the syntax for adding, deleting and modifying entries in your index.  I’ve managed to set up views to delete the index, re-create it, and return results which are then passed to a template.  You can also set up a hook in the ‘save’ method of your models to incrementally add/modify or delete items depending on what you’ve done to a particular entity.

    One of the nice things about Solr is that you can pass it a field which will not be indexed but is stored alongside an entry.  You can get this field returned as part of a query response.  Hence, you can set up an HTML rendered version of the search result snippet for a particular entry and pass it to Solr at the time you add the entry to the index.  Then, when you run a query you can get that field back and simply pass it through to your template.  This saves you a round trip to the datastore to get a copy of the entity for presentation.  Sweet!

     
  • Ben 16:53 on Tuesday, July 7, 2009 Permalink | Reply
    Tags: Django, Flatpages   

    Getting a ‘No FlatPage matches the given query’ error? 

    This may be useful if you are a Django newb.  All others… this will probably be gibberish.

    Are you are working through James Bennett’s ‘Practical Django Projects’ (second edition) and getting the above error when trying to view your first flatpage?  This could be because you did not put leading and trailing forward slashes in the flatpage URL field when setting up your flatpage in the admin panel.

    The book actually tells you to do this in the example it gives, but in my speed-reading I ignored the slashes and treated the URL field as though I was entering a wordpress page slug.  Four hours, a django reinstall, and much angst later, I have relearned the lesson that it always pays to read the instructions carefully.

    Perhaps you can avoid my mistake.
    _______
    Short URL for this post: http://wp.me/pnqr9-P

     
    • Paul 23:43 on Saturday, August 15, 2009 Permalink

      I have put the slashes but still throwing the same error!!
      /first-page/

    • Ben 16:24 on Monday, August 17, 2009 Permalink

      Hmmm. When I was hunting around I saw a couple of other possible causes for this, but I suspect you’ve already come across those. From memory I eventually found the source of my problem by playing around in the admin panel – after changing the page title a few times and trying to view the page from the admin panel I realised django wasn’t rendering the url correctly. That at least narrowed it down to a pattern/url problem rather than there being no communication with the flatpage ap at all. Maybe a similar approach would help you narrow down the source of your issue.

    • Raj 13:14 on Thursday, May 27, 2010 Permalink

      That was totally it! Thanks!

    • Scott Crosby 2:57 on Saturday, June 12, 2010 Permalink

      Thanks! You saved me hours :-)

    • Michael 16:06 on Thursday, July 15, 2010 Permalink

      I just came across your post and found a different problem with the same symptoms, so I wanted to post a comment to help out anyone in the future who stumbles across this: I received the same error because I had added “127.0.0.1:8000″ as a separate site, rather than editing “example.com”, so my site ID was 2, rather than 1. Instead of modifying my settings.py file to have SITE_ID = 1, I went to the shell and changed the localhost site to have an id of 1, and then it worked.

    • Duy 19:29 on Tuesday, October 26, 2010 Permalink

      Ha! Thanks a lot. This is what I’m looking for!

    • Srinivasa 1:30 on Friday, December 10, 2010 Permalink

      Solution provided by Machael and Ben works in different scenarios, but they are right. Thank you for the solution. This will help people who ignore few things while reading the ‘Practical Django Projects’ second edition.

    • Patrick 22:05 on Wednesday, February 9, 2011 Permalink

      After days of debugging and trying to figure out why “get_object_or_404()” in django/contrib/flatpages/views.py was returning “No FlatPage matches the given query” I stumbled across this page. Thank you! Why is flatpages not more forgiving! (why does it not just append a slash if it doesn’t exist!!)

    • Ben 7:28 on Thursday, February 10, 2011 Permalink

      Glad it helped Patrick. I’ve not looked at what it does for this sort of error, but you can add APPEND_SLASH = True to your settings.py to auto-append trailing slashes to incoming urls. This will only work if you have django.middleware.common.CommonMiddleware installed in your middleware settings, but it might sort out the slash-sensitive flatpage issue.

    • Nai 21:49 on Friday, March 4, 2011 Permalink

      Had the same problem, different solution. I had to add ‘django.contrib.flatpages.middleware.FlatpageFallbackMiddleware’, to my middleware in settings.py

      Hope this helps someone too

    • Mark Andrews 10:57 on Saturday, July 23, 2011 Permalink

      thanks, man! that was driving me nuts!

    • Vitalii 23:57 on Sunday, August 28, 2011 Permalink

      Very, very, very big THANKS to Michael!!!!!

    • Nima 6:41 on Saturday, December 3, 2011 Permalink

      Hi

      I have same problem

      I set SITE_ID to 1
      and also delete example.com
      and I am sure about slashes :)

      but still same problem :(

    • mert ozcan 3:48 on Sunday, January 15, 2012 Permalink

      I realized the book is not clear about the directory where we should place default.html file..
      Its basically like this.. ..cms/first-page/flatpages/default.html

c
compose new post
j
next post/next comment
k
previous post/previous comment
r
reply
e
edit
o
show/hide comments
t
go to top
l
go to login
h
show/hide help
shift + esc
cancel
Follow

Get every new post delivered to your Inbox.