Just keeping for later: Public datasets hosted on Amazon AWS. https://aws.amazon.com/datasets
Good real-world datasets used to be quite hard to come by for those interested in playing with different modelling approaches. However, sites like Kaggle, which expand the crowdsourcing approach to model improvement initiated by events such as the Netflix prize and KDD Cup, are opening up more datasets for statistical modellers to use. Worth keeping an eye on.