Parsing Digg

A reader just let me know that my repackaged Digg feed recently stopped working.

It turns out the HTML on the Digg web site is now so bad that even Beautiful Soup can’t parse it.

I’ve written about making Beautiful Soup even more tolerant before. Shortly after I posted that information, Leonard Richardson explained why Beautiful Soup 3.1 was failing on malformed pages. Rather than hack around with the parsing infrastructure as I had done before, I’ve just taken his advice and downgraded to Beautiful Soup version 3.0.7a.

The digg-direct feed should now be working again. Apologies for the outage.