Lemonodor: Scraping DARPA

March 16, 2005

Scraping DARPA

DARPA's official Grand Challenge site has a web-based forum for discussion among people working on entries. I've made it easier to find and read new posts by scraping the site and creating an RSS feed.

The feed is updated hourly. DARPA's forum software does not provide permanent (or even temporary) links to individual posts, so each RSS item is linked to the discussion topic page from which it came.

I was toying with the idea of scraping every team website I could find (there are currently 136 teams in the pipeline), but after looking at one terrible, terrible website after another I'm not sure I have the stomach for it. Too bad they didn't go with a free weblog hosting service, and get automatically generated RSS/Atom feeds as part of the deal.

Posted by jjwiseman at March 16, 2005 02:05 PM

Comments

John,

We are trying to be light on the web thing, so we went for a blog type of site:

http://pegasusbridge.blogspot.com

I too find it cumbersome to go through DARPA's web site and its "leaving DARPA web site page" when you go to other teams site.

Igor.

Posted by: Igor on March 16, 2005 02:42 PM

Where an I get the code for this forum scraper? What license is it under? I assume it's written in lisp.

Posted by: Andy B on July 17, 2007 09:29 AM

Whoa, a blast from my Chicago past.

Andy, the code is actually in Python. I'll send you a copy.

Posted by: John Wiseman on July 17, 2007 10:29 AM