Lemonodor: Morning Improv RSS Feed

September 06, 2005

Morning Improv RSS Feed

I'm generating an RSS feed for Scott McCloud's Morning Improv blog.

I've got Lisp code scraping his HTML to create the feed, just like I do for the DARPA Grand Challenge feed, so there are some limitations. In particular, again there are no permalinks, but I'm hoping to talk to Scott and get that fixed. Or at least semi-fixed.

Aaron: Scott has officially left the no-RSS ghettto.

The feed is already aggregated by LiveJournal, if you're into that.

Posted by jjwiseman at September 06, 2005 03:40 PM

Comments

Hi John,

I too wrote an RSS feed for someone's blog a couple of weeks ago. (I tried to tell you on the 6th and 7th, but I couldn't access your blog from work -- my work's connection? Anyway ...) I used pxmlutils to read the HTML, but surmised I had to write my own "accessors" to get elements out of the DOM tree (effectively), which was not hard but I thought pxmltuils (or some other package) would already have such routines. If you know about any I should have used, please let me know.

The DOM element accessor works by giving it a path of an element in the DOM tree to retrieve (I'm sure you've seen this before). For instance I learned where the blog title was located by reading in the blog via 'parse-html' as LHTML into the emacs scratch buffer and cruising it using the emacs sexp navigation functions. This was *much* easier than even reading prettified HTML to locate blog content components. So, I defined the following path to the blog title based on my scratch buffer cruising.

(defparameter *blog-title-path* '(:html :head (:body :p (:div (:center (:table (:tbody (:tr :td :td (:td :p :p 'string :p)))))))))

Then to get at that element I said

(path->element *blog-title-path*)

There are other details to talk about, such as text mastication routines. If you want, can we compare notes? I'm a relative newbie, so beware that you may not profit as much as I from such an exchange. :-)

Posted by: Rick Hanson on September 11, 2005 08:49 AM