I decided to spend a little time optimizing Montezuma indexing for speed, with the hope that I could get it to index the Planet Lisp archives within a reasonable amount of time.
Indexing 1000 pastes took about 100 seconds, as I mentioned the other day. Avoiding 6 million garbage-generating calls to subseq saved 10 seconds. Cutting down on the number of calls to string< and string> saved 4 seconds. Each was a significant speedup, but nothing dramatic.
After doing some of this benchmarking, the laptop I was using overheated and shut down. Since this time there were no cats lying on top and therefore no obvious scapegoats, I thought maybe the vents had just gotten clogged and so I blasted the hell out of it with canned air.
The comprehensive air blast cleaning cut 45 seconds off the benchmark time. It was like getting a processor upgrade.
Posted by jjwiseman at May 26, 2006 02:36 PMAll my old Lisp-hacker cronies are doing search now!
I liked this from the project page: "Montezuma is a Common Lisp port of Ferret. Ferret is a Ruby port of Lucene. Lucene is sort of Doug Cutting's Java version of Text Database (TDB), which he and Jan Pedersen developed at Xerox PARC, and which, to complete the circle, was written in Common Lisp (see "An Object-Oriented Architecture for Text Retrieval")."
ObYahooPlug: Doug Cutting is now a Yahoo! employee (though he's doing exactly the same open-source project work he was doing before), and Jan Pedersen runs the Relevance group at Yahoo! (he's my boss's boss).
Posted by: Tim Converse on May 26, 2006 03:43 PM