Lemonodor: Montezuma Benchmarking

November 20, 2006

Montezuma Benchmarking

I finally got around to doing some performance comparisons of Montezuma, Ferret and Lucene.

I'm glad that Montezuma is pretty close to Lucene, but wow, Ferret's performance is just awesome (helped by 65,000 or so lines of C code).

Posted by jjwiseman at November 20, 2006 04:28 PM

Comments

can we "steal" that 65,000 lines of C ? assuming that it's good quality code with public api and etc.

Posted by: paipas on November 20, 2006 04:59 PM

Motezuma 1.2? Is there any chance to get the code? I'd really like to be able to check out the source code from your VCS.

Posted by: R. Mattes on November 21, 2006 06:39 AM

may be this could help :)
http://jsnell.iki.fi/blog/archive/2006-11-19-sb-sprof.html

Posted by: dig on November 21, 2006 08:59 AM

I'm hoping that your goals stay the same after this benchmark: to get 10x boost in perfomance! Incredible! :)

"I'm hoping Montezuma will have better performance than both Ferret and Lucene, not because I'm doing anything fancy, but because I'll be relying on native code-generating Common Lisp implementations."

Posted by: on November 21, 2006 11:01 AM

Do you have the suite of code and datasets you used for the benchmark handy? I'd like to play with it.

Posted by: on November 21, 2006 01:13 PM

First, you can download Montezuma manually[1] or just asdf-install it. Then follow the instructions at the Ferret wiki[2] to download the Reuters corpus and extract the article files (using Dave's little Ruby extraction program[3]). The Montezuma indexer for the Reuters corpus is available from the Montezuma site[4].

The "Montezuma 0.1.2" referred to in my tests is just a slight update to Montezuma 0.1.1; I'll release it officially soon.

[1] http://lemonodor.com/code/montezuma-0.1.1.tar.gz
[2] http://ferret.davebalmain.com/trac/wiki/FerretVsLucene
[3] http://ferret.davebalmain.com/trac/wiki/ReutersExtractionScript
[4] http://projects.heavymeta.org/montezuma/browser/trunk/montezuma/tests/corpora/reuters-21578/indexer.lisp

Posted by: John Wiseman on November 21, 2006 01:21 PM

I find those 65k of C code to be pretty unreadable. every comment is like /* number */ or /* 65 */ I was going to make some erlang bindings for cferret but was frightened away. Maybe I'll grow a pair and start wading through it.

Hey, I'm just now switching back to the Mac after a year-long journey on ubuntu. What are the cool kids using for lisp on these fancy intel macs?

Posted by: Steve Jenson on November 21, 2006 09:57 PM

take a look at SBCL. they have wonderful documentation and prebuilt binaries which are a breeze to install. just download http://prdownloads.sourceforge.net/sbcl/sbcl-0.9.18-x86-darwin-binary.tar.bz2?download
and follow the included instructions.

Posted by: Justin Giancola on November 22, 2006 07:26 AM