November 20, 2006
Montezuma Benchmarking

montezuma performance comparison chart

I finally got around to doing some performance comparisons of Montezuma, Ferret and Lucene.

I'm glad that Montezuma is pretty close to Lucene, but wow, Ferret's performance is just awesome (helped by 65,000 or so lines of C code).

Posted by jjwiseman at November 20, 2006 04:28 PM
Comments

can we "steal" that 65,000 lines of C ? assuming that it's good quality code with public api and etc.

Posted by: paipas on November 20, 2006 04:59 PM

Motezuma 1.2? Is there any chance to get the code? I'd really like to be able to check out the source code from your VCS.

Posted by: R. Mattes on November 21, 2006 06:39 AM

I'm hoping that your goals stay the same after this benchmark: to get 10x boost in perfomance! Incredible! :)

"I'm hoping Montezuma will have better performance than both Ferret and Lucene, not because I'm doing anything fancy, but because I'll be relying on native code-generating Common Lisp implementations."

Posted by: on November 21, 2006 11:01 AM

Do you have the suite of code and datasets you used for the benchmark handy? I'd like to play with it.

Posted by: on November 21, 2006 01:13 PM

First, you can download Montezuma manually[1] or just asdf-install it. Then follow the instructions at the Ferret wiki[2] to download the Reuters corpus and extract the article files (using Dave's little Ruby extraction program[3]). The Montezuma indexer for the Reuters corpus is available from the Montezuma site[4].

The "Montezuma 0.1.2" referred to in my tests is just a slight update to Montezuma 0.1.1; I'll release it officially soon.

[1] http://lemonodor.com/code/montezuma-0.1.1.tar.gz
[2] http://ferret.davebalmain.com/trac/wiki/FerretVsLucene
[3] http://ferret.davebalmain.com/trac/wiki/ReutersExtractionScript
[4] http://projects.heavymeta.org/montezuma/browser/trunk/montezuma/tests/corpora/reuters-21578/indexer.lisp

Posted by: John Wiseman on November 21, 2006 01:21 PM

I find those 65k of C code to be pretty unreadable. every comment is like /* number */ or /* 65 */ I was going to make some erlang bindings for cferret but was frightened away. Maybe I'll grow a pair and start wading through it.

Hey, I'm just now switching back to the Mac after a year-long journey on ubuntu. What are the cool kids using for lisp on these fancy intel macs?

Posted by: Steve Jenson on November 21, 2006 09:57 PM

take a look at SBCL. they have wonderful documentation and prebuilt binaries which are a breeze to install. just download http://prdownloads.sourceforge.net/sbcl/sbcl-0.9.18-x86-darwin-binary.tar.bz2?download
and follow the included instructions.

Posted by: Justin Giancola on November 22, 2006 07:26 AM

A benchmark must include a testcase and setup.. Did you for example run the jvm in server mode or client mode? Did you run the java code from a cold-startup or did you run it severall times..

This benchmark is of no value without that information... It should at least be reproducable

Posted by: tjerk on December 10, 2007 07:35 AM

yeah, being able to check out montezuma would be a big help and would most probably bring a tidewave of patches coming back to you... :)

it's very easy to set up a darcs repo if you have http and ssh.

Posted by: attila on January 31, 2008 04:26 PM
Post a comment
Name:


Email Address:


URL:




Unless you answer this question, your comment will be classified as spam and will not be posted.
(I'll give you a hint: the answer is “lisp”.)

Comments:


Remember info?