I now have a pretty functional Spotlight indexer for Lisp files based on Jonathan Wight's Python indexer.
The heart of Jonathan's plugin is actually implemented in Python, and uses the parser module to parse the Python code being indexed and grovel through its abstract syntax trees. I began by fiddling with the Python code. You might expect that I would have started by trying to embed some Lisp environment into the plugin, but you would be wrong; That's too much pain even for me and offers no clear benefit I could see. Coding in Python let me play around with parsing strategies more easily than going with straight Objective C would have.
The performance of a Python-based plugin is, however, atrocious. The first version of my code, which just matched lines against a (precompiled) regular expression, was able to index at the rate of about 1.4 files/second (302 lines/second). A rough estimate of the time it would take to index all the Lisp files on my Powerbook is an hour and a half. Unacceptable.
So I rewrote it in Objective C. The resulting code runs about 16x faster than the Python version, even though it does more. It should be able to index every Lisp file I have on this machine in five minutes.
Here's an example of the plugin in action. First, a test file, test.lisp:
(defun foo () 1 2 3) ;; Not handled very well yet. (defun (setf foo) (whatever)) (defmacro oh-noe () (beep)) (defvar *oh-no* 1) (defparameter *hee-ho* T) (defconstant +thing+ :what-now) (defclass i-dont-think-so () ()) (defstruct BRAIN-CELL owner alcohol-level) (defstruct (RAT-BRAIN-CELL (:conc-name NIL)) color) ;; Haven't done these yet. (defgeneric attack (self target)) (defmethod attack ((self giant-robot) (target puny-human)) (squish self target))
And here's the metadata that results from indexing it:
lem-airport:~/src/cm wiseman$ mdls test.lisp test.lisp ------------- kMDItemAttributeChangeDate = 2005-09-02 13:34:24 -0700 kMDItemContentCreationDate = 2005-09-02 12:47:58 -0700 kMDItemContentModificationDate = 2005-09-02 12:51:07 -0700 [etc. Here's the good part:] org_lisp_defclasses = ("i-dont-think-so") org_lisp_definitions = ( foo, "(setf", "oh-noe", "*oh-no*", "*hee-ho*", "+thing+", "i-dont-think-so", "BRAIN-CELL", "(RAT-BRAIN-CELL", attack, attack ) org_lisp_defmacros = ("oh-noe") org_lisp_defstructs = ("BRAIN-CELL", "RAT-BRAIN-CELL") org_lisp_defuns = (foo, "(setf") org_lisp_defvars = ("*oh-no*", "*hee-ho*", "+thing+")
I decided to use nothing fancier than regular expressions to parse files, and you can see they need some tweaking.
NSString *LispDef_pat = @"(?i)^\\(def[^\\s]*[\\s\\']+([^\\s\\)]+)"; NSString *LispDefun_pat = @"(?i)^\\(defun\\s+([^\\s\\)]+)"; NSString *LispDefmacro_pat = @"(?i)^\\(defmacro\\s+([^\\s\\)]+)"; NSString *LispDefclass_pat = @"(?i)^\\(defclass\\s+([^\\s\\)]+)"; NSString *LispDefstruct_pat = @"(?i)^\\(defstruct\\s+\\(?([^\\s\\)]+)"; NSString *LispDefvar_pat = @"(?i)^\\((?:defvar|defparameter|defconstant)\\s+([^\\s\\)]+)";
Do those look reasonable? Is there anything I'm missing? (First person to give me a regex that correctly parses symbols escaped with the various methods allowed in Common Lisp gets... my pity.)
There's really nothing very lisp-specific about this plugin. It wouldn't be that hard to extend it to become some sort of universal regex-based Spotlight indexer for other text-based formats, though you'd have to do some fancy on-the-fly modification of the schema describing the file types for which it should be used.
I'd like to extend it to record defgeneric and defmethod forms, and then some simple who-calls functionality. Then you'll get the code and we'll see if this sort of indexing is actually useful to anyone.
Update: The importer is complete, see this post for details and availability.Posted by jjwiseman at September 02, 2005 01:54 PM