September 02, 2005
Lisp Metadata Importer I

I now have a pretty functional Spotlight indexer for Lisp files based on Jonathan Wight's Python indexer.

The heart of Jonathan's plugin is actually implemented in Python, and uses the parser module to parse the Python code being indexed and grovel through its abstract syntax trees. I began by fiddling with the Python code. You might expect that I would have started by trying to embed some Lisp environment into the plugin, but you would be wrong; That's too much pain even for me and offers no clear benefit I could see. Coding in Python let me play around with parsing strategies more easily than going with straight Objective C would have.

The performance of a Python-based plugin is, however, atrocious. The first version of my code, which just matched lines against a (precompiled) regular expression, was able to index at the rate of about 1.4 files/second (302 lines/second). A rough estimate of the time it would take to index all the Lisp files on my Powerbook is an hour and a half. Unacceptable.

So I rewrote it in Objective C. The resulting code runs about 16x faster than the Python version, even though it does more. It should be able to index every Lisp file I have on this machine in five minutes.

Here's an example of the plugin in action. First, a test file, test.lisp:

(defun foo ()
  1 2 3)

;; Not handled very well yet.
(defun (setf foo) (whatever))

(defmacro oh-noe () (beep))

(defvar *oh-no* 1)

(defparameter *hee-ho* T)

(defconstant +thing+ :what-now)

(defclass i-dont-think-so ()
  ())

(defstruct BRAIN-CELL
  owner
  alcohol-level)

(defstruct (RAT-BRAIN-CELL (:conc-name NIL))
  color)

;; Haven't done these yet.
(defgeneric attack (self target))

(defmethod attack ((self giant-robot) (target puny-human))
  (squish self target))

And here's the metadata that results from indexing it:

lem-airport:~/src/cm wiseman$ mdls test.lisp
test.lisp -------------
kMDItemAttributeChangeDate     = 2005-09-02 13:34:24 -0700
kMDItemContentCreationDate     = 2005-09-02 12:47:58 -0700
kMDItemContentModificationDate = 2005-09-02 12:51:07 -0700

[etc. Here's the good part:]

org_lisp_defclasses            = ("i-dont-think-so")
org_lisp_definitions           = (
    foo, 
    "(setf", 
    "oh-noe", 
    "*oh-no*", 
    "*hee-ho*", 
    "+thing+", 
    "i-dont-think-so", 
    "BRAIN-CELL", 
    "(RAT-BRAIN-CELL", 
    attack, 
    attack
)
org_lisp_defmacros             = ("oh-noe")
org_lisp_defstructs            = ("BRAIN-CELL", "RAT-BRAIN-CELL")
org_lisp_defuns                = (foo, "(setf")
org_lisp_defvars               = ("*oh-no*", "*hee-ho*", "+thing+")

I decided to use nothing fancier than regular expressions to parse files, and you can see they need some tweaking.

NSString *LispDef_pat = @"(?i)^\\(def[^\\s]*[\\s\\']+([^\\s\\)]+)";
NSString *LispDefun_pat = @"(?i)^\\(defun\\s+([^\\s\\)]+)";
NSString *LispDefmacro_pat = @"(?i)^\\(defmacro\\s+([^\\s\\)]+)";
NSString *LispDefclass_pat = @"(?i)^\\(defclass\\s+([^\\s\\)]+)";
NSString *LispDefstruct_pat = @"(?i)^\\(defstruct\\s+\\(?([^\\s\\)]+)";
NSString *LispDefvar_pat = @"(?i)^\\((?:defvar|defparameter|defconstant)\\s+([^\\s\\)]+)";

Do those look reasonable? Is there anything I'm missing? (First person to give me a regex that correctly parses symbols escaped with the various methods allowed in Common Lisp gets... my pity.)

There's really nothing very lisp-specific about this plugin. It wouldn't be that hard to extend it to become some sort of universal regex-based Spotlight indexer for other text-based formats, though you'd have to do some fancy on-the-fly modification of the schema describing the file types for which it should be used.

I'd like to extend it to record defgeneric and defmethod forms, and then some simple who-calls functionality. Then you'll get the code and we'll see if this sort of indexing is actually useful to anyone.

Update: The importer is complete, see this post for details and availability.

Posted by jjwiseman at September 02, 2005 01:54 PM
Comments

Nice work.

But did you work out why the Python implementation was so slow? One thing i did to improve performance was to make sure I included the compiled python code with the importer (that way python wouldn't compile each time the importer imports a file). Made a big difference.

Posted by: Jonathan Wight on September 8, 2005 05:03 PM

Also = i think it would be good if we could share tags between all importers that import source code. I've spoken to a developer who worked on a ruby importer and were trying to coordinate our efforts (so that we shared function and class tags, among others). Sound like a good idea?

Posted by: Jonathan Wight on September 8, 2005 05:05 PM

Here is what I use to re-parse cl stuff in emacs:

(defun skip-to-next-sexp ()
(interactive)
(while (or
(looking-at "\\([ \n\t\v\f\r]+\\)") ; spaces
(looking-at "\\(;.*$\\)") ; ;xxx comment
(looking-at "\\(#|\\([^|]\\||[^#]\\)*|#\\)")) ; #|xxx|# comment
(goto-char (match-end 0))));;skip-to-next-sexp


(defun cl-looking-at-what ()
(cond
((looking-at "[ \n\t\v\f\r]") :space)
((looking-at ";") :semicolon-comment) ; ;xxx
((looking-at "#|") :sharp-comment) ; #|xxx|#
((looking-at "\"") :string) ; "xx\"x"
((looking-at "(") :beginning-of-list)
((looking-at ")") :end-of-list)
((looking-at ",@") :comma-at)
((looking-at ",") :comma)
((looking-at "'") :quote)
((looking-at "`") :backquote)
(t :atom)));;cl-looking-at-what


(defun cl-skip-over (&optional what)
(setf what (or what (cl-looking-at-what)))
(case what
((:space) (looking-at "[ \n\t\v\f\r]+"))
((:semicolon-comment) (looking-at ";.*$"))
((:sharp-comment) (looking-at "#|\\([^|]\\||[^#]\\)*|#"))
((:string) (looking-at "\"\\([^\\\\\"]\\|\\\\.\\|\\\\\n\\)*\""))
((:beginning-of-list) (looking-at "("))
((:end-of-list) (looking-at ")"))
((:quote) (looking-at "'"))
((:backquote) (looking-at "`"))
((:comma) (looking-at ","))
((:comma-at) (looking-at ",@"))
((:atom)
(looking-at
"\\(|[^|]*|\\|\\\\.\\|#[^|]\\|[^\"\\#|;()'`, \n\t\v\f\r\\]\\)+"))
(otherwise (error "Cannot skip over %S" what)))
(goto-char (match-end 0)));;cl-skip-over


(defun cl-forward (n)
(interactive "p")
(setf n (or n 1))
(dotimes (i n)
(cl-skip-over)));;cl-forward


(defun cl-what-is-at-point ()
(interactive)
(message "%s" (cl-looking-at-what)))


(defun case-lisp-region (start end transform)
"
DO: Applies transform on all subregions from start to end that are not
a quoted character, a quote symbol, a comment (;... or #|...|#),
or a string.
"
(save-excursion
(goto-char start)
(while ( (while (and ( (goto-char (match-end 0)))
(funcall transform start (min end (point)))
(cl-skip-over)
(setq start (point)))));;case-lisp-region

Posted by: Pascal Bourguignon on September 10, 2005 03:18 AM

Have you considered using exuberant ctags to generate your tags? I haven't done the comparison, but it supports a boatload of languages, and its regexps are probably pretty well-tweaked.

Posted by: sean on September 10, 2005 08:44 AM

Hm. I wonder how easy XLISP would be to embed in this plugin?

Posted by: John Wiseman on September 12, 2005 07:49 PM
Post a comment
Name:


Email Address:


URL:




Unless you answer this question, your comment will be classified as spam and will not be posted.
(I'll give you a hint: the answer is “lisp”.)

Comments:


Remember info?