January 24, 2006
SBCL Crashes Constantly (Not Really)

the application sbcl quit unexpectedly

Every time I start SBCL in OS X, I get the dialog above.

This really confused me for a long time. It didn't help that the message says that SBCL “quit unexpectedly”, but I am always left with what seemed like a perfectly functioning SBCL prompt.

Poking around line 345 of runtime.c, as implicated in the backtrace, didn't help much:

Lineruntime.c
344    create_initial_thread(initial_function);
345    lose("CATS. CATS ARE NICE.");
346    return 0;

Great.

Nobody on #lisp seemed to be having this problem (they probably were, but they just didn't know it, as we'll see below), and since everything seemed to work I tried to forget about the mystery and not get annoyed by all the spam filling my CrashReporter logs from SBCL-based cron jobs. Building SBCL from source was quite a chore, though, as there would typically be three or four dozen crash dialogs I'd have to click through.

Jan 21 13:35:57 Johns-Powerbook crashdump[16974]: sbcl crashed
Jan 21 13:35:57 Johns-Powerbook crashdump[16974]: crash report written to: /Users/wiseman/Library/Logs/CrashReporter/sbcl.crash.log
Jan 21 13:35:59 Johns-Powerbook crashdump[16975]: sbcl crashed
Jan 21 13:35:59 Johns-Powerbook crashdump[16975]: crash report written to: /Users/wiseman/Library/Logs/CrashReporter/sbcl.crash.log
Jan 21 13:36:04 Johns-Powerbook crashdump[16986]: sbcl crashed
Jan 21 13:36:04 Johns-Powerbook crashdump[16986]: crash report written to: /Users/wiseman/Library/Logs/CrashReporter/sbcl.crash.log
Jan 21 13:36:08 Johns-Powerbook crashdump[16987]: sbcl crashed
Jan 21 13:36:08 Johns-Powerbook crashdump[16987]: crash report written to: /Users/wiseman/Library/Logs/CrashReporter/sbcl.crash.log
Jan 21 13:36:16 Johns-Powerbook crashdump[16993]: sbcl crashed
Jan 21 13:36:17 Johns-Powerbook crashdump[16993]: crash report written to: /Users/wiseman/Library/Logs/CrashReporter/sbcl.crash.log
Jan 21 13:36:23 Johns-Powerbook crashdump[16999]: sbcl crashed
Jan 21 13:36:23 Johns-Powerbook crashdump[16999]: crash report written to: /Users/wiseman/Library/Logs/CrashReporter/sbcl.crash.log
Jan 21 13:36:24 Johns-Powerbook crashdump[17000]: sbcl crashed
Jan 21 13:36:24 Johns-Powerbook crashdump[17000]: crash report written to: /Users/wiseman/Library/Logs/CrashReporter/sbcl.crash.log
# etc.

Finally, months later, I found some mail on the sbcl-devel mailing list in which Bruno Haible said he saw the same behavior, and Pascal Costanza explained what was happening.

It turns out that there's a bug in Apple's Crash Reporter whereby programs that generate Mach exceptions that are subsequently handled by a signal handler will cause CrashReporter to “erroneously generate a crash log for your program.” Fortunately for Apple, most programs don't expect to generate SIGSEGV signals and live; Unfortunately for Lispers, catching segfaults is part of a relatively common way of doing Lisp-style memory management (this is the same thing that prevents OpenMCL from running under Apple's Rosetta emulator)

There are two workarounds. One is to turn off Crash Reporter dialogs or logs (I'm guessing people on #lisp didn't know what I was talking about because they had CrashReporter configured to log crashes but not pop up the sort of dialogs I was seeing). This solution is not my favorite—You can't change the behavior only for SBCL; It involves changes to system-wide settings. And I like seeing the extra information when an app really does crash.

The other workaround is to implement Mach-level exception handlers as opposed to regular Unix signal handlers. This is the approach that OpenMCL and clisp take (and presumably ACL and LispWorks since I've never seen a crash dialog when using them).

One could take the position that this is a bug (documented, even) in Apple's software and SBCL shouldn't have to change anything, and people can just stop logging crashes. This, I think, would be a mistake. Do not spite your users to make a point, even if you're in the right.

The main point of this post is to document this issue so that those who come after me are less likely to think they're insane. A secondary motivation is to spur SBCL developers into adding Mach exception handlers.

I looked at trying to fix it myself, but as a first SBCL hacking project it was a little beyond me. But given what looks like a fundamental assumption in SBCL that signals are used for memory management (I wonder what the recent Windows port has to say about that assumption), it might be best to try the same strategy that OpenMCL uses (which is lucidly explained by Gary Byers in the code; search for “Mach's exception mechanism”): Install a Mach exception handler that catches exceptions, creates fake sigcontexts, and passes them to existing signal handling code. Sounds so simple! Code it up, mang!

Actually returning from the signal handler is another issue.

Posted by jjwiseman at January 24, 2006 04:47 PM
Comments

What's with all the the lisp-related posts? Is this weblog about lisp or something?

Posted by: geoff on January 24, 2006 06:27 PM

Sorry! I'm slipping.

Posted by: John Wiseman on January 24, 2006 06:34 PM

It's not to spite all our users, John; it's just to spite you.

Posted by: Christophe Rhodes on January 25, 2006 03:14 AM

It seems that it does not matter for SBCL, since nobody really uses SBCL on Mac OS X or Darwin (which seems to be some kind of 'Open Source')? Or nobody looked deep enough to find that issue? Anyway the response from Christophe Rhodes shows the usual arrogance. There are things that Apple does not fix or change, still others find a way to work around that. If some 'free' Lisp is just roughly ported to a commercial OS and then left unsupported, I would just mark the implementation as unsupported and experimental on that platform.

The SBCL port to Mac OS X is simply buggy itself, since the problem is known, no action has been taken on either side and no workaround (workarounds that seem to be possible) has been explored.

Posted by: foo on January 25, 2006 06:30 AM

Thank you, Mr. foo, for explaining how SBCL is buggy because I haven't changed it to work around a bug in the operating system.

To quote John's post: "Nobody on #lisp seemed to be having this problem" - and that's right. SBCL will cause logs to a CrashReporter log on each triggered GC on a fresh-out-of-the-box OS X system. These logs are silent; while they might cause a slight performance degredation, changing how SBCL handles signals on OS X to be different from every other platform we support(*) to get around it is strictly an enhancement, and one that I don't particuarly feel like working on.

The actual problem here is that John has configured his system to display this dialog. So if it bothers him that SBCL triggers it, then John should do some of the footwork to adapt SBCL to the mach exception handler strategy that OpenMCL uses. I don't feel any more motivated to do this work myself by John's post, but I'm not opposed to answering questions about SBCL internals and runtime on Darwin, and I'm sure that Gary would not be opposed to answering questions about Mach exception handlers either.

* Caveat: I have no idea how the win32 port works.

Posted by: Brian Mastenbrook on January 25, 2006 07:18 AM

Thanks for reminding us why lisp is dead, assholes! :D :D

Posted by: abbazabba on January 25, 2006 08:43 AM

"Thanks for reminding us why lisp is dead" - hmm, so why is FreeBSD still around, huh? If it bothers you, fix it. In other cas you just have to wait, 'till someone does it for you.

But I must agree with john, leaving it like this is not a good idea in a long term run. Performance anyone?

Posted by: axquan on January 25, 2006 09:58 AM

I tend to agree with the SBCL guys on this. If this is a problem, someone who feels that pain should fix it. I use SBCL on OS X, and this problem doesn't bother me. Granted, the logs are being generated, but that doesn't bother me. There are lots of things in SBCL to be improved, and this seems to fall nearer the bottom of the list. My $0.02.

See also Cristophe's post at: http://www.advogato.org/person/crhodes/diary.html?start=94

Posted by: Chad Harrington on January 25, 2006 11:28 AM

"lose("CATS. CATS ARE NICE.");" !!!???

/me bursts out laughing. I love that.

"Anyway the response from Christophe Rhodes shows the usual arrogance."

/me laughs some more.

What a humor-impaired in-duh-vidual.

Posted by: Larry Clapp on January 25, 2006 12:16 PM

I totally don't get your association between lisp and FreeBSD, axquan. FreeBSD is a thriving UNIX implementation, while sbcl is a language implementation that seems to be into marginalizing itself (and hurting lisp's reputation, to be sure) by not supporting platforms other than ones that the assholish implementors prefer.

Compare and contrast with the behaviour of persons writing and maintaining other open-source programming language implementations, who aren't completely hostile to people using a different OS.

Then, once you miss the point, go back to wondering why lisp isn't as popular as ruby.

(or maybe writing a clone of reddit? how's that going?)

Posted by: abbazabba on January 25, 2006 12:30 PM

Holy crap O_O ... Are you serious? "Arrogant" and "assholish"? The only arrogant assholes are the ones complaining like this!

These people do it of their own free will, and expect nothing in return for hours-and-hours of pain and hard work. How can you even begin to complain in this manner?

Spoiled brats like the ones posting here deserve a good spanking and should grow up!

No, I'm saying thumbs up for the SBCL-guys! I love your work, and of course I'm hoping it will be supported and reach out to as much platforms and people as possible. Still, this is not how to get it done.

(I'm still thinking this must be a joke, but I'm having a bad day anyways ..)

Posted by: Lars Rune N�stdal on January 25, 2006 04:02 PM

"Compare and contrast with the behaviour of persons writing and maintaining other open-source programming language implementations, who aren't completely hostile to people using a different OS." - that would explain the many hours I've put into maintaining the SBCL port to OS X?

Whatever. You're a troll.

Posted by: Brian Mastenbrook on January 25, 2006 05:25 PM

I haven't tried to confirm this, but I suspect that CrashReporter is establishing a Mach task-level exception handler. (A Mach "task" is a very similar concept to a Unix "process".)

If an exception happens in a given thread (in a given task), the Mach kernel tries to invoke (via a message send) a thread-level exception handler; if that handler doesn't exist or if it returns a non-zero value, a task-level exception handler is called (if it exists), and if that fails, a POSIX signal is raised and we're back in terra cognito.

If I'm remembering all of this correctly, then it seems that a partial solution would be to install a task-level exception handler that simply fails unconditionally. That'd suppress all crash logging in the lisp (suboptimal, but probably better than crash logging of routine, handled exceptions),wouldn't require any changes to existing signal-handling code, and is much simpler than actually doing what OpenMCL does at the thread level.

Posted by: Gary Byers on January 26, 2006 04:32 AM

WARNING:� This problem appears to be worse in the next pre-release version of OS X, Leopard.� The performance degradation is significant, very significant.� Those who think writing a file every few seconds "isn't really a problem" seem a bit daft to me to start with.� Unfortunately I have neither the time or the Lisp internals expertise to fix this myself.

Posted by: Sister Snape on August 26, 2006 06:39 PM
Post a comment
Name:


Email Address:


URL:




Unless you answer this question, your comment will be classified as spam and will not be posted.
(I'll give you a hint: the answer is “lisp”.)

Comments:


Remember info?