January 23, 2002
We are running into some of the limitations of cmucl's threading implementation. It's annoying, but mostly because I should have looked this up when we first started using cmucl instead of just checking that multiprocessing was mentioned somewhere in the EncyCMUCLopedia.
It turns out that cmucl has user-level, cooperative threading. Which means that when, for example, I create a Lisp thread and inside it call a function to make a socket connection to some remote host that isn't online, Lisp is completely dead to the world until the connect system call decides to return.
No other Lisp threads will run until that happens. No http requests will be served, no nothing. The listener is dead. (Sometimes so dead that SIGINT isn't enough to wake it up.)
I'm not sure what the best solution to this problem is.
- Try to eliminate all of these sorts of delays; Set socket options, etc. Far from foolproof, lots of opportunity to miss something, or to have a delay inside code that is not under your control.
- Try ACL under Linux. It uses user-level threading under unix, but maybe it is preemptive. Oops, I just checked and foreign code, at least, will not be preempted. Never mind.
- Actually do some forking and run anything that could take a significant amount of time in a separate process.
- Check whether Douglas T. Crosher has finished his kernel-level threads package for cmucl, and how much it costs.
- Turn on cmucl's SIGALRM thread preemption mechanism. This feature is never mentioned by anyone without saying in the same breath "cmucl's code is not guaranteed to be thread-safe!"
Like I said, annoying.
Posted by jjwiseman at January 23, 2002 05:21 PM
...or convince the OS to let the connect call fail sooner. I wonder how dangerous this would be:
echo 0 > /proc/sys/net/ipv4/tcp_syn_retries
A tcp_syn_retries value of 0 results in the attempt failing after about three seconds, a value of 1 fails in about eight.
I can't see any way to set this on a per-socket basis.
How many 8-second long delays in a row before someone either thinks the web server isn't going to return a page or the computer at the other end of the phone line is broken?
Heh. Today someone posted a message to a cmucl mailing list about their experiences with SIGALRM preemption enabled:
> eventually, CMUCL dies a horrible memory corruption death.