May 12, 2005
Exceptions

The other day in #lisp, there was some discussion of the trivial-http library and situations in which it might leak file descriptors.

<nyef>chandler: You realize that we're probably going to have to re-recode minion to not use trivial-http after all, right?
<chandler>to completely fix leaking fd problems?
<chandler>I'd rather fix trivial-http's interface

Here's some of the relevant code from trivial-http:

(defun http-get (url)
  (let* ((host (url-host url))
         (port (url-port url))
         (stream (open-stream host port)))
    (format stream "GET ~A HTTP/1.0~AHost: ~A~AUser-Agent: Trivial HTTP for Common Lisp~A~A"
        url +crlf+ host +crlf+ +crlf+ +crlf+)
    (force-output stream)
    (list
     (response-read-code stream)
     (response-read-headers stream)
     stream)))

I'm embarrassed to say that at first I didn't get it; I didn't see how anything could be leaked, and the interface looked so simple, how could one consider improving it?

Then, coincidentally, the next day I read Joel Spolsky's “Making Wrong Code Look Wrong” article. At the very end he says he doesn't like exceptions. Oh god, I thought, isn't it enough that I have to write tons of error-checking C++ code at work because we don't use exceptions? Now Joel Spolsky is telling everyone not to use them?

Here’s the thing with exceptions, in the context of this article. Your eyes learn to see wrong things, as long as there is something to see, and this prevents bugs. In order to make code really, really robust, when you code-review it, you need to have coding conventions that allow collocation. In other words, the more information about what code is doing is located right in front of your eyes, the better a job you’ll do at finding the mistakes. When you have code that says

dosomething();
cleanup();

… your eyes tell you, what’s wrong with that? We always clean up! But the possibility that dosomething might throw an exception means that cleanupmight not get called. And that’s easily fixable, using finally or whatnot, but that’s not my point: my point is that the only way to know that cleanup is definitely called is to investigate the entire call tree of dosomething to see if there’s anything in there, anywhere, which can throw an exception, and that’s ok, and there are things like checked exceptions to make it less painful, but the real point is that exceptions eliminate collocation. You have to look somewhere else to answer a question of whether code is doing the right thing, so you’re not able to take advantage of your eye’s built-in ability to learn to see wrong code, because there’s nothing to see.

OK, I guess the part about inspecting the entire call tree is true, but the way to save yourself from that tedium is to just always assume that an exception could be thrown (especially since you have to consider the fact that the code you're calling may change when you upgrade libraries, etc.--whoa, something that checked exceptions actually help with). Not a huge deal. But what this reminded me of was Richard Gabriel's and Paul Graham's complaints about object oriented code, which I agree with. Richard Gabriel describes code inheritance as “compression”, where to understand the meaning of any particular (in Lisp terms) generic function call you need to understand the meaning of every method on that function defined by superclasses (and in Lisp, subclasses). Paul Graham says that “Object-oriented programming offers a sustainable way to write spaghetti code.”

So Spolsky has the same complaint with exceptions that Gabriel and Graham have with object orientation, and now I realize that use of exceptions does indeed have a cost. I think I'll probably keep using them anyway, just as I often find object oriented code useful. Or, as Raymond Chen puts it, exceptions are “cleaner, more elegant and harder to recognize”, because he argues that evaluating the correctness of exception-based code is harder than evaluating the correctness of error-code-based code. “But that's okay. Like I said, just because something is hard doesn't mean it shouldn't be done. It's hard to write a device driver, but people do it, and that's a good thing.”

Once I was primed, it was easy to see the problem with trivial-http:http-get: if response-read-code or response-read-headers or format or any other line of code in the body of the let fails, the stream will never get closed. And please, do not try to argue that the garbage collector will close the stream (that's not how Lisp works), or should be made to do it using explicit finalizers/termination (it's unclean and nonportable).

So now I understand the problem (well, maybe--I still wonder how you would change the interface to avoid it without coming up with an ugly interface). What's the best way to fix it? The following code isn't extremely pretty:

(defun http-get (url)
  (let* ((host (url-host url))
         (port (url-port url))
         (stream (open-stream host port))
         (success-p NIL))
    (unwind-protect
        (progn
          (format stream "GET ~A HTTP/1.0~AHost: ~A~AUser-Agent: Trivial HTTP for Common Lisp~A~A"
                  url +crlf+ host +crlf+ +crlf+ +crlf+)
          (force-output stream)
          (let ((result (list (response-read-code stream)
                              (response-read-headers stream)
                              stream)))
            (setf success-p T)
            result))
      (unless success-p
        (close stream)))))

Can we do better than that? I'm not sure. We can at least macroize this idiom:

(defmacro with-non-local-cleanup (protected-form cleanup-form)
  "Like unwind-protect, but only runs the cleanup-form in the
   case of a non-local exit, not if protected-form terminates
   normally."
  (let ((success-var (gensym)))
    `(let ((,success-var NIL))
       (unwind-protect
            (multiple-value-prog1
                ,protected-form
              (setf ,success-var T))
         (unless ,success-var
           ,cleanup-form)))))

(defun http-get (url)
  (let* ((host (url-host url))
         (port (url-port url))
         (stream (open-stream host port)))
    (with-non-local-cleanup
        (progn
          (format stream "GET ~A HTTP/1.0~AHost: ~A~AUser-Agent: Trivial HTTP for Common Lisp~A~A"
                  url +crlf+ host +crlf+ +crlf+ +crlf+)
          (force-output stream)
          (list
           (response-read-code stream)
           (response-read-headers stream)
           stream))
      (close stream))))

All this leaking reminds me of an old bug in CMUCL's connect-to-inet-socket function.

;; This code is from 2002.
(defun connect-to-inet-socket (host port &optional (kind :stream))
  "The host may be an address string or an IP address in host order."
  (let ((socket (create-inet-socket kind))
        (hostent (or (lookup-host-entry host)
                     (error "Unknown host: ~S." host))))
    (with-alien ((sockaddr inet-sockaddr))
      (setf (slot sockaddr 'family) af-inet)
      (setf (slot sockaddr 'port) (htons port))
      (setf (slot sockaddr 'addr) (htonl (host-entry-addr hostent)))
      (when (minusp (unix:unix-connect socket
                                       (alien-sap sockaddr)
                                       (alien-size inet-sockaddr :bytes)))
        (unix:unix-close socket)
        (error "Error connecting socket to [~A:~A]: ~A"
               (host-entry-name hostent)
               port
               (unix:get-unix-error-msg)))
      socket)))

I find it a little funny that the part of that function that works with unix error codes correctly closes the socket when something goes wrong, while the code that uses more modern concepts of error handling (the (error "Unknown host...) bit) leaks a file descriptor when there's a problem (I also still find it funny that CMUCL wouldn't allow you to open a socket unless you could do a reverse DNS lookup of the target address). Maybe this is evidence for Raymond Chen's assertions, and whoever wrote this code was conscientious enough to feel weird about calling unix:unix-connect without checking the return value, but missed the problems with exceptions, even exceptions that the programmer explicitly raised himself.

Posted by jjwiseman at May 12, 2005 09:20 AM
Comments

I can better you on that interface. Compare the issues with those surrounding with-open-file.

It's still not quite pretty, but
(with-http-get ((response headers stream) url)
(format t "Got response ~A~%" response))
shouldn't leak a file descriptor. You still lose when it comes to serve-event delay and error-propagation exploits, though.

Posted by: nyef on May 12, 2005 10:16 AM

Yeah, that's what I was thinking of as 'ugly'. Maybe I'm wrong; it sure looks concise, and pretty safe.

There are a few things I wonder about, though. One is how inconvenient it might be to be unable to do an http GET without its lifetime being tied to some lexical scope. This doesn't seem like much of a restriction for an HTTP GET, but are you willing to force all other sorts of operations or objects with a multi-stage initialization/cleanup (which is where this problem seems to come from) to follow this model?

If your answer is that there is an escape hatch, in the same way that WITH-OPEN-FILE has OPEN, then you still have just as much of a problem: how to make sure that all uses of OPEN (or HTTP-GET or whatever) that aren't inside of WITH-OPEN-FILE are safe.

Posted by: John Wiseman on May 12, 2005 10:44 AM

If your program was only doing this one small job then you could wrap it all like:

(handler-bind ((error (lambda (e) (ext:quit)))) ...)

i.e. if there is an unhandled error then you kill off the process and let the operating system clean up. This takes care of your own bookkeeping (e.g. closing open files) and lets other programs detect your failure in a simple way (closed pipes and sockets). Very predictable, very internet, very unix, very erlang.

The nice thing is that this tends to take care of both the errors you anticipate and the ones you don't (bugs).

Posted by: Luke Gorrie on May 12, 2005 01:17 PM

Incidentally, barmar touched on this on c.l.l on Wednesday. Message ID barmar-529F0D.21395311052005@comcast.dca.giganews.com, google link http://groups-beta.google.com/group/comp.lang.lisp/msg/2fbbedf423f9ec8b?hl=en .

Posted by: Tim Daly Jr. on May 13, 2005 05:39 AM

1. (meta) when i click on anything here, and click back, not only does firefox make me wait, but it does not show me the page i started with. imho this is wrong, and it is wrong of firefox to not simply show the previous page. any idea how to fix?

2. why not with-open-stream?

Posted by: a on May 16, 2005 01:49 AM

Er... if handles don't get GCd, and explicit destruction handling is "nonportable" doesn't that just mean the language is broken?

Posted by: Andrew on June 20, 2006 11:48 PM
Post a comment
Name:


Email Address:


URL:




Unless you answer this question, your comment will be classified as spam and will not be posted.
(I'll give you a hint: the answer is “lisp”.)

Comments:


Remember info?