A nice little summary of the issues involved in writing and using data access layers (DALs) on top of databases, by Will Hartung (I think I saw this on del.icio.us):
In Days Of Yore, with ISAM style databases, this was the typical paradigm for processing.use OrderCustomerIndex Find Order 'CUST001' Fetch Order while(order.custcode = 'CUST001') use OrderDetailOrderIndex Find OrderDetail order.orderno Fetch OrderDetail while(orderdetail.orderno = order.orderno) use ItemItemCodeIndex Find Item orderdetail.itemcode Fetch Item ... Fetch Next OrderDetail end while Fetch Next Order end while
So, if you had 10 orders with 5 lines items each, this code would "hit" the DB library 100 times (10 for each order, once for the OrderDetail and once for Item data).
When DB calls are "cheap", this kind of thing isn't that expensive (since every DBMS (Relational and Object) on the planet essentially has an ISAM (B-Tree) core, they do this kind of thing all day long).
So, if you wrote your code with a DAL, you could easily come up with something like this:(defun process-orders (cust-code) (let ((orders (find-orders cust-code))) (dolist (order orders) (let ((details (find-order-details (order-no order)))) (dolist (detail details) (let ((item (find-item (detail-item-code detail)))) (do-stuff order detail item)))))))
But this is a trap, and this is where THE problem occurs with most any relational mapping process. Naively implemented with SQL, again for 10 orders with 5 line items each, you end up with SIXTY(!!!) SQL hits to the database. (1 query for the initial orders, 1 query for each order for its detail, 1 query for each detail for the item data). This is Bad. Round trips to the DB are SLLOOWWW, and this will KILL your performance.
Instead, you want to suck in the entire graph with a single query:SELECT orders.*, orderdetails.*, items.* FROM orders, orderdetails, items WHERE orders.custcode = ? AND orderdetails.orderno = orders.orderno AND items.itemcode = orderdetails.itemcode
Then, you use that to build your graph so you can do this:(defun process-orders (cust-code) (let ((orders (find-orders cust-code))) (dolist (order orders) (let ((details (order-details order))) (dolist (detail details) (let ((item (detail-item detail))) (do-stuff order detail item)))))))
And THAT's all well and good, but say you wanted to get the total of all canceled orders:(defun canceled-orders-total (cust-code) (let ((orders (find-orders cust-code)) (total 0)) (dolist (order orders) (when (equal (order-status order) "C") (setf total (+ total (order-total order))))) total))
This is fine, but you just sucked the bulk of ALL of the orders, including their detail and item information, to get at a simple header field.
Now, certainly this could have been done with a simple query as well:SELECT sum(total) FROM orders WHERE status = 'C' AND custcode=?
The point is that balancing the granularity of what your DAL does, and how it does it, needs to be determined up front, at least in a rough edged way. This is what I mean by being aware that an RDBMS is going to be your eventual persistent store. If you don't assume that, you may very well end up with a lot of code like the first example that will immediately suck as soon as you upgrade to the SQL database.
In my own days of yore (1997ish), I wrote a pretty complicated data access layer for a big Win32 application—in Visual Basic. The basic idea, which was Jim Firby's, was pretty neat, in that it made databases look almost like associative arrays.
Data was addressed with tuples, so ["books_by_isbn" "0140125051" "title"] would be the address (we called them “IGLs”, for Info Group Locator or something) of the "title" field of the record whose key field was “0140125051” in the "books_by_isbn" table. ["books_by_isbn" "0140125051" "*"] would be the collection of fields for book 0140125051, and ["books_by_isbn" "*" "*"] would be all books in the table.
There was a virtual translation layer, so if you wanted you could have a single "books" table that contained both ISBNs and ASINs, and then define virtual "books_by_isbn" and "books_by_asin" tables that would let you use IGLs that indexed by ISBN or ASIN respectively. It seems a little insane now, but actually every column of every table went through the virtual translation map, which was just another table in the database. What can I say, we were Lispers with an AI background, indirection was in our blood.
There was so much to the Datalink, as my DAL was called: 14,000 lines of shared caches, events, rules engines and schema migration support. But the result for users of the code was that complex database access patterns seemed simple and clean.
But of course we ran into the problem that some combinations of simple, clean Datalink operations would be fast and others would be slow and unless you knew a lot of the details of the Datalink implementation you couldn't easily predict which way it would go. Which is exactly the sort of trap mentioned by Will.
I ended up writing a series of tech notes on how to structure common operations with the Datalink to keep things fast. It was effective, in that users of the code could then get good performance, but it didn't feel like the most satisfying solution. But the problem seems to be that nobody knows how to come up with a satisfying solution.Posted by jjwiseman at March 09, 2007 02:01 PM