201207.07

cl-mongo and multithreading

We're building a queuing system for Musio written in common lisp. To be accurate, we already built a queuing system in common lisp, and I recently needed to add a worker to it that communicates with MongoDB via cl-mongo. Each worker spawns four worker threads, each thread grabbing jobs from beanstalkd via cl-beanstalk. During my testing, each worker was updating a Mongo collection with some values scraped from our site. However, after a few seconds of processing jobs, the worker threads begin to spit out USOCKET errors and eventually Clozure CL enters it's debugger of death (ie, lisp's version of a segfault). SBCL didn't fare too much better, either.

The way cl-mongo's connections work is that it has a global hash table that holds connections: cl-mongo::*mongo-registry*. When the threads are all running and communicating with MongoDB, they are using the same hash table without any inherent locking or synchronization. There are a few options to fix this. You can implement a connection pool that supports access from multiple threads (complicated), you can give each thread its own connection and force the each thread to use its connection when communicating, or you can take advantage of special variables in lisp (the easiest, simplest, and most elegant IMO). Let's check out the last option.

Although it's not in the CL spec, just about all implementations allow you to have global thread-local variables by using (defparameter) or (defvar), both of which create special variables (read: dynamic variables, as opposed to lexical). Luckily, cl-mongo uses defvar to create *mongo-registry*. This means in our worker, we can re-bind this variable above the top level loop using (let) and all subsequent calls to MongoDB will use our new thread-local version of *mongo-registry* instead of the global one that all the threads we're bumping into each other using:

;; Main worker loop, using global *mongo-registry* (broken)
(defun start-worker ()
  (loop
    (let ((job (get-job)))
      (let ((results (process-job job)))
        ;; this uses the global registry. not good if running multiple threads.
        (with-mongo-connection (:db "musio")
          (db.save "scraped" results))))))

New version:

;; Replace *mongo-registry* above worker loop, creating a local version of the
;; registry for this thread.
(defun start-worker ()
  ;; setting to any value via let will re-create the variable as a local thread
  ;; variable. nil will do just fine.
  (let ((cl-mongo::*mongo-registry* nil))
    (loop
      (let ((job (get-job)))
        (let ((results (process-job job)))
          ;; with-mongo-connection now uses the local registry, which stops the
          ;; threads from touching each other.
          (with-mongo-connection (:db "musio")
            (db.save "scraped" results)))))))

BOOM everything works great after this change, and it was only a one line change. It may not be as efficient as connection pooling, but that's a lot more prone to strange errors and synchronization issues than just segregating the connections from each other and calling it a day. One issue: *mongo-registry* is not exported by cl-mongo, which is why we access it via cl-mongo::*mongo-registry* (notice the double colon instead of single). This means in future versions, the variable name may change, breaking our above code. So, don't update cl-mongo without testing. Not hard.

Hopefully this helps a few people out, let me know if you have better solutions to this issue!