Distribute or Die

Posted on 2008-07-29 23:38 JST by Curt Sampson :: comments enabled

A couple of friends and I recently decided to write a mailing list server in Haskell. We’ve taken some derision for this1, with people telling us that there are plenty of them out there already, but we also have a couple of fairly powerful forces motivating us to do this. One is, of course, to learn Haskell better, and nobody can really argue with that. But another is that there are certain features missing from other mailing list managers that I feel are important.

One of these is distributing the application across several hosts, with no single point of failure. The importance of this to me is probably due to my background. I’ve been doing systems administration for a long time now, first with DOS and Novell systems, and then with Unix when I co-founded an ISP in 1995. I’ve spent a fair amount of that time with a pager strapped to my belt, which is not a particularly fun thing to have to do for long periods of time.

This has engendered in me the opinion that, if your application isn’t distributed across several computers, in different locations, such that an entire data center can go down (this happens more often than you might think) without bringing down your application as well, it’s broken.

The idea of distributing services like this is not new. It was designed into the DNS protocol from the beginning, and it’s always been recommended practice to have at least two separate DNS servers on different networks. Being a little more careful than most when it comes to reliability, we run four DNS servers, in four physically separate locations, on four different ISPs, on two different continents.

The DNS protocol is designed such that, if a client can’t contact one of the servers, it fairly rapidly times out and goes to find another. So for us, a DNS server going down is a “yellow alert,” something that shouldn’t go to our pagers when we’re asleep, and perhaps even during the day only to our standard e-mail.

SMTP, too, has protocol support for distribution: you can have multiple MX records for a domain, and if one is not responding, the mail transfer agent is supposed to try another one.

For quite a while we’ve wanted to run some mailing lists, and we’ve played around a bit with GNU Mailman. But one of the major issues we had with it was that it can’t be run in a distributed fashion, at least not without some fairly major surgery.

We have three different production servers at different locations, and we do what you might call a “fast failover” between them. When one goes down and isn’t obviously going to be back up again quickly, we change our DNS to point to one of the others, which are essentially acting as hot backups. This is far from perfect, but for protocols such as HTTP it’s difficult to do better because there’s no protocol support for failover. (Anycasting might be the solution here, but that requires some support from your ISP, which is not likely to be forthcoming for your average business.)

We tried to figure out how we could distribute Mailman across the hosts of this system, and came up with some ideas, but it turns out to be a pretty difficult job. There’s really no way to do it without rewriting some part of Mailman; clustering that works well has to be built in to the application architecture, and in Mailman (at least in the current version) no thought appears to have been given to this.

Thus, we’re rolling our own, and I’ve come up with a plan that should make our clusters almost “set and forget”: even if all the servers bar one in your cluster go down, the system keeps running with no human intervention. (Though you probably want at some point soon after to go bring up your other servers to restore your redundancy!)

We’re lucky in that the SMTP protocol has two characteristics that support this very well. The first is MX record fallback: if a host trying to deliver mail to us can’t reach one server, it will try another. The second is that mail does from time to time get lost, everybody knows this, and so we can afford to lose a message once in a while and know that it will probably be resent. (This may sound bad, but given the rarity of the situation, and that messages can be lost before they get to us anyway, it’s actually a better—and much, much easier—option than stopping the system should we not be able to commit to keeping that message in stable storage across the whole cluster.)

For more information on the trade-offs one has to make when making a clustered or distributed system (and there are always trade-offs—don’t believe anybody who says they’re not making them) I highly recommend the book In Search of Clusters by Gregory Pfister. It’s out of print now, but used copies of the second edition are fairly readily available. It’s quite technical, but if you’re actually implementing this sort of stuff, even if it’s just installing a system written by someone else, this is very useful stuff to know.

So why doesn’t everybody distribute their systems across multiple hosts for protection from failures? The reasons are the usual ones: people don’t know about it, don’t think about it, aren’t thinking about how to improve their (or someone else’s) work processes, and, perhaps most of all, it’s difficult to do right.

Well, we enjoy tough problems, and we like to do things better here. And when my pager tells me a server is down, it’s much nicer to know that my application is distributed instead of dead.

  1. One commentator went so far as to say that that the features lacking in Mailman that we wanted were, “[a]ll proposed Mailman 3. Betcha Barry can do two releases of Python and one of Mailman 3 before you get a scalable, production-ready MLM from scratch out the door.”

Add a comment »
comments are moderated