Why I Hate CGI

Posted on 2009-02-21 21:15 by Curt Sampson :: comments enabled

In a change from things beautiful, here’s a post about something not-beautiful. A recent message on the Haskell web-devel list asking about the meaning of the word “gateway” triggered this rant.

CGI stands for “common gateway interface,” and is basically:

  • a web server invoking another program,

  • with a specifically defined set of environment variables set to various values related to the request and the program’s location in the filesystem,

  • communicating further information via sending it to the program’s stdin, and

  • receiving a header and body response on the other end of the program’s stdout, which it then may process before returning it to the HTTP client.

There are some further tricks here: if you want to avoid having the server parse and possibly modify the header you send back, you should start your program’s name with ”nph-”.

It’s generally not possible to reconstruct the original HTTP request that generated the CGI request because various things in the environment variables change based on the configuration of the server, which the client generally has no way of knowing. Even how the various bits of the request are converted to environment variables is not very well defined.

All these issues translate to FastCGI and SCGI as well, with the addition that the the “programs” don’t always have names, meaning that you can’t prefix the name with “nph-” to avoid the headers in your response being parsed and modified.

I’ve spent more time than I care to over the last four or five years dealing with FastCGI on the application (i.e., opposite of web server) side, both as a client of existing libraries and a writer of new ones, and it’s caused me much pain.

The one real advantage that FastCGI has is that it’s supported across a lot of web servers, and some have some nice special features. For example, with lighttpd, you can avoid a lot of shoveling of data around by returning an ”x-send-file: /some/path” header and no body; lighttpd will then use the efficient low-copy or zero-copy sendfile system call or a similar thing to send that file across the network, avoiding a lot of interprocess I/O for large files. (We use this in our QWeb framework for our “docroot” servlet, which just serves files from disk, and also in the framework’s caching system for servlets where we’re sending the cached copy of previously generated content.)

But overall, I’d be much happier with a protocol that just handed me the raw HTTP request, and allowed me to send back a raw HTTP response. The issue is, of course, getting the protocol implemented in various web servers. I could, I suppose, design a new protocol and write a C implementation for lighttpd. But then, given that I’d hope to be talking to a Haskell application server anyway, GHC has great support for building efficient, multiprocessing, highly concurrent servers, and Haskell is about a hundred times nicer to program in than C, I think I’d just write a high-performance web server in Haskell. (Except that it’s already been done, anyway.)

Oh, right, where does “gateway” come in? Nowhere, really. Perhaps the word came from someone steeped in the days of MS-DOS BBSs (I ran one myself in the early ’90s), many of which had “doors,” which were programs to which control would be handed from the BBS software, with the BBS software doing some tricks to take care of the I/O over the modem, a la RCP/M. (That stood for “Remote CP/M,” a system so old that Wikipedia doesn’t even have an entry for it.)


By brian on 2009-05-11 18:37:58-0500:

But overall, I’d be much happier with a protocol that just handed me the raw HTTP request, and allowed me to send back a raw HTTP response.

Johan Tibell is working on a Haskell HTTP server with a pretty raw interface called Hyena: http://github.com/tibbe/hyena/tree/master

Add a comment »
comments are moderated