Well, I disagree with Bryan’s previous post post that the parsing was a “simple problem.” But I’ve finally implemented a Haskell solution, and have done quite the writeup about it (more than three times the length of the solution itself!).
Of course, after making sure that it was correct, I had to check its performance. While it’s not as good as I’d hoped (still more than five times slower than the C implementations we looked at), it’s respectable. Here’s the chart again, with my implementation’s peformance figures added:
OS: Ubuntu 8.10 CPU: i7 920 @ 2.67GHz MEM: 3GB [yoann-c] runs: 0.008802 0.007679 0.007488 0.007645 0.007756 0.007639 0.007509 0.008596 0.007591 0.017954 average: 0.008866 s [travis-c-unrolled] runs: 0.013745 0.006565 0.006541 0.0063 0.00639 0.006344 0.00636 0.006961 0.006567 0.006396 average: 0.007217 s [travis-hs] runs: 0.178778 0.093547 0.092922 0.092819 0.092559 0.092501 0.092348 0.092178 0.092826 0.092133 average: 0.101261 s [cjs-hs] runs: 0.056725 0.035372 0.032205 0.04397 0.055906 0.05614 0.055988 0.062287 0.056014 0.065113 average: 0.051972 s
You’ll note that it’s twice as fast as Travis’ version, despite no attempts at optimization beyond using
ByteString instead of
String. (Well, that’s a pretty big optimization. :-) In fact, mine is quite possibly less efficient than Travis’ would be if he had used
ByteString as well.)
But it’s also, in my not so very humble opinion, rather more clear, and certainly the C code we saw was entirely incomprehensible.
I do have some thoughts on easy ways of running this in parallel, but even so, when you’re five times slower, running on an eight-core machine is only going to get you back in range.
Perhaps someone will get enough energy to compare the speed of the Python Regexp versions to these. The Python regular expression code just calls out to a C library, and regular expressions are generally fairly efficient state machines, so it might not be so unfair a comparison as you might think.