Parsing Roman Numerals Redux

Posted on 2009-03-26 by Curt Sampson :: comments enabled

Well, I disagree with Bryan’s previous post post that the parsing was a “simple problem.” But I’ve finally implemented a Haskell solution, and have done quite the writeup about it (more than three times the length of the solution itself!).

Of course, after making sure that it was correct, I had to check its performance. While it’s not as good as I’d hoped (still more than five times slower than the C implementations we looked at), it’s respectable. Here’s the chart again, with my implementation’s peformance figures added:

OS:  Ubuntu 8.10
CPU: i7 920 @ 2.67GHz

runs: 0.008802 0.007679 0.007488 0.007645 0.007756
  0.007639 0.007509 0.008596 0.007591 0.017954
average: 0.008866 s

runs: 0.013745 0.006565 0.006541 0.0063 0.00639
  0.006344 0.00636 0.006961 0.006567 0.006396
average: 0.007217 s

runs: 0.178778 0.093547 0.092922 0.092819 0.092559
  0.092501 0.092348 0.092178 0.092826 0.092133
average: 0.101261 s

runs: 0.056725 0.035372 0.032205 0.04397 0.055906
      0.05614 0.055988 0.062287 0.056014 0.065113
average: 0.051972 s

You’ll note that it’s twice as fast as Travis’ version, despite no attempts at optimization beyond using ByteString instead of String. (Well, that’s a pretty big optimization. :-) In fact, mine is quite possibly less efficient than Travis’ would be if he had used ByteString as well.)

But it’s also, in my not so very humble opinion, rather more clear, and certainly the C code we saw was entirely incomprehensible.

I do have some thoughts on easy ways of running this in parallel, but even so, when you’re five times slower, running on an eight-core machine is only going to get you back in range.

Perhaps someone will get enough energy to compare the speed of the Python Regexp versions to these. The Python regular expression code just calls out to a C library, and regular expressions are generally fairly efficient state machines, so it might not be so unfair a comparison as you might think.

Add a comment »
comments are moderated