Home

succ Java 1: Common Ground

Posted on 2009-09-27 by Curt Sampson :: comments enabled

Introduction

Stuart Halloway recently posted a series of articles collectively entitled Java.next, discussing some of the newer languages that compile to Java Virtual Machine bytecode (Clojure, Groovy, JRuby and Scala) and how they encourage a cleaner and more concise, as well as somewhat functional, programming style.

I thought I’d take the opportunity here to write a parallel series of articles comparing Haskell with the above languages, much in the way that Hamlet D’Arcy did with F#. You’ll want to read Halloway’s article first, if you haven’t already; while I repeat the code examples for comparison, I don’t repeat his explanations.

I call this series “succ Java” not because I think that Java sucks (though I rather think it does), but because succ is the function that produces the successor to a value: succ 0 = 1, and for me Haskell has been the successor to both Java and Ruby.

I’ll be pointing out not only the similarities between Haskell and these other languages, and demonstrating the cleaner syntax, but also discussing some of the differences and why I think that these make Haskell a better language than any of the above. So let’s start in on the first column, discussing mainly some of the similarities these languages share in having powerful features beyond Java.

Everything is an object.

Well, this is true for the JVM languages, of course, but not for Haskell, where everything is data and functions. Still, aside from some slight differences in name space management, this is not as a big a difference as it seems; Haskell offers other features, such as type classes, that give it equivalent (or better) power and convenience. Let’s look at the examples given in Halloway’s article:

C:  (. 1 floatValue)                ; clojure
G:  1.floatValue()                  // groovy
R:  1.to_f                          # ruby
S:  1.floatValue                    // scala

Haskell is pretty similar:

H:  fromIntegral 3 :: Double        -- Haskell

The only real difference we see here is that rather than using a function that specifies what we want to convert the number to, we the function specifies what it should be converted from, and use a type declaration to specify what we’d like to convert it to. fromIntegral is actually a polymorphic function with many different versions; we’ll have more to say about this in part three of this series.

But there’s another difference going on underneath. Data values in Haskell are normally “boxed,” meaning that they contain additional information beyond the value itself, such as whether this is a fully evaluated value or a “thunk,” a piece of not-yet-executed code that will evaluate to this value when run. However, the compiler has available to it unboxed values as well, such as standard 32- and 64-bit integer and floating point numbers, and understands the equivalences between the boxed and unboxed versions of these values. This allows it to optimize the code by using unboxed versions when appropriate, such as when storing arrays of values. In particular, it can also pass values in registers when free registers are available, avoiding expensive access to memory.

Note that unboxed values cannot include type information; the compiler is required to keep track of the types of various values and generate code that will never do the wrong thing. That it can safely do this, rather than having to do type checks at runtime, is one of the great advantages of having a good type system like Haskell’s.

Low-ceremony property definitions

Composite data items, accessors, and the like exist in Haskell just as they do in object-oriented languages:

C:  (defstruct person :first-name :last-name)
G:  class Person { def firstName; def lastName; }
R:  Person = Struct.new(:first_name, :last_name)
S:  case class Person(firstName: String, lastName: String) {}

H:  data Person = Person { lastName, firstName :: String }

What we’ve done here is defined a type named Person. (Types in Haskell always start with an upper-case letter). It has a value constructor, also Person, which can be used as a function that, given two parameters of type String will construct and return a new value of type Person. It’s safe to use the same name because type and value names live in different namespaces, and due to the syntax of the language cannot be confused. We also define two accessors/setters, lastName and firstName. These allow all of the usual operations:

p1 = Person "Smith" "Joe"           -- Create a new person

p2 = Person { firstName = "Jane"    -- And again, with explicit
            , lastName  = "Doe"     -- specification of the field
            }                       -- names.

p3 = p2 { firstName = "Amy" }       -- Another new person: Amy Doe.

lastName p2 == lastName p3          -- Evaluates to True.

You’ll note that function calls in Haskell do not use parentheses (which are optional in Ruby, as well), and also separate their arguments with spaces, not commas. Parentheses in Haskell are used only to control evaluation order, such as in (2+3)*6.

You might wonder why I reverse the last name and first name from the example in Halloway’s article. This is so that, should I derive Ord for it, it will compare and sort by lastName first, then by firstName after that, since the automatic derivation of the comparision function compares the fields in the order they were declared.

Let me explain further. Haskell has “type classes” that serve a role similar to that of interfaces in Java: any type that’s a member (or “instance”) of a given type class has implementations of the functions associated with that type class. So, for example, the String type being a member of the Eq class means that one may apply the functions (==) and (/=) (equal and not equal) to pairs of string arguments and get back the appropriate boolean result. The type class Ord offers ordering-related functions, such as (<), (<=), sort, and so on.

Unlike interfaces in Java, though, a Haskell compiler understands certain common type classes and their functions, and is capable of automatically generating the code for these when it’s “obvious” how to do this based on the data declaration. For Ord, it does comparisions in the order of the declaration of the fields, so that if I add deriving (Eq, Ord) to the end of the Person data declaration, applying the sort function to a list of Person values will sort them by lastName then firstName.

This offers much of the power of inheritance or Ruby’s mixins with less work. I won’t go into this too deeply in this post, however; I encourage you to consult a Haskell tutorial if you’re interested in pursuing this further.

Expressive collections

All of the examples given in Halloway’s article are quite functional in style and so, Haskell being a truly functional language, is quite similar:

C:  (filter (fn [x] (= 1 (rem x 2))) (map (fn [x] (* x x)) (range 10)))
G:  (1..10).collect{ it*it }.findAll { it%2 == 1}
R:  (1..10).collect{ |x| x*x }.select{ |x| x%2 == 1}
S:  (1 to 10).map(x => x*x).filter(x => x%2 == 1)

H:  filter odd (map (^2) [1..10])

We can see here already how Haskell’s minimal syntax and avoidance of punctuation produces particularly clean and readable code. I think that Haskell has some of the nicest syntax of any general purpose language out there because of this.

One point here: we’ve now encountered a commonly used technique in Haskell called “partial application,” where you give a function only some of its arguments to produce a new function taking the remaining ones. Where (3^2) evaluates to a simple value, 9, (^2) evaluates to a function that takes one parameter, the one that normally goes on the left side of the carat: this function can then be applied to a value to produce the final result.

square = (^2)

square 3    -- evaluates to 9
square 4    -- evaluates to 16

This is a convenient way of avoiding the extra syntax of an explicit lambda definition (declaring a function with no name), such as square = \x -> (x^2). Compare the brevity and clarity:

map (\x -> x^2) [1..10]
map        (^2) [1..10]

Haskell offers an alternative way of expressing the filter/map expression above using a list comprehension, which Python users will be familiar with (Python’s list comprehensions were taken from Haskell):

[ x^2 | x <- [1..10], odd (x^2) ]

This reads, the list of x2 where x is taken from the list [1..10] and only the values where x2 is odd are kept.

Functional programming

Haskell of course has first class functions and various ways of defining functions “on the fly,” as it were. The other languages all use explicitly named function definitions or, at best, lambdas:

C:  (defn adder [x] (fn [y] (+ x y)))
G:  adder = { add -> { val -> val + add } } 
R:  def adder(add) lambda { |x| x + add }; end 
S:  def sum(a: Int)(b: Int) = a + b

As we’ve seen above, we have lambdas in Haskell as well, can declare functions with explicit parameters in the normal way, and also use our partial application style (which is often referred to as “point free” style):

H:  adderLambda         = \x y -> x + y
H:  adderDefinition x y = x + y
H:  adderPointFree      = (+)

In this particular example we touch again on the polymorphism given to us by the type class system in Haskell: despite being statically type checked at compile time, the above Haskell definitions are all equivalent to the dynamic definitions in languages such as Ruby, though we’re actually declaring many instances of the adderPointFree function. Because (+) is a polymorphic function that works on any type in the “class” (or set) of types called Num, adderPointFree will automatically also do so, unless we explicitly specify otherwise. The type signature is, in this case:

(+) :: Num a => a -> a -> a

which says that (+) takes two values of type a and returns a third value also of type a, where a stands for any type that is in the Num class.

Overriding operators

Here, Haskell is the same as nearly everybody else, all of whom can overload operators to avoid the separate function names required in Java:

J:  balance.add(balance.multiply(interest));  // Java math
C:  (+ balance (* balance interest))
G:  balance + (balance * interest)
R:  balance + (balance * interest)
S:  balance + (balance * interest)

H:  balance + (balance * interest)

Again, this is Haskell’s type class system at work.

Maintainable exception handling

Haskell, as with the other languages, has no checked exceptions.

However, Haskell has one additional feature that JVM languages do not: the dreaded Null Pointer Exception has been banished!

Tony Hoare referred to his invention of the null reference in 1965 as his “Billion Dollar Mistake”; it’s caused no small number of bugs and amount of boilerplate code, because everybody now has to deal with an extra, odd value everywhere. In Haskell, on the other hand, there are no null references, though there are simple mechanisms to do something similar in areas where they are useful in other languages.

This is unfortunately something that’s just not fixable in Java or the JVM without major reworking that would either break backward compatibility for a huge amount of code.

Adding methods to existing types

Haskell has no objects, just functions and values. So, unlike OO programmers, when we need a new function we simply write a new function and we’re done with it. This turns out to be rather more convenient than it sounds to an OO programmer, actually.

Skipping over the 12 line Java example, let’s look at how we add this new isBlank method in the languages that Halloway demonstrates.

In Groovy:

String.metaClass.isBlank = {
  length() == 0 || every { Character.isWhitespace(it.charAt(0)) }
}

In Ruby:

class String 
  def blank? 
    empty? || strip.empty? 
  end 
end 

In Clojure:

(defmulti blank? class)
(defmethod blank? String [s] (every? #{\space} s))
(defmethod blank? nil [_] true)

(Note in this Clojure example the extra check for a null reference; apparently the author believes that null is the same as an empty string. Whether it should be or not is open to interpretation.)

In Scala:

class CharWrapper(ch: Char) {
  def isWhitespace = Character.isWhitespace(ch)
}
implicit def charWrapper(ch: Character) = new CharWrapper(ch)
class BlankWrapper(s: String) {
  def isBlank = s.isEmpty || s.forall(ch => ch.isWhitespace)
}
implicit def stringWrapper(s: String) = new BlankWrapper(s)

And in Haskell?

isBlank = all isSpace

It doesn’t get much cleaner and simpler than that!

Type classes in Haskell are not extensible, at least not without recompiling everything declared to be of that type class. There’s good reason for this: since the type safety of a function call is checked at compile time, if we allowed run-time changes at the type level, the program could crash when linked with another module that didn’t know about these changes. This is the price we pay for efficiency. (There are actually ways in Haskell of dealing with types at runtime, if you’re willing to pay the price of doing all of the checks at runtime, but I won’t get into those here.)

Roll-your-own constructs

Haskell offers particularly powerful abilities to add new things that would normally be language features in other languages. Monads, in particular, are used not only to keep track of state but also for control flow. This allows you to add new control-flow structures as libraries. I won’t get into all of the details of this here, as it can get complex, but it’s all there.

And of course you can do all of the usual things with first-class functions that you can do in any functional language.

A Note on Tail Call Optimization

In the comments on Hamlet D’Arcy’s blog entry on F# there’s a discussion of the JVM’s lack of tail call optimization. Though this can be mitigated to a limited extent (which Scala does), this can in some circumstances be detrimental when using a functional programming style, for example, when you’re iterating over a large data structure using mutually recursive functions. I would take Jon Harrop (commenting there as “Flying Frog Consultancy”) with a grain of salt though; there are plenty of cases where this isn’t a critical issue, and even if you’re careful to avoid iterating with tail calls altogether, that still leaves you with plenty of good functional programing techniques to improve your code.

Summary

So there’s where Haskell lies in relation to Clojure, Groovy, JRuby and Scala as discussed in Halloway’s article, and I think it comes out very, very well. This has only touched on Haskell’s many capabilities and strengths, of course. There’s more to come in future articles, but I encourage you to spend some time with Haskell yourself, if you find what I’ve said here interesting.

Add a comment »
comments are moderated