Monday, September 27, 2010

An unpleasant surprise from Scala 2.8

Generally speaking, I am a big fan of the "latest and greatest" when it comes to software versions. For example, when I have a dependency on library 2.3 and I notice library 2.4 is out, I generally update my dependency. I think this is a good strategy so that your software stays current, is able to leverage the new features, and, let's face it, run the latest set of bugs.

Well, Scala 2.8.0 was released in July. I know a lot of work went into the release. Naturally, I took my Scala 2.7.7 code and upgraded it to Scala 2.8.0.

After updating dependencies, the first hurdle was to make sure the code compiled. I found the new Scala 2.8 compiler to be more strict -- which is perfectly fine. A few things had been deprecated, but it was great how the compiler told me what to do. For example, instead of using first, use head instead. This makes sense because the Collections API had a serious overhaul with 2.8, rationalizing methods and making functions consistent across the variety of different kinds of collections.

After the code compiled and the unit tests passed, I did a smoke test and felt pretty pleased with the results. I had not made any significant code change -- just made it 2.8 compliant. I naively thought I was done... But of course I was not...

A thorough regression test was required! Some very subtle differences were discovered that interfered with algorithms. For example:
  1. BufferedSource.getLines
    With Scala 2.7.7, the return value will include the end of line character(s), but Scala 2.8.x does not!
  2. XML processing behaves differently. Consider the following code:

    import scala.xml._
    
    val path : NodeSeq = readme.txt
    val kind = path \ "@kind"
    val isFile = kind equals "file"
    

    The above code compiles in both Scala 2.7.7 and Scala 2.8.x. The path and kind variables are evaluated exactly the same, but the isFile result is different: Scala 2.7.7 returns true, while Scala 2.8.x returns false. Wow, that is enough to break an algorithm!

    FWIW, replacing the last line with the following will work in either 2.7.x or 2.8.x:

    val isFile = (kind mkString) equals "file"
    

It made me also think about how Scala's implicit conversions could be introduced with a new version that could unknowingly change an algorithm.

Let me add that changing the behavior of getLines to exclude the end of line character(s) was likely a smart change. If the documentation in Scala 2.7.7 did not explicitly say that it was including the end of line character(s), I probably would have assumed that it did not. So now the implementation matches what one would expect. Of course, I am a little surprised that in this case they were willing to change the definition of the function and its documentation rather than deprecate it and suggest an alternative function with slightly different semantics.

And while I have not fully traced the XML change, I am sure that there was method to the madness. Perhaps there was a loophole that my code was exploiting, and they closed it.

After these discoveries, I wanted to read the release notes again. What else did I miss? What else should I look out for? Alas, I do not see any mention of these changes in any release notes or even the issue tracker! (If someone knows of where this was documented, please notify me as I am clearly not looking in the right place(s).)

I will admit that some issues could have been caught sooner if there was more unit tests. We could all strive to write more unit tests.

In conclusion, be careful! I still like Scala, and this issue could have occurred with almost any dependency. You have been warned.

Thursday, September 9, 2010

Announcing the Maven Plugin for Project Lombok

My favorite programming language is Scala.  I've long argued that the number of bugs in code is directly proportional to the number of lines of code.  Scala has several features that help you avoid boring boiler-plate code, like Case Classes.

Case Classes allow you to concisely describe a data container class. For example:

case class Book(title: String, published: Date, author: String)

This is practically the equivalent to the following Java code:

public class Book {
  private String title;
  private Date published;
  private String author;

  public Book (final String title, final Date published, final String author) {
    this.title = title;
    this.published = published;
    this.author = author;
  }

  public String getTitle() {
    return this.title;
  }

  public void setTitle(final String title) {
    this.title = title;
  }

  public Date getPublished() {
    return this.published;
  }

  public void setPublished(final Date published) {
    this.published = published;
  }

  public String getAuthor() {
    return this.author;
  }

  public void setAuthor(final String author) {
    this.author = author;
  }

  @Override
  public boolean equals(final Object o) {
    if (o == this) return true;
    if (o == null) return false;
    if (o.getClass() != this.getClass()) return false;
    final Book other = (Book)o;
    if (this.getTitle() == null ? other.getTitle() != null : !this.getTitle().equals(other.getTitle())) return false;
    if (this.getPublished() == null ? other.getPublished() != null : !this.getPublished().equals(other.getPublished())) return false;
    if (this.getAuthor() == null ? other.getAuthor() != null : !this.getAuthor().equals(other.getAuthor())) return false;
    return true;
  }

  @Override
  public int hashCode() {
    final int PRIME = 31;
    int result = 1;
    result = result * PRIME + (this.getTitle() == null ? 0 : this.getTitle().hashCode());
    result = result * PRIME + (this.getPublished() == null ? 0 : this.getPublished().hashCode());
    result = result * PRIME + (this.getAuthor() == null ? 0 : this.getAuthor().hashCode());
    return result;
  }

  @Override
  public String toString() {
    return "Book(title=" + this.getTitle() + ", published=" + this.getPublished() + ", author=" + this.getAuthor() + ")";
  }
}
Wow, that is a lot of Java code replaced by one line of Scala code!  Sure, Eclipse can generate the getters and setters, but maintenance of the above code can be error prone, especially for things like equals and hashCode.  Imagine that a new field is added, and you remember to add the getter and setter, but forget to add it to the equals and hashCode methods -- you may find out the hard way (i.e. at runtime) that there is a bug.

Well, thanks to Project Lombok, Java programmers can get the benefit of Scala Case Classes in Java.  The above code can be generated based on the following Java code:

public @lombok.Data class Book {
  private String title;
  private Date published;
  private String author;
}

It is not quite as concise as Scala, but a vast improvement.  Lombok will, during compilation, generate the getters, setters, equals, hashCode, and toString for a class annotated with @Data.  (Please read more about Project Lombok's features if you are not familiar with the project.)

One of the wrinkles with using Lombok is that the real code is generated magically during compilation.  Generally, you do not quite see the resulting Java code.  This will mean things like Javadoc will look incomplete because you will not see the getters and setters, for example, yet they are logically there.  Cobertura Code Coverage reports will look incomplete or incorrect.  Static code analysis tools, like PMD, may incorrectly complain that there are private fields with no getter or setter.

The clever minds behind Project Lombok do have a solution to this problem: delombok.  They provide a tool that takes the lombok annotated Java code and expands it to the end result Java code.  By integrating this step into the build process, one can mitigate the aforementioned shortcomings.

The team initially provided a delombok Ant task, but did not yet have a Maven plugin. Rather than integrating the Ant task into a Maven project using the handy AntRun plugin, I figured that it would be worth wrapping the Ant task in a Maven Mojo.  After a couple of evenings, the maven-lombok-plugin was born.

The lombok annotated source code needs to be placed in a separate directory:  src/main/lombok (not src/main/java).  Then, during the generate-sources phase, the lombok source is delomboked and placed in target/generated-sources/delombok.  Finally, the compile phase will compile the delombok source along with the Java source.

There is a companion Sample Maven Lombok Project that demonstrates how the DataExample class with lombok annotations is transformed into detailed Java code legible by Javadoc, Cobertura, and JXR.

Wednesday, September 8, 2010

Hello World!

In true coder fashion, my premier post is a simple, "Hello World!"