Monday, September 27, 2010

An unpleasant surprise from Scala 2.8

Generally speaking, I am a big fan of the "latest and greatest" when it comes to software versions. For example, when I have a dependency on library 2.3 and I notice library 2.4 is out, I generally update my dependency. I think this is a good strategy so that your software stays current, is able to leverage the new features, and, let's face it, run the latest set of bugs.

Well, Scala 2.8.0 was released in July. I know a lot of work went into the release. Naturally, I took my Scala 2.7.7 code and upgraded it to Scala 2.8.0.

After updating dependencies, the first hurdle was to make sure the code compiled. I found the new Scala 2.8 compiler to be more strict -- which is perfectly fine. A few things had been deprecated, but it was great how the compiler told me what to do. For example, instead of using first, use head instead. This makes sense because the Collections API had a serious overhaul with 2.8, rationalizing methods and making functions consistent across the variety of different kinds of collections.

After the code compiled and the unit tests passed, I did a smoke test and felt pretty pleased with the results. I had not made any significant code change -- just made it 2.8 compliant. I naively thought I was done... But of course I was not...

A thorough regression test was required! Some very subtle differences were discovered that interfered with algorithms. For example:
  1. BufferedSource.getLines
    With Scala 2.7.7, the return value will include the end of line character(s), but Scala 2.8.x does not!
  2. XML processing behaves differently. Consider the following code:

    import scala.xml._
    
    val path : NodeSeq = readme.txt
    val kind = path \ "@kind"
    val isFile = kind equals "file"
    

    The above code compiles in both Scala 2.7.7 and Scala 2.8.x. The path and kind variables are evaluated exactly the same, but the isFile result is different: Scala 2.7.7 returns true, while Scala 2.8.x returns false. Wow, that is enough to break an algorithm!

    FWIW, replacing the last line with the following will work in either 2.7.x or 2.8.x:

    val isFile = (kind mkString) equals "file"
    

It made me also think about how Scala's implicit conversions could be introduced with a new version that could unknowingly change an algorithm.

Let me add that changing the behavior of getLines to exclude the end of line character(s) was likely a smart change. If the documentation in Scala 2.7.7 did not explicitly say that it was including the end of line character(s), I probably would have assumed that it did not. So now the implementation matches what one would expect. Of course, I am a little surprised that in this case they were willing to change the definition of the function and its documentation rather than deprecate it and suggest an alternative function with slightly different semantics.

And while I have not fully traced the XML change, I am sure that there was method to the madness. Perhaps there was a loophole that my code was exploiting, and they closed it.

After these discoveries, I wanted to read the release notes again. What else did I miss? What else should I look out for? Alas, I do not see any mention of these changes in any release notes or even the issue tracker! (If someone knows of where this was documented, please notify me as I am clearly not looking in the right place(s).)

I will admit that some issues could have been caught sooner if there was more unit tests. We could all strive to write more unit tests.

In conclusion, be careful! I still like Scala, and this issue could have occurred with almost any dependency. You have been warned.

2 comments:

  1. The XML changes were a result of my putting my foot down about trying to swim against the tide of mathematics. See: http://github.com/scala/scala/blob/master/src/library/scala/xml/Equality.scala

    The method which implements the old logic is called "compareBlithely" and xml_== calls it.

    The getLines change was perhaps not entirely intentional. I say "perhaps" despite the fact that I did it, because I can't reconstruct the tapestry of factors. I had to change Source around at the very last minute to restore some other 2.7 behavior and that was one of those last minute changes.

    Anyway, I wasn't going out of my way to break things, but you have to realize how much of a rewrite 2.8.0 is. It's enormous. You might try -Xmigration to get some warnings about things: and I'm sure you have more to discover.

    ReplyDelete
  2. Probably the worst thing about 2.8 is binary incompatibility with 2.7.

    ReplyDelete