Blatherberg: java

Showing posts with label java. Show all posts

Monday, February 07, 2011

The Reddit Effect

Early this morning an old post of mine got voted up to the main page on Reddit. I may sift through the comments when my day settles down but I just want to show you the impact of the exposure, thanks to my Analytics page:

a 4147% growth. That's not 40x, that's 4000x, as evidenced by this next graph:

Looks like the Reddit comments wound up on my post's page. Do I really have to delete all the chaff?

Thursday, January 27, 2011

guava-osgi project gets a fresh update!

Thanks to Mikaël Barbero, the guava-osgi project has been revived!

At the new update site you can find bundles for Guava versions 3, 4, 5, 6 and 7. This is just in time for the imminent release of r08, and we will publish a plug-in shortly after its release.

Between me and Mikaël, guava-osgi will henceforth be updated just days after Guava releases. This is because Mikaël wrote a script that automates most of the nasty work.

Please give your thanks to Mikaël!

Saturday, April 10, 2010

Guava as an OSGi bundle

I have published an update site that hosts Guava (which hosts some of Google's core Java libraries), as an OSGi bundle for you to use in your Eclipse projects.

Before you use the update site, keep in mind: There are some kinks that need to be worked out. For example, it seems I didn't get source attachment right, and I'm keeping the version numbers low until the process runs more smoothly. But more important, URLs are going to change, so it is not a reliable site yet. Only use it to test and provide feedback on the set-up.

Feedback is welcome, but patches and (OSGi/Subversion) guidance will get priority over requests.

The update site URL is at http://guava-osgi.googlecode.com/svn/trunk/com.google.guava.site/. It's got the r03 release, which contains the first binary distribution of Guava.

Tuesday, January 05, 2010

Township Jitney Schedule: Software Development History

This past weekend I wrote an app for my local town: it displays a route, and schedule information, for the town's jitney. (In local terms, a jitney is a township-sponsored shuttle service for resident commuters to and from the train station.) You can see it at http://myjitney.appspot.com.

I'd like to discuss the short development history of this web application.

The Original Plan

The original plan was to write a public, custom Google Map to be shared with the other residents using My Maps, a tool that allows users to create customized maps.

The problem with My Maps is that each marker had be placed individually, and that's tedious: at least to me. What I needed was a programmatic way to feed a set of addresses, determine each address' coordinates (you know, latitude, longitude) and place markers in a custom map.

For that I wanted the Google Maps API in Javascript, and if I had to use the Javascript API for determining coordinates, I might as well use it to build the map from scratch.

The Prototype

The prototype was written in pure Javascript, using the Google Maps API. It was really basic; routes were stored in JSON (natch), and the whole app lived inside a Google Maps widget.

To get started with the API I found an excellent Google Maps API Tutorial written by Mike Williams. The relevant entries were the entries on Markers and info windows, Polylines from XML and Geocoding Multiple Addresses.

In fact, I hacked a version of the geocoding example by adding all the jitney stop addresses, and from that I got the coordinates of most of the map markers. For those addresses that the API couldn't parse, I used a tedious trial and error process.

The prototype took five hours to write: one hour to parse and codify the data, one hour for figuring out how to get reasonable coordinates for each address, and three hours to get my head around Javascript. For me, writing Javascript is like this: trial, error, trial, error, google, trial, trial, error, trial, error, google, but by the end, I got a map that showed all the jitney stops and their paths.

Two issues with writing the app in Javascript were my basic lack of comfort with Javascript, and also, the app didn't render on my Android phone. I dreaded debugging a web app on an Android phone.

The Rewrite

So I needed to rewrite the app, and by need, I mean want. 24 hours earlier I could have just manually built a damn custom map with My Maps, but now I was committed to code and more code. For the rewrite, I chose GWT. The Google Web Toolkit is a terrific piece of technology; you can write web applications using Java in Eclipse, my preferred IDE, and with a debugger. Since Google provides a GWT implementation of the Google Maps API, it was reasonable to port the existing prototype to GWT. The Google Plugin for Eclipse, a fabulous tool that combined GWT, AppEngine, and Eclipse, made it dead simple to deploy the app to to appspot.com which meant a permanent home to the app, along with a back-end infrastructure in case there was ever a need for servlets or a back-end data store.

(This is a great time to point out that I think that GWT is magic, and the GWT team are a bunch of magicians.)

It took four hours to write a feature-equivalent version of the application using GWT. Most of that time was spent familiarizing myself with the various APIs and getting reacquainted with GWT.

I didn't want to go through the effort of learning how to work with the AppEngine database, so the rewrite still shipped the route data as java source turned into javascript. One of the great benefits of this turned out to be that the map loaded super-quick. So I moved the CSS to the HTML <style> tag, removing yet another server request.

Thanks to GWT the app ran perfectly fine on Android. But more important, thanks to GWT I could write code in a more familiar style, and easily manipulate the DOM outside the web page's map object.

So I did.

Spit and Polish

I spent two more days adding features and polish: a pretty display of the schedule. A list of the routes so each could be viewed independent of the others. A visual indicator of when a jitney will next stop at a certain route.

One of the features was to replace the straight lines from stop to stop with paths along the street. The township had a map that laid out the supposed bus paths, so the trick was finding the coordinates where the paths turn. That turned out to be surprisingly easy with this small piece of Java:

map.addMapClickHandler(new MapClickHandler()
  @Override
  public void onClick(MapClickEvent event) {
   System.out.println(event.getLatLng());
  }
});

Then it was just a matter of running the web application and clicking each spot along the path, feeding the console output back to the web application as intermediate points.


Before	After

Besides adding features, I spent a ton of time dealing with things like positioning and formatting. I spent 30 minutes building a general purpose route building API. I spent 15 minutes on a general purpose algorithm for calculating the center of a group of points. 30 minutes went into making a widget that looks like, but isn't quite, an anchor tag. I spent endless time playing with different types of GWT panels, setting widths, heights and spans. I played with CSS. I failed at CSS, and then I played with it some more. It seems I have the same development cycle for CSS that I do with Javascript: trial, error, trial, error, google.

Done.

By Sunday night the app was done, and so was I. But it's still not done. Even with such a small one-off project, there so many features that could be added. For instance, while the app runs on Android (and reportedly the iPhone) it's not really built for small phones. The individual links are too small to be useful.

But also, I'd like to use Street View to show each stop. Unfortunately, while a Street View API exists in the Javascript API, there's no equivalent in the GWT API. I probably spent about two hours before I recognized that it would involve another painful, endless cycle of trial, error, trial, error, google. Too bad.

Done?

Damn. While writing this post, someone provided feedback, requesting a feature that made too much sense not to implement. So instead of cleaning up this post I'm reading about geocoding again. I love writing software.

Monday, November 09, 2009

Final Thoughts: Java Puzzler: Splitting Hairs

This is the final in a series of posts about a puzzler [ post containing the question ] [ post containing the answer ]. In those two posts I highlighted some surprising behavior in String.split().

Why the surprising behavior?

Well, and here's why my job is damn cool: after discovering this issue, I dropped a note to Josh Bloch, who quickly replied: (edited summary)

Yes, this is a pain. FWIW, it was done for a very good reason: compatibility with Perl. The guy who did it is Mike "madbot" McCloskey, who now works with us at Google. Mike made sure that Java's regular expressions passed virtually every one of the 30K Perl regular expression tests (and ran faster).

I have no real issues with the way Madbot implemented regular expressions in Java, nor with the goal of Perl compatibility. Perl's regular expression language was very popular, and derivatives of it were implemented not only in Java, but , JavaScript, PCRE, Python, Ruby, Microsoft's .NET Framework, and the W3C's XML Schema.[ref]

But I do have issue with String getting saddled with a method that quietly explodes the API's complexity.

So why does Perl work this way? I don't know, dude. This isn't a Perl Puzzler.

However, I tried to recreate the original puzzler in Perl as a way to validate the original puzzler's behavior. Unfortunately, I failed using perl v5.8.8 on my OSX machine. This script:

@first = split(/:/, "");
@second = split(/:/, ":");

print scalar @first . " [@first]\n";
print scalar @second . " [@second]\n";

Yielded

0 []
0 []

I'm not claiming there's either a bug or implementation change in either the Java or Perl implementations, but I sure am curious.

Possible Solutions

1. Hacking String.split

I don't attest to this, but it seems that you can ensure consistent behavior by appending a copy of the delimiter. So, if you plan to split a string by its colons you can do:

String[] result = (string + ":").split(":")

I'm sure you can find all sorts of issues with this example. Go for it. Point them out in the comments.

Besides, that's not much of a solution.

2. Get a String tokenizer

A second solution is to use StringTokenizer, which I completely forgot about Wim Jongman made a comment in the solution post.

Of course, even it has its own specific behavior.

public class Main {
  public static void main(String[] args) {
    tokenize("");
    tokenize(":");
    tokenize("a:");
    tokenize(":a");
    tokenize("a:a");
    tokenize("::");
  }

  static void tokenize(String s) {
    StringTokenizer t = new StringTokenizer(s, ":");
    List l = new ArrayList();
    while (t.hasMoreTokens()) {
      l.add(t.nextToken());
    }
    System.out.printf("Tokenization of %s is %s\n", s, l);
  }
}

yields

Tokenization of  is []
Tokenization of : is []
Tokenization of a: is [a]
Tokenization of :a is [a]
Tokenization of a:a is [a, a]
Tokenization of :: is []

3. Get your serving of Guava.

Here's a nice one: Project Guava. Project Guava is a soon-to-be open sourced library some of Google's core Java code. I've worked with these libraries for five years and I attest that they're wonderful to use. The only problem: it's not out yet. Kevin Bourrillion tells me. though, that an initial release will be available before Thanksgiving.

Note: You may already be aware of the open source project for Google's collections library. When Guava is released, the Google Collections library will go away.

The Guava libraries have a class called Splitter. Splitter's purpose is to alleviate some the confusion that comes with String.split.

By default Splitter's behavior is very simplistic:
Splitter.on(',').split("foo,,bar,  quux")
This returns an iterable containing ["foo", "", "bar", " quux"]. Notice that the splitter does not assume that you want empty strings removed, or that you wish to trim whitespace. If you want features like these, simply ask for them:
private static final Splitter MY_SPLITTER = Splitter.on(',')
       .trimResults()
       .omitEmptyStrings();

You can read more about Guava in Kevin Bourrillion's presentation slide deck from September of this year. Splitter is covered in slides 13 to 17.

Avoiding the real issue

There are two ways of looking back on the variety of votes: either people were assuming that String.split had confusing behavior, or they just expected it to work as as they would hope. Some might want a parse of ":" to return two elements. Some might want it to return one. Or zero. Something as seemingly simple as string tokenization has behavior that just might not meet your expectations. I'd like to say that Guava's Splitter will do the trick for everyone (as it does for my case of parsing a classpath) but you need to evaluate it for yourself.

This has been a rather long way of saying: test your edge cases. Thanks for reading.

Sunday, November 08, 2009

Answer to: Java Puzzler: Splitting Hairs

Update: I fixed some small errors, and also updated the charts one last time.

This blog post contains the results and answer to the previous post, Splitting Hairs.

Given the nature of the possible answers, this is actually two puzzlers in one:

How many elements come back from "".split(":"): Zero or one?
How many elements come back from ":".split(":"): Zero, one or two?

Darn, I could have gotten two separate puzzlers out of this.

My Guess

Here's the important point: this is a real world case that occurred to me just the other day. Specifically, I was writing some UI code to edit and parse a colon-separated classpath, so in fact the line in question looked like this:

String[] classpathEntries = classpathField.getText().split(":");

In this case, if the UI field starts out empty, and classpathField.getText()returns an empty String. Effectively, my expectation was that with an empty classpath, I'd also get an empty array.

So my guess for "".split(":").length() is Zero.

My guess for splitting a separator would result in two empty strings on either side. So that guess is two. Put them together and you get c) 0/2.

Your Guesses

How did you folks do?

Let's look at the numbers:

An overwhelming preference for f) 1/2, and my guess, c) 0/2 in a distant second.

The Answer

If by now you haven't run the sample code, I'll tell you: the correct answer is d) 1/0. If you're shocked that you get more elements with less data, you're not alone. Hey, just look at those graphs.

Okay, let's start the analysis by reading the method signature for String.split():

public String[] split(String regex)

Ah, yes, split takes a regular expression as a delimiter, not a literal string.

Sidebar: I occasionally read about (or experience) puzzlers where someone tries splitting on a pipe character (|), which has a special meaning in regular expressions. To split on a pipe, you must use "\|" so the pattern compiler sees it as a literal character. This is not one of those puzzlers: the colon does not have a special meaning for the pattern compiler.

Part 1: "".split(":")

If you navigate through the source for String.split, you'll discover that "".split(":") is effectively the same as Pattern.compile(":").split("", 0).

Things get interesting when you read the javadoc for Pattern.split(Charsequence, int):

If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form.

Wha? So if the delimiter isn't found in the in the input, the original input is returned. In other words:

System.out.println(Arrays.deepToString("food".split(":")));
System.out.println(Arrays.deepToString("foo".split(":")));
System.out.println(Arrays.deepToString("fo".split(":")));
System.out.println(Arrays.deepToString("f".split(":")));
System.out.println(Arrays.deepToString("".split(":")));

yields

[food]
[foo]
[fo]
[f]
[]

OK, that explains the first one, what about the second one?

Part 2: ":".split(":")

To understand what's going on, let's look again at the javadoc for Pattern.split(). In this case, n by the nature of being called by String.split(String), is 0.

If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded. [ emphasis added. ]

This can be corroborated with this piece of code near the end of the method:

// Construct result
int resultSize = matchList.size();
if (limit == 0)
    while (resultSize > 0 && matchList.get(resultSize-1).equals(""))
        resultSize--;

In other words, internally, it splits the input string ":" into [ "", "" ], and then removes the elements before returning to the caller, but "".split(":") doesn't get this treatment because the delimiter was never found in the input.

Does your head hurt? Mine sure hurts, and I've done the research. But here, if you're still hungry, or hate yourself, take a look at this little gem, almost worthy of its own puzzler.

System.out.println(Arrays.deepToString((String[]) ":".split(":", 2)));
System.out.println(Arrays.deepToString((String[]) ":".split(":", 1)));
System.out.println(Arrays.deepToString((String[]) ":".split(":", 0)));

Coming Soon

Like the last puzzler, this will be followed with an analysis of why this occurs, and some nice alternatives.

Friday, November 06, 2009

Java Puzzler: Splitting Hairs

Try to answer this question without running the code, or reading the class documentation.

Assume we're using Java 1.6, though it likely doesn't matter if you're using Java 1.5.

What does the following snippet print?

String[] nothing = "".split(":");
String[] bunchOfNothing = ":".split(":");

System.out.printf("%d/%d\n", nothing.length, bunchOfNothing.length);

I will post an answer on Sunday.

This puzzler comes with six possible answers. I would have liked to have given four choices, but then I realized this was just for fun.

Loading...

Thursday, October 22, 2009

Final Thoughts On: A Symbolic Puzzler

This is the final in a series of posts about a puzzler [ post containing the question ] [ post containing the answer ]. In those two posts I highlighted some bizarre Java weirdness as it pertained to the java.io.File.getCanonicalPath method, that canonicalized paths are cached, which means if a symbolic link changes somewhere the cached value becomes invalid.

Blatherberg

This problem really only ever crops up when your code relies on calls to getCanonicalPath and the links change while the application is running. If you expect your application to run against a filesystem with shifting symbolic links, you have to do one of these three things:

Disable the cache by setting the system property sun.io.useCanonCaches to false. (I would also like to briefly nod to the pedantic point that the string useCanonCaches is icky.)

This seems like an easy out (particularly if your application is moderately complex, and deployed, and assuming your application doesn't rely on the default behavior,) but there's a reason Java comes with a canonicalization cache: performance. Reading symbolic links from disk can take time, especially if you do it a lot.

Also, you might be running your application in a Java EE container along with other applications, in which case, you can't isolate the cache behavior to a single application.
Stop using getCanonicalPath (and its sugary sibling, getCanonicalFile) in your application, and rely solely on the non-canonicalized path.

Changing your infrastructure to rely on the symlink paths themselves and not their canonicalized values sounds good, but you might not have control over that code: your application may rely on an application infrastructure that already relies on getCanonicalPath, and then you're kind of screwed. It's easy to say that the cache should be disabled in the name of correctness, but if you're repeatedly resolving symbolic links files by the thousand, the time cost may be significant.

This leads to the other way to look at this problem, which is the lack of accessibility and flexible control over the cache. You might want to cache calls in some circumstances yet not others. The use cases for cache control can be complex, and by hiding the complexity you get, well, surprises like this.
Disallow an application's filesystem to redefine symbolic links.

If you've got that power, go for it.

Help From JSR 203

There's actually some hope for the future, and that's JSR203: More New I/O APIs for the JavaTM Platform ("NIO.2") which is scheduled to be part of Java 7. Look back to the puzzler, which points out the use of Filesystem and UnixFilesystem classes. In JSR203, those ideas are explicit. The equivalent of java.io.File is java.nio.file.Path which exposes a method getFileSystem. That's right, the file system is no longer hidden from the user, and you can read all about java.nio.file.FileSystem here. You can have a file system that represents a thin layer on top of your disk, or one that caches all sorts of metadata from your disk, or, heck, create an in-memory implementation for high-speed storage! But the real benefit is that these filesystem implementations can be injected into your classes: no more need for a single static filesystem. Whereas java.io.File objects are created through a constructor, java.nio.file.Path objects are constructed through the FileSystem's getPath method.

This isn't disk i/o nirvana, unfortunately, because like the continuing transition from java.util.Date to java.util.Calendar to something more reasonable like org.joda.time.DateTime, there's still plenty of legacy code using the old and busted APIs. But it's a good start.

If you want some more information about JSR 203, here's a write-up by Alex Miller and a link to a JavaOne talk from 2008. The video is a bit out of date (for instance it highlights the notion of Path.get, which seems to be gone, thank goodness.) But it's got lots of great information about the JSR.

The Last Word

In the end, I want to highlight something underlying this entire journey: the choice to cache the values by default in the first place is just wrong. It reminds me of the saying (that seems to be attributed to Bill Harlan): "It's easier to optimize correct code than it is to correct optimized code.

Wednesday, October 21, 2009

Answer to: A Symbolic Puzzler

This blog post contains the results and answer to the previous post, A Symbolic Puzzler.

The answer will be covered here, and I'll follow this up with a fourth post that covers my thoughts on this issue.

What were your guesses?

Clearly, the most popular answer was that the test would fail.

What would have been my guess?

Look, nobody codes in puzzler fashion, so without details I'll explain what I expected to occur from my own production code, but in terms of this test. I wouldn't be confident that TESTDIR_SYMLINK.getCanonicalPath() returned the non-canonicalized location, but excepting that, I certainly would assume that once the symbolic link was created at the end, the second call to TESTDIR_SYMLINK.getCanonicalPath()would return the symlinked directory. So my guess would have been a. It passes.

The Answer

The correct answer is: d. It depends. More specifically, it depends on the VM's arguments.

The Explanation

If you ran this test straight up without any special VM arguments, the test would fail (which might lead you to think the answer is b. It fails.)

junit.framework.ComparisonFailure: expected:</testdir/[file]> but was:</testdir/[symlink]>
 at junit.framework.Assert.assertEquals(Assert.java:81)
 at junit.framework.Assert.assertEquals(Assert.java:87)
 at ATest.testSymlink(ATest.java:26)
 ...

Why wouldn't the canonicalization return the updated value? It's because the return value from getCanonicalPath was cached from the previous call. Yes, calls to getCanonicalPath are cached.

Let's look at the code underneath getCanonicalPath. The magic lies in some package-private classes in the java.io package, specfically Filesystem and UnixFilesystem. The key operation occurs in UnixFilesystem.canonicalize:

class UnixFileSystem extends FileSystem {
  public String canonicalize(String path) throws IOException {
     if (!useCanonCaches) {
       return canonicalize0(path);
     } else {
       String res = cache.get(path);
       ... 
     }
  }
  private native String canonicalize0(String path) throws IOException;
}

In other words, the path canonicalization computations are cached when useCanonCaches is true. So just when is useCanonCaches true? For that let's look at the static initialization block for Filesystem, the subclass of UnixFilesystem:

// Flags for enabling/disabling performance optimizations for file
// name canonicalization
static boolean useCanonCaches      = true;
static boolean useCanonPrefixCache = true;
... 

static {
    useCanonCaches      = getBooleanProperty("sun.io.useCanonCaches",
                                             useCanonCaches);
    useCanonPrefixCache = getBooleanProperty("sun.io.useCanonPrefixCache",
                                             useCanonPrefixCache);
}

So by default, the cache canonicalization is on, but when you specify the VM arg -Dsun.io.useCanonCaches=false, the cache is never used.

Getting back to the puzzler, the first call to TESTDIR_SYMLINK.getCanonicalPath() always returns the path to the symlink, while the second call returns either the cached value (the path to the symlink) or the up-to-date resolved symlink, but only when -Dsun.io.useCanonCaches=false is specified.

If you don't beleive me now, go try running the test twice, once without specifying VM arguments, and once while disabling the canonicalization cache, and you'll see that it fails once, and passes another time. Hence, d. It depends.

Replies to some of the comments

Here are some of the comments that accompanied the survey:

Guess: It fails.
Comment:The target of the symlink doesn't exist so I suspect exists() will return false

In fact, exists() will return true since the symlink exists. And in fact, the next call to getCanonicalPath returns itself, just as the comment suggested.

Guess: It depends.
Comment: Depending on where the root filesystem is mounted, the canonical path might be something else. For example, /etc on Mac is a symlink to /private/etc. In addition, the mountpoint for / might be a networked drive (e.g. netboot) which might have different semantics.

The bottom line is that you can't necessarily assume that the file you use to access a file is the canonical path for that file, without knowing the filesystem.

This is an interesting comment. I had hoped this was cleared up by this statement in the original post: "The path /testdir is not a symbolic link to another directory, and the user running the test also owns /testdir." If my explanation was insufficient, then so be it.

There were other comments to that effect: "Does /bin/rm do what /bin/rm should do?" "Does the user have write permissions?" These are good points, and would be better suited for a UNIX puzzler. I hope people who worried about those cases looked past them to focus on Java's behavior.

Guess: It throws an exception
Comment: I want be the first one to check "it throws an exception"!

Sorry, not the first.

Meta: Thoughts on writing a puzzler

There are at least two places where the puzzler's code could have been simplified without sacrificing its quality:

Replace explicit calls that delete the files /testdir/file and /testdir/symlink, with a communicated precondition that /testdir had no files.
This test has an interim call that computes the canonical path of a symlink that points to nothing, leading people to spend too much time worrying about that case. A better snippet would:

point the symlink to a file that exists.
compute the symlink's canonical location thereby (optionally) populating the internal cache.
point the symlink to a second file that also exists.

Tuesday, October 20, 2009

Interim: A Symbolic Puzzler

Last night I published a Java puzzler for your enjoyment. I will publish the answer tomorrow, but for now I thought you would enjoy an interim count of the guesses:

There's still plenty of time to participate in the puzzler. Don't forget to use the text box if you want to back up your reasoning.

Monday, October 19, 2009

A Symbolic Puzzler

Update: If you're reading this post in Google Reader, view the original page to participate in the survey.

Here's a little Java puzzler I encountered back in August. Essentially, the test below creates a symlink to /testdir/file, and validates that the symlink is in fact pointing to it.

This test was run on a Mac Book Pro running OSX 10.5.8 using a 1.5.0 JVM, though I originally discovered this problem using Linux and a Java 1.6 VM. The path /testdir is not a symbolic link to another directory, and the user running the test also owns /testdir.

import java.io.*;
import junit.framework.TestCase;

public class ATest extends TestCase {
  public void testSymlink() throws Exception {
    // Setup
    run("/bin/rm", "/testdir/file", "/testdir/symlink");
    run("/usr/bin/touch", "/testdir/file");

    File TESTDIR_SYMLINK = new File("/testdir/symlink");

    // Symlink doesn't exist yet
    assertFalse(TESTDIR_SYMLINK.exists());

    // And so its canonical path points to itself.
    assertEquals("/testdir/symlink", TESTDIR_SYMLINK.getCanonicalPath());

    // Now point the symlink to the file.
    run("/bin/ln", "-s", "/testdir/file", "/testdir/symlink");

    // The symlink exists
    assertTrue(TESTDIR_SYMLINK.exists());

    // The canonical path should be up to date.
    assertEquals("/testdir/file", TESTDIR_SYMLINK.getCanonicalPath());
  }

  private static void run(String... args)
      throws IOException, InterruptedException {
    new ProcessBuilder(args).start().waitFor();
  }
}

These are your choices. Try to determine the answer by solely looking at the sample code. Vote for your preference, and I'll publish the results along with the correct answer.

Voting is open until some time early Wednesday morning, Oct 21.

Monday, October 12, 2009

Story Time with Google Collections

Note: this post in no way suggests that Google Collections is wasteful. Quite the opposite, it's spectacularly awesome. If you haven't used it yet, go get it, play with it, discover appropriate uses of its power, and go rock your project.

Several months ago there was a discussion on one of the mailing lists at work about preference of part of the Google Collections API over an imperative equivalent. Specifically, using Iterables.transform to translate a List<X> into List<Y> by creating a Function that translates X into Y. In other words, which was better?

This:

Function<X, Y> function = new Function<X, Y>() {

public Y apply(X from) {

return X.asY();

}

};

List<Y> listOfY = Lists.newArrayList(
Iterables.transform(listOfX, function));

Or this:

List<Y> listOfY = Lists.newArrayList();
for(X x : listOfX) {
listOfY.add(x.asY());
}

There was this argument about the value of the functional style over the imperative style, and frankly, I found it all rather confusing. All things being equal, the former was just too damn much. And that's the key here: all things being equal. If we were using a language with proper closures, or if there was a host of Function instances available to substitute for function, I might have (and have had) a different opinion.

To illustrate my point, I submitted this story to the mailing list.

Me: Kevin, tell me a story.

Kevin: Seriously. Go away.

Me: TELL!!

Kevin: OK. Once upon a time there was a programmer. He wanted a List. of Strings. But unfortunately, he had a List of Another Type.

Me: What type?

Kevin: Doesn't matter.

Me: WHAT TYYYYYPE!

Kevin: DOESN'T. MATTER.

Me: Ice Cream?

Kevin: Ice Cream. Fine Ice Cream. So the programmer...

Me: How much ice cream?

Kevin: What?

Me: How MUCH?

Kevin: Well, let me put it this way: if it was a List of ice cream, it would be A LOT of ice cream.

Me: Eeeeeeee!

Kevin: (pause) So... this programmer decided to take the list and return a New List. He decided to transform his list by creating a magical function that converts Another Type ... I mean ... _ice cream_ into Strings. And that Function has a method called 'apply'. Apply takes things of type ... sigh ... _ice cream_ and converts it to strings. It does this by calling its callStringMethod. The End.

Me: Wha?

--- REWIND

Kevin ... it would be A LOT of ice cream.

Me: Eeeeeeee!

Kevin: (pause) So... this programmer created a new, empty list. A place to store the Strings. And then he went through every _ice cream_ element in the old list, called its callString Method, and added it to the new list. And then he returned that list. The end.

Me: *sob* that is so awesome. What happened to the ice cream?

Kevin: I think Josh has it. Hurry up before he eats it.

Nobody here is saying that functional programming is bad or wrong. But sometimes it's not really all that awesome to be cutting edge.

Update: What I mean to say, that my colleague said in much fewer words, is that the primary concern in coding should be making your intention as plain as possible.

Tuesday, September 29, 2009

The Java language feature I want: Exception Sugar

I've enjoyed observing all the discussion around the new features to make it in to Java 7. All this time I've had an idea about a language feature which I've occasionally mentioned to colleagues, and it's around the domain of exceptions.

Before I discuss the problem, or my proposed solution, let me be frank: I'm no language expert. I have no expectations that a suggestion such as this has a chance of getting in to Java 8 if they can't even get multi-line strings into Java 7. I don't even care about the impurities brought up by this proposal. What I do care about is discussing a more graceful way of dealing with exceptions.

I'll agree with some of the fundamental concerns Misko Hevery brings up around the domain of checked exceptions. People often don't know what to do with them. Sometimes people just throw Exceptions into RuntimeExceptions and propogate them up the call stack. People also tend to overuse existing exception classes when one more specific to the a class or API domain would do. As is well documented in Effective Java (2ed) Item 61: Throw Exceptions appropriate to the abstraction. However, this all too often requires an undesirable amout of boilerplate code, and I do mean boilerplate. How many Exception classes have you written that look like this?

class MyException extends RuntimeException {
  public MyException() { super(); }
  public MyException(String message) { super(message); }
  public MyException(Throwable cause) { super(cause); }
  public MyException(String m, Throwable c) { super(m, c); }
}

Unless all your methods throw Exception, or if you only throw RuntimeException, IllegalArgumentException, and AssertionError, the answer is: you've written them plenty of times, and unless you're building an API to be consumed by the tens of thousands of engineers, you're probably not creating abstraction-appropriate exceptions often enough, directly against the advice of Effective Java.

What I propose is lowing the barrier for creating an exception class, which I'm calling Exception Sugar. Again, being no language maven, with no regard to syntax or grammar, I humbly propose:

exception className extends baseClass;

which serves as syntactic sugar for creating a subclass of baseClass named className that also, and here's the nasty part, "inherits" the base class's constructors. Yes, I know damn well that constructors aren't methods; they don't have instance scope, and there's really no such thing as constructor inheritance. I read Jeremy Manson's blog, and he even talks about the worthlessness of constructors. I get it. I couldn't care less. I just want to make it easier to write exception classes. To avoid the term inheritence I'll call it constructor propagation.

Maybe you don't like the proposed syntax. What about:

class className extends baseClass { propogate_ctors(); }

@ExceptionSugar className extends baseClass {}

Did you hear that? That, my friends, was the sound of a thousand shattering coffee cups from Java programmers whose grips loosened uncontrollably from the awful syntax. Relax, girls and boys. I'm not trying to sell syntax. I'm just trying to sell an idea. Get over the syntax, and mop up your coffee.

Q: Why can't you write your own exception class?
A. I might get it wrong. And I want to do it a lot. I want to make creating a domain-specific exception class even easier than creating a class for the domain.

Q: How could you use Exception Sugar to create subclasses that have additional attributes, or slightly different constructors?
A: I don't have any illusion of doing so. It seems to me that 99% of the time, people want to subclass Exception or RuntimeException, and even then, they only define just one constructor instead of all four.

Q: Why can't constructor propagation be applied to non-exception classes? Why should exceptions be the, ahem, exception?
A: Ha ha nice one. They don't have to be, but I'm not interested in easily propagating for classes outside the exception hierarchy. But if I'm pressed, constructor propagation in non-exception classes is almost certainly a Bad Idea, but am not going down that path tonight.

Q: Why not focus on something more valuable, like properties?
A: Properties sounds like a good idea. Go for it. The reason I bring up this particular idea is because I don't see anyone else thinking about it. That isn't to say I haven't brought it up before. I even mentioned it to Alex Buckley at last year's EclipseCon. Poor Alex, it was 1:30AM, and he and I somehow managed to win the party that evening, and so he had to suffer listening to yet another armchair language designer. But I will say, when I mentioned exceptions, he thought I was referring to catching multiple exception types. At least he was surprised, which means the idea might be awful.

Or awfully brilliant!

To be fair, Alex described some of the details that demonstrate the difficulty of this idea, but the combination of many drinks and the late hour made it impossible to understand. (Sorry, Alex.)

But getting back to the value of such a proposal: if people can propose literals with underbars and the Elvis operator, I can talk about this, too. (Don't get me wrong, I very much want the Elvis operator.)

As a conclusion of sorts, I'm about 95% confident there's a technical reason for prohibiting Exception Sugar, and about 99% confident it would never make it in to Java. I'd be honored if someone with an understanding greater than mine of Java and language design would be willing to comment on this idea, and provide some thoughts about the technical issues, if at least to educate.

Thanks for reading. Thanks to David Mankin for his feedback.

Friday, July 03, 2009

Generic types are not required for covariance

Java 5.0 introduced Generics. It also introduced covariant return types. Wikipedia does a fine job describing covariant return types.

Since they were released simultaneously, I consider them to be tightly coupled. For instance, here are simplified versions of an interface and implementation I recently wrote:

Note: I am having difficulty representing greater-than and less-than symbols in Blogger's editor, so you'll have to do with { and }.

Version 1: Java 5, Generics, Covariant return types

public interface Model{T extends Model{T}} {
T read(InputStream in);
T write(OutputStream out);
}

public class MyModel implements Model{MyModel} {
public MyModel read(InputStream in) {
...
}
public MyModel write(OutputStream out) {
...
}
public MyModel setName(String name) {
...
return this;
}
public String getName() { ... }
}

Thanks to the covariance, I can write a method chain like this:

new MyModel()
   .read(in)
   .setName("foo")
   .setStopAtMain(false)
   ...
   .write(out);

With Java 1.4, the code would have to look like this

Version 2: Java 1.4

public interface Model {
Model read(InputStream in);
Model write(OutputStream out);
}

class MyModel implements Model {
public Model read(InputStream in) { ... }
public Model write(OutputStream out) { ... }
...
}

And the method chain would result in a syntax error:

public static void foo() {
new MyModel()
      .read(in)
      .setName("foo")
      ^ The method setName(String) is undefined
        for the type Model.
      .write(out);
}

Which you could hack around with an ugly cast:

public static void foo() {
((MyModel) new MyModel()
      .read(in))
      .setName("foo")
      .write(out);
}

Back to the Java 5 example: My point is just this: covariant return types don't require generics. All that messy code in version 1 could look much simpler because covariant return types exist on their own without generics:

Version 3: Java 5, Covariant return types

public interface Model~~{T extends Model{T}}~~ {
T Model read(InputStream in);
T Model write(OutputStream out);
}

public class MyModel implements Model~~{MyModel}~~ {
public MyModel read(InputStream in) {
    ...
}
public MyModel write(OutputStream out) {
...
}
public MyModel setName(String name) {
    ...
    return this;
}
public String getName() { ... }
}

Lesson learned: I know generics fairly well, but there's a difference between knowing when it's useful and when it isn't. Said another way: when you have a Generic hammer everything looks like a generic nail.

Thanks to David Plass for pointing this out.

Sunday, March 01, 2009

Building a National Pet Identification System in Java

This week I dealt with a rather irritating little problem that wound up having a surprising source. Rather than try to explain it to my wife in technical terms, I went for the more prosaic form, but for those of you that are familiar with Java, it's all about this.

Congratulations! President Obama created a new Cabinet post, and you became the first Secretary of Pets! Your first task: managing and categorizing every pet in the United States: dogs, cats, budgies, snakes, it's your domain. You decide to identify every pet by giving them a unique nationally-assigned number. References to that number are the same as references to the pet and references to the pet are the same as references to the pet's identifying number. You create a huge central station just outside Lebanon, Kansas and process requests as they come in. Someone requests an identifying number for a Maltese cat in Michigan, they get 104558628. The next request is for a wire-haired terrier in Pasadena, CA and they get the next one: 104558629.

You've also been tasked with creating an abstract broad classification system. It's an odd classification requirement: you don't need to classify by species, number of paws, color, gender, geography or fangs, just something that lets you batch them up. So you create a classification based on the identifying number already given to the animals by declaring an animal's classification as the last two digits of the identifying number. So if your wire-haired terrier's identifying number is 104558629, its classification is 29.

So far so good, your central station is humming away, doling out identification numbers. One day your sister's cat goes missing, and she goes to the Lost Pet Center. She doesn't describe the cat, she just gives her cat's identification number 104558628, and a Lost Animal Ticket is stored in the National Lost Pet Database.

Remember when I said earlier that referencing a pet is virtually the same as referencing a pet by its identification number? Thanks to this you were able to build a Pet Identification Station in every town: put the animal up to a station scanner and, presto! It displays the animal's identification number.

Also in every town is a Lost Pet station. It's awesome: if you find an animal you don't even have to bring it to the station, you just supply its identification number. When you do, cogs turn, phones dial and the Lost Pet station connects to the National Lost Pet Database, and asks it: "Is this identification number in your Database?" If the response is no then it's not a lost pet, and if the answer is yes the cogs turn and phones dial. The Correct Authorities are notified and before you know it, your sister's cat is safely returned. In fact, she just told me so about an hour ago, and her cat's making all sorts of happy purrs. Do you like happy endings? I love 'em.

I bet you're a little sad that there's a National Lost Pet Database. Don't be sad - it's not a terribly large list, and it's certainly smaller than before you ever built your pet identification system.

But it's large enough that everyone wants it to respond as quick as possible. Nobody wants the National Lost Pet Database to be the source of delaying a pet returning to its home, particularly not the politicians, at least, not the savvy ones. But right now, how it works is like this: every entry in the database is a Lost Animal Ticket. The Lost Animal Ticket contains the identifying number of the pet as well as a piece of information about the owner to facilitate fast contact. When trying to decide if any pet is missing given its identification number, the system creates a prototype Lost Animal Ticket, and attempt to match the prototype ticket against every entry in the lost pet database. If the identifying number of the first entry matches the identifying number on the prototype ticket, it's considered a match. If not, it moves to the second Lost Animal Ticket entry in the database, and performs the same test. This repeats until a match is found or until the entire database is exhausted. It's a bit like trying to find a playing card in a deck by taking a prototype card from yet another deck and attempting to match it against the first deck through visual inspection.

So to make it faster, you come up with a Great Idea: split the database into several miniature databases. In fact, thanks to the classification system you designed, you can just split the database into a hundred mini databases, and in each mini database you only put Lost Animal Tickets for pets with the same classification number. So now you have one mini database for all Lost Animal Tickets for pets classified as '00', one for all pets classified as '01', one for all pets classified as '02', and so on up to '99'. This really works well because every mini database is about as large as every other, and even better, instead of search through every Lost Animal Ticket, you just search one of the mini databases in 1% of the time. Instead of searching through all the possible missing pets, you only have to search through the mini database for pets classified as '28', which as we know are the last two digits of your sister's cat's identifying number. In some ways this is like trying to quickly search a deck of cards by separating them based on their suit first.

All of this works and is predicated on three important things which may seem obvious: first, everything that tries to classify a pet must consistently perform the same step to get the classification number, which is to create a Lost Animal Ticket, read the identification number from it and use its last two digits; second, that whenever attempting to match one Lost Animal Ticket against another, this must be performed by matching the identification number on each of the two tickets; and third, those first two things agree that the identification number is what matters.

And here's the thing: what happens if any of those assumptions are wrong? For instance: one day you decide to categorize pets by the first two digits of the animal's identification number instead of the last. While your sister's cat's Lost Animal Ticket is sitting in mini database '29', everyone may look in mini database '10'. All of a sudden nobody can find your cat, or it's registered in two separate mini databases.

Here's something else that can go wrong: instead of comparing Lost Animal Tickets using the identification number of the animal, the system uses the identification number of the pet's owner. If your sister lost three cats and one is found, it's quite possible for the wrong ticketto be matched against the prototype ticket. Also consider instead a possibility where the criteria for deciding if tickets matched changed to being the same physical piece of paper. Doesn't make sense? If you and I both have a four of clubs, we could possibly agree they were the same card. (That's basically how the National Lost Pet Database works.) But if instead of playing cards, we had airline boarding passes, it would be safe to assume that if you were standing on an airplane with our own boarding passes, if they referred to the same ticket, someone may call The Correct Authorities, and if we applied that algorithm to the National Lost Pet Database, then pets would practically never be found.

Finally, your system will fail from a poor combination of the first two requirements: if your system's categorization calculation is inconsistent with its test for equality, the system may lose tickets. For example, while the first example of possible failure still used the pet's identifying number, what if classification was something more arbitrary, like the time the ticket was issued? The prototype ticket will have been issued at a different time from the lost pet ticket, and as such will likely be classified in different mini databases, and never be found, or only found due to something repeating the same error (Like the time nobody at the bank could locate my account until someone mistyped my name the same way as the teller who registered it on my behalf. That's another story.)

Luckily for your sister, you didn't make these mistakes. Your National Pet Identification System is a huge success, and President Obama has now tapped you to address the problems with our banking system, just like Moist Von Lipwig.

Keep in mind: if anyone plans to write a National Pet Identification System in Java, read Items 8 and 9 of this book. You may save someone both several hours of work and their wire-haired terrier.

Sunday, August 10, 2008

Watch your return values

Preface - This post was going to be a one-sentence comment to a post by Jeremy Zawodny, but then I remembered my professor, and it went from there.

I had a professor for two semesters that required that all our assignments were written in C. This was his assignment submission policy:
1) Run lint on all your code.
2) All code must have zero warnings from lint, with no exceptions.
3) Any use of strcpy resulted in a zero for the assignment.

#3 was my first lesson about buffer overflows. (We all used strncpy.)

One of the lint warnings pertained to unused return values. For instance, the method signature for printf is

int printf(char *format,...)

How often is printf's returned value given attention? It represents the number of characters written to the stream, or a negative number on failure.

Our linter complained about unused return values for typical uses of printf like:

printf("--done.");

We were required to either accept and process the return value, or explicitly disregard it:

(void) printf("--done.");

This was one of my earliest impressions of studying defensive programming.

In Java, the method java.io.InputStream.skip has a contract that requires you to pay attention to the return value, and the reason may surprise you:

public long skip(long) throws IOException
The skip method may, for a variety of reasons, end up skipping over some smaller number of bytes, possibly 0. This may result from any of a number of conditions; reaching end of file before n bytes have been skipped is only one possibility. The actual number of bytes skipped is returned. If n is negative, no bytes are skipped.

So, you supply a distance to skip, and you likely expect the same value back, at least, most of the time. What amazes me is that the documentation even addresses the common expectation: sure EOF is one way skip(n) != n, and then says, twice, that there may be a "variety of reasons" and a "number of conditions." The author is trying to make up for a troublesome API with special javadoc.

Forget that oftentimes there's no documentation and hence, no contract, you still can't expect that documenting unexpected behavior is going to result in proper use of the API.

Present day, Java has Findbugs, The Lint Of Java (zero results, you saw it here first!) Java's classfile format and FindBugs' semantic analysis makes it much more powerful than lint, it can identify this issue with java.io.InputStream.skip(). From the description as Findbugs reports it:

This method ignores the return value of java.io.InputStream.skip() which can skip multiple bytes. If the return value is not checked, the caller will not be able to correctly handle the case where fewer bytes were skipped than the caller requested. This is a particularly insidious kind of bug, because in many programs, skips from input streams usually do skip the full amount of data requested, causing the program to fail only sporadically. With Buffered streams, however, skip() will only skip data in the buffer, and will routinely fail to skip the requested number of bytes.

That's great, but there's a problem, and that comes with expanding the API. If I build my own API with its own nuances, someone needs to write a FindBugs detector. (Hint: be careful writing a clever API!)

It seems draconian to enforce a policy of "always address return values" with Java outside academia since there's no easy way to mimic the explicit cast to void. Otherwise you wind up with unused local variables, which becomes yet another code smell.

In the end all tools like FindBugs and lint only augment the human analysis that accompanies development.

I hope students today are being told that they cannot turn in any Java assignments without running FindBugs.

Monday, May 26, 2008

Becoming a Java Master, con't.

Last night I read an interesting post by Steve McLeod, titled Becoming a Java Master. The article suggests these "books, techniques, and qualifications that help you become a Java Master." You can browse the article to see what he suggests: a good series of books and a Sun Certified Java Programmer certification. I think these things are helpful, but not nearly enough. Of course, everyone takes their own path to mastery, mine involved a job with some of the greatest engineers I've ever met.

Note: I don't consider myself a 'Master', and that's in part because it's hard to use that phrase when I see names like Bob Lee in my inbox almost every day.

All of Steve's recommendations are good, here are some others:

Read

Read blogs of the real masters for instance, Neal Gafter, Danny Coward, Peter Ahé, Brian Goetz, Cliff Click, et cetera. I work for Google so I naturally lean toward Googlers' blogs, some of which are: Jeremy Manson, Bob Lee, Jesse Wilson, Kevin Bourrillion (two 'r's, two 'l's, I know I know.).

Bruce Eckel also discusses some interesting stuff.

Read Steve Yegge's blog, if only so you know what's going on when other people talk about it.

Participate in a reading group. From a personal perspective, the thing that worked best for me was to skip participating and go straight to leading the group. Occasionally, (and this depends on the group,) you don't need to be a subject matter expert to lead a reading group, just a desire to learn a topic, and a few people who will look to you to set the group's tone. I find that the fear of embarrassing myself as an unprepared leader is usually enough to get me through a difficult book. (Except that one time when we read The Haskell School Of Expression. I make no apologies for that one.)
Read Java Puzzlers. It doesn't feel like a mastery-type book, but only because it's so much fun.
Read the Core Java books. They are invaluable references.

Get Inside

Read some of the JDK source. Do you know how ArrayList got its name? Have you tried to understand ConcurrentHashMap from the inside? Go find out.
Try to write something meta, like a class file decompiler, a custom classloader, or a debugger. You don't have to finish it, and it doesn't even have to have to be revolutionary. You're doing it for yourself so you come away with a bonafide understanding of the insides. (Personally, I started writing a decompiler in 2004. When I got hired by Google, I stopped putting time aside for it.)

Be Like A Master

Regurgitate what you have learned. You choose the format: a blog, an online document, an open-source contribution, a presentation. Heck, you can even write a book. Regurgitating the information helps you become a subject master.
Participate in Java Ranch or any other online forum, and the answer questions you find online. Answering other people's questions is a great way to reinforce your own knowledge. Of course, use it to ask questions about things you don't understand. A true master knows it's good to ask questions.

Friday, April 11, 2008

Integer.getInteger. Are you kidding me?

Photos and diary are on haitus. Now a little technology.

I just discovered a method introduced in Java 5: the method Integer.getInteger(String):

http://java.sun.com/j2se/1.5.0/docs/api/java/lang/Integer.html#getInteger(java.lang.String)

Determines the integer value of the system property with the specified name.
The first argument is treated as the name of a system property. System properties are accessible through the System.getProperty(java.lang.String) method. The string value of this property is then interpreted as an integer value and an Integer object representing this value is returned. Details of possible numeric formats can be found with the definition of getProperty.

So let me see if I understand:

Integer.valueOf(String) converts a String to a number by assuming the String is a numeric representation. In other words. Integer.valueOf("12345") yields the number 12345.
Integer.getInteger(String) converts a String to a number by assuming the String is the name of a system property numeric representation. In other words. Integer.getInteger("12345") is likely to yield null.

Why would anybody consider this a sufficient distinction? How many bugs are people going to create by using getInteger when they meant valueOf and vice versa?

This type of overloading is called near-phrase overloading. I just made that term up right now. It's when people use very similar words to mean different things. Consider two words x and y, their general meanings gm(x) and gm(y), and their meanings in a given context, cm(x) and cm(y). If

distance(gm(x), gm(y))< distance(cm(x), cm(y))

then it's a bad use of x and y! Go find another x and y for their contextual uses. Really, they could have called it getIntegerProperty.

This is the worst case of avoidable ambiguity I've seen in Java; I expect better coming out of them.

Update: it turns out there is something worse: Boolean.getBoolean("true") is usually equal to Boolean.FALSE.

Monday, February 07, 2011

Thursday, January 27, 2011

Saturday, April 10, 2010

Tuesday, January 05, 2010

The Original Plan

The Prototype

The Rewrite

Spit and Polish

Done.

Done?

Monday, November 09, 2009

Possible Solutions

1. Hacking String.split

2. Get a String tokenizer

3. Get your serving of Guava.

Avoiding the real issue

Sunday, November 08, 2009

Your Guesses

The Answer

Part 1: "".split(":")

Part 2: ":".split(":")

Friday, November 06, 2009

Thursday, October 22, 2009

Blatherberg

Help From JSR 203

The Last Word

Wednesday, October 21, 2009

What were your guesses?

What would have been my guess?

The Answer

The Explanation

Replies to some of the comments

Meta: Thoughts on writing a puzzler

Tuesday, October 20, 2009

Monday, October 19, 2009

Monday, October 12, 2009

Tuesday, September 29, 2009

Friday, July 03, 2009

Sunday, March 01, 2009

Sunday, August 10, 2008

Monday, May 26, 2008

Friday, April 11, 2008

Blog Archive

Labels