Follow by Email

Sunday, November 08, 2009

Answer to: Java Puzzler: Splitting Hairs

Update: I fixed some small errors, and also updated the charts one last time.

This blog post contains the results and answer to the previous post, Splitting Hairs.

Given the nature of the possible answers, this is actually two puzzlers in one:
  1. How many elements come back from "".split(":"): Zero or one?
  2. How many elements come back from ":".split(":"): Zero, one or two?
Darn, I could have gotten two separate puzzlers out of this.

My Guess

Here's the important point: this is a real world case that occurred to me just the other day. Specifically, I was writing some UI code to edit and parse a colon-separated classpath, so in fact the line in question looked like this:

String[] classpathEntries = classpathField.getText().split(":");

In this case, if the UI field starts out empty, and classpathField.getText()returns an empty String. Effectively, my expectation was that with an empty classpath, I'd also get an empty array.

So my guess for "".split(":").length() is Zero.

My guess for splitting a separator would result in two empty strings on either side. So that guess is two. Put them together and you get c) 0/2.

Your Guesses

How did you folks do?

Let's look at the numbers:

An overwhelming preference for f) 1/2, and my guess, c) 0/2 in a distant second.

The Answer

If by now you haven't run the sample code, I'll tell you: the correct answer is d) 1/0. If you're shocked that you get more elements with less data, you're not alone. Hey, just look at those graphs.

Okay, let's start the analysis by reading the method signature for String.split():
public String[] split(String regex)
Ah, yes, split takes a regular expression as a delimiter, not a literal string.

Sidebar: I occasionally read about (or experience) puzzlers where someone tries splitting on a pipe character (|), which has a special meaning in regular expressions. To split on a pipe, you must use "\|" so the pattern compiler sees it as a literal character. This is not one of those puzzlers: the colon does not have a special meaning for the pattern compiler.

Part 1: "".split(":")

If you navigate through the source for String.split, you'll discover that "".split(":") is effectively the same as Pattern.compile(":").split("", 0).

Things get interesting when you read the javadoc for Pattern.split(Charsequence, int):
If this pattern does not match any subsequence of the input then the resulting array has just one element, namely the input sequence in string form.
Wha? So if the delimiter isn't found in the in the input, the original input is returned. In other words:




OK, that explains the first one, what about the second one?

Part 2: ":".split(":")

To understand what's going on, let's look again at the javadoc for Pattern.split(). In this case, n by the nature of being called by String.split(String), is 0.
If n is zero then the pattern will be applied as many times as possible, the array can have any length, and trailing empty strings will be discarded. [ emphasis added. ]
This can be corroborated with this piece of code near the end of the method:

// Construct result
int resultSize = matchList.size();
if (limit == 0)
    while (resultSize > 0 && matchList.get(resultSize-1).equals(""))

In other words, internally, it splits the input string ":" into [ "", "" ], and then removes the elements before returning to the caller, but "".split(":") doesn't get this treatment because the delimiter was never found in the input.

Does your head hurt? Mine sure hurts, and I've done the research. But here, if you're still hungry, or hate yourself, take a look at this little gem, almost worthy of its own puzzler.

System.out.println(Arrays.deepToString((String[]) ":".split(":", 2)));
System.out.println(Arrays.deepToString((String[]) ":".split(":", 1)));
System.out.println(Arrays.deepToString((String[]) ":".split(":", 0)));

Coming Soon

Like the last puzzler, this will be followed with an analysis of why this occurs, and some nice alternatives.


wimjongman said...

Hi Robert,

Nice, thanks. I guessed 0/2 as well. And all this time I was thinking that split() solved the quirks of StringTokenizer.

David Plass said...

Wait, can I change my answer?