More than you ever wanted to know about $$ and XPath

Posted in Articles, Development, JavaScript, Web

I took great interest in Sylvain Zimmer’s optimized version of Prototype’s $$ function, since I’ve been working on something very similar recently. Sylvain’s version uses more workmanlike string parsing; mine uses XPath. Each approach has its strengths and weaknesses.

The $$ function

For those unfamiliar with Prototype: the $$ function is used to perform DOM queries with CSS selector syntax. The concept is several years old — Simon Willison’s getElementsBySelector might be the first — and the JavaScript required to pull it off has typically been rather ugly.

Sam Stephenson’s version, $$, makes a trade‐off that some might not agree with: it sacrifices robustness and speed for elegance of code. It stuffs most of the ugliness into a single regular expression against which the CSS selector is tested; and for each atom in the selector, it adds a condition to a match function that’s built dynamically with the Function constructor.

But regex matching is slower than ordinary string matching, and the Function constructor is particularly slow — just like eval, it involves interpreting a string as JavaScript code, and the overhead of such an operation is pretty large.

Sylvain’s approach

Sylvain’s optimized version of $$ takes a different approach — one that’s simpler, albeit less elegant, than the method Prototype uses. It eschews regular expressions almost entirely, instead checking the start of the string for class or ID identifiers (. and #). It also branches a lot, weeding out the simpler cases so that it performs costly operations only when it absolutely has to. Also, in order to keep the code simple, it punts on attribute selectors, passing them off to the ordinary $$ function.

All in all, Sylvain’s version is damned good. It’s small in size, works in all browsers, and is optimized for the most common types of selectors.

The XPath approach

My version, on the other hand, uses XPath, which is supported in Firefox 1.5, Opera 9, and WebKit nightlies (though WebKit’s implementation is currently crappy, as we’ll see). It has the benefit of being scorchingly fast in Firefox and Opera — but the drawback of being totally useless to IE. On the other hand, it’s really cool, assuming you’re as much of a dork as I am.

XPath is a query language for selecting specific nodes in an XML document. In other words, it’s like CSS selector syntax, only far more robust. One of the glaring weaknesses of traditional DOM methods — the scarcity of useful ways to select nodes — will finally be addressed once the ECMAScript binding for DOM Level 3 XPath is implemented in all browsers. (Until very recently, Mozilla browsers were the only ones that supported it at all, which is why you’ve probably never seen any code that uses XPath outside of a Greasemonkey script.)

The idea of routing CSS selector queries through XPath was originally posited by Joe Hewitt (of Firebug fame), who wrote a function for converting a CSS selector to an XPath expression. Dean Edwards, picking up where Joe left off, used that function as part of an addon to Behaviour, illustrating several tactics that can be used to speed up DOM queries.

I wanted in on the sexy, bleeding‐edge, buzzword action! I, in turn, used Joe’s function as part of a patch to Prototype. The patch tests for the presence of document.evaluate and, if it is found, redefines $$ (and its cousin document.getElementsByClassName 1) to convert the CSS selector to an XPath expression and fetch the results that way. Even with the overhead of converting the selector, this approach is a hell of a lot faster than the one $$ uses.

The complicating factors

You might then ask: “Which approach is better: yours or Sylvain’s?” It depends. If you’re asking about speed alone, then you can try it yourself — I’ve adapted Sylvain’s test page to run a side‐by‐side‐by‐side comparison. (Note that if the document.evaluate check fails, $$ will not get redefined. This means that columns two and three should yield identical results in IE.)

The results vary wildly depending on your platform and/or browser, but here’s how it plays out on my machine (MacBook Pro, OS X 10.4.7, 1.83ghz Core Duo, 1.5GB RAM):

  • For the first test case, a simple ID selector, Sylvain’s version is way faster. In fact, the XPath version is the slowest of all three.
  • For the last test case, a very complex selector, the XPath version is about fifteen times faster than Sylvain’s, and about fifty times faster than Sam’s. 2
  • For the rest, it’s tough to say. In Firefox and Opera 3, the XPath version is faster in two out of the three cases. In WebKit, the XPath version gets trounced — it would appear that the JavaScriptCore implementation of XPath is horribly slow. 4

The two outliers in the results are the edge cases: one simple, one complex. There’s a case to be made for throwing them both out.

The first test case, #n1, is very fast in Sylvain’s version because it’s smart enough to realize, “Oh, I can just use document.getElementById.” IDs are privileged in the DOM API, but for XPath querying by ID is no different from querying by any other attribute. So Sylvain’s version, optimized for the simple case, wins.

It’s a bad example, though, because you probably wouldn’t describe the element that way — you’d use document.getElementById. Or, in other words: if you’re using $$("#n1") instead of $('n1'), you’re being silly.

The other outlier is a very complex selector: html body div .c2 div #n8.c4. It’s also heterogeneous — it queries by ID, by class, and by descendance. Sylvain’s version, while still much faster than Sam’s, is nonetheless sluggish. The XPath version leaves both in the dust.

But this is also a bad example. There are few situations when you’d need to query the DOM with such granularity.

So it depends.

Moral of the story

Sylvain’s version of $$ should, in my opinion, be used by anyone who’s using Prototype. Excluding selectors that match on attributes (which it can’t parse), it’s faster than Sam’s version in all cases and across all browsers.

Meanwhile, I’m convinced that XPath is going to be a major part of client‐side web apps very soon. (Assuming IE8 will support it. Gah.) The XPath version of $$ is also a lot faster than Sam’s, and in most cases is slightly faster than Sylvain’s. In addition, it handles attribute selectors with ease. But it won’t make DOM queries faster in IE. 5

(Also, consider document.getElementsByClassName, which gets a huge performance boost when XPath is used.)

I’ll probably end up using both, even though that means I’ll have three different versions of $$, because each has an upside that’s too good to pass up. I’ll be able to use $$ in my code and leave it to the browser to figure out which version it should use.

  1. Matching a class name in XPath is a little strange. You can’t simply test for equality (//*[@class='s1']): you’ll get false negatives, since that expression won’t match an element with more than one class. You can test for a substring match (//*[contains(@class, 's1')]), but you’ll get false positives, since you’ll also pick up elements with classes of ss1 and s1s. So you have to get a little hacky. Add leading and trailing spaces to the class name, then check for a substring with leading and trailing spaces: //*[contains(concat(' ', @class, ' '), ' s1 ')]. It’s nuts, but it works. Thank god for XPath’s string manipulation functions.
  2. The mixed performance results of Sylvain’s and my versions can be attributed to differences in computational complexity. For the simple case, a string parser is very fast. But as the selector gets more complex, the cost of parsing that string increases very quickly, whereas the cost of an XPath query hardly increases at all. In other words: for XPath, //*[@id='n1']//*[contains(concat(' ', @class, ' '), ' c2 ')][contains(concat(' ', @class, ' '), ' c4 ')] is only a little slower than //*[@id='n1'].
  3. Opera’s XPath parser is, shall we say, fickle. Open this test page in Opera 9 and you’ll see that three different expressions — equivalent within the scope of this page — return different results. This makes no sense. I filed a bug.
  4. Seriously — what the hell is going on here? Am I doing something wrong? Can anyone explain these results to me? I did a separate test for WebKit wherein I iterated an XPath query directly, just to make sure the CSS→XPath conversion wasn’t to blame, but it didn’t shave off much time. It’s so slow that it makes DOM methods look fast by comparison. I really hope they improve the performance before the next Safari release, or else I’ll have to filter WebKit out manually so it gets Sylvain’s version.
  5. But, as Dean explains, IE has its own trick for speeding up CSS selector queries, albeit only on page load. (Perhaps Justin Palmer can integrate these approaches into event:Selectors.)