XBRL provides a very effective way of capturing certain types of data. XBRL’s built-in dimensions, such as period and units, augmented by taxonomy-defined XBRL Dimensions (XDT), provide structured data that is ripe for automated processing and analysis.

The obvious next question is, “what technology should be used to do this processing?” and the obvious answer is “XBRL is XML so let’s use XPath.” Unfortunately, life’s not that simple. XPath is great at navigating a “traditional” XML hierarchy, but one of the features of XBRL is that it doesn’t use XML in the “traditional” way. The data in an XBRL instance document is represented by a very flat XML structure, and the relationships between facts are captured using XLink hierarchies (often spread across multiple files), rather than element nesting. Both of these make processing XBRL with XPath a real challenge. One approach to the problem is to convert XBRL into a structure of XML that is more suited to processing with XPath. Mark Goodhand has discussed this approach before and it works very well in some environments, but for others we want to work with the XBRL directly.

To understand the challenges in doing this, let’s consider some really simple examples. Take the following assertion:

Income = Revenue – Costs

Let’s assume that the three things in this assertion map directly to three XBRL concepts. It’s very tempting to try and write an XPath expression for this assertion. Something like:

//Income = //Revenue - //Costs

Unfortunately, life’s not that simple. XBRL instance documents often report the same concept more than once. For example, all three concepts might be reported for 2011 and again for 2010. To a business user, the sensible thing to do is obvious: apply the assertion to the three 2011 facts, and apply it separately to the three facts for 2010. Unfortunately, in XPath, it’s rather less obvious. We need to rewrite the expression so that we only use Revenue and Costs facts with the same period as the Income fact we’re testing, and doing that in XPath turns out to be really complicated. In XBRL, the dates form part of the context, so you need to dereference the contextRef attribute, and then do a date comparision on the start and end dates. To do it properly, you need to deal with different representations of the same dates (e.g. 2010-01-01T00:00 vs 2010-01-01, or the different interpretations applied by XBRL to start dates and instant dates).

That’s just the tip of the iceberg. What if the document contains facts for different entities? In different currencies? For different regions? At this point, you have to accept that you haven’t got the right tool for the job. You have three options:

  1. Augment XPath with some external mechanism for node selection, such as custom functions, or embed the XPath within other XML structures that influence the way the expressions are evaluated.
  2. Make unjustified assumptions about equality of contextRef attributes and create something that works with some XBRL documents but not others.
  3. Take the leap to a native XBRL expression language.

(I’d hope you’d join me in discounting the second option, although sadly I have seen software that takes this approach).

We’ve investigated the approach of augmenting XPath, but the results are somewhat unsatisfactory, as you end up with much of the important information about your expression being captured outside of the XPath assertion itself, and you find yourself asking just what benefits you’re gaining from the use of XPath.

Building a new expression language from scratch is not something we undertook lightly, but we believe that it yields the best results, and thus Sphinx was born.

So what does our simple assertion above look like in Sphinx?

Income[] = Revenue[] - Costs[]

That’s it.

Sphinx’s default behaviour is to do the “lining up” that is obvious to a business user. This assertion will be applied for each period in which the facts occur. If there are multiple entities in the document, it will be applied for each of those. If your document contains a geographic breakdown of these facts, it will be applied within each region, as well as to the total.

Aside from the addition of some empty square brackets, that expression really is a pretty natural and accessible expression of the assertion we’re trying to apply. You might reasonably wonder what the square brackets are for. We describe Sphinx as a “native” XBRL expression language, because its view of the world is based on a logical, dimensional model of XBRL. The square brackets allow you to navigate that model. For example, suppose I only want Revenue in US dollars:

Revenue[unit = unit(iso4217:USD)]

Or for 2010:

Revenue[period = duration("2010-01-01", "2010-12-31")]

I also have direct access to XDT dimensions. Assuming I have a CountryAxis dimension with a UnitedKingdom member, I can get UK revenue:

Revenue[CountryAxis = UnitedKingdom]

So let’s take this one step further and look at another simple example, a roll-up by region:

Total Revenue = sum of Revenue for all Countries

In Sphinx this becomes:

Revenue[] = sum(Revenue[CountryAxis = *])

Again, all the other dimensions that aren’t mentioned explicitly, such as period, units or entity will be lined up automatically, meaning that this expression gets applied for each period, unit or entity.

As a point of detail, “CountryAxis = *” selects everything except the default, which is what we want, as the default will be the total. If for some reason I wanted to include the default, I could use “CountryAxis = **”. Once again, we can see that a native XBRL expression language gives us syntax that maps intuitively and concisely onto the data model that we’re working with.

Sphinx 1.0 was released on 22nd April 2011 after 18 months of intensive development. We built it because we had customers with real requirements that were unsolved by existing solutions, and because we believe that it’s the right answer. Simple expressions about XBRL documents should be simple to write, and complex ones should be only as difficult as they need to be.