Sphinx: making simple things simple

My last two blog posts have generated a lot of interest and discussion on the xbrl-dev mailing list. I thought it might be interesting to go back to the article that triggered the discussion of rules languages in the first place.

Charlie Hoffman posted some interesting observations on his blog about common patterns in the formulae that he was creating, noting that he could distill most of what he was doing into 10 patterns. Charlie then took the next step which was to parameterise the patterns, and create code to generate the XBRL Formulas from a simpler XML format, which he refers to as an info set.

For example, one of the patterns is a “roll forward” where a closing balance should be equal to an opening balance, plus the sum of a set of changes for the period. Charlie uses the following example with US GAAP concepts:

CashAndCashEquivalentsAtCarryingValue (end of period) = CashAndCashEquivalentsAtCarryingValue (start of period) + CashAndCashEquivalentsPeriodIncreaseDecrease (during period)

You can view the XBRL Formula for this example. As you can see, it’s quite involved. Charlie’s solution is to create a much simpler XML format that contains only the parameters for this constraint:

<BusinessRule number='11'>
  <Network href='abc-20101231.xsd#StatementOfCashFlows'>http://www.abc.com/role/StatementOfCashFlows</Network>
    <ChangeConcept operator='+'>us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease</ChangeConcept>

Charlie has code that takes the above XML and converts it to the necessary XBRL Formula, allowing users to work with a much simpler format. To quote Charlie:

Go look at the complexity of the XBRL Formula file. Then go look at the complexity of the business rules info set file. Calculate in your head how much effort it would take to teach someone to create that XBRL Formulas file. Then, think about how long it might take to explain how to create that business rules info set file.

I certainly agree with this, and Charlie’s approach certainly goes a long way to making these problems easier to solve, although it does suffer from the same problem as all other code generation approaches, specifically, that you can’t round-trip the resulting XBRL Formulas back to the simpler format. If you want to do something that isn’t covered by one of the patterns, then you’re left editing XBRL Formula, and you have to make sure that your edits don’t get overwritten if you regenerate from the input file.

The underlying problem is that XBRL Formula doesn’t make the simple things simple enough for a business user to work with. It won’t surprise you to learn that I think that Sphinx can do a better job here.

Firstly, I think the Sphinx implementation of this problem is much more accessible in the first place:

raise StatementOfCashFlowsRollForwardCheck
  d = foreach set(values us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease[]::period) 
  bop = $d::start-date
  eop = $d::end-date 
  us-gaap:CashAndCashEquivalentsAtCarryingValue[period = $eop] != 
  us-gaap:CashAndCashEquivalentsAtCarryingValue[period = $bop] + 
  us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease[period = $d]

See my previous post for an explanation of how this works. I think this sample is pretty readable as it is, but Charlie is quite right in observing that there’s a pattern here. If I were writing rules for a taxonomy like this, I’d be doing a lot of copying and pasting of code. Fortunately, Sphinx has the ability to define custom functions, so I can create a function for this pattern. Here’s what it would look like:

function roll-forward(balance, change) 
  d = foreach set(values [primary = $change]::period) 
  bop = $d::start-date
  eop = $d::end-date 
  [primary = $balance; period = $eop] != [primary = $balance; period = $bop] + [primary = $change; period = $d]

Having done this, writing the rule itself is reduced to a single function call, providing the two parameters to the pattern – the balance concept, and the change concept:

raise StatementOfCashFlowsRollForwardCheck
  roll-forward(us-gaap:CashAndCashEquivalentsAtCarryingValue, us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease)

I can create functions for other patterns in Charlie’s article, allowing authors to write rules by providing little more than the names of the concepts involved. If you find the definition of the function intimidating, that’s fine. It only has to be written once, and can then be squirreled away in a separate file in your Sphinx rulebase, so that your business users need only be concerned with the rules themselves. (And just in case you were worried, it’s trivial to extend the function to cope with multiple “change” concepts, rather than just one)

Aside from the rules being syntactically more concise than even the simplified XML, this has a few advantages over the code generation approach:

  1. The rules I write are still Sphinx expressions, meaning that I don’t need to create an additional piece of software to edit them with. I can still benefit from SpiderMonkey’s rule editing environment which provides auto-completion of QNames, and on-the-fly syntax validation.
  2. If I need to write something that’s not covered by a pattern, I don’t have to switch to some other environment, or some lower level language.

In other words, Sphinx allows us to model these patterns in a similar way to the proposed infoset, making simple things even simpler, but whilst still making the more difficult problems solvable.

Sphinx: some interesting examples

In response to my previous post, Maciej Pichocki from the IFRS Foundation posted a question:

I would be really curious to see “trivial” EPS check in Sphinx syntax (just to get started with more realistic examples of business rules application).

EarningsPerShare(reported in currency per per share) equals ProfitLoss (duration in currency) div (NoOfShares (instant beginning of period reported in shares) + NoOfShares (instant end of period reported in shares) ) div 2

It goes without saying that in a single instance I got various periods and various units.

Maciej picks an interesting example.  To be fair, a “trivial” EPS check would look more like this:

us-gaap:EarningsPerShareBasic[] == 
  us-gaap:NetIncomeLoss[] / us-gaap:WeightedAverageNumberOfSharesOutstandingBasic[]

This is simpler (and, if you’ve got a “weighted number of shares outstanding concept”, better) as it involves three concepts in the same period, so the normal lining up of periods does what we want.

The interesting bit is the units, as we’ve got three different units:

  • Currency per xbrli:shares
  • Currency
  • xbrli:shares

Sphinx’s “lining up” behaviour automatically takes into account the division operator and applies that same division to the units, so the above example really does just work: EPS in Euros/share will be calculated from Profit in Euros, and EPS in Dollars/share will be calculated from Profit in Dollars.  This is a nice benefit of the way that units are defined in XBRL, as it allows a processor to understand the relationship between the different units.

Anyway, that’s not what was asked for.  Maciej’s example uses the average of the opening and closing balances of the shares. Here it is in Sphinx:

  d = foreach set(values ProfitLoss[]::period)
  bop = $d::start-date  
  eop = $d::end-date
  EarningsPerShare[period = $d] := 
    ProfitLoss[period = $d] /
    ((NumberOfShares[period = $bop] + NumberOfShares[period = $eop]) / 2)

As you can see, the bit after the “in” maps very closely to how Maciej wrote the rule in English.

The first bit simply gets me a set of the periods for which “ProfitLoss” has been reported, and then assigns the dates at the beginning and end of that period to the variables $bop and $eop respectively.  Those variables aren’t necessary, I just used them for clarity.

Maciej requested a rule that runs on an instance with multiple periods and multiple currencies:

The above screenshot shows the result of running the rule on some sample data in our Magnify review tool.  As you can see, it includes two different currencies and two different reporting periods.  The red crosses show where the rule has flagged a failure because the reported value does not match the calculated value.

Just to take this one step further, suppose I wanted to use WeightedAverageNumberOfShares if it’s reported, but fall back on the calculated unweighted average if not.  I can introduce a function to give me the “best” option for the average number of shares:

function ANS(d) 
  if(exists(WeightedAverageNumberOfShares[period=$d])) then
    ((NumberOfShares[period = $d::start-date] + NumberOfShares[period = $d::end-date]) / 2)

and then use that in my expression:

  d = foreach set(values ProfitLoss[]::period) 
  EarningsPerShare[period = $d] := ProfitLoss[period = $d] / ANS($d)

Sphinx: An XBRL Expression Language

XBRL provides a very effective way of capturing certain types of data. XBRL’s built-in dimensions, such as period and units, augmented by taxonomy-defined XBRL Dimensions (XDT), provide structured data that is ripe for automated processing and analysis.

The obvious next question is, “what technology should be used to do this processing?” and the obvious answer is “XBRL is XML so let’s use XPath.” Unfortunately, life’s not that simple. XPath is great at navigating a “traditional” XML hierarchy, but one of the features of XBRL is that it doesn’t use XML in the “traditional” way. The data in an XBRL instance document is represented by a very flat XML structure, and the relationships between facts are captured using XLink hierarchies (often spread across multiple files), rather than element nesting. Both of these make processing XBRL with XPath a real challenge. One approach to the problem is to convert XBRL into a structure of XML that is more suited to processing with XPath. Mark Goodhand has discussed this approach before and it works very well in some environments, but for others we want to work with the XBRL directly.

To understand the challenges in doing this, let’s consider some really simple examples. Take the following assertion:

Income = Revenue – Costs

Let’s assume that the three things in this assertion map directly to three XBRL concepts. It’s very tempting to try and write an XPath expression for this assertion. Something like:

//Income = //Revenue - //Costs

Unfortunately, life’s not that simple. XBRL instance documents often report the same concept more than once. For example, all three concepts might be reported for 2011 and again for 2010. To a business user, the sensible thing to do is obvious: apply the assertion to the three 2011 facts, and apply it separately to the three facts for 2010. Unfortunately, in XPath, it’s rather less obvious. We need to rewrite the expression so that we only use Revenue and Costs facts with the same period as the Income fact we’re testing, and doing that in XPath turns out to be really complicated. In XBRL, the dates form part of the context, so you need to dereference the contextRef attribute, and then do a date comparision on the start and end dates. To do it properly, you need to deal with different representations of the same dates (e.g. 2010-01-01T00:00 vs 2010-01-01, or the different interpretations applied by XBRL to start dates and instant dates).

That’s just the tip of the iceberg. What if the document contains facts for different entities? In different currencies? For different regions? At this point, you have to accept that you haven’t got the right tool for the job. You have three options:

  1. Augment XPath with some external mechanism for node selection, such as custom functions, or embed the XPath within other XML structures that influence the way the expressions are evaluated.
  2. Make unjustified assumptions about equality of contextRef attributes and create something that works with some XBRL documents but not others.
  3. Take the leap to a native XBRL expression language.

(I’d hope you’d join me in discounting the second option, although sadly I have seen software that takes this approach).

We’ve investigated the approach of augmenting XPath, but the results are somewhat unsatisfactory, as you end up with much of the important information about your expression being captured outside of the XPath assertion itself, and you find yourself asking just what benefits you’re gaining from the use of XPath.

Building a new expression language from scratch is not something we undertook lightly, but we believe that it yields the best results, and thus Sphinx was born.

So what does our simple assertion above look like in Sphinx?

Income[] = Revenue[] - Costs[]

That’s it.

Sphinx’s default behaviour is to do the “lining up” that is obvious to a business user. This assertion will be applied for each period in which the facts occur. If there are multiple entities in the document, it will be applied for each of those. If your document contains a geographic breakdown of these facts, it will be applied within each region, as well as to the total.

Aside from the addition of some empty square brackets, that expression really is a pretty natural and accessible expression of the assertion we’re trying to apply. You might reasonably wonder what the square brackets are for. We describe Sphinx as a “native” XBRL expression language, because its view of the world is based on a logical, dimensional model of XBRL. The square brackets allow you to navigate that model. For example, suppose I only want Revenue in US dollars:

Revenue[unit = unit(iso4217:USD)]

Or for 2010:

Revenue[period = duration("2010-01-01", "2010-12-31")]

I also have direct access to XDT dimensions. Assuming I have a CountryAxis dimension with a UnitedKingdom member, I can get UK revenue:

Revenue[CountryAxis = UnitedKingdom]

So let’s take this one step further and look at another simple example, a roll-up by region:

Total Revenue = sum of Revenue for all Countries

In Sphinx this becomes:

Revenue[] = sum(Revenue[CountryAxis = *])

Again, all the other dimensions that aren’t mentioned explicitly, such as period, units or entity will be lined up automatically, meaning that this expression gets applied for each period, unit or entity.

As a point of detail, “CountryAxis = *” selects everything except the default, which is what we want, as the default will be the total. If for some reason I wanted to include the default, I could use “CountryAxis = **”. Once again, we can see that a native XBRL expression language gives us syntax that maps intuitively and concisely onto the data model that we’re working with.

Sphinx 1.0 was released on 22nd April 2011 after 18 months of intensive development. We built it because we had customers with real requirements that were unsolved by existing solutions, and because we believe that it’s the right answer. Simple expressions about XBRL documents should be simple to write, and complex ones should be only as difficult as they need to be.