Taxonomy Packages

Many of the standards that we deal with in the XBRL world are fearsomely complicated, take years to develop, and enable new and exciting ways of working.

This post is about a proposed standard that is very simple, took only a few hours to develop and which is just intended to make working with XBRL that little bit easier.

Taxonomies are a key part of XBRL. They typically consist of many files, hosted on a website somewhere, which are then referenced by the instance documents or extension taxonomies that use them. This creates two practical problems for people working with taxonomies.

Problem 1: Finding the Entry Points

Over time taxonomies have become increasingly complicated, and modular taxonomies consisting of tens, if not hundreds of files have now become the norm. In such a modular taxonomy, only a handful of those files are typically considered to be “entry points”, that is, files from which you would start the DTS discovery process.

For example, the full 2009 UK GAAP, IFRS, Banking and Charities taxonomies ZIP file consists of 603 files, but contains just four primary entry points. These are described in Word documents included in the ZIP file, which means in order to start working with the taxonomy I need to:

  • Unpack the ZIP file
  • Open and read the Word documents to decide which file inside the ZIP I actually need
  • Open my XBRL software, browse to where I unpacked the ZIP file, and then browse to the right file inside the ZIP.

Wouldn’t life be just that little bit easier if I could just point my XBRL software at the ZIP, be presented with a list of the four entry points (with sensible, human readable descriptions), and then just open what I wanted? Something more like this:

Problem 2: Offline working

XBRL taxonomies are typically published on publicly available web servers, and then referenced by instance documents using an absolute URL. An XBRL processor consuming such a document will then follow the URL and download the files that make up the taxonomy as required. This creates two potential issues. Firstly, it means that you need an internet connection in order to process the document. Secondly, taxonomies are big (the UK taxonomies are made up of over 50MB of XML files) so you need a fast internet connection.

In order to support offline work, and to improve performance, you really want to be working with offline copies of taxonomies, rather than constantly downloading them from the web. Most XBRL software already provides some mechanism for working with offline copies of taxonomies.

At its simplest, software can just cache copies of taxonomies as it uses them, although that means that you’ve got to use it once before it becomes available for offline use, and the cache may be subject to an expiry policy to limit its size. In many cases it’s desirable to control explicitly which taxonomies are going to be stored locally, but this is often cumbersome to configure as you need to provide not only a copy of the taxonomy but also a “remapping” or “redirection” that specifies what public locations should be remapped to your local copy.

Wouldn’t life be just that little bit easier if I could just give my XBRL software a ZIP of the taxonomy, and it would configure itself for offline use, so that instance documents referencing that taxonomy would then “just work”?

The solution: Taxonomy Packages

Taxonomy Packages are a simple solution to the above problems that require only a minimal change in the way that taxonomies are currently distributed. Most taxonomies are already made available as ZIP files, containing all the files that make up the taxonomy. A Taxonomy Package is simply a ZIP file with an extra XML file dropped into it.

The XML file, called .taxonomyPackage.xml, provides a list of the entry points within the taxonomy, along with names and descriptions. The .taxonomyPackage.xml file also contains generic name, description and version meta-data about the taxonomy as a whole, enabling taxonomy distributions to be self-documenting. All names and descriptions have support for multi-language alternatives.

The other component of the .taxonomyPackage.xml is a set of remappings, that allow the contents of the ZIP file to be treated as if they were hosted at an internet location. At its simplest, a remapping could take the form of remapping the prefix “” to a directory within the ZIP file. This tells a processor that every time it encounters a URL starting with “” it should try to resolve it to an equivalently named file within the taxonomy package ZIP.

The format of the .taxonomyPackage.xml file has been kept as simple as possible. We’ve published some samples, and of course, a schema for the file format:

We will be publishing a simple spec on the details of how these files are to be processed just as soon as we’ve had a chance to write them down.

Working with Taxonomy Packages

As you will see from the comment at the top of the schema, we’re making this available under a Creative Commons licence that allows free use of the format (including for commercial purposes). Our hope is that the XBRL community will agree that this is a simple solution to a simple problem, and if we adopt a common solution then XBRL will become that little bit easier to work with, and just a little bit less intimidating for end users.

We’re actively introducing support for taxonomy packages into our products. The recent Magnify and SpiderMonkey 1.27 releases have support for opening packages, and SpiderMonkey 1.27 also has support for creating them.

If you would like more information on taxonomy packages, or would like to see a taxonomy package sample for your taxonomy, please drop me an email.

Sphinx: making simple things simple

My last two blog posts have generated a lot of interest and discussion on the xbrl-dev mailing list. I thought it might be interesting to go back to the article that triggered the discussion of rules languages in the first place.

Charlie Hoffman posted some interesting observations on his blog about common patterns in the formulae that he was creating, noting that he could distill most of what he was doing into 10 patterns. Charlie then took the next step which was to parameterise the patterns, and create code to generate the XBRL Formulas from a simpler XML format, which he refers to as an info set.

For example, one of the patterns is a “roll forward” where a closing balance should be equal to an opening balance, plus the sum of a set of changes for the period. Charlie uses the following example with US GAAP concepts:

CashAndCashEquivalentsAtCarryingValue (end of period) = CashAndCashEquivalentsAtCarryingValue (start of period) + CashAndCashEquivalentsPeriodIncreaseDecrease (during period)

You can view the XBRL Formula for this example. As you can see, it’s quite involved. Charlie’s solution is to create a much simpler XML format that contains only the parameters for this constraint:

<BusinessRule number='11'>
  <Network href='abc-20101231.xsd#StatementOfCashFlows'></Network>
    <ChangeConcept operator='+'>us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease</ChangeConcept>

Charlie has code that takes the above XML and converts it to the necessary XBRL Formula, allowing users to work with a much simpler format. To quote Charlie:

Go look at the complexity of the XBRL Formula file. Then go look at the complexity of the business rules info set file. Calculate in your head how much effort it would take to teach someone to create that XBRL Formulas file. Then, think about how long it might take to explain how to create that business rules info set file.

I certainly agree with this, and Charlie’s approach certainly goes a long way to making these problems easier to solve, although it does suffer from the same problem as all other code generation approaches, specifically, that you can’t round-trip the resulting XBRL Formulas back to the simpler format. If you want to do something that isn’t covered by one of the patterns, then you’re left editing XBRL Formula, and you have to make sure that your edits don’t get overwritten if you regenerate from the input file.

The underlying problem is that XBRL Formula doesn’t make the simple things simple enough for a business user to work with. It won’t surprise you to learn that I think that Sphinx can do a better job here.

Firstly, I think the Sphinx implementation of this problem is much more accessible in the first place:

raise StatementOfCashFlowsRollForwardCheck
  d = foreach set(values us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease[]::period) 
  bop = $d::start-date
  eop = $d::end-date 
  us-gaap:CashAndCashEquivalentsAtCarryingValue[period = $eop] != 
  us-gaap:CashAndCashEquivalentsAtCarryingValue[period = $bop] + 
  us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease[period = $d]

See my previous post for an explanation of how this works. I think this sample is pretty readable as it is, but Charlie is quite right in observing that there’s a pattern here. If I were writing rules for a taxonomy like this, I’d be doing a lot of copying and pasting of code. Fortunately, Sphinx has the ability to define custom functions, so I can create a function for this pattern. Here’s what it would look like:

function roll-forward(balance, change) 
  d = foreach set(values [primary = $change]::period) 
  bop = $d::start-date
  eop = $d::end-date 
  [primary = $balance; period = $eop] != [primary = $balance; period = $bop] + [primary = $change; period = $d]

Having done this, writing the rule itself is reduced to a single function call, providing the two parameters to the pattern – the balance concept, and the change concept:

raise StatementOfCashFlowsRollForwardCheck
  roll-forward(us-gaap:CashAndCashEquivalentsAtCarryingValue, us-gaap:CashAndCashEquivalentsPeriodIncreaseDecrease)

I can create functions for other patterns in Charlie’s article, allowing authors to write rules by providing little more than the names of the concepts involved. If you find the definition of the function intimidating, that’s fine. It only has to be written once, and can then be squirreled away in a separate file in your Sphinx rulebase, so that your business users need only be concerned with the rules themselves. (And just in case you were worried, it’s trivial to extend the function to cope with multiple “change” concepts, rather than just one)

Aside from the rules being syntactically more concise than even the simplified XML, this has a few advantages over the code generation approach:

  1. The rules I write are still Sphinx expressions, meaning that I don’t need to create an additional piece of software to edit them with. I can still benefit from SpiderMonkey’s rule editing environment which provides auto-completion of QNames, and on-the-fly syntax validation.
  2. If I need to write something that’s not covered by a pattern, I don’t have to switch to some other environment, or some lower level language.

In other words, Sphinx allows us to model these patterns in a similar way to the proposed infoset, making simple things even simpler, but whilst still making the more difficult problems solvable.

Sphinx: some interesting examples

In response to my previous post, Maciej Pichocki from the IFRS Foundation posted a question:

I would be really curious to see “trivial” EPS check in Sphinx syntax (just to get started with more realistic examples of business rules application).

EarningsPerShare(reported in currency per per share) equals ProfitLoss (duration in currency) div (NoOfShares (instant beginning of period reported in shares) + NoOfShares (instant end of period reported in shares) ) div 2

It goes without saying that in a single instance I got various periods and various units.

Maciej picks an interesting example.  To be fair, a “trivial” EPS check would look more like this:

us-gaap:EarningsPerShareBasic[] == 
  us-gaap:NetIncomeLoss[] / us-gaap:WeightedAverageNumberOfSharesOutstandingBasic[]

This is simpler (and, if you’ve got a “weighted number of shares outstanding concept”, better) as it involves three concepts in the same period, so the normal lining up of periods does what we want.

The interesting bit is the units, as we’ve got three different units:

  • Currency per xbrli:shares
  • Currency
  • xbrli:shares

Sphinx’s “lining up” behaviour automatically takes into account the division operator and applies that same division to the units, so the above example really does just work: EPS in Euros/share will be calculated from Profit in Euros, and EPS in Dollars/share will be calculated from Profit in Dollars.  This is a nice benefit of the way that units are defined in XBRL, as it allows a processor to understand the relationship between the different units.

Anyway, that’s not what was asked for.  Maciej’s example uses the average of the opening and closing balances of the shares. Here it is in Sphinx:

  d = foreach set(values ProfitLoss[]::period)
  bop = $d::start-date  
  eop = $d::end-date
  EarningsPerShare[period = $d] := 
    ProfitLoss[period = $d] /
    ((NumberOfShares[period = $bop] + NumberOfShares[period = $eop]) / 2)

As you can see, the bit after the “in” maps very closely to how Maciej wrote the rule in English.

The first bit simply gets me a set of the periods for which “ProfitLoss” has been reported, and then assigns the dates at the beginning and end of that period to the variables $bop and $eop respectively.  Those variables aren’t necessary, I just used them for clarity.

Maciej requested a rule that runs on an instance with multiple periods and multiple currencies:

The above screenshot shows the result of running the rule on some sample data in our Magnify review tool.  As you can see, it includes two different currencies and two different reporting periods.  The red crosses show where the rule has flagged a failure because the reported value does not match the calculated value.

Just to take this one step further, suppose I wanted to use WeightedAverageNumberOfShares if it’s reported, but fall back on the calculated unweighted average if not.  I can introduce a function to give me the “best” option for the average number of shares:

function ANS(d) 
  if(exists(WeightedAverageNumberOfShares[period=$d])) then
    ((NumberOfShares[period = $d::start-date] + NumberOfShares[period = $d::end-date]) / 2)

and then use that in my expression:

  d = foreach set(values ProfitLoss[]::period) 
  EarningsPerShare[period = $d] := ProfitLoss[period = $d] / ANS($d)

Sphinx: An XBRL Expression Language

XBRL provides a very effective way of capturing certain types of data. XBRL’s built-in dimensions, such as period and units, augmented by taxonomy-defined XBRL Dimensions (XDT), provide structured data that is ripe for automated processing and analysis.

The obvious next question is, “what technology should be used to do this processing?” and the obvious answer is “XBRL is XML so let’s use XPath.” Unfortunately, life’s not that simple. XPath is great at navigating a “traditional” XML hierarchy, but one of the features of XBRL is that it doesn’t use XML in the “traditional” way. The data in an XBRL instance document is represented by a very flat XML structure, and the relationships between facts are captured using XLink hierarchies (often spread across multiple files), rather than element nesting. Both of these make processing XBRL with XPath a real challenge. One approach to the problem is to convert XBRL into a structure of XML that is more suited to processing with XPath. Mark Goodhand has discussed this approach before and it works very well in some environments, but for others we want to work with the XBRL directly.

To understand the challenges in doing this, let’s consider some really simple examples. Take the following assertion:

Income = Revenue – Costs

Let’s assume that the three things in this assertion map directly to three XBRL concepts. It’s very tempting to try and write an XPath expression for this assertion. Something like:

//Income = //Revenue - //Costs

Unfortunately, life’s not that simple. XBRL instance documents often report the same concept more than once. For example, all three concepts might be reported for 2011 and again for 2010. To a business user, the sensible thing to do is obvious: apply the assertion to the three 2011 facts, and apply it separately to the three facts for 2010. Unfortunately, in XPath, it’s rather less obvious. We need to rewrite the expression so that we only use Revenue and Costs facts with the same period as the Income fact we’re testing, and doing that in XPath turns out to be really complicated. In XBRL, the dates form part of the context, so you need to dereference the contextRef attribute, and then do a date comparision on the start and end dates. To do it properly, you need to deal with different representations of the same dates (e.g. 2010-01-01T00:00 vs 2010-01-01, or the different interpretations applied by XBRL to start dates and instant dates).

That’s just the tip of the iceberg. What if the document contains facts for different entities? In different currencies? For different regions? At this point, you have to accept that you haven’t got the right tool for the job. You have three options:

  1. Augment XPath with some external mechanism for node selection, such as custom functions, or embed the XPath within other XML structures that influence the way the expressions are evaluated.
  2. Make unjustified assumptions about equality of contextRef attributes and create something that works with some XBRL documents but not others.
  3. Take the leap to a native XBRL expression language.

(I’d hope you’d join me in discounting the second option, although sadly I have seen software that takes this approach).

We’ve investigated the approach of augmenting XPath, but the results are somewhat unsatisfactory, as you end up with much of the important information about your expression being captured outside of the XPath assertion itself, and you find yourself asking just what benefits you’re gaining from the use of XPath.

Building a new expression language from scratch is not something we undertook lightly, but we believe that it yields the best results, and thus Sphinx was born.

So what does our simple assertion above look like in Sphinx?

Income[] = Revenue[] - Costs[]

That’s it.

Sphinx’s default behaviour is to do the “lining up” that is obvious to a business user. This assertion will be applied for each period in which the facts occur. If there are multiple entities in the document, it will be applied for each of those. If your document contains a geographic breakdown of these facts, it will be applied within each region, as well as to the total.

Aside from the addition of some empty square brackets, that expression really is a pretty natural and accessible expression of the assertion we’re trying to apply. You might reasonably wonder what the square brackets are for. We describe Sphinx as a “native” XBRL expression language, because its view of the world is based on a logical, dimensional model of XBRL. The square brackets allow you to navigate that model. For example, suppose I only want Revenue in US dollars:

Revenue[unit = unit(iso4217:USD)]

Or for 2010:

Revenue[period = duration("2010-01-01", "2010-12-31")]

I also have direct access to XDT dimensions. Assuming I have a CountryAxis dimension with a UnitedKingdom member, I can get UK revenue:

Revenue[CountryAxis = UnitedKingdom]

So let’s take this one step further and look at another simple example, a roll-up by region:

Total Revenue = sum of Revenue for all Countries

In Sphinx this becomes:

Revenue[] = sum(Revenue[CountryAxis = *])

Again, all the other dimensions that aren’t mentioned explicitly, such as period, units or entity will be lined up automatically, meaning that this expression gets applied for each period, unit or entity.

As a point of detail, “CountryAxis = *” selects everything except the default, which is what we want, as the default will be the total. If for some reason I wanted to include the default, I could use “CountryAxis = **”. Once again, we can see that a native XBRL expression language gives us syntax that maps intuitively and concisely onto the data model that we’re working with.

Sphinx 1.0 was released on 22nd April 2011 after 18 months of intensive development. We built it because we had customers with real requirements that were unsolved by existing solutions, and because we believe that it’s the right answer. Simple expressions about XBRL documents should be simple to write, and complex ones should be only as difficult as they need to be.


Great new resources for XBRL

I’ve been meaning to put some effort into introductory XBRL materials to place on this site, but we have been busy. Now it looks like Josef MacDonald’s team over at the International Accounting Standards Committee Foundation (IASCF) are producing such good quality stuff that we won’t need to. The diagram is an adaptation of a slide that Walter created years ago and that has become a bit of a staple for XBRL International talks. But this is heaps better! And the glossary is terrific. Congratulations to everyone involved.

How XBRL differs from other XML standards

Developers familiar with XML standards often look at XBRL and ask the same questions:

  • XBRL looks different: how does it differ?
  • Why does XBRL differ?

In this article, Steve Baker, Director of Product Development, explains how and why XBRL differs from most other XML standards.

Check back for later articles on how XBRL works and how to work with it.

XBRL is a framework, not an implementation

Most XML standards provide a schema that defines fairly tightly what structures and datatypes are allowed. If a standard is extensible, it is only in defined places and often in only narrow ways.

XBRL is different: an XBRL report is defined to contain contexts, units and concepts but the XBRL standard does not provide any concrete concepts one can use. Instead, XBRL is defined in terms of the two types of concept: items and tuples. Concrete items and tuples are provided by XBRL taxonomies.

By leaving items and tuples to be defined by domain experts in taxonomies, XBRL provides a great way to model typical business reports in a uniform way, while allowing preparers to report whatever they need to.

XBRL models tables of data

A typical business report will have a column of labels on the left, indented to indicate structure. A table is formed by providing the data for those headings for each of several periods in time: the rows are the facts, the columns are the different periods or instants. The units of the report are given, as is the name of the business entity to which the facts refer. Additionally, footnotes are added, and certain facts typically sum to subtotals.

Sketch these features in a spreadsheet and hand them to a dozen XML gurus and you will get at least a dozen different designs. And remember we didn’t tell the gurus what we would actually be reporting!

XBRL’s big win is that it takes these typical features and gives us one design so we can move forward.

XBRL reports are defined in several parts

Where a schema is usually sufficient, XBRL requires a definition in several parts:

  • The general structure of an XBRL report is defined by the schema for XBRL.
  • The concepts which may be used in a report – the items and tuples – and any datatypes are defined in a taxonomy schema.
  • Relationships between concepts and about concepts are stored in taxonomy linkbases for flexibility:
    • The label linkbase gives various human-readable labels for concepts, often in several languages.
    • The reference linkbase points to authoritative documentation for concepts.
    • Hints about how to present concepts in relation to one another – those indents in the business report – are provided by the presentation linkbase.
    • Statements about how concepts should sum to other concepts are provided by the calculation linkbase.
    • Miscellaneous – and infrequently used! – relationships are given in the definition linkbase.

The way these components of the definition are brought together varies, but typically a report points to a taxonomy schema and the taxonomy schema points to all the other components. Taxonomies often point to other taxonomies.

XBRL uses some uncommon XML technologies and techniques

In addition to the now run-of-the-mill XML technologies like namespaces and XML Schema, XBRL has some uncommon features.

Nothing could work without substitution groups. The schema for XBRL reports says, “you can put items here” but it makes items abstract: you can never use an item directly. Taxonomies declare concrete items that can be substituted wherever XBRL allows an XBRL item. Don’t worry: we’ll be coming back to this in a later article!

A quite arcane part of the landscape of XML standards is XLink. XLink allows the definition of arbitrary networks, or graphs, of relationships between things of any type. XBRL takes XLink and applies it to good effect: all the linkbases use XLink to relate concepts to one another and to other information.

IDs and IDREFs have always been available in XML, but in XBRL they are indispensible. Taxonomy schemas give every concept an ID to which links can refer out of the linkbases. Footnotes use IDs on reported facts with IDREFs to add that extra information to the report.

XBRL is typically quite flat

No doubt you are used to seeing deeply-nested elements in XML documents: tags within tags within tags. In XBRL, reports are usually quite flat. Most of the structure is in the linkbases.

XBRL is not completely flat. Contexts use structure to lay out entity, period, segment and scenario information and units are structured. Tuples can contain other concepts – that is, items or tuples – so they can reintroduce as much structure as is necessary.

But why?!

It’s all about extensibility:

  • Allowing multiple accounting standards within the same framework.
  • Allowing extensions to accounting standards by adding new concepts with new labels and references while changing relationships.
  • Removing concepts from presentation views.
  • Removing concepts from calculations.
  • Permitting labels for concepts in new languages.
  • Inserting new concepts into a report, changing the existing presentation and calculation relationships.

As you can imagine, flexibility and extensibility are available in XBRL to a degree unseen in most standards. Thankfully, all this change happens within the same defined framework, so tools and applications can still get on with the processing.

Further reading

Check back for further articles as we explain how XBRL works in detail. You can also see our paper XML Flattened.

Too many tags? No!

There are a few commentators that feel that XBRL contains “too many tags”. I think they are wrong… XBRL models a complex human system (accounting) with lots of small components (disclosure rules).

So, a number of these “too many tags” kind of comments are creeping into articles being printed about XBRL. Interesting. Fundamentally flawed, but interesting. In particular, I mean this part:

Industry commentators have said that too many taxonomies have been created within XBRL, making it too complicated, time-consuming and costly for companies to use in its current guise. From Kevin Reid in Accountancy Age.

The unnamed commentators are usually one of a few usual suspects who have just come at this technology from the wrong angle.

Do this: download a financial statement from a leading company, listed on the stock-exchange of your choice. Count the number of different accounting disclosures.

Just at random, I’ve downloaded the annual report from BP, admittedly a company with a reputation for quality and quantity in their investor relations. The Excel file available from here contains just the financials, required accounting policies and certain non-GAAP statistics about the company’s exploration activities.

Sixty-eight pages of performance information. At a quick glance, there are at least 20 different concepts on each page, 1400 different concepts.

Companies don’t go to the expense of publishing this many performance concepts unless they have to (it’s a requirement of the accounting standards they work with) or they want to (they believe that their investor relations message will be enhanced or made clearer). Accounting authorities don’t include reporting concepts in their accounting standards that are unnecessary or useless. So those (at least) 1400 concepts published by BP make up a package of information that either the company itself, or the accounting authorities, consider will be analyzed by market participants.

The point about XBRL is that each of those 1400 concepts either already has been, or can be, encapsulated in a taxonomy. Making comparison between companies a task that can be largely automated, or, just as importantly, making comparisons of a single company across time, something that can be dealt with by computers, instead of people. Freeing up people to think about what those comparisons mean, instead of manipulating spreadsheets and summary databases, (or worse) retyping 1400 concepts before being able to make those comparisons at all. Those comparisons are only possible through the use of taxonomies. They define what each disclosure means, link them to related disclosures, determine how they are calculated, impose validation rules around them and add references to relevant authoritative literature.

The number of taxonomies that exist reflects the diversity of economic life. The US has a different accounting framework to that used in Europe. Oil and Gas companies need to disclose different performance measures to those that matter in Aerospace. BP has different investor relations objectives to Shell. So you just can’t get away from the fact that XBRL has lots of taxonomies, and each of those taxonomies contains lots of tags. But are there too many? Nope. Just the number in use in corporate life in the naughties. Is it too hard for companies to use? Nope. While you wouldn’t want to approach it from first principles (that specification is a nasty read), implementation right now means using tools that others have already built. Most companies just want to be able to publish performance data in a format that is an unambiguous interpretation of their financial or business performance. For them, the task amounts to:

  • finding the right taxonomies,
  • matching the right tags to their performance measures; and
  • publishing the right values inside those tags, together with a few other bits of key information, like which company, date and currency that disclosure relates to.

There are a whole bunch of tools (granted not as many as some of us would like), and some talented people, that can help them do that.

The task of XBRL has never been to change the way that accounting works… and that is the only way you would reduce the number of tags. Technology usually models human systems. In fact it tends to fail when it tries to change them. And the accounting system, while venerable, is both highly regulated and critical to the global economy. XBRL is just a better way of communicating performance.