A simple law of numbers to identify fraud… Benford’s Analyser

By Ben Russell,

Specialist Sales Consultant

Chair of Taxonomy Architecture Guidance Task Force, XBRL International

As part of our drive to showcase how innovative apps can be built on the True North® Data Platform, we’ve released a new at-a-glance fraud indicator tool. And it’s free on our website!

The design team behind this app were inspired by the principles of Benford’s law. This is an observation about the frequency distribution of leading digits in many real-life sets of numerical data, including financial accounts. In data that follows Benford’s law, leading digits are more likely to be low than high. So, given a large enough sample of financial data, numbers will start with a 9 on average 5% of the time, whereas numbers will start with a 1 more than 30% of the time.


This is an example of a good fit to Benford’s law:

To show this in action, the team created Benford’s Analyser app, a quick, high level check that can be used to help detect fraud through analysis of the numbers in financial statements. The free version allows you to apply the rule to leading digits in the latest filing for companies that submit annual and quarterly accounts to the Securities Exchange Commission (SEC).

As a statistical check, those using Benford’s law need to be aware of the implications of the results. Firstly, small sets of numbers are unsuitable for statistical analysis; secondly, a bad fit is not an indicator that there is fraud, rather that the numbers are not what would typically be expected. There are many company-specific reasons why this might be the case, of which fraud is just one. Those regulators, government agencies and auditors using the law may use this to decide where to look further.

As shown in the images below, Benford’s Analyser creates a chart from filings that shows the anticipated distribution of first digits – based on Benford’s law – using the numbers reported in the company’s quarterly and annual financial statements.


This is an example where the numbers do not fit Benford’s law:

In this platform tool, a chi-square value above 20.09 indicates that the numbers are only one per cent likely to appear in a normal set of accounts. The value shown (61.32) needs investigating to establish if fraud has taken place.

This is an example where there were too few numbers in the financial statement to apply this type of statistical analysis:

We’ve already had some great feedback for Benford’s Analyser from our latest rounds of innovation and customer outreach. Users were impressed with the immediacy of the model in creating tables for the filings under investigation. Why not have a go via the link below?


Tagged financial data, such as XBRL reports, has become more readily available. CoreFiling is responding. We’re listening to our customers, bringing other automated checks to the market.

This is an exciting innovation and a testament to the creativity of the designers at CoreFiling.


XBRL accounting taxonomy design and categorisation – Part 3: Coherence

In this series of articles, we propose a categorisation of taxonomies based on different aspects of their design.  Using this categorisation we look at the evolution of taxonomy design through three generations.

What is taxonomy coherence?

Our dictionary provides two definitions for ‘coherence’: the quality of being logical and consistent; and the quality of forming a unified whole. Both should apply to the architecture and design of XBRL taxonomies.

We said in the introductory article of this series that by coherent we meant a taxonomy that “hangs together” to produce consistent and comparable instance documents. That’s actually extending the concept beyond the taxonomy itself to the instance documents that can be created with it, but if we can’t guarantee to produce documents with those qualities then what’s the point of a coherent taxonomy?

Hypercubes and dimensions

One useful way of comparing the coherence of taxonomies is demonstrated by the graph below, which plots dimensions per hypercube for each of the three taxonomies we are examining (note that both scales are logarithmic):

The number of dimensions per hypercube is a good measure of how extensively the taxonomy uses dimensional data modelling to provide a unified data model focused around the relationship between concepts and the data aspects that apply to them.

The difference between US GAAP (green) and IFRS (red) is simply one of magnitude – the US GAAP taxonomy (345 hypercubes, 272 unique dimensions) is approximately three times larger than the IFRS taxonomy (112 hypercubes, 113 unique dimensions). However their ‘dimensions per hypercube’ profiles are very similar, starting at a peak with one-dimensional hypercubes and diminishing quickly as the number of dimensions increases. There is a large majority (70%+) of hypercubes with one and two dimensions in both taxonomies.

The profile for UK FRS (purple) is strikingly different. There is just one hypercube with one dimension, and only ten hypercubes with two dimensions. This suggests a radically different approach to the design of the taxonomy (212 hypercubes, 115 unique dimensions) and in particular the use of hypercubes to represent highly-dimensional data.

The graph implies that, for US GAAP and IFRS, most concepts have been modelled in isolation with a small number of specialised dimensions. In contrast, the UK FRS taxonomy has been modelled comprehensively as a whole, with widely applicable dimensions being applied across numerous relevant concepts.

We will now explore in more detail the reasons behind the differences between the taxonomies.

The IFRS taxonomy

We’ve already seen that the architectural underpinnings of the IFRS taxonomy are derived from the International Financial Reporting Standards themselves.

The chief consequence of this on the design of the IFRS taxonomy is that users of it are at liberty to interpret the framework it provides very broadly. This can be to the detriment of instance document consistency and comparability.

The IFRS taxonomy is intended to act as a foundation for electronic reporting regimes in IFRS-using jurisdictions around the world. The primary ‘users’ of the taxonomy in this case are most likely taxonomy architects tasked with creating extended versions of the IFRS taxonomy suitable for local reporting purposes. This means that there is a considerable effort required on the part of the extension architects to implement a level of consistency on top of the IFRS taxonomy itself.

This “standards-first” approach shows itself in that the IFRS taxonomy has the lowest average number of dimensions per hypercube of the three taxonomies we’re examining, at just under two. This is surely the result of attempting to model the low-dimension, presentation-oriented tables commonly seen in standards documents and in the corresponding financial reports. The taxonomy also has dimensions not associated with any hypercube; and some reportable concepts are not associated with any hypercube.

By way of a small but illustrative example, consider the IFRS Earnings per share hypercube (table) which has six separate primary items and a single dimension.

Primary items:

• Basic and Diluted earnings (loss) per share from continuing operations (2 items)

 • Basic and Diluted earnings (loss) per share from discontinued operations (2 items)

• Totals for both Basic and Diluted earnings (loss) per share (2 items)


• Classes of Ordinary Shares

There is also a “floating” dimension (axis) not associated with any hypercube – Continuing and discontinued operations – for breaking down continuing versus discontinued operations, which wasn’t utilized in the Earnings per share hypercube. Had it been, the number of Earnings per share primary items in the structure could have been reduced from six to two (Basic and Diluted earnings concepts for each of continuing, discontinued and (default) total dimension members). This demonstrates that, if a data-centric approach had been taken to model the taxonomy, it would have simplified and improved its coherence.

In summary, the IFRS taxonomy is not as coherent as it might be, and that impacts the consistency and comparability of instance documents created to adhere to it.

The IFRS taxonomy

The US GAAP taxonomy is nearly three times the size of the IFRS taxonomy in nearly all respects, but it associates all dimensions with hypercubes, even if around one third of all reportable concepts are not associated with any hypercube. This is an approach with greater consistency than that of the IFRS taxonomy, dimensionally-speaking, even though it provides far too many degrees of freedom to instance document preparers when it comes to these “free” reportable concepts, leading to documents that may not be entirely consistent with each other or wholly comparable.

Interestingly, however, it is on a par with the IFRS taxonomy in one important respect: the average number of dimensions per hypercube is only slightly larger, at just over two. This suggests that the hypercubes (or tables in US vernacular) in the taxonomy are primarily modelling the kinds of two-dimensional tabular presentations (for human consumption!) that one might see in a financial report or defined in an accounting standard (e.g. an axis of ‘concepts’ plus one or two dimensional breakdowns).

The “document-centric” approach of US GAAP therefore tends to produce a taxonomy design that yields data structures that tend to represent the conventional tabular presentations prescribed or presented as exemplars in standards documents and in common usage by preparers of financial statements.

The rigorous architectural underpinnings of the US GAAP taxonomy have resulted in a coherent taxonomy design, although one that does not lend itself to ensuring similar consistency in instance documents, particularly due to the usage of filer taxonomy extensions, as we will discuss in a forthcoming blog.

The UK FRS taxonomy

The average number of dimensions per hypercube in the UK FRS taxonomy is much higher than either IFRS or US GAAP at just over eight. This is a key indicator of a radically different architectural approach in which data modelling has taken centre-stage. As if to emphasise this, all reportable concepts belong to a hypercube, which is a very strong indicator from the taxonomy’s architect to instance document preparers of what is expected of them. There is a coherent dimensional framework in which each and every reportable concept unambiguously sits.

The result is a collection of highly-dimensional hypercubes tightly bound to reportable concepts. With judicious use of dimensions with default members, in the main, the “tagging” task for any given reportable concept is not onerous, but at the same time the full expressive power of the hypercubes can be brought to bear when the need arises. Reportable concepts are only valid in certain well-defined circumstances, and those hypercubes have been equipped with all the necessary dimensions, whether they are actually needed in any given circumstance or not.

The UK FRS taxonomy design is based on a thorough analysis of the data that is required to be conveyed by financial statements. This results in a more coherent data model since all the potential aspects (or “dimensions”) of an item of data can be considered holistically and independently of any traditional or prescribed presentation requirements. In this approach, the typical presentations’ one- and two-dimensional hypercubes (tables) tend to be represented by one- and two-dimensional “slices” through higher-dimensional structures, and there is little or no need for preparers to expand the existing hypercubes – something that can only be achieved via entity-specific taxonomy extensions.

The coherence of the taxonomy naturally assures the consistency and comparability of instance documents. It is a consequence of the taxonomy’s design that places no unreasonable demands on the ingenuity of taxonomy extenders or instance document preparers.


We have seen how the different choices in taxonomy design have influenced the coherence of the taxonomies under study and how this is illustrated by the dimension-per-hypercube metric.

In general, a coherent taxonomy should have a complete, consistent data model with full hypercube coverage, broadly-applicable dimensions and no unnecessary duplication either for dimensions or concepts. We have seen that these goals are most readily achieved by taking a data-model-first approach to taxonomy design. A coherent taxonomy leads to clear, unambiguous tagging and therefore clear, comparable instance documents with less opportunity for error.

If a taxonomy has a less extensive dimensional model, this requires extenders and/or instance document preparers to provide more interpretation of the taxonomy and to work significantly harder to produce consistent, comparable instance documents. This is by no means impossible, but some of the burden has been transferred from the taxonomy authors to taxonomy extenders and/or instance document preparers, who are less able to produce coherent, comparable data if they’re not equipped with the tools to do so.

In the next blog post we’ll cover taxonomy extensibility.

Artificial Intelligence: Applications Today, Not Tomorrow.

From reports of the DeepMind scandal a few weeks ago, to several talking spots at this year’s London Fintech week, July has arguably become Artificial Intelligence Month in the Fintech world. But while there has been a great deal of interest in future applications of AI, it seems that less attention is being paid to the AI solutions already out in the wild. Let’s take a look at the state of play for AI today.

Why the rise of AI?

It’s no secret that as computers get more powerful, and society gets more connected, organisations across all industries are feeling the pressure of too much data. Whether you’re gathering it, storing it, or trying to use it, modern data volumes present vast operational challenges. For example, Walmart, the world’s largest retailer, is reportedly building a private cloud that will process 2.5 petabytes of information each hour. That’s one company processing the equivalent of over half a million DVDs (or the estimated memory capacity of the human brain) each hour, every day.

The problem is bandwidth: a human being can only process so much information at once. Using Artificial Intelligence is a good way of dealing with such staggering amounts of data – and although we haven’t quite reached the I, Robot scenario, research into everything from deep learning to artificial neural networks continues to gather pace.

Applications today, not tomorrow.

One way that AI is helping to conquer the data mountain right now is through automation. The advantage of AI in this area is scalability – in theory, AI can learn how to recognise complex patterns of information that would normally require human understanding; AI-based pattern recognition has already been used to build surveillance cameras that can distinguish between people, objects, and even events.

At CoreFiling, we recognise that potential. That’s why we built AI-based automation right into our XBRL document creation tool, Seahorse. Seahorse learns how to interpret the fields in forms, and automatically tags the information, thanks to a form of machine learning called Logistic Regression. And because it’s hosted in the cloud, Seahorse benefits from every filing. Each and every document scanned refines its detection method, allowing Seahorse to achieve incredible levels of accuracy.

Click here to learn more about Seahorse.

A Wealth of Potential: Interview with Ian Hicks

Last week, CoreFiling’s Ian Hicks took part in the FRC’s Digital Future: Data round table, and discussed ways to combat the current “under-performance” of financial data. After the event, we sat down with Ian to talk about the benefits and challenges of using XBRL for Digital Future.

DAN: Is XBRL right for Digital Future?

IAN: Oh, absolutely. XBRL has applications around the world, and it’s flexible enough to meet the main goal of Digital Future, which to me is data versatility. The nice thing about XBRL is that it creates a kind of blank canvas with data, from which you can start to create solutions that meet the specific needs of each client or industry – that could be data automation, integration with other reporting media, and so on. That doesn’t mean it’s a perfect fit, though!

DAN: How so?

IAN: XBRL is a powerful standard, but it needs to be better at hiding complexity. CoreFiling already helped to address this in some instances when Philip Allen developed iXBRL, but XBRL itself still needs specialist applications to be useful.

DAN: You currently chair the XBRL Best Practices Board. Is reducing complexity something you focus on?

IAN: Oh absolutely, yes. But that relies on more than just developing XBRL. To reduce the complexity of an XBRL-based system, you need to take a holistic approach – develop working methods and processes that enhance customer experience, for example, or take advantage of new technologies.

DAN: What would be on your wish list for XBRL development?

IAN: I think the most useful thing for all applications, including Digital Future Reporting, would be to make XBRL a little more visual – more “renderable” – outside of specialist tools. Similar to how Microsoft plug-ins can be embedded in a browser. We’ve already started to address this with our Beacon platform, which renders XBRL instance data and displays it in a format that users can engage with.

From a more technical standpoint, I’d like to see features such as non-repudiation of instances – i.e. was the instance created by an authorized person, have its contents changed, is the date stamp correct, etc. More widespread use of auto-tagging would also be a great benefit to the preparer, accountancy and audit communities, both from an XBRL perspective and from the point of choosing the most appropriate concept.

DAN: How would those advances relate to Digital Future Reporting?

IAN: The Digital Future Reporting model is all about taking advantage of technology. The advantage of using XBRL is that it already supports features like auto-tagging – that’s how CoreFiling’s instance creation tool, Seahorse, is able to auto-tag filing documents, for example. The challenge is just to maximise the usage of these features.

DAN: Do you think XBRL is under-used in the fintech industry?

IAN: XBRL itself has an enormous user base around the world, but I do think its more advanced features are overlooked – which is why the Digital Future Reporting model is so key. But I think the most important thing is to deploy XBRL where it can be of maximum use. There was an interesting discussion in Dublin regarding “NOXBRL” – not a rejection of XBRL as it might imply, but rather the idea of “Not Only” XBRL. This discussion was around hiding the complexity of XBRL reporting from filers. It went on to cover the idea that XBRL is fundamental, but that other technologies should be included to meet the overall regulatory reporting need.

To give you one example, investment analysts have already developed sophisticated systems sourcing data in a multitude of ways, like “web-scraping”. XBRL has the capability to massively enhance these processes through its ability to rapidly and effectively analyse large sets of structured and unstructured data – this ability to enhance other technologies is what makes XBRL so useful. We’re already seeing this in practice in other industry sectors.

Linked Data is another example. Linked Data and XBRL appear to have progressed in parallel: there needs to be closer co-operation to benefit from using both technologies. This co-operation could result in massive benefits to the analyst community by simplifying the process of comparing information from seemingly disparate data sets.

DAN: What about outside fintech?

IAN: Extending XBRL beyond finance isn’t just possible, it’s already happening, and I fully encourage it!

DAN: Can you give us an example?

IAN: Non-financial data looks to me to be the next ideal candidate for XBRL. We’ve already seen something similar in the US, with the SunShot Initiative’s Orange Button Programme – XBRL is going to be used for solar data gathering and analysis. It’s a great idea, because solar data comes from so many different sources across America. You need to keep data like that in a single, consistent format to make it useful. Equally, organisations like AECA in Spain are pioneering the reporting and analysis of sustainability data using XBRL.

DAN: Do you have any advice for people already working with XBRL?

IAN: I think taxonomy developers should broaden their approach when developing a taxonomy. Rather than thinking just about the regulation to be met, they should also consider and take into account, far more thoroughly, the needs of the groups who will be consuming and analysing the data.

I’d also advise the filer communities to require the solution vendors to provide the means to simplify XBRL filing, to hide the complexity from users and report preparers. This could only encourage a more widespread adoption of XBRL in digital reporting.

DAN: Thanks Ian!

IAN: Happy to help.

To find out more about the FRC’s financial reporting lab, and the goals of the future data model, click here. And for information on CoreFiling’s XBRL services, click here.

EDITOR’S NOTE: parts of this interview have been shortened for clarity.

CoreFiling at the FRC’s Digital Future Round Table

This week, CoreFiling’s Ian Hicks joined over 20 other industry representatives in London for the Digital Future: Data round table, hosted by the FRC’s Financial Reporting Lab. Discussions focused on how XBRL can help facilitate Digital Future Reporting: a 12-step model proposed by the FRC that uses technology to combat the current “under-performance” of financial data – read more about Digital Future Reporting here.

The event was a great success, attracting attendees from key fintech organisations (including Vizor, IFRS, Workiva, the FRC, and the Bank of England). CoreFiling was on hand to provide important insights about XBRL and its capabilities – and as chair of the XBRL Best Practices Board, Ian also outlined how to follow best practice when implementing it.

 XBRL and the Digital Future model

As an open source technology, XBRL is a good fit for Digital Future, because it isn’t constrained to a single supplier. The standard is also well established, with many examples of large-scale implementation across the world (including tax ecosystems in the UK, Middle East, and now the USA); XBRL provides a proven framework for creating simple, cost-effective solutions, without needing to “reinvent the wheel” or develop brand new technologies.

As IFRS’s Rita Ogun-Clijmans noted during the discussion, “XBRL should focus on where it fits best into the Digital Future Reporting model” – CoreFiling agrees, as the strength of XBRL lies in its inherent auditability, data provenance, and versatility; XBRL can be linked with other reporting media to assist compatibility.

For more information about XBRL, iXBRL, and CoreFiling’s solutions, visit corefiling.com.

UK Government Joins the Party at London Fintech Week 2017

London Fintech Week is back for 2017, with a new line-up of conferences, workshops, exhibitions  and meet-ups for financial innovators. For those who haven’t attended before, London Fintech Week is a 7-day event designed to act as a hub for international financial professionals, held across the City of London, Canary Wharf and East London’s “Tech City”.

The event has gone from strength to strength since its inception in 2014 – and this year’s conference is the biggest yet. Starting on 7th July, London Fintech Week 2017 will host the UK government’s first International Fintech Conference. The conference aims to attract investors to the UK’s “high growth fintech centre”. Click here to read a comprehensive overview of the event by Liam Fox (Department for International Trade) on GOV.UK

Ready. Set. Innovate.

At CoreFiling, we know that developing new technologies is key to the success of the fintech industry – so here are the top 4 events we’re looking forward to most at this year’s London Fintech Week:

  • Regulatory Sandbox Panel. Barclays, Santander and Credit Suisse meet to unlock the potential of regulatory reporting. As experts in regulatory reporting technology, we’ll be watching with interest.
  • Blockchain & Cyber Security Showcase. Blockchain is set to be the next evolution in financial security – but how far can we trust the hype? Thomson Reuters, the Linux Foundation and Applied Blockchain investigate.
  • Machine Learning & AI: Is it OK to Panic Yet? With the recent Deep Mind scandal still fresh in our minds, we look forward to seeing this panel’s take on an important emerging technology.
  • The FCA’s Regtech Update & Innovation Hub Workshop. Innovation is at the heart of what we do at CoreFiling, and as long-time supporters of the FCA, we anticipate big things from this workshop.

A Bright Future for Orange Button & Solar Energy

CoreFiling recently joined XBRL US, the SunSpec Alliance, Wells Fargo and the US Department of Energy’s SunShot Initiative, to co-host the “Orange Button” webinar. This webinar focused on the role of XBRL in the US DoE’s Orange Button program – a solar energy development plan aimed at reducing solar energy costs (and growing the US solar industry).

Mark Goodhand, Head of Research at CoreFiling and a global authority on XBRL specifications, was on hand to give attendees expert advice about XBRL and its features. Solar energy is already big business in America, and a data-led approach is key to its growth; Mark showed that adopting XBRL will simplify and standardise solar data, aiding (and in some cases enabling) all aspects of the SunShot Initiative. The discussion of XBRL’s benefits covered everything from improved feasibility studies and financial projections, to better planning, smarter construction, and support for future research.

Chris Mills then put theory into practice, by taking the audience through a detailed demonstration of CoreFiling’s taxonomy development suite, Yeti.

The webinar was a real success, and gave the audience an exciting look at the bright future of solar development. Click here to watch the webinar on YouTube, or read a write-up on the XBRL US page. You can also get involved with solar energy by attending the InterSolar conference in July.

Launched: Beacon can solve your regulatory submission errors.

Filing rejections are a real problem for businesses that submit to regulators. Even if you have a solution in place to create your XBRL filings, there is no easy way to decode them, or to check what you’re actually sending. Using our 20+ years of experience in data integrity, we’ve created the solution:

CoreFiling is excited to announce the launch of Beacon: our cloud-based filing review platform. And to celebrate, we’re offering free trial access to Beacon for all users.

Beacon is a secure, collaborative review and validation tool that integrates effortlessly into your existing workflow. XBRL filings contain a lot of encoded information that you can’t see (or check) – but with Beacon, you can view that data in incredible, granular detail. The Beacon trial gives you access to Beacon’s advanced review tools: users can upload and review one XBRL document, completely free. Better yet, you can store your document in our cloud for up to three years… and review it as many times as you like.

Plus, we’re holding a free webinar for all filers, showing you how to avoid the most common errors in CRD IV (incl. IFRS 9) & Solvency II submissions.

Here are just a few of the ways that Beacon helps your organisation solve its submission errors:

The Benefits

  • Beacon lets you decode the XBRL filing. You can view your data inside a regulatory template, and see your filing as the regulator will see it. You can investigate broad sections of the filing, or drill down to individual data points, then apply targeted validation rules if you spot an error.
  • Beacon creates an advanced filing management system that’s flexible enough to fit right into your current workflow. Store all your filings in a secure, change-tracked environment. Control data access through custom user profiles. Import LDAP users and connect to your existing data sets with Beacon APIs.
  • Beacon promotes collaboration. Cloud access means colleagues can work together, anytime, anywhere. In fact, Beacon allows an unlimited number of users to view and mark up a filing at any one time. And thanks to Beacon’s cloud architecture, even the largest XBRL documents are accessed quickly – with no performance loss on your PC.

Download the PDF for more information about Beacon.

How do I sign up?

You can access Beacon right away by visiting our launch page, here. All you need to enter is your name, e-mail address and company. And don’t forget to sign up for our free Solvency II & CRD IV webinar too.

XBRL accounting taxonomy design and categorisation – Part 2: Architecture

In this series of articles, we propose a categorisation of taxonomies based on three aspects of their design: architecture, coherence and extensibility.  Using this categorisation we look at the evolution of taxonomy design through three generations.

1. An example of three taxonomy generations

In this article we look at the architecture of three taxonomies that are good examples of each generation:

2. What is architecture?

The ancient Greeks bequeathed us the word they used for the “chief builder” – architect.  Good architecture manifests itself in perceived simplicity, elegance, or consistency of form; making the best or even inspired use of the available materials and prevailing building techniques; enhancing, working with or at least complementing the environment in which it resides; and performing its intended function effectively and reliably.

In the modern world architecture doesn’t just apply to physical structures. In the software world, it can be applied to any moderately complex logical structure. There is little doubt that an XBRL taxonomy qualifies in this regard, so it is reasonable to apply the principles of good architecture to taxonomies.

3. First generation: standards based – the IFRS taxonomy

The International Accounting Standards Board’s IFRS taxonomy has been in existence for over ten years – its latest incarnation is for 2016. By that measure alone it might be regarded as a first generation taxonomy.

Its architecture is derived from the structure and content of the International Financial Reporting Standards. These standards were drawn up without the knowledge that an XBRL taxonomy would be based upon them.

The IASB was breaking new ground when it first developed its IFRS taxonomy, and to provide the taxonomy with authority it was essential to model the underlying standard as closely as possible.  The design focused on ensuring that each concept in the taxonomy mapped directly into IFRS standard which, while being a good way of reflecting the precise structure of the standard, was not able to benefit from the elements of normalisation common to modern IT data design.  The result is a taxonomy that contains silos of repetitive concept definitions, labels and dimensions.  It is also dimensionally incomplete, using dimensions only where they are absolutely necessary to fully describe data relationships.

Secondly, in limiting the content of the IFRS taxonomy strictly to the scope of the underlying standard, it was unable to provide guidance to the builders of other taxonomies based on IFRS until the creation in 2014 of the IFRS Taxonomy Consultative Group (ITCG).  One consequence of this approach is that taxonomies already derived and extended from the IFRS taxonomy vary markedly from jurisdiction to jurisdiction.  This design freedom has turned out to have a critical side effect, which is that it makes it more difficult to compare financial data drawn up against different IFRS-based taxonomies.

4. Second generation: document based – the US GAAP taxonomy

The US GAAP taxonomy has a rigorous and well-documented three-layer architecture comprising a domain model, a logical model and a physical model, with clean separation between the layers.

Similar to the IFRS taxonomy, the US GAAP taxonomy is the embodiment of a set of accounting standards.  It was modelled on the various kinds of documents that arise from the practical application of a set of accounting standards rather than the standards themselves.

The taxonomy is characteristic of architectures that have to cater for, and are developed by, numerous stakeholders.  Each of these stakeholders works with a variety of different documents each of which need to be modelled by the taxonomy leading to an unavoidable loss of focus.

This document-centric model has influenced the taxonomy in a number of ways. These include a proliferation of one- and two-dimensional hypercubes that closely resemble the tables that typically appear in documents and a requirement for HTML-marked-up “text block” data items.

5. Third generation: data model based – the UK FRS taxonomy

The UK FRS taxonomy is primarily based on the International Financial Reporting Standards (though not the IFRS taxonomy) so one might expect similarities with the IASB’s IFRS taxonomy. However, the architecture is very different.

The UK FRS taxonomy has no formally documented architecture, but nevertheless it does have a design that is modular and consistent.  The approach has been to model the data defined by the standards rather than the documents. This allows the taxonomy to focus on the data rather than document structure or presentation.  The key is in understanding the data and its potential dimensionality.

The taxonomy team at the FRC expended a considerable amount of time working with preparers and consumers to analyse over eight million Inline XBRL financial reports that had been produced in the UK since 2011, tapping into their experience to understand how the data is collected, organised and used. This has resulted in a data-centric architecture with a complete and comprehensive dimensional model.

6. Conclusion

We’ve described three contemporary taxonomies in terms of their architecture and indicated why we think each of them are good exemplars of the changing architectural approaches that characterise each of the three generations.

With each generation the foundation of the taxonomy’s architecture has become deeper and more analytical, moving from a model based on standards, to documents, and then to data. In the process, presentation, as an architectural concern at least, has been stripped away.

In following articles we’ll discuss other aspects of these three taxonomies that reinforce this characterisation.

7. A final thought

It is worth noting that you can arrive at substantially the same conclusion by different architectural routes.

If you look at the Presentation Linkbases of these three taxonomies you’ll see much the same document-centric, IFRS-oriented structure, thanks in part to the alignment of reporting terminology.  However, the underlying architectures differ significantly.

Despite this, the architects of these three taxonomies have arranged for users to see and browse the reporting structures with which they are familiar.  It wasn’t necessary to follow one particular method of deriving a taxonomy to do this.  This is a powerful demonstration that presentation is neither a by-product of design, nor something that has to be “designed in” from the outset.

XBRL accounting taxonomy design and categorisation

1. The future of taxonomy design

This is the first in a series of articles in which I propose a novel categorisation of accounting taxonomies based on three aspects of taxonomy design: Architecture, Coherence and Extensibility.

In this first overview article I will introduce the three design aspects. In future articles I will cover these aspects in more detail and examine how they apply to the US GAAP, IFRS and UK FRS taxonomies. The series will conclude with a discussion of how I’ve categorised these taxonomies and how this categorisation might inform the current direction of taxonomy design.

2. Why are taxonomies so important?

XBRL taxonomies are the key components of any electronic financial or business reporting system. An XBRL taxonomy is the formal definition of a financial or business reporting vocabulary for a given jurisdiction or reporting domain, imparting meaning to the concepts which describe the facts being reported and providing a framework within which reports are structured. It defines, the “contract” between reporter and regulator.
Just as importantly, it defines what is not permitted, except insofar as “locally negotiated” extensions allow. A taxonomy also defines relationships between reporting concepts, meaning that the “contract” not only defines the reporting vocabulary (the “what”) but also the grammar (the “how”) – how reported concepts can legitimately be combined and related to each other.
It is for these reasons that taxonomies matter in an electronic world. They are fundamental to any financial or business reporting regime and their design exerts a direct influence on the capabilities and expressive power of reporting and analysis tools.
XBRL tools and technologies are still evolving to suit the market’s needs. Experience has shown that deploying XBRL solutions takes a considerable amount of time and effort, and a large portion of this is invested in taxonomy design and development. Taxonomy authors are continually developing new ways to address the complex challenges of financial and business reporting.
It’s clear that XBRL taxonomies are currently undergoing a period of rapid evolution as they colonise a number of new financial niches, with new taxonomies building on the successes – and avoiding the perceived failures – of previous generations. I’m proposing the establishment of a new classification system for taxonomy evolution, with the hope of illuminating the future of taxonomy design.

3. Taxonomy evolution

In the family tree of taxonomies, those concerned with company financial statements can be broadly classified according to three key aspects. This has resulted in taxonomies that can be classified as belonging to one of three generations.

3.1 Aspects of taxonomy design

3.1.1 Architecture

Some taxonomies model the applicable accounting or financial standards; some model the required reporting documents; and some model the underlying data.

3.1.2 Coherence

‘Coherence’ is the degree to which a taxonomy “hangs together” and permits the creation of a body of instance documents that are consistent and comparable. At one extreme some taxonomies give the freedom to combine reportable concepts with any dimensions and to combine dimensions freely. At the other extreme such combinations are carefully controlled by the taxonomy.

3.1.3 Extensibility

Some taxonomies are very permissive when it comes to extension, to the point that “anything goes”; some taxonomies provide specific extension points so that extension can be controlled, if not actually defined; and some taxonomies provide specific mechanisms to support extension.

3.2 Taxonomy classification

3.2.1 First generation

First generation taxonomies are literal interpretations of accounting or financial standards, where the filer can do pretty much whatever they please with the base taxonomy, and any additional structured information can be captured as a privately-defined but uncontrolled extension.

3.2.2 Second generation

Second generation taxonomies model not the accounting or financial standards themselves but the regime’s required document structures derived from the applicable accounting or financial standards. Additional structured information can be captured in a private extension that should follow certain rules or guidelines laid down by the taxonomy author.

3.2.3 Third generation

Third generation taxonomies move away from an architecture derived from the accounting/financial standards or reporting document structures and instead simply model the data within the taxonomy. Additional structured data can be captured by ‘extension’ mechanisms built in to the data model of the base taxonomy itself.

All three generations exhibit convergent evolution in that they all provide a document-oriented browsing and presentation view that will be familiar to preparers and accountants, but each is derived in a fundamentally different way.

4. A new taxonomy classification system

The key taxonomy design aspects that categorise taxonomy evolution are summarised as follows:

Taxonomy Classification

5. Next article

In the next article in this series I will discuss the Architecture aspect of taxonomy design in depth, with reference to the US GAAP, IFRS and UK FRS taxonomies.

I would like to also thank Andy Greener for his contributions.