The invention of schemas to provide formal definition of the contents of XML documents was one of the most important innovations accompanying the creation of the world wide web.  For the first time it was possible to develop a generic data validator which was controlled wholly by the data schema.  Data processing code no longer had to include custom tests that checked the validity of each data item before processing.  Developers could focus on the processing without having to devote resources to ensuring that each item of data was of the size and type required.

It is hard to overstate the extent to which “schema validation” simplified the problem of passing increasing amounts of data around the world.  Without it, many data transfer projects would be unviable because of the number of data errors and the cost of fixing them.

Before schemas, data transfer projects used definitional documents like MIGs (message implementation guidelines).  These were compiled by hand, often in MS Word, and were not machine-readable.  In the old world, an application receiving a stream of data had to be told precisely what type of data to expect and in what order.  If the incoming data did not follow those rules a processor might, for instance, try to add a number to a text string and fail unpredictably, throwing a “segmentation fault”.  To avoid this, each piece of data had to be tested before it was used, making the processing code more complex and more expensive to write – and, importantly, to maintain.

With schema-based validation, the testing amounted simply to passing the data file and the schema file to a generic validator.  No more custom testing code, and the spin-off benefit that the data originator could also maintain the schema, removing a whole area of communications between data owner and developer which are always a potential source of confusion and error.

XBRL, as a primarily XML-based specification, makes heavy use of XML Schema to define the contents of XBRL taxonomies and data documents.

But this is only one component of validation.

XBRL validation relies upon a “stack” of different types of rules.

  1. Well-formedness
  2. Schema-validity
  3. Specification validity
  4. Taxonomy validity
  5. Business rule compliance

These five levels of validation can be represented as the rungs of a ladder; each level of validation depends upon the lower levels.  If the lower rungs (well-formedness, schema-validity) aren’t in place, we can’t reach the higher rungs (business rule compliance).

As a general principle, any rule should be defined as low as possible in the ladder.  Lower levels are easier and cheaper to enforce.  Thus, don’t define a rule in a filer manual if it can be defined in XBRL Formula; don’t use formulae if the rule can be encapsulated in XML Schema.  This keeps implementation as simple as possible and ensures the best levels of compliance.  Secondly, rule-writing can (and should) assume compliance at each previous level.  Business rules, for instance, should be written on the expectation that document content rules defined in the schema have been met in full.  This keeps the number of rules to a minimum and keeps preparation and ingestion costs as low as possible.

And because each level of validation assumes that the data complies with the tests defined at the previous level, it follows that all levels of the ladder should be tested, in turn, to ensure data compliance.

CoreFiling’s True North Data Platform represents the gold standard of validation for XBRL. Please visit our site for more information on our Taxonomy Management System for taxonomy authors and Beacon for assurance and review.