|
XBRL News
Issue of the Week – XBRL data encoding
06 November 2006
Multilingual label linkbases are a crucial part of the IFRS-GP Taxonomy, supporting its adoption in numerous countries implementing XBRL and IFRS.
As users prefer to use their native languages, some of which have their own special characters, the IASC Foundation XBRL Team assists those countries by providing various translations of the IFRS-GP Taxonomy Label linkbase.
Issue description
While preparing translations of IFRS-GP 2006 Taxonomy label linkbases the IASC Foundation XBRL Team noticed an inconsistency in the encoding of special characters by different XBRL-enabled software.
Encoding introduction
XBRL taxonomies and instances, like any text-based document, may use different encodings. Fundamentally, computers just deal with numbers. They store characters from text-based documents by assigning a number to each one. This mapping is called (data) encoding and applications have to specify an encoding to store and process a text-based document.
Over time many different encoding standards have evolved, each of them using a different coding scheme and in many cases conflicting with each other. These incompatible encoding schemes exist because of hardware restrictions which force encoding standards to use as few characters as possible.
However, today's computers provide enough disk space and memory to enable the use of a comprehensive global encoding scheme (Unicode) which contains the characters of all modern languages. UTF-8 is the most common variant of Unicode encodings and commonly used to encode XML documents.
XBRL and Unicode (UTF-8)
The XBRL data format is designed to enable data interchange without needing to resolve regional or technical distinctions. To make this work, the XBRL data needs to be encoded in a global encoding scheme.
Complying with the W3C's XML specification which applies to all XML based data formats, the IASC Foundation XBRL team uses UTF-8 to encode XBRL taxonomies and instances. (According to the XML specification "All XML processors MUST accept the UTF-8 and UTF-16 encodings of Unicode 3.1".)
UTF-8 encoding issue in practice
The specific inconsistency encountered by the IASC Foundation XBRL Team is the encoding of special characters (i.e. é, ê, è) by XBRL-enabled software when using UTF-8.
Examination of the labels shows that some characters are interchanged (e.g. à changes to ŕ, è changes to č) for no apparent reason. The changes require manual correction.
Temporary solution using ASCII encoding with entities
As a workaround for the aforementioned issue, the IASC Foundation XBRL Team uses ASCII encoding for the label linkbase, using entities for special characters (see example below).
UTF-8 encoding and ASCII encoding with entities
The example below shows the same XBRL label encoded in UTF-8 and in ASCII using entities.
A major drawback of ASCII encoding is the reduced human readability of the XBRL label linkbase source files. The use of entities may also lead to poorer processing performance compared to UTF-8.
The following example code shows the IFRS-GP label for the concept AccruedAdministrativeLiabilitiesTotal in the French language. Note the difference in the encoding of special character "a" with a grave. The UTF-8 encoded XBRL code contains the a with grave (à), whereas the ASCII encoded XBRL code uses an entity (à) for the character.
UTF-8
<?xml version="1.0" encoding="utf-8"?>
…
<label xlink:type="resource"
xlink:role="http://www.xbrl.org/2003/role/label"
xlink:label="ifrs-gp_AccruedAdministrativeLiabilitiesTotal"
xml:lang="fr">
Charges administratives à payer, Total
</label>
…
ASCII
<?xml version="1.0" encoding="ascii"?>
…
<label xlink:type="resource"
xlink:role="http://www.xbrl.org/2003/role/label"
xlink:label="ifrs-gp_AccruedAdministrativeLiabilitiesTotal"
xml:lang="fr">
Charges administratives à payer, Total
</label>
…
Conclusion
Although ASCII encoding is not recommended for XML / XBRL based documents, until XBRL-enabled software can provide flexible and inter-compatible handling of UTF-8, the IASC Foundation XBRL Team will continue to encode label linkbases in ASCII using entities for special characters.
Our objective is to be able to encode XBRL taxonomies and instances using UTF-8 and for XBRL-enabled software to handle this consistently.
|