|
XML I18N with Internationalization Tag Set
By Yves
Savourel, Localization Solutions Architect, ENLASO
Corporation
As XML is used more and more for storing, transporting and manipulating data and content, it is becoming important to make sure XML applications consider any internationalization issues. The Internationalization Tag Set Working Group (ITS) is the latest effort of the W3C Internationalization Activity in this area.
The aim of ITS is two-fold:
- To provide, for both developers of XML formats and content authors, a set of guidelines on what are the best practices when it comes to internationalizing XML applications.
- To offer a "ready-to-go" set of elements and attributes implementing some of the best practices, that one can simply integrate in a new or existing schema.
At the time this article is written, the Working Group has just published (May 18th 2006) the Last Call Working Draft for the "Internationalization Tag Set (ITS) Version 1.0" specification. The document is available at http://www.w3.org/TR/2006/WD-its-20060518/. All information in this article is based on this document. Be sure to refer to the latest version of the specification, available at http://www.w3.org/TR/its/.
In addition to the specification, a new "Best Practices for XML Internationalization" document has been published at http://www.w3.org/TR/2006/WD-xml-i18n-bp-20060518/. This is a First Working Draft. The most updated version is available at http://www.w3.org/TR/xml-i18n-bp/.
ITS defines its features through "data categories". A data category is an abstract concept for a particular type of information for internationalization and localization of XML schemas and documents. At this stage the different data categories are:
- Translatability – to indicate what parts of a document are translatable or not.
- Localization Information – to provide notes and information about the content to the localizers.
- Terminology – to identify terms within the content.
- Directionality – to indicate the writing direction of runs of text.
- Ruby – to allow specific phonetic annotation.
- Elements within text – to identify how elements affect the segmentation of the content.
- Language identification – to identify the language of the content.
These data categories are then addressed from two different viewpoints:
First, if there is an existing construct in the targeted XML format that already provides the same information, ITS should be able to re-use it transparently. For example, in DITA (the OASIS Darwin Information Typing Architecture) there is already a translate attribute that has similar properties to the ITS translatability data category. ITS provides a way to simply associate the DITA translate attribute to the ITS translatability data category.
Second, if the targeted XML format does not have the features ITS offers, ITS should be useable as a namespace within the host format. For example, if a document type does not have a way to flag translatability, the its:translate attribute can be used within your documents ("local" rules). The <its:translate.Rule> elements are used to specify at a more generic level what elements or attributes are, or are not, translatable. The ITS rules can be embedded within a document instance, or set in a standalone file that tools can re-use with all documents of a given format (both are "global" rules).
This way ITS offers both: a ready-out-of-the-box tag set that can be simply imported into an XML schema, or it can re-use markup already in place and associate it with well-defined internationalization features that any ITS-aware tools can work with.
Most of the global rules follow the same pattern: A selector attribute points to an area of the XML documents (using an XPath expression) and various attributes carry the actual information to be applied at the pointed location. For example, to specify that all elements <draft-comment> in DITA are not translatable the following ITS rule is used:
<its:rules xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
<its:translateRule selector="//draft-comment" translate="no" />
</its:rules>
Rules are processed in the order they are declared and, in case of conflict, the last takes precedence. This allows creating more complex declarations and resolves just about any complicated case. For example: in XHTML, an alt attribute is translatable, but the content of a <del> element is not. Making sure no alt attribute (or other attributes) in any of the children elements of a <del> element are not translated can be declared with the following rules:
<its:rules xmlns:its="http://www.w3.org/2005/11/its" its:version="1.0">
<its:ns prefix="h" uri="http://www.w3.org/1999/xhtml"/>
<its:translateRule selector="//h:*/@alt" translate="yes"/>
<its:translateRule selector="//h:del" translate="no"/>
<its:translateRule selector="//h:del/descendant-or-self::*/@*" translate="no"/>
</its:rules>
We have looked at the translatability data category in the examples above; however, ITS covers more than that. It aims at addressing internationalization issues, and localizability is just one of them.
The ITS specifications have reached a stage where feedback from the potential users is critical. It is very important for localization tools providers as well as schema designers and content authors to see whether the current Working Draft addresses correctly the problems it tries to solve. There is a six week review period (ending June 30th 2006) and you are encouraged to provide feedback. For this, you can use the W3C Bugzilla system (see http://www.w3.org/International/its/its-bugzilla for instructions), or post an email to the i18n-comments mailing list (http://lists.w3.org/Archives/Public/www-i18n-comments/). All comments are public.
ENLASO's Localization Services
For more information on how ENLASO can assist you with all of your localization needs, please contact Chris Raulf at craulf@translate.com or 303 516 0857 x103.
|