HTML Translation:Technology

From Fmfi

Various pieces of technology where used and enhanced in this project, all of the work being around format manipulation and translation management tools.

Translation Toolkit

The Translate Toolkit allows us to convert various source texts (HTML, wiki, etc) into standard file formats such as Gettext PO and XLIFF (a translation interchange format).

The key to good translation is to divorce translators from the layout, just as content is divorced from layout in the HTML and CSS paradigm. Thus the toolkit allows us to take raw HTML and extract the textual content. This content is then translated and managed using Pootle and finally the translations plus the original English are combined to create the translated content.


During the project a number of enhancements occurred. The first work was with HTML thus we enhanced the html2po tool to be more robust, adding htmltidy to the process to clean up badly formed HTML.

During the course of the project we added support for wiki formats, which was essential for the translation of Pootle documentation. This was based on the txt2po converter, since wiki syntax is usually just a simple text layout. The initial format used was DokuWiki but MediaWiki syntax was added quite quickly as it was relatively easy to add new flavours.

Translation Management System

The Pootle Translation Management System (TMS) was used extensively through the project. A TMS allows a translation manager to manage the process and rights of the translators. Pootle was developed by to facilitate distributed translation.


Pootle itself was monitored and bug fixed. In cases such as creative commons a number of scripts needed to be created to allow their content to be transformed to and from PO for work on the Pootle server.