HTML Translation:Project Chapter

From Fmfi

Most projects within the FMFI project focus on connectivity, the first mile (innovative ways to connect A to B). This project falls within the first inch, the ability to access content. To many people this is often seen as access to a device that allows the user access to content. Or in some cases technology that allows people with disabilities to access data; screen readers, text-to-speech, etc. But not many people consider that with content not available in someone's own language that first-inch and first-mile solutions are bring information to a person but they are unable to access it simply because of a language barrier.

Creative Commons license selection page translated into Northern Sotho

All the e-Government systems in the world, delivered efficiently to all the rural areas in South Africa will still not bring access to people if all the content is in English.

English content is a barrier to many people wanting to access information. By focusing on making it possible for a website to be easily translated the project's aim is not to eliminate the barrier posed by English but to eliminate the technological issues that prevent web content from being translated. Actual translation still needs to be done by a translator and the project looks at the social issues that enable or prevent this translation from happening.

The majority of content on the internet in created in English, this is indeed also a reflection on the fact that many of the first Internet users where English speaking. With the growth of the Internet there is a growing community that does not speak English and are creating content in their own languages. However, the fact remains that much English content if available in their mother-tongue would be very useful to many non-English speakers. This problem of monolingual content means that content is not widely accessible.

If this content could be translated then it would be accessible to a greater audience bringing with it all sorts of possibilities. The bridging of divides between linguistic communities, the sharing of information and the stimulation of new content.

Although there is a movement to Content Management Systems (CMS) and Wikis, there is still a large amount of content that is stored in static HTML pages. And the problem with static content is that it is hard to process and translate. But with tools it would be possible to transform static HTML content into a standard translatable form.

With a process set in place to transforms static content into a translatable form, then makes it easy for someone to translate and lastly converts the translated content back into static HTML then it would be possible to unlock monolingual content easily.

Research Question

Does the ability to easily translate websites content stimulate the actual translation of the content.


The first major technical challenge was the adaptation of the Translate Toolkit to be able to process HTML, and later wiki pages, in a standard translation format known as Gettext PO. This proved difficult as although HTML is a standard its implementation varies. Wiki syntax proved much easier to manage.

The project did set out to automate the roundtrip process of static content to Gettext PO and back but this was not completed as the translations of static content through the FMFI partners never materialised. The roundtrip for Wiki content happened but was not automated. The automation of these would be simple to implement.

The project used the Pootle Translation Management System (TMS) to enable on-line web-based translation to take place. This was done to lower the barrier to entry for translators. They would not need to install any software and the process of translation is as simple as using a web browser.

Translation process explained

The following diagram shows simply the content creation process in contrast to the localisation process:

Content creation and translation compared

In both their is the creator who either creates new content or translates existing content info another language. While the content creator creates and output that benefits users who can speak his or her language, the translator allows other speakers of other languages to benefit from the same content.

The following diagram allows us to understand the actual translation process:

The translation process for static content

From the HTML content we extract translatable text. This is given to a translator who produced translated text. By combining the original HTML and the translated text we can create a translated HTML page.

The same process applies to other content such as Wiki pages. In the context of this project tools where created to transform the HTML into a translatable format. And the tool used by translators to create the translated text was the Pootle TMS.

Boundary Partners

Three boundary partners were identified:

Boundary partners
  1. Creative Commons
  2. Pootle Translation Community
  3. FMFI partners

Each has a different level of translation experience and each was given a different level of support.

Creative Commons is an organisation as well as a movement centred around copyright reform. Central to the movement are a set of licenses that allow content creators to define how people may user their content. These licenses are translated into a number of languages and our efforts where to translate them into a number of South African languages.

The Pootle software, that was also used in this project, was developed to allow anyone to translate online. The community that has sprung up around the software is part of the broader software localisation community. During this project we exposed the Pootle user guide for translation into a number of languages.

Lastly, the FMFI project partners themselves where asked to translate the FMFI website content into local languages. As FMFI covers Angola and Mozambique as project partners this introduced Portuguese as a language that would be helpful for research dissemination.

Levels of translation experience

Levels of translation experience

The various boundary partners had varying levels of experience in localisation. This varied from no localisation experience to a high level of localisation experience.

We assumed that all FMFI partners had no localisation experience and thus where the ideal representation of the average user of content. Creative Commons had some level of exposure to localisation in that many of the licenses are translated and their community is multilingual and multicultural. The highest level of experience was that of the Pootle Community, since this community is centred around localisation it would be expected that they are very aware of the need for localisation and the experience of localisation.

Levels of support

The Creative Commons translation were performed in Pootle using professional translators and given full support. The Pootle community was was not given any support but they are part of an existing translation community. Lastly, the FMFI partners where given the translations but were not given any level of support, in most cases this would represent a real content community.

Outcome challenges and progress markers

The following are the outcome challenges and progress markers for the three boundary partners.

Creative Commons

Creative Commons adopts and actively uses the Pootle translation management system

Expect to see
Like to see
Love to see
Creative Commons wants to have their licenses translated in South African languages Translations are hosted by creative commons on their own version of Pootle All licenses are translated on the Pootle server
The translated licenses are made available on the Creative Commons website New language teams start translating on the Pootle server Other content and software on their website is examined to see if it can be translated
Other teams outside of Creative Commons begin adopting the tools

Pootle Community

The community actively translates the Pootle documentation

Expect to see
Like to see
Love to see
People are able to translate the wiki using Pootle The dokuwiki community supports this translation method Dokuwiki translated its own content using the wiki extractor and Pootle
Wiki content is translated into at least one other language Other dokuwiki sites use this approach to translations Dokuwiki enhances the extractor when they change their internal syntax
At least one major world language is actively translating: e.g. French Other wiki creators, specifically MediaWiki adopt the tools
Wiki designers add their flavour to the wiki extractor
The tools receive media attention.

FMFI partners

Partners see localisation as an integral component of dissemination of their work

Expect to see
Like to see
Love to see
The software is installed on the FMFI website Key content is translated Participate actively in the development/enhancement of the software
FMFI partners in Mozambique and Angola localise some of the content Rapid turnaround between publication and translation Contribute monetarily to the software's enhancement
Translators excited about the software and its potential application in other areas. Promote and advocate the software in other departments
Brag about meeting their multilingual mandate.

Strategy Map

I-1 I-2 I-3
Enable key languages on Pootle Discuss the issue of dissemination and local languages Provide email and online support to translators
Approach individuals to translate Update help documentation as needed
E-1 E-2 E-3
Install a Pootle server Send emails requesting help and praising progress Demonstrate Pootle translation
Upload translatable content Help set translation as an FMFI priority


This is the view of the project from a technical perspective. In this we focused on 3 aspects:

  1. Translation of creative commons licenses. This is an existing body of translations with a poor process. It allowed us to validate our tools and create a better process.
  2. Localisation of static web pages. We chose the FMFI website and processed the pages to allow them to be localised.
  3. Localisation of wiki content. Although wiki content is not static it is hard to localise, we looked at making it easy to localise the Pootle manuals.


From a technical point of view the following positive outcomes where achieved:

Creative Commons Pootle server

Creative Commons Server

Creative Commons installed a Pootle server onto which they migrated their licenses (48 languages) and all of their software. This replaced an old system that was not ideal for translation.

Tools enhanced

The tools that convert HTML to a translatable format where enhanced, but as explained below this will probably never be fully completed.

The tools where enhanced to manage various Wiki formats so that the Pootle documentation could be translated.


Web pages are messy, very messy

A lot of time was spent looking at web pages and fixing the tools so that they extracted only the core content. Since the HTML standard is very loose and many web browsers have over time evolved to handle the idiosyncrasies of bad HTML we discovered that no matter how hard we worked we often ended up with extracted text which was difficult for a translator to use. With more time we could have refined the quality of the output.

However, we did see that it is possible to divorce the content from much of the raw HTML. So this still holds good potential in recovering static web content for translation.

A focus on XHTML which is better structured could also lead to more consistent extraction.

Slow translation server

The translation server hosted at Rhodes University experienced a general slowdown for anyone outside Rhodes campus. This was initially assumed to relate to either students using the Internet or throttling policies within the University, however subsequently it was discovered that it related simply to an incorrect IP allocation. Unfortunately this was only discovered after the completion of the project. It is uncertain what influence this would have had on the project but it did affect the FMFI translations, Pootle community translation where unaffected as these are hosted on another server.

It was not possible to find a suitable host within South Africa that could give the same level of access that the Pootle server required.


Wiki's hold potential

When building the tools to help make it easy to translate wiki pagesit was realised that Wiki pages, no matter that they are stored in a database, are simple text pages. Even though they follow an informal structure they are more regular then HTML. Thus it was quite easy to create a way to change the wiki page into a translatable format and back.

With much content now being generated in Wiki's, we think of Wikipedia and other resource, one realises that much of this content can be made available for translation. Thus future translation should become much easier as wiki and other content management systems become more pervasive.


This is a view of the project from a community or participatory view. In this we looked at:

  1. How the skill level influenced the work to be done
  2. How the level of support influenced the outcome of a translation
  3. How the familiarity with the need for translation impacted on the volume of work completed


The following are the results that where observed:

The Creative Commons translations where conducted with professional translators, who worked on Pootle and where given a high level of support. As would be expected all translations where completed a the Creative Commons licenses where translated into Zulu, Northern Sotho and Afrikaans.

The Pootle user guide was translated into [six languages by the community that translates the Pootle user interface and is involved in many other software translation projects.

The FMFI content was made available on's South African Pootle server, the data was announced twice to the FMFI partners, the need and opportunities available by translating the content where highlighted at various FMFI gatherings but no translations where made.


Partnership works

In translating of the Creative Commons license we were able to engage Creative Commons directly which allowed us to bring translators, customer and end-result together. This was an amazing experience and mirrors what should be happening. The translators could work well especially since all the technical details were hidden from them.

Making it available is not enough, a connection to the need must exist

We made an announcement on the FMFI group list, appealed at various gathering and yet still this did not lead to localisation. With a number of the project partners from Portuguese speaking countries it was assumed that they would see the connection with translation, that by translating the English FMFI content into Portuguese that they would expose their work to others allowing future collaboration.

Therefore it is clear that simply making it easier to translate is not enough. The potential translators will only translate if they see a need to translate. These could be seen as external; the need for others to have access to the information, or internal: by making this data accessible I can get more exposure for my work. The motivation is interesting but what is the most important is that the translator must have made that connection, without that the translations will not happen as they are not prioritised by the translator.

But as we show below if the motivation is present then sometimes the action of making the content available can be a catalyst for further translation.

Existing techs are adopters

From all of the projects that we worked with we found that when we have technical people involved they are quick adopters of the translations or of the tools and that often this spreads to others. This was noticed in the translation of the Pootle guides; an existing highly technical group translated the content. Also in the Creative Commons scenario a group of highly technical system administrators quickly adopted the tools.

Making it more accessible can get more done

People had put some effort into localising the Pootle tools documentation, but nothing quite succeeded. The English documentation is stored in a Wiki, previous translators had simply created translated pages in the wiki based on the English, but these are very hard to keep up-to-date and where often abandoned. By simply making it easier to translate and keep translation up-tp-date we saw more translations. Within days we had a full Persian translation and people has started translating Afrikaans, French and Basque (a minority language of France and Spain).

With the Creative Commons Licenses we found that by using our tools people were able to give higher quality translation, something that they had been unable to do till then. Volunteers also emerged with someone wanting to translate the licenses into Xhosa. They approached simply because the licenses were made available for easy translation. Thus easy access to translation can make more happen, once again as long as their is the motivation to translate.


Although we saw little change within the FMFI participants we were very encouraged by the adoption within the two other communities. The fact that the documentation for the tools has been localised into Persian and Afrikaans within days of announcing their availability is very encouraging, indicating that where their is a will to create local content making it easier to translate results in more local language content.

The creative commons localisation was also very encouraging, since we hosted the translation on our translation server we were approached by students at Rhodes University to localise the licenses into Xhosa.

The most important observations are these:

  1. Lowering the technical barrier results in increasing levels of translation.
  2. A connection with the need for translated content results in people participating in content translation. Without this connection lowering the technical barriers shows no increase in translation activity.

External Links