Challenges Localizing Wiki Files

Translation and localization tools have come a long way in the last decade, despite that, no vendor can offers a 100% complete solution. This is why relying solely on localization tools to parse all file formats can be risky, especially when dealing with Wiki markup files.

According to Wikipedia “Wikitext language, or wiki markup, is a lightweight markup language used to write pages in wiki websites, such as Wikipedia, and is a simplified alternative/intermediate to HTML. Its ultimate purpose is to be converted by wiki software into HTML, which in turn is served to web browsers.” Parsing Wiki files for translation is tricky. Without a defined set of rules to identify the text from the markup, it is almost impossible to correctly parse them.

Most common localization tools are not yet equipped to parse wiki files and to prepare them like standard XML or HTML files for translation. Media Wiki syntax differs from common XML code by being easy to confuse the variables or tags embedded text, with the text that needs to be translated.

Wiki markup editors do allow translating the text directly in their environments, but this is often not a preferred long term solution. A more comprehensive localization and translation methodology needs to be integrated in the process to allow the use of a robust translation environment that most translators are familiar with and facilitate the creation of a translation memory that will allow translation reuse and consistency throughout the translated text.

Following the following important steps can make a real difference to your project when working with Wiki files:

  1. Obtaining and understanding the full wiki rules used when building the wiki files
  2. Obtaining file samples to use as reference while building the custom parser
  3. Running pseudo translate to ensure complete and accurate parsing
  4. Having access to a temporary wiki system to perform quality assurance

When dealing with Wiki markup, it is best to work with experienced engineers who can understand the syntax and identify the code from the content. These engineers can then create a custom parser based on the rules used while building the wiki source files.

Similar to XML and HTML files, tags like <tt> or </tt>, or <pre> or <code>can be easily parsed and protected from translation. But when you see interwiki links, external links, or specific wiki templates, you need experienced engineers that can identify the content and create the parser to protect the code.

For example:[[InterOp:Connect/Format and Version Support|Format and Version Support]]

The rule indicates that only the text after the | can be translated. Anything prior to the | needs to be protected.


Fig1: Sample of Media Wiki file syntax, parsed file ready for translation into a translation memory tool and how it will display in a browser. Click on the image to enlarge.

It is best to work with a project manager that can understand special needs and be able to ask the right questions and get the needed info to help the engineers parse the wiki files.

Pseudo translation is also an important factor to succeed. Running pseudo translation on the files before starting the translation process can help you identify if any of the content was left out of the translation process. One pseudo translation technique replaces vowels with target language characters on all the words that need translation. When displaying the pseudo translated files, if any words show with the source language vowels, it will be an indication that the source files were not parsed correctly. Also, if the file fails to open properly, it could be an indication that code was translated when it should not have. Here, parsers can be corrected before real translation begins.

Once the files are fully translated, having a temporary or portable media wiki is also an important factor to ensure a quality result. It allows importing the translated files and performing localization quality assurance (QA) on them before final delivery. This added step to your process helps you identify issues that are caused by inadvertently translating syntax that should have stayed in its source format and performing full in-context linguistic QA to ensure consistency and accuracy of the translated text.

Don’t take unnecessary risks when translating or localizing your wiki files. When you work with capable engineers and project managers, and when you combine human intelligence with technology, your linguistic and technical goals can be achieved when the most difficult files can be correctly parsed and translated.

GlobalVision offers professionalism, competency and engineering skills when localizing Wiki Media and other specialized types of XML and software formats. Click on Spatial Corp to read their success story!

Translation Management System BenefitsTranslation Management System Benefits

This white paper will guide you through the translation & localization process and shows you how the use of a robust Translation Management System will help alleviate many of the quality translation issues typically encountered in traditional translation processes. Download it for free!

Please Share!   Facebooktwittergoogle_plusredditpinterestlinkedinmail