Under The hood of Translation Memory Tools

Some pundits in our industry are advocating that it is time for translation memory tools to become free to all. Before you join the bandwagon, have you ever wondered how translation memory (TM) tools work? Did you ever use them and run into issues and wondered why these issues surface?

Like for instance why do you get non-100% matches when you analyze a file that you just finished fully translating? Or why MS-Word word count is different than your TM tool’s count? Or why fuzzy matches are calculated the way they are?

To really understand TM tools, one has to take a closer look under the hood. Here is a summary to help you make a more educated decision.

All TM tools contain these 4 essential parts: parsers, segmenter, fuzzy match engine and graphical user interface (GUI):

1. Parser: Source files need to be parsed correctly by the tool to start the process. This means that the TM tool has to read files and programmatically and correctly dissect them into external code, text to translate, and internal code to the translatable text. An example of external code could be the numerations that you see in this blog. An example of internal code is a font typesetting, like bold or italic.

What complicates parsers are the different file formats that are used, like XML, HTM, InDesign, FrameMaker, MS Word, Excel, PowerPoint, MS Publisher, PageMaker, QuarkXPress, PHP, Java, RC… Most authoring tools are migrating to support XML, but still require custom formats. XLIFF can make this easier, but it has not gained much traction yet among authoring tool providers.

2. Segmenter: This is the language-tuned engine that determines where to break the segment. Typically segments end after a period “.” or a paragraph ending, like a hard line return. For instance, headers, titles and callouts, all are independent segments that don’t necessarily end with a period. The title above for instance, is a full segment that should be translated independent of the sentence that follows it.

A good segmenter will contain as many of the exception rules that are needed for each source language that it supports. For instance, it is important for it to discern between an end of segment period and a period used in abbreviated words or numeric notations. Numbers such as “2.0” or abbreviations like “misc.” will therefore not break the segment into incomplete sentences. In short, segmenters should be smart and tuned for each language to provide the translators full sentences to translate, or else, the resulting translation risks being grammatically incorrect.

3. Fuzzy match engine: This is the mathematical brain of the tool that helps find the closest match in the translation database and set the necessary penalties to the match. This is done to accurately compensate the translator for the effort needed to adapt the fuzzy match into a correct translation, and to accurately estimate the time needed to perform the work.

Good fuzzy match engines are fast, robust, scalable and accurate. They don’t simply compare characters or bytes– they compare words and strings of words at a time. They also have built-in intelligence to identify the changes that are relevant and the ones that are less relevant to the translator. For instance, an internal tag change, a punctuation change or an extra space may have an impact on the target sentence; they are however not as important as changing “dig” into “dog”!

4. Graphical user interface: It is no secret that translation tasks continue to be the critical path of any localization project and that the more efficient the translator becomes the less costly and faster it is for him or her to deliver. Here, the GUI plays a big role in making the entire environment useful to the translator, or not.

Ideally, all project assets including the translation memory, the terminology database, the query database and machine translation options, should be at the fingertips of translators for them to become as efficient as possible while performing their work. With the advent of online translation environments, achieving this is in our reach.

Perhaps there is room for free tools for beginners or non-professional translators. If you are a professional translator however, your time is very valuable. If you use the free tool but end up spending an extra hour or two each time you need to complete the translation of a file, or to properly parse it, or to correctly segment it, or to accurately analyze it, or to properly finalize and format it, or to adequately export the translation memory, well then, is the free tool really free?

LOCALIZATION QUIZ

Test your localization experience level using a brief online questionnaire and then get your score served up immediately after the test is taken.

Your Localization expertise score will be totaled at the end of the questionnaire where you can request your FREE 15-page custom audit report!

Leave a Reply

Your email address will not be published. Required fields are marked *