Translation Database Fuzzy Matches and Word Count Demystified

How to get your translation projects on schedule and on budget every time!

Translation Database Fuzzy MatchesMost global content publishers, software or other, perform updates on their product, manuals, website, and marketing literature yearly or on a more regular basis. They update their source language files and engage their localization vendor or staff to update all the supported target languages. Calculating the word count to translate is very important to determine the effort involved. Translation database fuzzy matches comes into play!

Translation Database

Changes to the source files take the shape of additions (new text), removal of obsolete text (deletions), or edits (modifications to existing text).

The new text requires new translations for each target language. The deleted text is disregarded. The edited or modified text will require updating in all target languages.

When a top-down localization process is applied and a translation database (translation memory or TM) is in use, the search engine looks for segment changes (complete phrases or complete segments) to the source. The following is the result:

translation database fuzzy match

Fig1: Example of fuzzy matching analysis

  1. No match or new text: Typically generates little match in the database and requires full translation.
  2. Repeat or unchanged text: Generates a 100% match from the database, not requiring any changes.
  3. Edited or modified text: Results in a “fuzzy” match. This is a match in the database that can be anywhere from 50-99% of the original. Anything under 50% is considered no match.
  4. Deleted text: Produces no impact on the translation update effort, since the text no longer exists.

Fuzzy Matches

Translation databases store language pair segments or sentences. A search engine is run on the newly released source text that analyzes the text one segment at a time, comparing it against what is already in the database.

  • If a 100% match is found, then it is considered an exact match.
  • If the search engine finds a similar but not an exact match, it allocates a fuzzy match percentage to it, anywhere from 50% to 99%.

For instance, a sentence with ten words having just one word difference from a sentence stored in the database will result in a 90% fuzzy match. If it has only five words in common with another sentence, then the fuzzy match is 50%.

By calculating the fuzzy match of each sentence, one can approximate the effort of translation needed to perform the full update in any target language.

Calculation Algorithms

At GlobalVision, we apply weights to strings to calculate the “equivalent” new word count to translate. For instance, the sentence with ten words having just one word changed since the last release is calculated as two new words to translate (20%). A sentence of ten words with four or more words changed is calculated as ten new words (100%). Other percentages are applied in between.

Internal changes to the sentence tags (bold, italic, links, internal font or color change, etc.) will also force a fuzzy match. A weight is applied to these changes as well, as they also require translator intervention.

The analysis and calculations are done by the translation database/search engine software. These are based on algorithms built in the software that objectively approximate the new number of words to translate. The results are not 100% exact, but during the past ten years in using these algorithms, we have satisfied all our clients.

99% Success Rate

Applying an appropriate weight to each fuzzy match is a process that we use to estimate not only the cost, but also the staffing and scheduling data. This is why we can accomplish 99% of our projects on schedule and on budget for our clients! This is not just a claim; we have the data to prove it! Contact us for more information. 

How to Plan & Budget Translation Projects Whitepaper

Whitepaper – How to Plan & Budget for Translation & Localization Projects? opens in a new window

This whitepaper presents you with straight forward metrics that will help you determine the budget and schedules needed for translation and localization projects. Don’t request a bid without reading it first! Get it now for free!!

    How did you learn about us?