Fuzzy Match or Fuzzy MathRecently, a blog was published about Translation Memory Matching. In it, the authors explained what Translation Memory (TM) is and how fuzzy matching is derived. But a couple of things in their blog did not add up prompting me to write about them.

The following is the example that they gave:

Assuming a source segment:
The lazy brown fox jumped over the quick brown dog.

And a previously translated segment in the TM:
The lazy brown dog jumped over the quick brown fox.

Comparing the two, there are 10 words in the new source segment of which 2 have effectively deviated from the one in the TM. The word dog became fox and the word fox became dog.

While explaining how Translation Memory fuzzy match engines work, the authors indicated that the source segment contains 50 characters and that the fuzzy match is calculated at 92%.

From the given information, it seems that the fuzzy match engine that they apply uses the Levenshtein distance method to calculate the penalty. There are 4 character substitutions out of 50 total (f, x, d and g in fox and dog), or 46 / 50 characters remaining the same, equaling 92% match. Sounds good?

But wait a second, since when do translators translate characters, one at a time? Sounds like they are comparing apples to oranges, no?

Although the Levenshtein distance is a good method to identify possible fuzzy matches from a large pool of already translated segments in the TM, it is by no means an accurate method to calculate the actual fuzzy match %. This fuzzy match % is a measure that identifies the work ahead for the translator to change the fuzzy match segment to accurately represent the meaning of the new source segment. Two changing words out of 10 sounds a lot more like an 80% fuzzy match to me. But then again, translators do not translate a word at a time. Still, 80% is a lot more accurate than 92%.

What translation agencies won’t tell you

Why is that important you may ask? Well, for many reasons, including proper planning, scheduling and budgeting (read word count demystified). So, if you are a translator working for translation agencies, make sure you question their fuzzy math calculations when they issue you the translation purchase order.

Another important point was the explanation given for why vendors charge their clients to accept 100% matches. The explanation makes sense. But since TM tools enable the acceptance of 100% matches with “minimal efforts”, if you are the client, what do you think is a reasonable overhead cost for you to pay in this case? Look up what you are being billed on 100% matches and if you think it is more than a “minimal cost”, ask your translation agencies why that is so. Chances are you won’t like their numbers or their answers!

