Recently, one of our blog entries tiled “Fuzzy Match or Fuzzy Math opens in a new window” caused interesting comments on the G11n – Globalization Professionals group on LinkedIn, where readers argued the equal benefits of the Levenshtein distance opens in a new window method in calculating fuzzy matching during text translations, compared to more refined algorithms.
Opinions stated that since source languages, subject matter, difficulty of text and target languages differ from a project to another, no method can derive a real fuzzy match calculation. Critics also argued that the law of averages will at the end normalize the Levenshtein distance method calculation results making it all acceptable.
Levenshtein Distance Method Limitations
But our experience shows that in the majority of languages, the fuzzy match calculation of the Levenshtein distance method always leads to much more optimistic results than what reality dictates.
Normalized word count based on accurate fuzzy match calculations are important to operations at GlobalVision. This is because we use them to gauge not only the cost, but also the time that a project requires before we complete it. For more information on this read “Translation database fuzzy matches and word count demystified“.
Although we try to lower costs for our clients–which character based fuzzy match engines do due to their constant much higher fuzzy match values–we have to consider the real effort ahead and keep the interest of translators at heart as well. Squeezing translators in time and budget causes a disservice to them and our clients’ end-users.
Some Asian languages, like Chinese, use symbols rather than characters. In their case, the Levenshtein distance method becomes the common denominator. But most other languages, even ones with complex morphology, like Arabic, can benefit from the adapted Levenshtein distance method that accounts for word and not just character changes. The improvement in fuzzy matching accuracy is meaningful.
Text Translations Fuzzy Matching
Fuzzy match engines can benefit from further refinements, but the return on investment quickly diminishes beyond a certain point as issues such as factoring in the complexity of the source text are very complex to simulate in a software algorithm, and the end result will not lead to improved quality of the text translation, only slightly improved metrics. Once the fuzzy match engine is optimized, companies should look for other components of the translation memory tool and perhaps machine translation to help achieve accurate and fast translation.
We welcome outside input and research in this area and will be glad to test any newly devised methods that can demonstrate real significant improvement in the accounting of words to translate.
This white paper will guide you through the translation & localization process. It will show you how a reliable Translation Management System can alleviate many translation issues typically encountered in traditional translation and localization processes. Download it for free!