In 2007, we were contacted by a software publisher asking us to consider Machine Translation (MT) use for translating their knowledge base. Given the volumes involved, they were looking at a way to lower their costs. So we took a stab at investigating different engines to determine the state of machine translation quality.
Machine Translation Motivation
Their hypothesis for using MT was based on the following:
- Knowledge bases, unlike software GUI, documentation or help, do not need to have a high level of quality
- Since they contain massive amounts of information, it is impossible for humans to translate them fast enough to meet their rapid expansions
- Some entries are never or rarely used by users
- The entries themselves are authored by many support people and not by professional tech pub writers. The grammatical quality of the source content is already at an inferior level.
Bottom line, since inferior translation quality is acceptable, perhaps use of MT is justified.
Machine Translation Quality Test
We’ve had long ago experimented with MT and concluded that its benefits do not save our professional translators time. Reworking the output of MT is more time consuming that translating from scratch. But given recent hype about new methods and technologies, we decided to put their hypothesis to the test.
We randomly selected sentences from their knowledge base and gave off-the-shelf MT solutions a try. Within minutes, we found many problems. They were mainly in inaccurate translations and terminology use. This is particularly becuase the source was not in a perfect shape (see last bullet above). We will limit the discussion here to a simple example to illustrate our point. The source English sentence that we will use is the following. The operation of saving the assembly as a multi-body part was a point in time event.
With much press on the new Statistical Machine Translation (SMT) technology from the University of Southern California, and its proclaimed higher fidelity than rule-based translation output, we decided to give it a try. SMT depends on vast (multi-millions of words) existing translation databases. So we opted to go to the fore-front leader in serving content, Google. After all, they are the best at indexing the world-wide web. If anyone can make benefit of the vast existing translations on the internet, it will be them.
Google Machine Translation
Google’s translation of our text sentence into French in 2007 was the following. Le fonctionnement de l’économie d’un assemblage de plusieurs partie du corps a été un moment manifestation.
In 2019 with neural network learning the following was the result. L’opération de sauvegarde de l’assemblage en tant que pièce à plusieurs corps était un événement ponctuel.
Despite other problems, the key term that I want you to focus on is saving. It was translated by Google in 2007 as economizing, giving it a financial tone. In 2019, it wants to safeguard it!
Systran and Microsoft
With Systran, the engine used by many free online translation engines like Altavista’s Bable Fish, the translation into French was as follows. L’opération de sauver l’assemblée comme pièce de multi-corps était un point dans l’événement de temps.
Microsoft’s Translate site in 2007 was very similar to Systran’s translation. L’opération de sauver l’assemblée comme pièce de multi-corps était un événement de moment.
In 2019 the result was the following. Le fonctionnement de l’enregistrement de l’assemblage en tant que pièce à plusieurs corps a été un événement de point dans le temps.
But both Systran and Microsoftin 2007 interpreted saving as rescuing!
It took a human being to realize that the text is intended for a software application and to correctly infer that saving is intended for registering the file (enregistrer) of the assembly and not for rescuing or economizing it!
But finally Microsoft figured it out in 2019! The translation of the sentence is also much more accurate!
This was not a surprise to us. When you deal with translations every day, hour and minute, you know that there is no real substitute today to human translations.
Some say that despite all this, the gist of the meaning is still maintained and the international user can benefit from MT. It is better than not having any translation at all. Perhaps. But when you are a successful and reputable professional company and your brand and image are on the line, are you willing to risk it all without looking at better options?
Your goal should be to seek quality and accuracy in everything that you publish, no matter if it is product, website, support, PR, training, legal, financial or knowledge base related. So how can you balance brand, image and cost trade offs when it comes to translating bulk content?
Simple: Divide, Prioritize and Conquer! Stay tuned for the next blog.
Whitepaper 10 Tips on Lowering Translation Costs
Are you under constant pressure to lower your translation budgets and do more with less? This whitepaper gives you 10 tangible methods to reduce your translation costs without sacrificing quality. Guaranteed to save you money and a must read for every translation or localization manager.