We were recently contacted by a software publisher asking us to consider Machine Translation (MT) use for translating their knowledge base. Given the volumes involved, they were looking at a way to lower their costs.
Their hypothesis for using MT was based on the following:
- Knowledge bases, unlike software GUI, documentation or help, do not need to have a high level of quality
- Since they contain massive amounts of information, it is impossible for humans to translate them fast enough to meet their rapid expansions
- Some entries are never or rarely used by users
- The entries themselves are authored by many support people and not by professional tech pub writers, so the grammatical quality of the source content is already at an inferior level.
Bottom line, since inferior translation quality is acceptable, perhaps use of MT is justified.
We’ve had long ago experimented with MT and concluded that its benefits do not save our professional translators time. Reworking the output of MT is more time consuming that translating from scratch. But given recent hype about new methods and technologies, we decided to put their hypothesis to the test.
We randomly selected sentences from their knowledge base and gave off-the-shelf MT solutions a try. We found many problems, mainly in inaccurate translations and terminology use, and particularly that the source was not in a perfect shape (see last bullet above). We will limit the discussion here to a simple example to illustrate our point. The source English sentence that we will use is: The operation of saving the assembly as a multi-body part was a point in time event.
With much press on the new Statistical Machine Translation (SMT) technology from the University of Southern California, and its proclaimed higher fidelity than rule-based translation output, we decided to give it a try. SMT depends on vast (multimillions of words) existing translation databases, so we opted to go to the fore-front leader in serving content, Google. After all, they are the best at indexing the world-wide web and if anyone can make benefit of the vast existing translations on the internet, it will be them.
Google’s translation of our text sentence into French was the following: Le fonctionnement de l’économie d’un assemblage de plusieurs partie du corps a été un moment manifestation.
Despite other problems, the key term that I want you to focus on is saving. It was translated by Google as economizing, giving it a financial tone.
With Systran, the engine used by many free online translation engines like Altavista’s Bable Fish, the translation into French was: L’opération de sauver l’assemblée comme pièce de multi-corps était un point dans l’événement de temps.
Microsoft’s Beta Translate site was very similar to Systran’s translation: L’opération de sauver l’assemblée comme pièce de multi-corps était un événement de moment.
But both Systran and Microsoft interpreted saving as rescuing!
It took a human being to realize that the text is intended for a software application and to correctly infer that saving is intended for registering the file (enregistrer) of the assembly and not for rescuing or economizing it!
This was not a surprise to us. When you deal with translations every day, hour and minute, you know that there is no real substitute today to human translations.
Some say that despite all this, the gist of the meaning is still maintained and the international user can benefit from MT. It is better than not having any translation at all. Perhaps. But when you are a successful and reputable professional company and your brand and image are on the line, are you willing to risk it all without looking at better options?
Your goal should be to seek quality and accuracy in everything that you publish, no matter if it is product, website, support, PR, training, legal, financial or knowledge base related. So how can you balance brand, image and cost trade offs when it comes to translating bulk content?
Simple: Divide, Prioritize and Conquer! Stay tuned for the next blog.