Statistical Machine Translation for All

In the 80s, I worked for a large European chip manufacturer who at that time was marketing a new technology in video chips. They architected a solid state CCD (Charge Coupled Device) relying on the Frame Transfer (FT) technology to compete with the common Interline Transfer technology adopted by most Japanese video camera manufacturers.

Although the FT technology was superior, it never took off. Why? The company focused on the scientific and professional markets instead of the consumer market that was dominated by Japanese companies. Lacking consumer volumes, they could not justify financing their technology for too long, hence its demise.

Lesson learned? Volume often trumps technology!

But what if someone has the volume, the technology, and offers it for free? How can anyone compete under such conditions? This is the case with statistical machine translation (SMT).

At core, all SMT solutions are based on the same algorithms. And by their nature, they all require intensive mathematical operations on very large sets of bilingual text corpus and even larger monolingual corpora to stand a chance to resemble human translation quality. The winner will not be the user of the better SMT technology, but the user of the one that relies on the largest volume of translation databases and computing muscle (read the unreasonable effectiveness of data).

Effectively, the power to harness SMT lies with the company that accesses, spiders, aligns and indexes the massive volumes of monolingual and multilingual corpora available to the public, while at the same time holds an enormous infrastructure in computing resources. The larger the volume of language corpora processed, the more valuable SMT will be.

But just like translation memory (TM) and rule-based machine translation (RbMT), SMT will not replace human translators or language service providers. With adequate integration in the translation environment, and in time, SMT will offer benefits similar or possibly better than fuzzy-match results from TM tools.

Certainly, human translators will keep on applying the final edits and linguistic quality control; localization and QA engineers will carry on ensuring that the localized product operates correctly; and, project managers will continue overseeing timely and on budget projects’ completions.

If you want to experiment with SMT technology today, don’t look far. Think volume, think cloud computing, think free! Think of the most popular search-engine company in the world. Get the picture?

Please Share!   Facebooktwittergoogle_plusredditpinterestlinkedinmail