Measuring and Comparing Machine Translation Quality

by Drew Evans

May, 20, 2021 1 Minute Read

In the last few decades, machine translation has become more and more common as a tool to improve speed and reduce cost for companies scaling localization programs. The process itself has helped to make large amounts of the world’s content available for people outside of the target language audience.

But machine translation is more than just a simple process that applies equally to all scenarios. There are multiple types of machine translation systems, different approaches to its use, and various statistics and ways to measure its success.

One of the most common ways to measure machine translation output is using Automatic Evaluation with Bilingual Evaluation Understudy, also known as the BLEU Score. This historical standard compares raw machine translation output to a human translation - the higher the score, the closer the two are to one another. Using the BLEU Score with a little research and engineering, it’s possible to compare popular machine translation systems and see how they score.

In our whitepaper Measuring and Comparing Machine Translation Quality, we break down the details to help you better understand the various types of machine translation, how to measure quality, and see a direct comparison between Google, Microsoft, and Lilt’s various machine translation systems to see how they stack up.

• • •

Click here to download the whitepaper, and contact us to learn more about how Lilt’s language services can help you cover your localization needs.

View All Posts

August, 5, 2021

Lilt Launches Instant Translate Machine Translation Solution

4 Minute Read

January, 10, 2017

Machine Translation Tools: Comprehensive BLEU Evaluation

2 Minute Read

The language services industry offers an intimidating array of machine translation options. To help you separate the truly innovative from the middle-dwellers, your pals here at Lilt set out to provide reproducible and unbiased evaluations of these options using public data sets and a rigorous methodology. This evaluation is intended to assess machine translation not only in terms of baseline translation quality, but also regarding the quality of domain adapted systems where available. Domain adaptation and neural networks are the two most exciting recent developments in commercially available machine translation. We evaluate the relative impact of both of these technologies for the following commercial systems: