Webinar: How to Run a Data-Driven MT Evaluation

There are many machine translation systems on the market today. Choosing the most effective system for a particular production workflow can be a difficult task. Lilt’s CEO and co-founder, Spence Green, recently hosted a webinar to show a principled, data-driven method that is used by researchers to score systems. The webinar covered automatic measures such… Read More Webinar: How to Run a Data-Driven MT Evaluation

2017 Machine Translation Quality Evaluation Addendum

This post is an addendum to our original post on 1/10/2017 entitled 2017 Machine Translation Quality Evaluation. Experimental Design We evaluate all machine translation systems for English-French and English-German. We report case-insensitive BLEU-4 [2], which is computed by the mteval scoring script from the Stanford University open source toolkit Phrasal (https://github.com/stanfordnlp/phrasal). NIST tokenization was applied… Read More 2017 Machine Translation Quality Evaluation Addendum

2017 Machine Translation Quality Evaluation

The language services industry now has an array of machine translation options. The goal of Lilt Labs is to provide reproducible and unbiased evaluations of these options using public datasets and a rigorous methodology. The 2017 evaluation is intended to assess machine translation quality in a prototypical translation workflow. Therefore, it includes not only an evaluation of baseline translation… Read More 2017 Machine Translation Quality Evaluation