The languages one learns as a child will influence nearly every aspect of their life: their community, their access to information, and even their career prospects. I observed this most acutely while living in the Middle East, where I met bright and ambitious people who were often cut off from intellectual work because they didn't speak English. While the capacity for language is shared by all of humanity and one of the most fascinating aspects of human intelligence, language differences can divide us socially and economically.
A few weeks ago, our friends over at GALA jointly hosted a webinar with our CEO Spence Green called "Scaling Localization With Artificial Intelligence and Automation."
Lilt is happy to announce the addition of Bulgarian and Slovenian to our platform today. With this, Lilt officially supports 40 languages; or 50 including variations of languages such as Castilian and Latin American Spanish.
Today I’m pleased to announce that we raised $9.5M in new funding led by Sequoia Capital. Bill Coughran, partner at Sequoia, will join our board. Our existing investors‒Redpoint Ventures, Zetta Venture Partners, and XSeed Capital‒all participated in the round. Series A funding indicates two milestones in an enterprise company’s life: strong revenue and momentum, and a compelling plan for the deployment of new capital. It also marks the start of a new partnership, in this case among us, Bill, and Sequoia. We are thrilled because Bill is that unique leader who has contributed to science, managed large technical teams, and led businesses. Sequoia has also funded the businesses we admire most, among them Google, Apple, and Stripe. In this post, I’ll describe what we’ve achieved, what we plan to do, and why we are certain that Bill is the right partner to help us do it. How We Got Here
A major problem in effective deployment of machine learning systems in practice is domain adaptation — given a large auxiliary supervised dataset and a smaller dataset of interest, using the auxiliary dataset to increase performance on the smaller dataset. This paper considers the case where we have K datasets from distinct domains and adapting quickly to a new dataset. It learns K separate models on each of the K datasets and treats each as experts. Then given a new domain it creates another model for this domain, but in addition, computes attention over the experts. It computes attention via a dot product that computes the similarity of the new domain’s hidden representation with the other K domains’ representations.
When doing beam search in sequence to sequence models, one explores next words in order of their likelihood. However, during decoding, there may be other constraints we have or objectives we wish to maximize. For example, sequence length, BLEU score, or mutual information between the target and source sentences. In order to accommodate these additional desiderata, the authors add an additional term Q onto the likelihood capturing the appropriate criterion and then choose words based on this combined objective.
Neural MT systems generate translations one word at a time. They can still generate fluid translations because they choose each word based on all of the words generated so far. Typically, these systems are just trained to generate the next word correctly, based on all previous words. One systematic problem with this word-by-word approach to training and translating is that the translations are often too short and omit important content. In the paper Neural Machine Translation with Reconstruction, the authors describe a clever new way to train and translate. During training, their system is encouraged not only to generate each next word correctly but also to correctly generate the original source sentence based on the translation that was generated. In this way, the model is rewarded for generating a translation that is sufficient to describe all of the content in the original source.
The most popular way of finding a translation for a source sentence with a neural sequence-to-sequence model is a simple beam search. The target sentence is predicted one word at a time and after each prediction, a fixed number of possibilities (typically between 4 and 10) is retained for further exploration. This strategy can be suboptimal as these local hard decisions do not take the remainder of the translation into account and can not be reverted later on.
This article describes the technology behind Lilt’s interactive translation suggestions. The details were first published in an academic conference paper, Models and Inference for Prefix-Constrained Machine Translation. Machine translation systems can translate whole sentences or documents, but they can also be used to finish translations that were started by a person — a form of autocomplete at the sentence level. In the computational linguistics literature, predicting the rest of a sentence is called prefix-constrainedmachine translation. The prefix of a sentence is the portion authored by a translator. A suffix is suggested by the machine to complete the translation. These suggestions are proposed interactively to translators after each word they type. Translators can accept all or part of the proposed suffix with a single keystroke, saving time by automating the most predictable parts of the translation process.