What We’re Reading: Learning to Decode for Future Success

by Spence Green
1 Minute Read

When doing beam search in sequence to sequence models, one explores next words in order of their likelihood. However, during decoding, there may be other constraints we have or objectives we wish to maximize. For example, sequence length, BLEU score, or mutual information between the target and source sentences. In order to accommodate these additional desiderata, the authors add an additional term Q onto the likelihood capturing the appropriate criterion and then choose words based on this combined objective.

The difficulty here is that we don’t know the values of these quantities until we have completed our decoding. Eg, we don’t know how long the sequence we are going to output is until we have actually finished decoding the sentence. In order to solve this issue, the authors learn Q as a function that has the following inputs: the source sentence, the prefix of previously outputted target symbols, and the current hidden state of the decoder. Based off of this information, it predicts the quantity in question. In the sequence length example, it predicts number of output tokens that the decoder will generate.

Paper: Learning to Decode for Future Success

Authors: Jiwei Li, Will Monroe, Dan Jurafsky

Publication: Stanford University

What We’re Reading: Single-Queue Decoding for Neural Machine Translation

1 Minute Read

The most popular way of finding a translation for a source sentence with a neural sequence-to-sequence model is a simple beam search. The target sentence is predicted one word at a time and after each prediction, a fixed number of possibilities (typically between 4 and 10) is retained for further exploration. This strategy can be suboptimal as these local hard decisions do not take the remainder of the translation into account and can not be reverted later on.

Read More

What We’re Reading: Neural Machine Translation with Reconstruction

1 Minute Read

Neural MT systems generate translations one word at a time. They can still generate fluid translations because they choose each word based on all of the words generated so far. Typically, these systems are just trained to generate the next word correctly, based on all previous words. One systematic problem with this word-by-word approach to training and translating is that the translations are often too short and omit important content. In the paper Neural Machine Translation with Reconstruction, the authors describe a clever new way to train and translate. During training, their system is encouraged not only to generate each next word correctly but also to correctly generate the original source sentence based on the translation that was generated. In this way, the model is rewarded for generating a translation that is sufficient to describe all of the content in the original source.

Read More