The most popular way of finding a translation for a source sentence with a neural sequence-to-sequence model is a simple beam search. The target sentence is predicted one word at a time and after each prediction, a fixed number of possibilities (typically between 4 and 10) is retained for further exploration. This strategy can be suboptimal as these local hard decisions do not take the remainder of the translation into account and can not be reverted later on.
In this paper, the authors propose to tackle this problem by keeping a single stack of partial hypotheses with different lengths alive. If a previous decision turns out to be sub-optimal a few words later, it is now possible to revisit them. To make the determination the authors use an additional tiny neural network to predict the expected sentence length, for each source sentence and each partial translation. If the two differ significantly it is considered an indicator that a previous decision should be revisited.
Paper: Single-Queue Decoding for Neural Machine Translation
Authors: Raphael Shu, Hideki Nakayama
Venue: University of Tokyo