The Encoder-Decoder Model

The sequence-to-sequence labeling problem is to algorithmically map a sequence on one alphabet to a “good” sequence on another alphabet. The two alphabets may differ. As might the lengths of the sequences. Implicit in this definition is that there is some way to distinguish between good and bad mappings.

Let’s see some noteworthy examples.

Machine translation: Translate sequences of words from one natural language to another. (Such as from English to French.) Each language’s lexicon of words forms its alphabet.

Conversational bot: Input a sentence or question by a human, the bot supplies a suitable response. In this setting, the input and the output alphabets are the same. The human and the bot are “speaking the same language”.

Summarization: Input a long sentence, paragraph, or longer text and summarize its contents.

Part-of-speech tagging: Input a sequence of words in a language (say, English) and output the sequence of the POS tags of the words (noun, verb, adjective, etc).

Speech Recognition: Input a sequence of phonemes (spoken) and output the sequence of text they represent.

This labeling problem is a machine learning problem because it involves predicting an output sequence for a given input sequence. This is a form of supervised learning. That said, the input and output are both sequences.

What makes this problem hard?

Below we focus on machine translation.

Lexicons are huge: Take English and French. English has lots of words. As does French. Which English words map to which French ones? For those that are mappable, that is.

Lexicons may not be mappable: The two languages may be so different that their lexicons may not even be mappable. That is, what may be expressed concisely by a single word in one language may require quite a few words in the other.

It’s not just words: Consider the phrase data mining. It has a very specific meaning that emerges from the two words data and mining being in proximity. (In NLP, this phenomenon is called collocations.) Clearly, we need to somehow capture this meaning so it’s not lost during translation.

This is just the tip of the iceberg. More nuanced concepts require more elaborate sequences of words to express them. For them to not get lost during translation we need to somehow capture their essence.

As Supervised Machine Learning

The sequence-to-sequence labeling problem may be framed as learning to predict the output sequence for a given…

Continue reading:—-7f60cf5620c9—4