240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx

Jin-Woo Jeong
Network Science Lab
Dept. of Mathematics
The Catholic University of Korea
E-mail: zeus0208b@gmail.com
Ilya Sutskever, Oriol Vinyals, Quoc V.
Le

1
 Previous work
• LSTM
 Introduction
• Motivation
• Introduction
 Model
 Experiment
• Dataset
• Training details
• Experimental results
Conclusion
Q/A

2
Previous work
LSTM
• In the previous presentation, I built a stock price prediction program using LSTM. However, I only checked
the prediction visually and did not measure the accuracy. For time series data, similar to regression, it is
difficult to measure accuracy, so I used MAE to evaluate it. For numerical comparison, I measured the
training MAE and the test MAE.

3
Introduction
Motivation
• DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of
fixed dimensionality. It is a significant limitation, since many important problems are best expressed with
sequences whose lengths are not known a-priori.
• It is therefore clear that a domain-independent method that learns to map sequences to sequences would
be useful.
• In this paper, they show that a straightforward application of the Long Short-Term Memory (LSTM)
architecture can solve general sequence to sequence problems.

4
Introduction
Introduction
• The idea is to use one LSTM to read the input sequence, one timestep at a time, to obtain large fixed
dimensional vector representation, and then to use another LSTM to extract the output sequence from that
vector.
• The model stops making predictions after outputting the end-of-sentence token.
• And LSTM reads the input sentence in reverse, because doing so introduces many short term dependencies
in the data that make the optimization problem much easier.

5
Introduction
Introduction
• Here is a figure for better understanding. The source sentence enters the encoder, and one
context vector with the information of the sentence enters the decoder for translation, and then
the translation proceeds until <eos> is obtained.
• Encoders and decoders have different parameters (weights).
• In this figure, the source sentences are in their original order, but they often
mention that reversing the tokens in the sentences greatly improved the
performance of the model.
• they said that the simple trick of reversing the words in the source sentence is
one of the key technical contributions of this work.

6
Model
Model
The goal of the LSTM is to estimate the conditional probability p(𝑦1, ⋯ , 𝑦𝑇′ |𝑥1, ⋯ , 𝑥𝑇 ) where
(𝑥1, ⋯ , 𝑥𝑇 ) is an input sequence and 𝑦1, ⋯ , 𝑦𝑇′ is its corresponding output sequence whose length T′ may
differ from T . The LSTM computes this conditional probability by first obtaining the fixed dimensional
representation 𝑣 of the input sequence (𝑥1, ⋯ , 𝑥𝑇 ) given ›by the last hidden state of the LSTM, and then
computing the probability of (𝑦1, ⋯ , 𝑦𝑇′ ) with a standard LSTM-LM formulation whose initial hidden state
is set to the representation v of 𝑥1, ⋯ , 𝑥𝑇 :
In this equation, each p(𝑦𝑡|𝑣, 𝑦1, ⋯ , 𝑦𝑡−1 ) distribution is represented with a softmax over all the words in
the vocabulary.

7
Experiments
Experiments
• Dataset: WMT’14 English to French MT task
• In Experiments, they used two different LSTMs(Encoder & Decoder) and they choose LSTM with four
layers and they reverse the order of the words of the input sentence. (but not the target sentences in
training and test set)
• Training details
• Initialize all of the LSTM’s parameters with the uniform distribution between -0.08 and 0.08
• Uses stochastic gradient descent without momentum, with fixed learning rate of 0.7. After 5 epochs,
they began halving the learning rate every half epoch. And they trained models for a total of 7.5
epochs.
• Uses batches of 128 sequences.
• They made sure that all sentences in a minibatch are roughly of the same length.

8
Experiments
Experimental Results

9
Experiments
SMT : Statistical Machine Translation

10
Experiments

11
Experiments

12
Conclusion
Conclusion
• The success of their simple LSTM-based approach on MT suggests that it should do well on many other
sequence learning problems, provided they have enough training data.
• We conclude that it is important to find a problem encoding that has the greatest number of short term
dependencies, as they make the learning problem much simpler.
• These results suggest that their approach will likely do well on other challenging sequence to sequence
problems.

240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx

Recommended

Recommended

More Related Content

Similar to 240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx

Similar to 240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx (20)

More from thanhdowork

More from thanhdowork (20)

Recently uploaded

Recently uploaded (20)

240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx

Editor's Notes