SlideShare a Scribd company logo
1 of 14
Jin-Woo Jeong
Network Science Lab
Dept. of Mathematics
The Catholic University of Korea
E-mail: zeus0208b@gmail.com
Ilya Sutskever, Oriol Vinyals, Quoc V.
Le
1
 Previous work
• LSTM
 Introduction
• Motivation
• Introduction
 Model
 Experiment
• Dataset
• Training details
• Experimental results
Conclusion
Q/A
2
Previous work
LSTM
• In the previous presentation, I built a stock price prediction program using LSTM. However, I only checked
the prediction visually and did not measure the accuracy. For time series data, similar to regression, it is
difficult to measure accuracy, so I used MAE to evaluate it. For numerical comparison, I measured the
training MAE and the test MAE.
3
Introduction
Motivation
• DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of
fixed dimensionality. It is a significant limitation, since many important problems are best expressed with
sequences whose lengths are not known a-priori.
• It is therefore clear that a domain-independent method that learns to map sequences to sequences would
be useful.
• In this paper, they show that a straightforward application of the Long Short-Term Memory (LSTM)
architecture can solve general sequence to sequence problems.
4
Introduction
Introduction
• The idea is to use one LSTM to read the input sequence, one timestep at a time, to obtain large fixed
dimensional vector representation, and then to use another LSTM to extract the output sequence from that
vector.
• The model stops making predictions after outputting the end-of-sentence token.
• And LSTM reads the input sentence in reverse, because doing so introduces many short term dependencies
in the data that make the optimization problem much easier.
5
Introduction
Introduction
• Here is a figure for better understanding. The source sentence enters the encoder, and one
context vector with the information of the sentence enters the decoder for translation, and then
the translation proceeds until <eos> is obtained.
• Encoders and decoders have different parameters (weights).
• In this figure, the source sentences are in their original order, but they often
mention that reversing the tokens in the sentences greatly improved the
performance of the model.
• they said that the simple trick of reversing the words in the source sentence is
one of the key technical contributions of this work.
6
Model
Model
The goal of the LSTM is to estimate the conditional probability p(𝑦1, ⋯ , 𝑦𝑇′ |𝑥1, ⋯ , 𝑥𝑇 ) where
(𝑥1, ⋯ , 𝑥𝑇 ) is an input sequence and 𝑦1, ⋯ , 𝑦𝑇′ is its corresponding output sequence whose length T′ may
differ from T . The LSTM computes this conditional probability by first obtaining the fixed dimensional
representation 𝑣 of the input sequence (𝑥1, ⋯ , 𝑥𝑇 ) given ›by the last hidden state of the LSTM, and then
computing the probability of (𝑦1, ⋯ , 𝑦𝑇′ ) with a standard LSTM-LM formulation whose initial hidden state
is set to the representation v of 𝑥1, ⋯ , 𝑥𝑇 :
In this equation, each p(𝑦𝑡|𝑣, 𝑦1, ⋯ , 𝑦𝑡−1 ) distribution is represented with a softmax over all the words in
the vocabulary.
7
Experiments
Experiments
• Dataset: WMT’14 English to French MT task
• In Experiments, they used two different LSTMs(Encoder & Decoder) and they choose LSTM with four
layers and they reverse the order of the words of the input sentence. (but not the target sentences in
training and test set)
• Training details
• Initialize all of the LSTM’s parameters with the uniform distribution between -0.08 and 0.08
• Uses stochastic gradient descent without momentum, with fixed learning rate of 0.7. After 5 epochs,
they began halving the learning rate every half epoch. And they trained models for a total of 7.5
epochs.
• Uses batches of 128 sequences.
• They made sure that all sentences in a minibatch are roughly of the same length.
8
Experiments
Experimental Results
9
Experiments
Experimental Results
SMT : Statistical Machine Translation
10
Experiments
Experimental Results
11
Experiments
Experimental Results
12
Conclusion
Conclusion
• The success of their simple LSTM-based approach on MT suggests that it should do well on many other
sequence learning problems, provided they have enough training data.
• We conclude that it is important to find a problem encoding that has the greatest number of short term
dependencies, as they make the learning problem much simpler.
• These results suggest that their approach will likely do well on other challenging sequence to sequence
problems.
13
Q & A
Q / A

More Related Content

Similar to 240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx

Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyNUPUR YADAV
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016InVID Project
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERTHuali Zhao
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)ALINLAB
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Armando Vieira
 
New Microsoft PowerPoint Presentation (2).pptx
New Microsoft PowerPoint Presentation (2).pptxNew Microsoft PowerPoint Presentation (2).pptx
New Microsoft PowerPoint Presentation (2).pptxpraveen kumar
 
2019 Levenshtein Transformer
2019 Levenshtein Transformer2019 Levenshtein Transformer
2019 Levenshtein Transformer広樹 本間
 
Performance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal ApproximationPerformance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal Approximationiosrjce
 
Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsCheng-An Yang
 
A NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksA NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksKyriakos Chatzidimitriou
 
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...dyyjkd
 
Information retrieval as statistical translation
Information retrieval as statistical translationInformation retrieval as statistical translation
Information retrieval as statistical translationBhavesh Singh
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIlya Kryukov
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Jinho Choi
 
Initialization methods for the tsp with time windows using variable neighborh...
Initialization methods for the tsp with time windows using variable neighborh...Initialization methods for the tsp with time windows using variable neighborh...
Initialization methods for the tsp with time windows using variable neighborh...Konstantinos Giannakis
 

Similar to 240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx (20)

B-G-3
B-G-3B-G-3
B-G-3
 
lecture_11.pptx
lecture_11.pptxlecture_11.pptx
lecture_11.pptx
 
Transfer Learning in NLP: A Survey
Transfer Learning in NLP: A SurveyTransfer Learning in NLP: A Survey
Transfer Learning in NLP: A Survey
 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
 
ICSE20_Tao_slides.pptx
ICSE20_Tao_slides.pptxICSE20_Tao_slides.pptx
ICSE20_Tao_slides.pptx
 
ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016ELLA LC algorithm presentation in ICIP 2016
ELLA LC algorithm presentation in ICIP 2016
 
From_seq2seq_to_BERT
From_seq2seq_to_BERTFrom_seq2seq_to_BERT
From_seq2seq_to_BERT
 
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
Polynomial Tensor Sketch for Element-wise Matrix Function (ICML 2020)
 
Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio Hidden Layer Leraning Vector Quantizatio
Hidden Layer Leraning Vector Quantizatio
 
New Microsoft PowerPoint Presentation (2).pptx
New Microsoft PowerPoint Presentation (2).pptxNew Microsoft PowerPoint Presentation (2).pptx
New Microsoft PowerPoint Presentation (2).pptx
 
2019 Levenshtein Transformer
2019 Levenshtein Transformer2019 Levenshtein Transformer
2019 Levenshtein Transformer
 
Performance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal ApproximationPerformance of Matching Algorithmsfor Signal Approximation
Performance of Matching Algorithmsfor Signal Approximation
 
Superefficient Monte Carlo Simulations
Superefficient Monte Carlo SimulationsSuperefficient Monte Carlo Simulations
Superefficient Monte Carlo Simulations
 
L010628894
L010628894L010628894
L010628894
 
A NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State NetworksA NEAT Way for Evolving Echo State Networks
A NEAT Way for Evolving Echo State Networks
 
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
Presentation File of paper "Leveraging Normalization Layer in Adapters With P...
 
Information retrieval as statistical translation
Information retrieval as statistical translationInformation retrieval as statistical translation
Information retrieval as statistical translation
 
Intel Cluster Poisson Solver Library
Intel Cluster Poisson Solver LibraryIntel Cluster Poisson Solver Library
Intel Cluster Poisson Solver Library
 
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
Incremental Sense Weight Training for In-depth Interpretation of Contextualiz...
 
Initialization methods for the tsp with time windows using variable neighborh...
Initialization methods for the tsp with time windows using variable neighborh...Initialization methods for the tsp with time windows using variable neighborh...
Initialization methods for the tsp with time windows using variable neighborh...
 

More from thanhdowork

[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...thanhdowork
 
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...thanhdowork
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...thanhdowork
 
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...thanhdowork
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptxthanhdowork
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...thanhdowork
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...thanhdowork
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...thanhdowork
 
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...thanhdowork
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...thanhdowork
 
240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptxthanhdowork
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...thanhdowork
 
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....thanhdowork
 
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptxthanhdowork
 
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...thanhdowork
 
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptxthanhdowork
 
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...thanhdowork
 
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...thanhdowork
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...thanhdowork
 

More from thanhdowork (20)

[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
[20240429_LabSeminar_Huy]Spatio-Temporal Graph Neural Point Process for Traff...
 
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
240429_Thanh_LabSeminar[TranSG: Transformer-Based Skeleton Graph Prototype Co...
 
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
240429_Thuy_Labseminar[Simplifying and Empowering Transformers for Large-Grap...
 
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
240422_Thanh_LabSeminar[Dynamic Graph Enhanced Contrastive Learning for Chest...
 
[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx[20240422_LabSeminar_Huy]Taming_Effect.pptx
[20240422_LabSeminar_Huy]Taming_Effect.pptx
 
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
240422_Thuy_Labseminar[Large Graph Property Prediction via Graph Segment Trai...
 
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
[20240415_LabSeminar_Huy]Deciphering Spatio-Temporal Graph Forecasting: A Cau...
 
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
240315_Thanh_LabSeminar[G-TAD: Sub-Graph Localization for Temporal Action Det...
 
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
240415_Thuy_Labseminar[Simple and Asymmetric Graph Contrastive Learning witho...
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 
240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx240122_Attention Is All You Need (2017 NIPS)2.pptx
240122_Attention Is All You Need (2017 NIPS)2.pptx
 
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
240226_Thanh_LabSeminar[Structure-Aware Transformer for Graph Representation ...
 
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
[20240304_LabSeminar_Huy]DeepWalk: Online Learning of Social Representations....
 
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
240304_Thanh_LabSeminar[Pure Transformers are Powerful Graph Learners].pptx
 
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
240304_Thuy_Labseminar[SimGRACE: A Simple Framework for Graph Contrastive Lea...
 
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
[20240311_LabSeminar_Huy]LINE: Large-scale Information Network Embedding.pptx
 
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
240311_Thanh_LabSeminar[Translating Embeddings for Modeling Multi-relational ...
 
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
240311_Thuy_Labseminar[Contrastive Multi-View Representation Learning on Grap...
 
240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx240318_JW_labseminar[Attention Is All You Need].pptx
240318_JW_labseminar[Attention Is All You Need].pptx
 
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
[20240318_LabSeminar_Huy]GSTNet: Global Spatial-Temporal Network for Traffic ...
 

Recently uploaded

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersChitralekhaTherkar
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptxPoojaSen20
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Recently uploaded (20)

Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Micromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of PowdersMicromeritics - Fundamental and Derived Properties of Powders
Micromeritics - Fundamental and Derived Properties of Powders
 
PSYCHIATRIC History collection FORMAT.pptx
PSYCHIATRIC   History collection FORMAT.pptxPSYCHIATRIC   History collection FORMAT.pptx
PSYCHIATRIC History collection FORMAT.pptx
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 

240311_JW_labseminar[Sequence to Sequence Learning with Neural Networks].pptx

  • 1. Jin-Woo Jeong Network Science Lab Dept. of Mathematics The Catholic University of Korea E-mail: zeus0208b@gmail.com Ilya Sutskever, Oriol Vinyals, Quoc V. Le
  • 2. 1  Previous work • LSTM  Introduction • Motivation • Introduction  Model  Experiment • Dataset • Training details • Experimental results Conclusion Q/A
  • 3. 2 Previous work LSTM • In the previous presentation, I built a stock price prediction program using LSTM. However, I only checked the prediction visually and did not measure the accuracy. For time series data, similar to regression, it is difficult to measure accuracy, so I used MAE to evaluate it. For numerical comparison, I measured the training MAE and the test MAE.
  • 4. 3 Introduction Motivation • DNNs can only be applied to problems whose inputs and targets can be sensibly encoded with vectors of fixed dimensionality. It is a significant limitation, since many important problems are best expressed with sequences whose lengths are not known a-priori. • It is therefore clear that a domain-independent method that learns to map sequences to sequences would be useful. • In this paper, they show that a straightforward application of the Long Short-Term Memory (LSTM) architecture can solve general sequence to sequence problems.
  • 5. 4 Introduction Introduction • The idea is to use one LSTM to read the input sequence, one timestep at a time, to obtain large fixed dimensional vector representation, and then to use another LSTM to extract the output sequence from that vector. • The model stops making predictions after outputting the end-of-sentence token. • And LSTM reads the input sentence in reverse, because doing so introduces many short term dependencies in the data that make the optimization problem much easier.
  • 6. 5 Introduction Introduction • Here is a figure for better understanding. The source sentence enters the encoder, and one context vector with the information of the sentence enters the decoder for translation, and then the translation proceeds until <eos> is obtained. • Encoders and decoders have different parameters (weights). • In this figure, the source sentences are in their original order, but they often mention that reversing the tokens in the sentences greatly improved the performance of the model. • they said that the simple trick of reversing the words in the source sentence is one of the key technical contributions of this work.
  • 7. 6 Model Model The goal of the LSTM is to estimate the conditional probability p(𝑦1, ⋯ , 𝑦𝑇′ |𝑥1, ⋯ , 𝑥𝑇 ) where (𝑥1, ⋯ , 𝑥𝑇 ) is an input sequence and 𝑦1, ⋯ , 𝑦𝑇′ is its corresponding output sequence whose length T′ may differ from T . The LSTM computes this conditional probability by first obtaining the fixed dimensional representation 𝑣 of the input sequence (𝑥1, ⋯ , 𝑥𝑇 ) given ›by the last hidden state of the LSTM, and then computing the probability of (𝑦1, ⋯ , 𝑦𝑇′ ) with a standard LSTM-LM formulation whose initial hidden state is set to the representation v of 𝑥1, ⋯ , 𝑥𝑇 : In this equation, each p(𝑦𝑡|𝑣, 𝑦1, ⋯ , 𝑦𝑡−1 ) distribution is represented with a softmax over all the words in the vocabulary.
  • 8. 7 Experiments Experiments • Dataset: WMT’14 English to French MT task • In Experiments, they used two different LSTMs(Encoder & Decoder) and they choose LSTM with four layers and they reverse the order of the words of the input sentence. (but not the target sentences in training and test set) • Training details • Initialize all of the LSTM’s parameters with the uniform distribution between -0.08 and 0.08 • Uses stochastic gradient descent without momentum, with fixed learning rate of 0.7. After 5 epochs, they began halving the learning rate every half epoch. And they trained models for a total of 7.5 epochs. • Uses batches of 128 sequences. • They made sure that all sentences in a minibatch are roughly of the same length.
  • 10. 9 Experiments Experimental Results SMT : Statistical Machine Translation
  • 13. 12 Conclusion Conclusion • The success of their simple LSTM-based approach on MT suggests that it should do well on many other sequence learning problems, provided they have enough training data. • We conclude that it is important to find a problem encoding that has the greatest number of short term dependencies, as they make the learning problem much simpler. • These results suggest that their approach will likely do well on other challenging sequence to sequence problems.
  • 14. 13 Q & A Q / A

Editor's Notes

  1. As a result, the train MAE is 36won that means that on averages, the difference between predicted stock price and real stock price is about 36won. But the test MAE is almost 5000won. This doesn't look like a good number. However, when we look at the graph, the model is getting the flow of the graph right.
  2. 본 논문에서는 LSTM을 활용한 효율적인 Seq2Seq 기계번역 아키텍처를 제안합니다. Seq2Seq는 딥러닝 기반 기계 번역의 돌파구와 같은 역할을 수행했습니다. In this paper, they propose an efficient Seq2Seq machine translation architecture using LSTMs. Seq2Seq has been a breakthrough in deep learning-based machine translation.
  3. 1000차원의 워드 임베딩 균일한 분포를 따르도록 파라미터를 이니셜라이즈 할 때 -0.08 에서 0.08 사이로 정함. 대부분의 문장이 짧지만, 중간에 긴 문장이 들 어가면 그 배치의 다른 문장들이 패딩 되어야 하기 때문에 계산적으로 낭비가 될 수 있다. 따라서 비슷한 길이의 문장끼리 묶었다. – 학습속도를 높일 수 있었다. While most of the sentences are short, having a long sentence in the middle would be computationally wasteful because the other sentences in that batch would have to be padded. Therefore, they grouped sentences of similar length together. doing so speeds up learning.
  4. BLEU score는 일종의 기계번역 성능지표 중 하나. LSTM에 앙상블 기법만 적용하더라도 충분히 베이스라인 모델보다 좋은 성능을 내고 있다. Applying the ensemble technique to the LSTM alone is sufficient to outperform the baseline model.
  5. 딥러닝 기법을 기존의 통계적 기법과 함께 사용했더니, 좋은 성능을 보였다. 전통적인 SMT기법과 함께 했을 때, Best WMT’14 Result에 근접한 것을 볼 수 있다. When deep learning techniques were used in combination with traditional statistical techniques, they performed well. When combined with the traditional SMT technique, we can see that it is close to the best WMT'14 result.
  6. 이는 LSTM 히든스테이트 값을 PCA를 이용해서 2-dimension에 프로젝션시킨 것이다. 비슷한 의미끼리 잘 클러스터링 된 것을 확인할 수 있다. This is a two-dimensional projection of the LSTM hidden state values using PCA. You can see that similar meanings are well clustered together.
  7. 이는 베이스라인 모델과 비교했을 때, 더 좋은 성능이 나온다는 것을 보여주고 있다. This shows that compared to the baseline model, the performance is better.
  8. MT : machine translation
  9. thank you, the presentation is concluded