SlideShare a Scribd company logo
1 of 112
Download to read offline
@DocXavi
Module 6 - Day 10 - Lecture 1
Self-supervised
Visual Learning
26th March 2020
Xavier Giró-i-Nieto
Deep Learning for Video lectures:
[https://mcv-m6-video.github.io/deepvideo-2020/]
2
Acknowledgements
Víctor
Campos
Junting
Pan
Xunyu
Lin
Sebastian
Palacio
Carlos
Arenas
Oscar
Mañas
Video lecture
3
Lecture from in Master in Computer Vision Barcelona 2019
Motivation
4
5
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive methods
4. Contrastive methods
Types of machine learning
Yann Lecun’s Black Forest cake
6
7
A broader picture of types of learning...
Slide inspired by Alex Graves (Deepmind) at
“Unsupervised Learning Tutorial” @ NeurIPS 2018.
...with a teacher ...without a teacher
Active agent... Reinforcement learning
(with extrinsic reward)
Intrinsic motivation /
Exploration.
Passive agent... Supervised learning Unsupervised learning
Supervised learning
8
Slide credit: Kevin McGuinness
y^
9
A broader picture of types of learning...
Slide inspired by Alex Graves (Deepmind) at
“Unsupervised Learning Tutorial” @ NeurIPS 2018.
...with a teacher ...without a teacher
Active agent... Reinforcement learning
(with extrinsic reward)
Intrinsic motivation /
Exploration.
Passive agent... Supervised learning Unsupervised learning
Unsupervised learning
10
Slide credit: Kevin McGuinness
Unsupervised learning
11
● It is the nature of how intelligent beings percept
the world.
● It can save us tons of efforts to build an AI agent
compared to a totally supervised fashion.
● Vast amounts of unlabelled data.
12
Video lectures on Unsupervised Learning
Kevin McGuinness, UPC DLCV 2016 Xavier Giró, UPC DLAI 2017
13
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive methods
4. Contrastive methods
14
Self-supervised learning
(...)
“Doing this properly and reliably is the greatest
challenge in ML and AI of the next few years in my
opinion.”
15
16
Self-supervised learning
Reference: Andrew Zisserman (PAISS 2018)
Self-supervised learning is a form of unsupervised learning where the data
provides the supervision.
● A pretext (or surrogate) task must be invented by withholding a part of the
unlabeled data and training the NN to predict it.
Unlabeled data
(X)
17
Self-supervised learning
Reference: Andrew Zisserman (PAISS 2018)
Self-supervised learning is a form of unsupervised learning where the data
provides the supervision.
● By defining a proxy loss, the NN learns representations, which should be
valuable for the actual downstream task.
^
y
loss
Representations learned without labels
18
Transfer Learning
Target data
E.g. PASCAL
Target model
Target labels
Small
amount of
data/labels
Slide credit: Kevin McGuinness
Instead of training a deep network from scratch for your task...
19
Transfer Learning
Source data
E.g. ImageNet
Source model
Source labels
Target data
E.g. PASCAL
Target model
Target labels
Transfer Learned
Knowledge
Large
amount of
data/labels
Small
amount of
data/labels
Slide credit: Kevin McGuinness
Instead of training a deep network from scratch for your task:
● Take a network trained on a different source domain and/or task
● Adapt it for your target domain and/or task
20
Transfer Learning
Slide credit: Kevin McGuinness
labels
Train deep net on “nearby” task using standard
backprop
● Supervised learning (e.g. ImageNet).
● Self-supervised learning (e.g. Language
Model)
Cut off top layer(s) of network and replace with
supervised objective for target domain
Fine-tune network using backprop with labels for
target domain until validation loss starts to
increase
conv2
conv3
fc1
conv1
surrogate loss
surrogate data
fc2 + softmax
real labelsreal data
real loss
my_fc2 + softmax
21
Self-supervised + Transfer Learning
#Shuffle&Learn Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using
temporal order verification." ECCV 2016. [code] (Slides by Xunyu Lin):
22
Transfer Learning: Video-lectures
Kevin McGuinness (UPC DLCV 2016) Ramon Morros (UPC DLAI 2018)
23
Predictive vs Contrastive Methods
Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
24
Predictive vs Contrastive Methods
Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
25
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
○ Autoencoder
26
Autoencoder (AE)
Fig: “Deep Learning Tutorial” Stanford
Autoencoders:
● Predict at the output the
same input data.
● Do not need labels.
27
Autoencoder (AE)
Slide: Kevin McGuinness (DLCV UPC 2017)
Encoder
W1
Decoder
W2
hdata reconstruction
Loss
(reconstruction error)
Latent variables
(representation/features)
1. Initialize a NN by solving an autoencoding
problem.
28
Autoencoder (AE)
Slide: Kevin McGuinness (DLCV UPC 2017)
Latent variables
(representation/features)
Encoder
W1
hdata Classifier
WC
prediction
y Loss
(cross entropy)
1. Initialize a NN solving an autoencoding
problem.
2. Train for final task with “few” labels.
29
Autoencoder (AE)
Fig: “Deep Learning Tutorial” Stanford
What other application does the AE target address ?
30
Autoencoder (AE)
Fig: “Deep Learning Tutorial” Stanford
Dimensionality reduction:
Use the hidden layer as a
feature extractor of any
desired size.
31
Denoising Autoencoder (DAE)
Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust features
with denoising autoencoders." ICML 2008.
1. Corrupt the input signal, but predict the clean version at its output.
2. Train for final task with “few” labels.
Encoder
W1
Decoder
W2
hdata reconstruction
Loss
(reconstruction error)
corrupted
data
32
Latent variables
(representation/features)
Encoder
W1
hdata Classifier
WC
prediction
y Loss
(cross entropy)
1. Corrupt the input signal, but predict the clean version at its output.
2. Train for final task with “few” labels.
Denoising Autoencoder (DAE)
33
Denoising Autoencoder (DAE)
Source: “Tutorial: Your First Model (DAE)” by Opendeep.org
34
Eigen, David, Dilip Krishnan, and Rob Fergus. "Restoring an image taken through a window covered with dirt or rain." ICCV
2013.
35
#BERT Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional
transformers for language understanding." NAACL 2019 (best paper award).
Take a sequence of words on input,
mask out 15% of the words, and
ask the system to predict the
missing words (or a distribution of
words)
Masked Autoencoder (MAE)
36
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
○ Spatial relations
37
Spatial verification
Predict the relative position between two image patches.
#RelPos Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context
prediction." ICCV 2015.
38
Spatial verification
#Jigsaw Noroozi, M., & Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. ECCV 2016.
Train a neural network to solve a jigsaw puzzle.
39
Spatial verification
#Jigsaw Noroozi, M., & Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. ECCV 2016.
Train a neural network to solve a jigsaw puzzle.
40
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
○ Temporal relations
41#RelPos Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context
prediction." ICCV 2015.
What temporal-specific surrogate tasks could you think about ?
42
Temporal coherence
Take temporal order as the supervisory signals for learning
Shuffled
sequences
Binary classification
In order
Not in order
#Shuffle&Learn Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using
temporal order verification." ECCV 2016. [code] (Slides by Xunyu Lin):
43
Temporal coherence
#Shuffle&Learn Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using
temporal order verification." ECCV 2016. [code] (Slides by Xunyu Lin):
Temporal order of frames is
exploited as the supervisory
signal for learning.
44
Temporal verification
#Odd-one-out Fernando, Basura, Hakan Bilen, Efstratios Gavves, and Stephen Gould. "Self-supervised video
representation learning with odd-one-out networks." ICCV 2017
Train a network to detect which of the video sequences contains frames in the wrong order.
45
Temporal sorting
Lee, Hsin-Ying, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. "Unsupervised representation learning by sorting
sequences." ICCV 2017.
Sort the sequence of frames.
46#T-CAM Wei, Donglai, Joseph J. Lim, Andrew Zisserman, and William T. Freeman. "Learning and using the arrow of time."
CVPR 2018.
Predict whether the video moves forward or backward.
Arrow of time
47
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
○ Cycle Consistency
48#TimeCycle Xiaolong Wang, Allan Jabri, Alexei A. Efros, “Learning Correspondence from the Cycle-Consistency of
Time”. CVPR 2019. [code] [Slides by Carles Ventura]
Cycle-consistency
Training: Tracking backward and then forward with pixel embeddings, to later train a
NN to match the pairs of associated pixel embeddings.
49#TimeCycle Xiaolong Wang, Allan Jabri, Alexei A. Efros, “Learning Correspondence from the Cycle-Consistency of
Time”. CVPR 2019. [code] [Slides by Carles Ventura]
Cycle-consistency
Inference (video object segmentation): Track pixels across video sequences.
50#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal
Cycle-Consistency Learning”. CVPR 2019
Training: Learn video frame embeddings that must satisfy a temporal alignment
between videos depicting the same activity.
Temporal Cycle-Consistency
51#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal
Cycle-Consistency Learning”. CVPR 2019
Temporal Cycle-Consistency
52#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal
Cycle-Consistency Learning”. CVPR 2019
Application: Anomaly detection.
Temporal Cycle-Consistency
53#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal
Cycle-Consistency Learning”. CVPR 2019
Application: Fine-grained retrieval
Temporal Cycle-Consistency
54
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
○ Video Frame Prediction
55
Video Frame Prediction
Slide credit:
Yann LeCun
56
Next word prediction (language model)
Figure:
TensorFlow tutorial
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal
of machine learning research 3, no. Feb (2003): 1137-1155.
Self-supervised
learning
57
#word2vec #continuousbow Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. "Distributed
representations of words and phrases and their compositionality." NIPS 2013
the cat climbed a tree
Given context:
a, cat, the, tree
Estimate prob. of
climbed
Self-supervised
learning
Missing word prediction (language model)
58
#word2vec #skipgram Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. "Distributed
representations of words and phrases and their compositionality." NIPS 2013
Self-supervised
learning
the cat climbed a tree
Given word:
climbed
Estimate prob. of context words:
a, cat, the, tree
Context Words Prediction (language model)
59
Next sentence prediction (TRUE/FALSE)
#BERT Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional
transformers for language understanding." NAACL 2019 (best paper award).
BERT is also pre-trained for a
binarized next sentence
prediction task:
Select some sentences for A and
B, where 50% of the data B is the
next sentence of A, and the
remaining 50% of the data B are
randomly selected in the corpus,
and learn the correlation.
60
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using
LSTMs." In ICML 2015. [Github]
Learning video representations (features) by...
Video Frame Prediction
61
(1) frame reconstruction (AE):
Learning video representations (features) by...
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using
LSTMs." ICML 2015. [Github]
Video Frame Prediction
62
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using
LSTMs." ICML 2015. [Github]
Learning video representations (features) by...
(2) frame prediction
Video Frame Prediction
63
Unsupervised learned features (lots of data) are
fine-tuned for activity recognition (small data).
Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using
LSTMs." ICML 2015. [Github]
Video Frame Prediction
64
Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error."
ICLR 2016 [project] [code]
Video frame prediction with a ConvNet.
Video Frame Prediction
65
Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error."
ICLR 2016 [project] [code]
The blurry predictions from MSE (L1 loss) are improved with multi-scale
architecture, adversarial training and an image gradient difference loss (GDL)
function.
Video Frame Prediction
66
Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error."
ICLR 2016 [project] [code]
Video Frame Prediction
67#DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017.
The model learns to disentangle (“separate”) the visual features that correspond
to the:
Object Pose
(wrt the camera)
Object Content
(class)
Frame Prediction + Disentangled features
68
#DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017.
#MCNet R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee. Decomposing motion and content for natural video sequence
prediction. In ICLR, 2017
100 step video generation on KTH
dataset where:
● green frames indicate
reconstructions
● red frames indicate frame
predictions.
Frame Prediction + Disentangled features
69#DDPAE Hsieh, Jun-Ting, Bingbin Liu, De-An Huang, Li F. Fei-Fei, and Juan Carlos Niebles. "Learning to
decompose and disentangle representations for video prediction." NeurIPS 2018.
Decompositional Disentangled Predictive Auto-Encoder (DDPAE):
Frame Prediction + Disentangled features
70
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
○ Pixel tracking
71
Zhang, R., Isola, P., & Efros, A. A. Colorful image colorization. ECCV 2016.
Colorization (still image)
Surrogate task: Colorize a gray scale image.
72Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
Pixel tracking
Pre-training: A NN is trained to point the location in a reference frame with the most
similar color...
73Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
Pre-training: A NN is trained to point the location in a reference frame with the most
similar color… computing a loss with respect to the actual color.
Pixel tracking
74Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
For each grayscale pixel to colorize, find the color of most similar pixel embedding
(representation) from the first frame.
Pixel tracking
75Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
Pixel tracking
Application: colorization from a reference frame.
Reference frame Grayscale video Colorized video
76Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by
colorizing videos." ECCV 2018. [blog]
Application: video object segmentation from an object mask
Pixel tracking
77
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
4. Contrastive Methods
78
Predictive vs Contrastive Methods
Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality reduction by learning an invariant mapping." CVPR 2006.
Learn Gw
(X) with pairs (X1
,X2
) which are similar (Y=0) or dissimilar (Y=1).
Contrastive Loss
similar pair (X1
,X2
) dissimilar pair (X1
,X2
)
80
Predictive Learning
#CPC Oord, Aaron van den, Yazhe Li, and Oriol Vinyals. "Representation learning with contrastive predictive coding." arXiv
preprint arXiv:1807.03748 (2018).
81
Predictive Learning
#CPCv2 Hénaff, Olivier J., Ali Razavi, Carl Doersch, S. M. Eslami, and Aaron van den Oord. "Data-efficient image recognition
with contrastive predictive coding." arXiv preprint arXiv:1905.09272 (2019).
Patch features (in blue) are
aggregated in the context
network (in red) to predict
the features from the
features extracted from
other parts of the image.
82
The Gelato Bet
"If, by the first day of autumn (Sept 23) of 2015, a
method will exist that can match or beat the
performance of R-CNN on Pascal VOC detection,
without the use of any extra, human annotations
(e.g. ImageNet) as pre-training, Mr. Malik promises to
buy Mr. Efros one (1) gelato (2 scoops: one chocolate,
one vanilla)."
Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
Contrastive loss between:
● Encoded features of a query
image.
● A running queue of encoded
keys.
Defining losses over
representations instead of pixels
facilitates abstraction.
#MoCo He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. . Momentum Contrast for Unsupervised Visual Representation Learning.
CVPR 2020. [code]
Momentum Contrast (MoCo)
Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. CVPR
2018.
Pretext task: Treat each image instance as a distinct class of its own and train a
classifier to distinguish between individual instance classes (images).
Momentum Contrast (MoCo)
86
Transfer Learning (super. IN-1M)
Source data
E.g. ImageNet
Source model
Source labels
Target data
E.g. PASCAL
Target model
Target labels
Transfer Learned
Knowledge
Large
amount of
data/labels
Small
amount of
data/labels
Slide credit: Kevin McGuinness
87
Transfer Learning (MoCo IN-1M)
Source data
E.g. ImageNet
Source model
Source labels
Target data
E.g. PASCAL
Target model
Target labels
Transfer Learned
Knowledge
Large
amount of
data
Small
amount of
data/labels
Slide concept: Kevin McGuinness
88
Transfer Learning (super. IN-1M vs MoCo IN-1M)
Transferring to VOC object detection: a Faster R-CNN detector (C4-backbone)
is fine-tuned end-to-end.
89
90
Correlated views by data augmentation
#SimCLR Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. "A simple framework for contrastive
learning of visual representations." arXiv preprint arXiv:2002.05709 (2020). [tweet]
91
92
Correlated views by data augmentation
Original image cc-by: Von.grzanka
#SimCLR Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. "A simple framework for contrastive
learning of visual representations." arXiv preprint arXiv:2002.05709 (2020). [tweet]
93
#SimCLR Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. "A simple framework for contrastive
learning of visual representations." arXiv preprint arXiv:2002.05709 (2020). [tweet]
Correlated views by data augmentation
ImageNet Linear Classification: unsupervised learned features are frozen and a
supervised linear classifier is trained on top.
94
#MoCo2 Chen, Xinlei, Haoqi Fan, Ross Girshick, and Kaiming He. "Improved Baselines with Momentum Contrastive
Learning." arXiv preprint arXiv:2003.04297 (2020).
MoCo + data augmentation + others
ImageNet Linear Classification: unsupervised learned features are
frozen and a supervised linear classifier is trained on top.
95
#VINCE Gordon, Daniel, Kiana Ehsani, Dieter Fox, and Ali Farhadi. "Watching the World Go By: Representation Learning
from Unlabeled Videos." arXiv preprint arXiv:2003.07990 (2020).
Video Noise Contrastive estimation
Videos provide data augmentation for free.
96
#VINCE Gordon, Daniel, Kiana Ehsani, Dieter Fox, and Ali Farhadi. "Watching the World Go By: Representation Learning
from Unlabeled Videos." arXiv preprint arXiv:2003.07990 (2020).
Video Noise Contrastive estimation
97
https://paperswithcode.com/sota/self-supervised-image-classification-on
Coming next...
Object candidates + Time
Pirk, Sören, Mohi Khansari, Yunfei Bai, Corey Lynch, and Pierre Sermanet. "Online Object Representations with Contrastive
Learning." Learning from Unlabeled Videos (LUV) Worksop, CVPR 2019.
Multiview + Time
#TCN Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google
Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
Pre-Training: Learn features with a
triplet loss by:
● Positive: Same time-stamp
from different cameras for
view invariance.
● Negative: Other frames from
the same camera to be
sensitive to temporal ordering
Inference: Imitation learning for a
robotic arm.
Multiview + Time
#TCN Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google
Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
#TCN Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google
Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
102
Outline
1. Unsupervised Learning
2. Self-supervised Learning
3. Predictive Methods
4. Contrastive Methods
103
Coming next:
Self-supervised Audiovisual Learning [slides] [video]
104
Suggested reading
Alex Graves, Kelly Clancy, “Unsupervised learning: The curious pupil”. Deepmind blog 2019.
105
Suggested talks
Demis Hassabis (Deepmind) Aloysha Efros (UC Berkeley)
Andrew Zisserman (Oxford/Deepmind) Yann LeCun (Facebook AI)
106
Further readings
● Jeremy Howard, “Self-supervised learning and computer vision” (fast.ai, 2020)
● Zhang, L., Qi, G. J., Wang, L., & Luo, J. (2019). Aet vs. aed: Unsupervised representation learning
by auto-encoding transformations rather than data. CVPR 2019.
● Tian, Yonglong, Dilip Krishnan, and Phillip Isola. "Contrastive Multiview Coding." arXiv preprint
arXiv:1906.05849 (2019).
● Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric
instance discrimination. CVPR 2018.
○ Each image is its on class.
○ Memory bank.
○ Triplet loss + Softmax
107
Questions ?
Discarded Slides
109
Temporal regularization: ISA
Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for
action recognition with independent subspace analysis." CVPR 2011
Video features are learned with convolutions and poolings combined with a
Independent Subspace Analysis (ISA).
110Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for
action recognition with independent subspace analysis." CVPR 2011
Feature visualizations.
Temporal regularization: ISA
111
Assumption: adjacent video frames contain semantically similar information.
Autoencoder trained with regularizations by slowliness and sparisty.
Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. "Unsupervised learning of spatiotemporally
coherent metrics." ICCV 2015.
Temporal regularization: Slowliness
112Jayaraman, Dinesh, and Kristen Grauman. "Slow and steady feature analysis: higher order temporal coherence in video."
CVPR 2016. [video]
Slow feature analysis
● Temporal coherence assumption: features
should change slowly over time in video
Steady feature analysis
● Second order changes also small: changes in
the past should resemble changes in the
future
Train on triplets of frames from video
Loss encourages nearby frames to have slow
and steady features, and far frames to have
different features
Temporal regularization: Slowliness

More Related Content

What's hot

Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Universitat Politècnica de Catalunya
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Universitat Politècnica de Catalunya
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Universitat Politècnica de Catalunya
 

What's hot (20)

Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
Interpretability of Convolutional Neural Networks - Xavier Giro - UPC Barcelo...
 
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
Deep Learning Architectures for Video - Xavier Giro - UPC Barcelona 2019
 
Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)Deep Learning from Videos (UPC 2018)
Deep Learning from Videos (UPC 2018)
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
Deep Video Object Segmentation - Xavier Giro - UPC Barcelona 2019
 
Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)Deep Learning for Video: Object Tracking (UPC 2018)
Deep Learning for Video: Object Tracking (UPC 2018)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
 
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
Video Saliency Prediction with Deep Neural Networks - Juan Jose Nieto - DCU 2019
 
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
Language and Vision (D3L5 2017 UPC Deep Learning for Computer Vision)
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)Deep Learning for Video: Action Recognition (UPC 2018)
Deep Learning for Video: Action Recognition (UPC 2018)
 
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
Unsupervised Learning (DLAI D9L1 2017 UPC Deep Learning for Artificial Intell...
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)Deep Learning for Video: Language (UPC 2018)
Deep Learning for Video: Language (UPC 2018)
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
Closing, Course Offer 17/18 & Homework (D5 2017 UPC Deep Learning for Compute...
 
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
Video Analysis (D4L2 2017 UPC Deep Learning for Computer Vision)
 

Similar to Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona

Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionDarian Frajberg
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya
 
lec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdflec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdfAlamgirAkash3
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Universitat Politècnica de Catalunya
 
Computing Education as a Foundation for 21st Century Literacy
Computing Education as a Foundation for 21st Century LiteracyComputing Education as a Foundation for 21st Century Literacy
Computing Education as a Foundation for 21st Century LiteracyMark Guzdial
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersRoelof Pieters
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxSangmin Woo
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningCharles Deledalle
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0taeseon ryu
 
Applying an intersectionality lens in data science
Applying an intersectionality lens in data scienceApplying an intersectionality lens in data science
Applying an intersectionality lens in data scienceData Con LA
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Universitat Politècnica de Catalunya
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningYan Xu
 
Deep Learning of High-Level Representations
Deep Learning of High-Level RepresentationsDeep Learning of High-Level Representations
Deep Learning of High-Level RepresentationsHamid Eghbal-zadeh
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 

Similar to Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona (20)

Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
 
Introduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolutionIntroduction to the Artificial Intelligence and Computer Vision revolution
Introduction to the Artificial Intelligence and Computer Vision revolution
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)Learning with Videos  (D4L4 2017 UPC Deep Learning for Computer Vision)
Learning with Videos (D4L4 2017 UPC Deep Learning for Computer Vision)
 
lec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdflec_11_self_supervised_learning.pdf
lec_11_self_supervised_learning.pdf
 
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
Welcome (D1L1 2017 UPC Deep Learning for Computer Vision)
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
 
Computing Education as a Foundation for 21st Century Literacy
Computing Education as a Foundation for 21st Century LiteracyComputing Education as a Foundation for 21st Century Literacy
Computing Education as a Foundation for 21st Century Literacy
 
Deep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ersDeep Learning, an interactive introduction for NLP-ers
Deep Learning, an interactive introduction for NLP-ers
 
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptxAn Empirical Study of Training Self-Supervised Vision Transformers.pptx
An Empirical Study of Training Self-Supervised Vision Transformers.pptx
 
MLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learningMLIP - Chapter 3 - Introduction to deep learning
MLIP - Chapter 3 - Introduction to deep learning
 
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
Deep Learning for Computer Vision (2/4): Object Analytics @ laSalle 2016
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0End to-end semi-supervised object detection with soft teacher ver.1.0
End to-end semi-supervised object detection with soft teacher ver.1.0
 
Applying an intersectionality lens in data science
Applying an intersectionality lens in data scienceApplying an intersectionality lens in data science
Applying an intersectionality lens in data science
 
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
Advanced Deep Architectures (D2L6 Deep Learning for Speech and Language UPC 2...
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
HML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep LearningHML: Historical View and Trends of Deep Learning
HML: Historical View and Trends of Deep Learning
 
Deep Learning of High-Level Representations
Deep Learning of High-Level RepresentationsDeep Learning of High-Level Representations
Deep Learning of High-Level Representations
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationUniversitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (17)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
Recurrent Neural Networks RNN - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Backpropagation for Deep Learning
Backpropagation for Deep LearningBackpropagation for Deep Learning
Backpropagation for Deep Learning
 
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic ModerationHate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
Hate Speech in Pixels: Detection of Offensive Memes towards Automatic Moderation
 

Recently uploaded

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...ThinkInnovation
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Recently uploaded (20)

Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
Predictive Analysis - Using Insight-informed Data to Determine Factors Drivin...
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona

  • 1. @DocXavi Module 6 - Day 10 - Lecture 1 Self-supervised Visual Learning 26th March 2020 Xavier Giró-i-Nieto Deep Learning for Video lectures: [https://mcv-m6-video.github.io/deepvideo-2020/]
  • 3. Video lecture 3 Lecture from in Master in Computer Vision Barcelona 2019
  • 5. 5 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive methods 4. Contrastive methods
  • 6. Types of machine learning Yann Lecun’s Black Forest cake 6
  • 7. 7 A broader picture of types of learning... Slide inspired by Alex Graves (Deepmind) at “Unsupervised Learning Tutorial” @ NeurIPS 2018. ...with a teacher ...without a teacher Active agent... Reinforcement learning (with extrinsic reward) Intrinsic motivation / Exploration. Passive agent... Supervised learning Unsupervised learning
  • 9. 9 A broader picture of types of learning... Slide inspired by Alex Graves (Deepmind) at “Unsupervised Learning Tutorial” @ NeurIPS 2018. ...with a teacher ...without a teacher Active agent... Reinforcement learning (with extrinsic reward) Intrinsic motivation / Exploration. Passive agent... Supervised learning Unsupervised learning
  • 11. Unsupervised learning 11 ● It is the nature of how intelligent beings percept the world. ● It can save us tons of efforts to build an AI agent compared to a totally supervised fashion. ● Vast amounts of unlabelled data.
  • 12. 12 Video lectures on Unsupervised Learning Kevin McGuinness, UPC DLCV 2016 Xavier Giró, UPC DLAI 2017
  • 13. 13 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive methods 4. Contrastive methods
  • 14. 14 Self-supervised learning (...) “Doing this properly and reliably is the greatest challenge in ML and AI of the next few years in my opinion.”
  • 15. 15
  • 16. 16 Self-supervised learning Reference: Andrew Zisserman (PAISS 2018) Self-supervised learning is a form of unsupervised learning where the data provides the supervision. ● A pretext (or surrogate) task must be invented by withholding a part of the unlabeled data and training the NN to predict it. Unlabeled data (X)
  • 17. 17 Self-supervised learning Reference: Andrew Zisserman (PAISS 2018) Self-supervised learning is a form of unsupervised learning where the data provides the supervision. ● By defining a proxy loss, the NN learns representations, which should be valuable for the actual downstream task. ^ y loss Representations learned without labels
  • 18. 18 Transfer Learning Target data E.g. PASCAL Target model Target labels Small amount of data/labels Slide credit: Kevin McGuinness Instead of training a deep network from scratch for your task...
  • 19. 19 Transfer Learning Source data E.g. ImageNet Source model Source labels Target data E.g. PASCAL Target model Target labels Transfer Learned Knowledge Large amount of data/labels Small amount of data/labels Slide credit: Kevin McGuinness Instead of training a deep network from scratch for your task: ● Take a network trained on a different source domain and/or task ● Adapt it for your target domain and/or task
  • 20. 20 Transfer Learning Slide credit: Kevin McGuinness labels Train deep net on “nearby” task using standard backprop ● Supervised learning (e.g. ImageNet). ● Self-supervised learning (e.g. Language Model) Cut off top layer(s) of network and replace with supervised objective for target domain Fine-tune network using backprop with labels for target domain until validation loss starts to increase conv2 conv3 fc1 conv1 surrogate loss surrogate data fc2 + softmax real labelsreal data real loss my_fc2 + softmax
  • 21. 21 Self-supervised + Transfer Learning #Shuffle&Learn Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code] (Slides by Xunyu Lin):
  • 22. 22 Transfer Learning: Video-lectures Kevin McGuinness (UPC DLCV 2016) Ramon Morros (UPC DLAI 2018)
  • 23. 23 Predictive vs Contrastive Methods Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
  • 24. 24 Predictive vs Contrastive Methods Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
  • 25. 25 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods ○ Autoencoder
  • 26. 26 Autoencoder (AE) Fig: “Deep Learning Tutorial” Stanford Autoencoders: ● Predict at the output the same input data. ● Do not need labels.
  • 27. 27 Autoencoder (AE) Slide: Kevin McGuinness (DLCV UPC 2017) Encoder W1 Decoder W2 hdata reconstruction Loss (reconstruction error) Latent variables (representation/features) 1. Initialize a NN by solving an autoencoding problem.
  • 28. 28 Autoencoder (AE) Slide: Kevin McGuinness (DLCV UPC 2017) Latent variables (representation/features) Encoder W1 hdata Classifier WC prediction y Loss (cross entropy) 1. Initialize a NN solving an autoencoding problem. 2. Train for final task with “few” labels.
  • 29. 29 Autoencoder (AE) Fig: “Deep Learning Tutorial” Stanford What other application does the AE target address ?
  • 30. 30 Autoencoder (AE) Fig: “Deep Learning Tutorial” Stanford Dimensionality reduction: Use the hidden layer as a feature extractor of any desired size.
  • 31. 31 Denoising Autoencoder (DAE) Vincent, Pascal, Hugo Larochelle, Yoshua Bengio, and Pierre-Antoine Manzagol. "Extracting and composing robust features with denoising autoencoders." ICML 2008. 1. Corrupt the input signal, but predict the clean version at its output. 2. Train for final task with “few” labels. Encoder W1 Decoder W2 hdata reconstruction Loss (reconstruction error) corrupted data
  • 32. 32 Latent variables (representation/features) Encoder W1 hdata Classifier WC prediction y Loss (cross entropy) 1. Corrupt the input signal, but predict the clean version at its output. 2. Train for final task with “few” labels. Denoising Autoencoder (DAE)
  • 33. 33 Denoising Autoencoder (DAE) Source: “Tutorial: Your First Model (DAE)” by Opendeep.org
  • 34. 34 Eigen, David, Dilip Krishnan, and Rob Fergus. "Restoring an image taken through a window covered with dirt or rain." ICCV 2013.
  • 35. 35 #BERT Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." NAACL 2019 (best paper award). Take a sequence of words on input, mask out 15% of the words, and ask the system to predict the missing words (or a distribution of words) Masked Autoencoder (MAE)
  • 36. 36 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods ○ Spatial relations
  • 37. 37 Spatial verification Predict the relative position between two image patches. #RelPos Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction." ICCV 2015.
  • 38. 38 Spatial verification #Jigsaw Noroozi, M., & Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. ECCV 2016. Train a neural network to solve a jigsaw puzzle.
  • 39. 39 Spatial verification #Jigsaw Noroozi, M., & Favaro, P. Unsupervised learning of visual representations by solving jigsaw puzzles. ECCV 2016. Train a neural network to solve a jigsaw puzzle.
  • 40. 40 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods ○ Temporal relations
  • 41. 41#RelPos Doersch, Carl, Abhinav Gupta, and Alexei A. Efros. "Unsupervised visual representation learning by context prediction." ICCV 2015. What temporal-specific surrogate tasks could you think about ?
  • 42. 42 Temporal coherence Take temporal order as the supervisory signals for learning Shuffled sequences Binary classification In order Not in order #Shuffle&Learn Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code] (Slides by Xunyu Lin):
  • 43. 43 Temporal coherence #Shuffle&Learn Misra, Ishan, C. Lawrence Zitnick, and Martial Hebert. "Shuffle and learn: unsupervised learning using temporal order verification." ECCV 2016. [code] (Slides by Xunyu Lin): Temporal order of frames is exploited as the supervisory signal for learning.
  • 44. 44 Temporal verification #Odd-one-out Fernando, Basura, Hakan Bilen, Efstratios Gavves, and Stephen Gould. "Self-supervised video representation learning with odd-one-out networks." ICCV 2017 Train a network to detect which of the video sequences contains frames in the wrong order.
  • 45. 45 Temporal sorting Lee, Hsin-Ying, Jia-Bin Huang, Maneesh Singh, and Ming-Hsuan Yang. "Unsupervised representation learning by sorting sequences." ICCV 2017. Sort the sequence of frames.
  • 46. 46#T-CAM Wei, Donglai, Joseph J. Lim, Andrew Zisserman, and William T. Freeman. "Learning and using the arrow of time." CVPR 2018. Predict whether the video moves forward or backward. Arrow of time
  • 47. 47 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods ○ Cycle Consistency
  • 48. 48#TimeCycle Xiaolong Wang, Allan Jabri, Alexei A. Efros, “Learning Correspondence from the Cycle-Consistency of Time”. CVPR 2019. [code] [Slides by Carles Ventura] Cycle-consistency Training: Tracking backward and then forward with pixel embeddings, to later train a NN to match the pairs of associated pixel embeddings.
  • 49. 49#TimeCycle Xiaolong Wang, Allan Jabri, Alexei A. Efros, “Learning Correspondence from the Cycle-Consistency of Time”. CVPR 2019. [code] [Slides by Carles Ventura] Cycle-consistency Inference (video object segmentation): Track pixels across video sequences.
  • 50. 50#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal Cycle-Consistency Learning”. CVPR 2019 Training: Learn video frame embeddings that must satisfy a temporal alignment between videos depicting the same activity. Temporal Cycle-Consistency
  • 51. 51#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal Cycle-Consistency Learning”. CVPR 2019 Temporal Cycle-Consistency
  • 52. 52#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal Cycle-Consistency Learning”. CVPR 2019 Application: Anomaly detection. Temporal Cycle-Consistency
  • 53. 53#TCC Debidatta Dwibedi, Yusuf Aytar, Jonathan Tompson, Pierre Sermanet, Andrew Zisserman, “Temporal Cycle-Consistency Learning”. CVPR 2019 Application: Fine-grained retrieval Temporal Cycle-Consistency
  • 54. 54 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods ○ Video Frame Prediction
  • 55. 55 Video Frame Prediction Slide credit: Yann LeCun
  • 56. 56 Next word prediction (language model) Figure: TensorFlow tutorial Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal of machine learning research 3, no. Feb (2003): 1137-1155. Self-supervised learning
  • 57. 57 #word2vec #continuousbow Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. "Distributed representations of words and phrases and their compositionality." NIPS 2013 the cat climbed a tree Given context: a, cat, the, tree Estimate prob. of climbed Self-supervised learning Missing word prediction (language model)
  • 58. 58 #word2vec #skipgram Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. "Distributed representations of words and phrases and their compositionality." NIPS 2013 Self-supervised learning the cat climbed a tree Given word: climbed Estimate prob. of context words: a, cat, the, tree Context Words Prediction (language model)
  • 59. 59 Next sentence prediction (TRUE/FALSE) #BERT Devlin, Jacob, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. "Bert: Pre-training of deep bidirectional transformers for language understanding." NAACL 2019 (best paper award). BERT is also pre-trained for a binarized next sentence prediction task: Select some sentences for A and B, where 50% of the data B is the next sentence of A, and the remaining 50% of the data B are randomly selected in the corpus, and learn the correlation.
  • 60. 60 Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." In ICML 2015. [Github] Learning video representations (features) by... Video Frame Prediction
  • 61. 61 (1) frame reconstruction (AE): Learning video representations (features) by... Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." ICML 2015. [Github] Video Frame Prediction
  • 62. 62 Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." ICML 2015. [Github] Learning video representations (features) by... (2) frame prediction Video Frame Prediction
  • 63. 63 Unsupervised learned features (lots of data) are fine-tuned for activity recognition (small data). Srivastava, Nitish, Elman Mansimov, and Ruslan Salakhutdinov. "Unsupervised Learning of Video Representations using LSTMs." ICML 2015. [Github] Video Frame Prediction
  • 64. 64 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] Video frame prediction with a ConvNet. Video Frame Prediction
  • 65. 65 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] The blurry predictions from MSE (L1 loss) are improved with multi-scale architecture, adversarial training and an image gradient difference loss (GDL) function. Video Frame Prediction
  • 66. 66 Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." ICLR 2016 [project] [code] Video Frame Prediction
  • 67. 67#DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017. The model learns to disentangle (“separate”) the visual features that correspond to the: Object Pose (wrt the camera) Object Content (class) Frame Prediction + Disentangled features
  • 68. 68 #DrNet Denton, Emily L. "Unsupervised learning of disentangled representations from video." NIPS 2017. #MCNet R. Villegas, J. Yang, S. Hong, X. Lin, and H. Lee. Decomposing motion and content for natural video sequence prediction. In ICLR, 2017 100 step video generation on KTH dataset where: ● green frames indicate reconstructions ● red frames indicate frame predictions. Frame Prediction + Disentangled features
  • 69. 69#DDPAE Hsieh, Jun-Ting, Bingbin Liu, De-An Huang, Li F. Fei-Fei, and Juan Carlos Niebles. "Learning to decompose and disentangle representations for video prediction." NeurIPS 2018. Decompositional Disentangled Predictive Auto-Encoder (DDPAE): Frame Prediction + Disentangled features
  • 70. 70 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods ○ Pixel tracking
  • 71. 71 Zhang, R., Isola, P., & Efros, A. A. Colorful image colorization. ECCV 2016. Colorization (still image) Surrogate task: Colorize a gray scale image.
  • 72. 72Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] Pixel tracking Pre-training: A NN is trained to point the location in a reference frame with the most similar color...
  • 73. 73Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] Pre-training: A NN is trained to point the location in a reference frame with the most similar color… computing a loss with respect to the actual color. Pixel tracking
  • 74. 74Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] For each grayscale pixel to colorize, find the color of most similar pixel embedding (representation) from the first frame. Pixel tracking
  • 75. 75Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] Pixel tracking Application: colorization from a reference frame. Reference frame Grayscale video Colorized video
  • 76. 76Vondrick, Carl, Abhinav Shrivastava, Alireza Fathi, Sergio Guadarrama, and Kevin Murphy. "Tracking emerges by colorizing videos." ECCV 2018. [blog] Application: video object segmentation from an object mask Pixel tracking
  • 77. 77 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods 4. Contrastive Methods
  • 78. 78 Predictive vs Contrastive Methods Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
  • 79. Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality reduction by learning an invariant mapping." CVPR 2006. Learn Gw (X) with pairs (X1 ,X2 ) which are similar (Y=0) or dissimilar (Y=1). Contrastive Loss similar pair (X1 ,X2 ) dissimilar pair (X1 ,X2 )
  • 80. 80 Predictive Learning #CPC Oord, Aaron van den, Yazhe Li, and Oriol Vinyals. "Representation learning with contrastive predictive coding." arXiv preprint arXiv:1807.03748 (2018).
  • 81. 81 Predictive Learning #CPCv2 Hénaff, Olivier J., Ali Razavi, Carl Doersch, S. M. Eslami, and Aaron van den Oord. "Data-efficient image recognition with contrastive predictive coding." arXiv preprint arXiv:1905.09272 (2019). Patch features (in blue) are aggregated in the context network (in red) to predict the features from the features extracted from other parts of the image.
  • 82. 82
  • 83. The Gelato Bet "If, by the first day of autumn (Sept 23) of 2015, a method will exist that can match or beat the performance of R-CNN on Pascal VOC detection, without the use of any extra, human annotations (e.g. ImageNet) as pre-training, Mr. Malik promises to buy Mr. Efros one (1) gelato (2 scoops: one chocolate, one vanilla)." Source: Ankesh Anand, “Contrastive Self-Supervised Learning” (2020)”
  • 84. Contrastive loss between: ● Encoded features of a query image. ● A running queue of encoded keys. Defining losses over representations instead of pixels facilitates abstraction. #MoCo He, K., Fan, H., Wu, Y., Xie, S., & Girshick, R. . Momentum Contrast for Unsupervised Visual Representation Learning. CVPR 2020. [code] Momentum Contrast (MoCo)
  • 85. Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. CVPR 2018. Pretext task: Treat each image instance as a distinct class of its own and train a classifier to distinguish between individual instance classes (images). Momentum Contrast (MoCo)
  • 86. 86 Transfer Learning (super. IN-1M) Source data E.g. ImageNet Source model Source labels Target data E.g. PASCAL Target model Target labels Transfer Learned Knowledge Large amount of data/labels Small amount of data/labels Slide credit: Kevin McGuinness
  • 87. 87 Transfer Learning (MoCo IN-1M) Source data E.g. ImageNet Source model Source labels Target data E.g. PASCAL Target model Target labels Transfer Learned Knowledge Large amount of data Small amount of data/labels Slide concept: Kevin McGuinness
  • 88. 88 Transfer Learning (super. IN-1M vs MoCo IN-1M) Transferring to VOC object detection: a Faster R-CNN detector (C4-backbone) is fine-tuned end-to-end.
  • 89. 89
  • 90. 90 Correlated views by data augmentation #SimCLR Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. "A simple framework for contrastive learning of visual representations." arXiv preprint arXiv:2002.05709 (2020). [tweet]
  • 91. 91
  • 92. 92 Correlated views by data augmentation Original image cc-by: Von.grzanka #SimCLR Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. "A simple framework for contrastive learning of visual representations." arXiv preprint arXiv:2002.05709 (2020). [tweet]
  • 93. 93 #SimCLR Chen, Ting, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. "A simple framework for contrastive learning of visual representations." arXiv preprint arXiv:2002.05709 (2020). [tweet] Correlated views by data augmentation ImageNet Linear Classification: unsupervised learned features are frozen and a supervised linear classifier is trained on top.
  • 94. 94 #MoCo2 Chen, Xinlei, Haoqi Fan, Ross Girshick, and Kaiming He. "Improved Baselines with Momentum Contrastive Learning." arXiv preprint arXiv:2003.04297 (2020). MoCo + data augmentation + others ImageNet Linear Classification: unsupervised learned features are frozen and a supervised linear classifier is trained on top.
  • 95. 95 #VINCE Gordon, Daniel, Kiana Ehsani, Dieter Fox, and Ali Farhadi. "Watching the World Go By: Representation Learning from Unlabeled Videos." arXiv preprint arXiv:2003.07990 (2020). Video Noise Contrastive estimation Videos provide data augmentation for free.
  • 96. 96 #VINCE Gordon, Daniel, Kiana Ehsani, Dieter Fox, and Ali Farhadi. "Watching the World Go By: Representation Learning from Unlabeled Videos." arXiv preprint arXiv:2003.07990 (2020). Video Noise Contrastive estimation
  • 98. Object candidates + Time Pirk, Sören, Mohi Khansari, Yunfei Bai, Corey Lynch, and Pierre Sermanet. "Online Object Representations with Contrastive Learning." Learning from Unlabeled Videos (LUV) Worksop, CVPR 2019.
  • 99. Multiview + Time #TCN Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018. Pre-Training: Learn features with a triplet loss by: ● Positive: Same time-stamp from different cameras for view invariance. ● Negative: Other frames from the same camera to be sensitive to temporal ordering
  • 100. Inference: Imitation learning for a robotic arm. Multiview + Time #TCN Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
  • 101. #TCN Sermanet, Pierre, Corey Lynch, Yevgen Chebotar, Jasmine Hsu, Eric Jang, Stefan Schaal, Sergey Levine, and Google Brain. "Time-contrastive networks: Self-supervised learning from video." ICRA 2018.
  • 102. 102 Outline 1. Unsupervised Learning 2. Self-supervised Learning 3. Predictive Methods 4. Contrastive Methods
  • 103. 103 Coming next: Self-supervised Audiovisual Learning [slides] [video]
  • 104. 104 Suggested reading Alex Graves, Kelly Clancy, “Unsupervised learning: The curious pupil”. Deepmind blog 2019.
  • 105. 105 Suggested talks Demis Hassabis (Deepmind) Aloysha Efros (UC Berkeley) Andrew Zisserman (Oxford/Deepmind) Yann LeCun (Facebook AI)
  • 106. 106 Further readings ● Jeremy Howard, “Self-supervised learning and computer vision” (fast.ai, 2020) ● Zhang, L., Qi, G. J., Wang, L., & Luo, J. (2019). Aet vs. aed: Unsupervised representation learning by auto-encoding transformations rather than data. CVPR 2019. ● Tian, Yonglong, Dilip Krishnan, and Phillip Isola. "Contrastive Multiview Coding." arXiv preprint arXiv:1906.05849 (2019). ● Wu, Z., Xiong, Y., Yu, S. X., & Lin, D. (2018). Unsupervised feature learning via non-parametric instance discrimination. CVPR 2018. ○ Each image is its on class. ○ Memory bank. ○ Triplet loss + Softmax
  • 109. 109 Temporal regularization: ISA Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis." CVPR 2011 Video features are learned with convolutions and poolings combined with a Independent Subspace Analysis (ISA).
  • 110. 110Le, Quoc V., Will Y. Zou, Serena Y. Yeung, and Andrew Y. Ng. "Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis." CVPR 2011 Feature visualizations. Temporal regularization: ISA
  • 111. 111 Assumption: adjacent video frames contain semantically similar information. Autoencoder trained with regularizations by slowliness and sparisty. Goroshin, Ross, Joan Bruna, Jonathan Tompson, David Eigen, and Yann LeCun. "Unsupervised learning of spatiotemporally coherent metrics." ICCV 2015. Temporal regularization: Slowliness
  • 112. 112Jayaraman, Dinesh, and Kristen Grauman. "Slow and steady feature analysis: higher order temporal coherence in video." CVPR 2016. [video] Slow feature analysis ● Temporal coherence assumption: features should change slowly over time in video Steady feature analysis ● Second order changes also small: changes in the past should resemble changes in the future Train on triplets of frames from video Loss encourages nearby frames to have slow and steady features, and far frames to have different features Temporal regularization: Slowliness