SlideShare a Scribd company logo
1 of 60
Download to read offline
Towards Scaling Video
Understanding
Serena Yeung
YouTube TV
GoPro Smart spaces
State-of-the-art in video understanding
State-of-the-art in video understanding
Classification
Abu-El-Haija et al. 2016
4,800 categories
15.2 Top5 error
State-of-the-art in video understanding
Classification Detection
Idrees et al. 2017, Sigurdsson et al. 2016Abu-El-Haija et al. 2016
4,800 categories
15.2 Top5 error
Tens of categories
~10-20 mAP at 0.5 overlap
State-of-the-art in video understanding
Classification Detection
Abu-El-Haija et al. 2016
Captioning
4,800 categories
15.2 Top5 error
Yu et al. 2016
Just getting started:
Short clips, niche domains
Idrees et al. 2017, Sigurdsson et al. 2016
Tens of categories
~10-20 mAP at 0.5 overlap
Comparing video with image understanding
Comparing video with image understanding
Classification
4,800 categories
15.2% Top5 error
Videos
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Krizhevsky 2012, Xie 2016
Comparing video with image understanding
He 2017
Classification Detection
4,800 categories
15.2% Top5 error
Tens of categories
~10-20 mAP at 0.5 overlap
Videos
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Hundreds of categories*
~60 mAP at 0.5 overlap
Pixel-level segmentation
*Transfer learning
widespread
Krizhevsky 2012, Xie 2016
Comparing video with image understanding
He 2017 Johnson 2016, Krause 2017
Classification Detection Captioning
4,800 categories
15.2% Top5 error
Just getting started:
Short clips, niche
domains
Videos
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Hundreds of categories*
~60 mAP at 0.5 overlap
Pixel-level segmentation
*Transfer learning
widespread
Dense captioning
Coherent paragraphs
Krizhevsky 2012, Xie 2016
Tens of categories
~10-20 mAP at 0.5 overlap
Comparing video with image understanding
He 2017 Johnson 2016, Krause 2017
Classification Detection Captioning
4,800 categories
15.2 Top5 error
Just getting started:
Short clips, niche
domains
Videos
Beyond
Images 1,000 categories*
3.1% Top5 error
*Transfer learning
widespread
Hundreds of categories*
~60 mAP at 0.5 overlap
Pixel-level segmentation
*Transfer learning
widespread
Dense captioning
Coherent paragraphs
Significant work on
question-answering
—
Krizhevsky 2012, Xie 2016
Yang 2016
Tens of categories
~10-20 mAP at 0.5 overlap
The challenge of scale
Training labels Inference
Models
Video processing is
computationally expensive
Video annotation is
labor-intensive
Temporal dimension adds
complexity
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Input Output
t = 0 t = T
Running
Task: Temporal action detection
Talking
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Efficient video processing
t = 0 t = T
Output
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Frame model
Input: a frame
Our model for efficient action detection
t = 0 t = T
[ ]
Output
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Output:
Detection instance [start, end]
Next frame to glimpse
Frame model
Input: a frame
Our model for efficient action detection
t = 0 t = T
Recurrent neural network
(time information)
[ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
t = 0 t = T
Recurrent neural network
(time information)
[ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
t = 0 t = T
Recurrent neural network
(time information)
[ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Output
[ ]
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
[ ] [ ] Output:
Detection instance [start, end]
Next frame to glimpse
Output
Convolutional neural network
(frame information)
Output
[ ]
…
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
t = 0 t = T
Output
Our model for efficient action detection
Recurrent neural network
(time information)
Output
Convolutional neural network
(frame information)
Output
[ ]
…
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Optional output:
Detection instance [start, end]
Output:
Next frame to glimpse
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Our model for efficient action detection
• Train differentiable outputs (detection output class and bounds) using
standard backpropagation
• Train non-differentiable outputs (where to look next, when to emit a
prediction) using reinforcement learning (REINFORCE algorithm)
• Achieves detection performance on par with dense sliding window-
based approaches, while observing only 2% of frames
Learned policy in action
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
Learned policy in action
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Dense action labeling
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiTHUMOS
• Extends the THUMOS’14 action detection dataset with dense, multilevel,
frame-level action annotations for 30 hours across 400 videos
THUMOS MultiTHUMOS
Annotations 6,365 38,690
Classes 20 65
Density (labels / frame) 0.3 1.5
Classes per video 1.1 10.5
Max actions per frame 2 9
Max actions per video 3 25
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Modeling dense, multilabel actions
• Need to reason about multiple potential actions simultaneously
• High degree of temporal dependency
• In standard recurrent models for action recognition, all state is in hidden layer
representation
• At each time step, makes prediction of current frame based on the current
frame and previous hidden representation
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
• Extension of LSTM that expands the temporal receptive field of input
and output connections
• Key idea: providing the model with more freedom in both reading
input and writing output reduces the burden placed on the hidden
layer representation
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Input video frames
Frame class predictions
t
Standard LSTM
……
Donahue 2014
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Frame class predictions
t
……
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Standard LSTM: Single input, single output
Input video frames
MultiLSTM
Frame class predictions
t
……
Frame class predictions
t
MultiLSTM: Multiple inputs, multiple outputs
……
Standard LSTM: Single input, single output
Input video frames Input video frames
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Multiple Inputs (soft attention)
Multiple Outputs (weighted average)
Multilabel Loss
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
MultiLSTM
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Retrieving sequential actions
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
Retrieving co-occurring actions
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex
Videos. IJCV 2017.
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
Labeling videos is expensive
• Takes significantly longer to
label a video than an image
• If spatial or temporal bounds
desired, even worse
• How can we practically learn
about new concepts in video?
Web queries are a source of noisy video labels
Image search is much cleaner!
Can we effectively learn from the noisy web queries?
• Our approach: learn how to selective positive training examples
from noisy queries in order to train classifiers for new classes
• Use a reinforcement learning-based formulation to learn a data
labeling policy that achieves strong performance on a small,
manually-labeled dataset of classes
• Then use this policy to automatically label noisy web data for new
classes
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Balancing diversity vs. semantic drift
• Want diverse training examples to improve classifier
• But too much diversity can also lead to semantic drift
• Our approach: balance diversity and drift by training labeling
policies using an annotated reward set which the policy must
successfully classify
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Overview of approach
Boomerang …
Boomerang
on a beach
Boomerang
music video
Classifier
Candidate web queries
(YouTube autocomplete)
Agent
Label new positives
+
Boomerang
on a beach
Current
positive set
Update classifier
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Update
state
Overview of approach
Boomerang …
Boomerang
on a beach
Boomerang
music video
Classifier
Candidate web queries
(YouTube autocomplete)
Agent
Label new positives
+
Boomerang
on a beach
Current
positive set
Update classifier
Fixed
negative set
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Update
state
Overview of approach
Boomerang …
Boomerang
on a beach
Boomerang
music video
Classifier
Candidate web queries
(YouTube autocomplete)
Agent
Label new positives
+
Boomerang
on a beach
Current
positive set
Update classifier
Update
state
Fixed
negative set
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Training reward
Eval on
reward set
Sports1M
Greedy classifier Ours
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Sports1M
Greedy classifier Ours
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
Novel classes
Greedy classifier Ours
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
Learning to learn
The challenge of scale
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Yeung, Russakovsky, Mori, Fei-Fei. End-to-end
Learning of Action Detection from Frame
Glimpses in Videos. CVPR 2016.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei.
Every Moment Counts: Dense Detailed Labeling
of Actions in Complex Videos. IJCV 2017.
Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-
Fei. Learning to learn from noisy web videos.
CVPR 2017.
Learning to learn
Unsupervised learning
Towards Knowledge
Training labels Inference
Video processing is
computationally expensive
Video annotation is
labor-intensive
Models
Temporal dimension adds
complexity
Videos
Knowledge of the
dynamic visual world
Collaborators
Olga Russakovsky Mykhaylo Andriluka Ning Jin Vignesh Ramanathan Liyue Shen
Greg Mori Fei-Fei Li
Thank You

More Related Content

Viewers also liked

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...MLconf
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...MLconf
 
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017MLconf
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017MLconf
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017MLconf
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017MLconf
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016MLconf
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016MLconf
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016MLconf
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016MLconf
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...MLconf
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017MLconf
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016MLconf
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017MLconf
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017MLconf
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017MLconf
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017MLconf
 

Viewers also liked (19)

Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
Caroline Sinders, Online Harassment Researcher, Wikimedia at The AI Conferenc...
 
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
Andrew Musselman, Committer and PMC Member, Apache Mahout, at MLconf Seattle ...
 
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
Alexandra Johnson, Software Engineer, SigOpt, at MLconf NYC 2017
 
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
Corinna Cortes, Head of Research, Google, at MLconf NYC 2017
 
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
Ross Goodwin, Technologist, Sunspring, MLconf NYC 2017
 
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
Scott Clark, CEO, SigOpt, at MLconf Seattle 2017
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
Chris Fregly, Research Scientist, PipelineIO at MLconf ATL 2016
 
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
Brian Lucena, Senior Data Scientist, Metis at MLconf SF 2016
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
 
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
Rajat Monga, Engineering Director, TensorFlow, Google at MLconf 2016
 
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
Alex Dimakis, Associate Professor, Dept. of Electrical and Computer Engineeri...
 
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
Irina Rish, Researcher, IBM Watson, at MLconf NYC 2017
 
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
Anjuli Kannan, Software Engineer, Google at MLconf SF 2016
 
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
Ben Lau, Quantitative Researcher, Hobbyist, at MLconf NYC 2017
 
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
Luna Dong, Principal Scientist, Amazon at MLconf Seattle 2017
 
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
Mayur Thakur, Managing Director, Goldman Sachs, at MLconf NYC 2017
 
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
Ashirth Barthur, Security Scientist, H2O, at MLconf Seattle 2017
 

Similar to Serena Yeung, PHD, Stanford, at MLconf Seattle 2017

論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future DirectionsToru Tamaki
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...sipij
 
Real-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataReal-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataIRJET Journal
 
Video search by deep-learning
Video search by deep-learningVideo search by deep-learning
Video search by deep-learningvoginip
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformFadwa Fouad
 
Recognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesRecognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesIJCSEA Journal
 
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background SubtractionA Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background SubtractionIRJET Journal
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventYara Ali
 
Video Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal MattingVideo Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal MattingMeidya Koeshardianto
 
Objective Evaluation of Video Quality
Objective Evaluation of Video QualityObjective Evaluation of Video Quality
Objective Evaluation of Video QualityAnton Venema
 
Measuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVMMeasuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVMIRJET Journal
 
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET Journal
 
Video Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPVideo Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPinventionjournals
 
Tablet tools and micro tasks
Tablet tools and micro tasksTablet tools and micro tasks
Tablet tools and micro tasksmhilde
 

Similar to Serena Yeung, PHD, Stanford, at MLconf Seattle 2017 (20)

Video + Language 2019
Video + Language 2019Video + Language 2019
Video + Language 2019
 
Video + Language
Video + LanguageVideo + Language
Video + Language
 
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
論文紹介:Temporal Sentence Grounding in Videos: A Survey and Future Directions
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
 
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
TARGET DETECTION AND CLASSIFICATION PERFORMANCE ENHANCEMENT USING SUPERRESOLU...
 
Real-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big DataReal-Time Video Copy Detection in Big Data
Real-Time Video Copy Detection in Big Data
 
presentation
presentationpresentation
presentation
 
Video+Language: From Classification to Description
Video+Language: From Classification to DescriptionVideo+Language: From Classification to Description
Video+Language: From Classification to Description
 
Video search by deep-learning
Video search by deep-learningVideo search by deep-learning
Video search by deep-learning
 
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon TransformHuman Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
Human Action Recognition in Videos Employing 2DPCA on 2DHOOF and Radon Transform
 
Recognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenesRecognition and tracking moving objects using moving camera in complex scenes
Recognition and tracking moving objects using moving camera in complex scenes
 
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background SubtractionA Smart Target Detection System using Fuzzy Logic and Background Subtraction
A Smart Target Detection System using Fuzzy Logic and Background Subtraction
 
Generating a time shrunk lecture video by event
Generating a time shrunk lecture video by eventGenerating a time shrunk lecture video by event
Generating a time shrunk lecture video by event
 
Video Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal MattingVideo Object Extraction Using Feature Matching Based on Nonlocal Matting
Video Object Extraction Using Feature Matching Based on Nonlocal Matting
 
Objective Evaluation of Video Quality
Objective Evaluation of Video QualityObjective Evaluation of Video Quality
Objective Evaluation of Video Quality
 
Measuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVMMeasuring Heart Rate from Video using EVM
Measuring Heart Rate from Video using EVM
 
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV CameraIRJET- Storage Optimization of Video Surveillance from CCTV Camera
IRJET- Storage Optimization of Video Surveillance from CCTV Camera
 
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
Multiple Object Tracking - Laura Leal-Taixe - UPC Barcelona 2018
 
Video Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAPVideo Manifold Feature Extraction Based on ISOMAP
Video Manifold Feature Extraction Based on ISOMAP
 
Tablet tools and micro tasks
Tablet tools and micro tasksTablet tools and micro tasks
Tablet tools and micro tasks
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingMLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushMLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceMLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionMLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLMLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksMLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldMLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeMLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareMLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesMLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 

Serena Yeung, PHD, Stanford, at MLconf Seattle 2017

  • 2.
  • 5. State-of-the-art in video understanding Classification Abu-El-Haija et al. 2016 4,800 categories 15.2 Top5 error
  • 6. State-of-the-art in video understanding Classification Detection Idrees et al. 2017, Sigurdsson et al. 2016Abu-El-Haija et al. 2016 4,800 categories 15.2 Top5 error Tens of categories ~10-20 mAP at 0.5 overlap
  • 7. State-of-the-art in video understanding Classification Detection Abu-El-Haija et al. 2016 Captioning 4,800 categories 15.2 Top5 error Yu et al. 2016 Just getting started: Short clips, niche domains Idrees et al. 2017, Sigurdsson et al. 2016 Tens of categories ~10-20 mAP at 0.5 overlap
  • 8. Comparing video with image understanding
  • 9. Comparing video with image understanding Classification 4,800 categories 15.2% Top5 error Videos Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Krizhevsky 2012, Xie 2016
  • 10. Comparing video with image understanding He 2017 Classification Detection 4,800 categories 15.2% Top5 error Tens of categories ~10-20 mAP at 0.5 overlap Videos Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Hundreds of categories* ~60 mAP at 0.5 overlap Pixel-level segmentation *Transfer learning widespread Krizhevsky 2012, Xie 2016
  • 11. Comparing video with image understanding He 2017 Johnson 2016, Krause 2017 Classification Detection Captioning 4,800 categories 15.2% Top5 error Just getting started: Short clips, niche domains Videos Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Hundreds of categories* ~60 mAP at 0.5 overlap Pixel-level segmentation *Transfer learning widespread Dense captioning Coherent paragraphs Krizhevsky 2012, Xie 2016 Tens of categories ~10-20 mAP at 0.5 overlap
  • 12. Comparing video with image understanding He 2017 Johnson 2016, Krause 2017 Classification Detection Captioning 4,800 categories 15.2 Top5 error Just getting started: Short clips, niche domains Videos Beyond Images 1,000 categories* 3.1% Top5 error *Transfer learning widespread Hundreds of categories* ~60 mAP at 0.5 overlap Pixel-level segmentation *Transfer learning widespread Dense captioning Coherent paragraphs Significant work on question-answering — Krizhevsky 2012, Xie 2016 Yang 2016 Tens of categories ~10-20 mAP at 0.5 overlap
  • 13. The challenge of scale Training labels Inference Models Video processing is computationally expensive Video annotation is labor-intensive Temporal dimension adds complexity
  • 14. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 15. Input Output t = 0 t = T Running Task: Temporal action detection Talking Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 16. Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Efficient video processing
  • 17. t = 0 t = T Output Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Frame model Input: a frame Our model for efficient action detection
  • 18. t = 0 t = T [ ] Output Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Output: Detection instance [start, end] Next frame to glimpse Frame model Input: a frame Our model for efficient action detection
  • 19. t = 0 t = T Recurrent neural network (time information) [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection
  • 20. t = 0 t = T Recurrent neural network (time information) [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection
  • 21. t = 0 t = T Recurrent neural network (time information) [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection
  • 22. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 23. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 24. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 25. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Output [ ] Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 26. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) [ ] [ ] Output: Detection instance [start, end] Next frame to glimpse Output Convolutional neural network (frame information) Output [ ] … Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 27. t = 0 t = T Output Our model for efficient action detection Recurrent neural network (time information) Output Convolutional neural network (frame information) Output [ ] … Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Optional output: Detection instance [start, end] Output: Next frame to glimpse
  • 28. Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Our model for efficient action detection • Train differentiable outputs (detection output class and bounds) using standard backpropagation • Train non-differentiable outputs (where to look next, when to emit a prediction) using reinforcement learning (REINFORCE algorithm) • Achieves detection performance on par with dense sliding window- based approaches, while observing only 2% of frames
  • 29. Learned policy in action Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016.
  • 30. Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Learned policy in action
  • 31. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 32. Dense action labeling Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 33. MultiTHUMOS • Extends the THUMOS’14 action detection dataset with dense, multilevel, frame-level action annotations for 30 hours across 400 videos THUMOS MultiTHUMOS Annotations 6,365 38,690 Classes 20 65 Density (labels / frame) 0.3 1.5 Classes per video 1.1 10.5 Max actions per frame 2 9 Max actions per video 3 25 Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 34. Modeling dense, multilabel actions • Need to reason about multiple potential actions simultaneously • High degree of temporal dependency • In standard recurrent models for action recognition, all state is in hidden layer representation • At each time step, makes prediction of current frame based on the current frame and previous hidden representation Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 35. MultiLSTM • Extension of LSTM that expands the temporal receptive field of input and output connections • Key idea: providing the model with more freedom in both reading input and writing output reduces the burden placed on the hidden layer representation Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 36. MultiLSTM Input video frames Frame class predictions t Standard LSTM …… Donahue 2014 Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 37. MultiLSTM Frame class predictions t …… Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Standard LSTM: Single input, single output Input video frames
  • 38. MultiLSTM Frame class predictions t …… Frame class predictions t MultiLSTM: Multiple inputs, multiple outputs …… Standard LSTM: Single input, single output Input video frames Input video frames Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 39. MultiLSTM Multiple Inputs (soft attention) Multiple Outputs (weighted average) Multilabel Loss Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 40. MultiLSTM Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 41. Retrieving sequential actions Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 42. Retrieving co-occurring actions Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017.
  • 43. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 44. Labeling videos is expensive • Takes significantly longer to label a video than an image • If spatial or temporal bounds desired, even worse • How can we practically learn about new concepts in video?
  • 45. Web queries are a source of noisy video labels
  • 46. Image search is much cleaner!
  • 47. Can we effectively learn from the noisy web queries? • Our approach: learn how to selective positive training examples from noisy queries in order to train classifiers for new classes • Use a reinforcement learning-based formulation to learn a data labeling policy that achieves strong performance on a small, manually-labeled dataset of classes • Then use this policy to automatically label noisy web data for new classes Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 48. Balancing diversity vs. semantic drift • Want diverse training examples to improve classifier • But too much diversity can also lead to semantic drift • Our approach: balance diversity and drift by training labeling policies using an annotated reward set which the policy must successfully classify Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 49. Overview of approach Boomerang … Boomerang on a beach Boomerang music video Classifier Candidate web queries (YouTube autocomplete) Agent Label new positives + Boomerang on a beach Current positive set Update classifier Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017. Update state
  • 50. Overview of approach Boomerang … Boomerang on a beach Boomerang music video Classifier Candidate web queries (YouTube autocomplete) Agent Label new positives + Boomerang on a beach Current positive set Update classifier Fixed negative set Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017. Update state
  • 51. Overview of approach Boomerang … Boomerang on a beach Boomerang music video Classifier Candidate web queries (YouTube autocomplete) Agent Label new positives + Boomerang on a beach Current positive set Update classifier Update state Fixed negative set Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017. Training reward Eval on reward set
  • 52. Sports1M Greedy classifier Ours Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 53. Sports1M Greedy classifier Ours Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 54. Novel classes Greedy classifier Ours Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei-Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 55. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017.
  • 56. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017. Learning to learn
  • 57. The challenge of scale Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Yeung, Russakovsky, Mori, Fei-Fei. End-to-end Learning of Action Detection from Frame Glimpses in Videos. CVPR 2016. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Russakovsky, Jin, Andriluka, Mori, Fei-Fei. Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos. IJCV 2017. Yeung, Ramanathan, Russakovsky, Shen, Mori, Fei- Fei. Learning to learn from noisy web videos. CVPR 2017. Learning to learn Unsupervised learning
  • 58. Towards Knowledge Training labels Inference Video processing is computationally expensive Video annotation is labor-intensive Models Temporal dimension adds complexity Videos Knowledge of the dynamic visual world
  • 59. Collaborators Olga Russakovsky Mykhaylo Andriluka Ning Jin Vignesh Ramanathan Liyue Shen Greg Mori Fei-Fei Li