PR-455: CoTracker: It is Better to Track Together

•

0 likes•47 views

H

이번 영상에서는 제가 PR 278번째로 소개드린 적 있었던 RAFT의 Point Tracking 버전 논문입니다. 보통 Object Traking은 주어진 bounding box를 track하는 task를 말하는데 본 논문에서는 첫 프레임에 주어진 point를 따라가는 task를 다루고 있습니다. 논문 제목에서 이야기 하듯이, 주어진 point 하나를 따라가는 것보다 여러 point를 함께 따라가면서 서로 정보를 주고받는 등의 interaction을 하는 것이 tracking 성능 향상에 도움이 된다는 것이 이 논문의 main idea입니다. 논문 링크: https://arxiv.org/abs/2307.07635 영상 링크: https://youtu.be/BDfTSm3_hys

CoTracker: It is Better to
Track Together
PR-455
Hyeongmin Lee
Twelve Labs

2023.7.14 공개
Meta AI
arXiv (NeurIPS Format)

Somoothness Constraint [PR-302]

Somoothness Constraint [PR-302]

Optical Flow + Tracking
“Tracking Together”

Notations
Data
Input Output

Features
Image Features
Track Features
⇒ properties of
each track point
⇒
(Like RAFT)

Correlation Feature

Tokens

Initialization
Windowed Inference

Other Details
Grid-Time Factorization
⇒
Point Sampling

Experiments

Experiments

Experiments

“Alignment”
Conclusion
Struggling to break down the barrier between video understanding and pixel
correspondence, but it's still not easy.

Recommended

PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...

PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...

PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...Hyeongmin Lee

PR-420: Scalable Model Compression by Entropy Penalized Reparameterization

PR-420: Scalable Model Compression by Entropy Penalized Reparameterization

PR-420: Scalable Model Compression by Entropy Penalized ReparameterizationHyeongmin Lee

PR-409: Denoising Diffusion Probabilistic Models

PR-409: Denoising Diffusion Probabilistic Models

PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee

PR-395: Variational Image Compression with a Scale Hyperprior

PR-395: Variational Image Compression with a Scale Hyperprior

PR-395: Variational Image Compression with a Scale HyperpriorHyeongmin Lee

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...Hyeongmin Lee

PR-376: Softmax Splatting for Video Frame Interpolation

PR-376: Softmax Splatting for Video Frame Interpolation

PR-376: Softmax Splatting for Video Frame InterpolationHyeongmin Lee

PR-365: Fast object detection in compressed video

PR-365: Fast object detection in compressed video

PR-365: Fast object detection in compressed videoHyeongmin Lee

PR-340: DVC: An End-to-end Deep Video Compression Framework

PR-340: DVC: An End-to-end Deep Video Compression Framework

PR-340: DVC: An End-to-end Deep Video Compression FrameworkHyeongmin Lee

Recommended

PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...

PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...

PR-430: CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retri...Hyeongmin Lee

PR-420: Scalable Model Compression by Entropy Penalized Reparameterization

PR-420: Scalable Model Compression by Entropy Penalized Reparameterization

PR-420: Scalable Model Compression by Entropy Penalized ReparameterizationHyeongmin Lee

PR-409: Denoising Diffusion Probabilistic Models

PR-409: Denoising Diffusion Probabilistic Models

PR-409: Denoising Diffusion Probabilistic ModelsHyeongmin Lee

PR-395: Variational Image Compression with a Scale Hyperprior

PR-395: Variational Image Compression with a Scale Hyperprior

PR-395: Variational Image Compression with a Scale HyperpriorHyeongmin Lee

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...

PR-386: Light Field Networks: Neural Scene Representations with Single-Evalua...Hyeongmin Lee

PR-376: Softmax Splatting for Video Frame Interpolation

PR-376: Softmax Splatting for Video Frame Interpolation

PR-376: Softmax Splatting for Video Frame InterpolationHyeongmin Lee

PR-365: Fast object detection in compressed video

PR-365: Fast object detection in compressed video

PR-365: Fast object detection in compressed videoHyeongmin Lee

PR-340: DVC: An End-to-end Deep Video Compression Framework

PR-340: DVC: An End-to-end Deep Video Compression Framework

PR-340: DVC: An End-to-end Deep Video Compression FrameworkHyeongmin Lee

PR-328: End-to-End OptimizedImage Compression

PR-328: End-to-End OptimizedImage Compression

PR-328: End-to-End OptimizedImage CompressionHyeongmin Lee

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-315: Taming Transformers for High-Resolution Image SynthesisHyeongmin Lee

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowHyeongmin Lee

Pr266Hyeongmin Lee

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-252: Making Convolutional Networks Shift-Invariant AgainHyeongmin Lee

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...Hyeongmin Lee

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...Hyeongmin Lee

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

PR-214: FlowNet: Learning Optical Flow with Convolutional NetworksHyeongmin Lee

[PR12] Making Convolutional Networks Shift-Invariant Again

[PR12] Making Convolutional Networks Shift-Invariant Again

[PR12] Making Convolutional Networks Shift-Invariant AgainHyeongmin Lee

Latest Frame interpolation Algorithms

Latest Frame interpolation Algorithms

Latest Frame interpolation AlgorithmsHyeongmin Lee

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

[Paper Review] Temporal Generative Adversarial Nets with Singular Value ClippingHyeongmin Lee

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...Hyeongmin Lee

[Paper Review] Video Frame Interpolation via Adaptive Convolution

[Paper Review] Video Frame Interpolation via Adaptive Convolution

[Paper Review] Video Frame Interpolation via Adaptive ConvolutionHyeongmin Lee

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] A spatio -Temporal Descriptor Based on 3D -GradientsHyeongmin Lee

[Paper Review] Unmasking the abnormal events in video

[Paper Review] Unmasking the abnormal events in video

[Paper Review] Unmasking the abnormal events in videoHyeongmin Lee

GAN with Mathematics

GAN with Mathematics

GAN with MathematicsHyeongmin Lee

[Paper Review] Image captioning with semantic attention

[Paper Review] Image captioning with semantic attention

[Paper Review] Image captioning with semantic attentionHyeongmin Lee

Git commandHyeongmin Lee

Data Visualization and t-SNE

Data Visualization and t-SNE

Data Visualization and t-SNEHyeongmin Lee

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?Jay Park

공학 관점에서 바라본 JMP 머신러닝 최적화

공학 관점에서 바라본 JMP 머신러닝 최적화

공학 관점에서 바라본 JMP 머신러닝 최적화JMP Korea

More Related Content

More from Hyeongmin Lee

PR-328: End-to-End OptimizedImage Compression

PR-328: End-to-End OptimizedImage Compression

PR-328: End-to-End OptimizedImage CompressionHyeongmin Lee

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-315: Taming Transformers for High-Resolution Image SynthesisHyeongmin Lee

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical FlowHyeongmin Lee

Pr266Hyeongmin Lee

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-252: Making Convolutional Networks Shift-Invariant AgainHyeongmin Lee

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...Hyeongmin Lee

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...Hyeongmin Lee

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

PR-214: FlowNet: Learning Optical Flow with Convolutional NetworksHyeongmin Lee

[PR12] Making Convolutional Networks Shift-Invariant Again

[PR12] Making Convolutional Networks Shift-Invariant Again

[PR12] Making Convolutional Networks Shift-Invariant AgainHyeongmin Lee

Latest Frame interpolation Algorithms

Latest Frame interpolation Algorithms

Latest Frame interpolation AlgorithmsHyeongmin Lee

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

[Paper Review] Temporal Generative Adversarial Nets with Singular Value ClippingHyeongmin Lee

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...Hyeongmin Lee

[Paper Review] Video Frame Interpolation via Adaptive Convolution

[Paper Review] Video Frame Interpolation via Adaptive Convolution

[Paper Review] Video Frame Interpolation via Adaptive ConvolutionHyeongmin Lee

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] A spatio -Temporal Descriptor Based on 3D -GradientsHyeongmin Lee

[Paper Review] Unmasking the abnormal events in video

[Paper Review] Unmasking the abnormal events in video

[Paper Review] Unmasking the abnormal events in videoHyeongmin Lee

GAN with Mathematics

GAN with Mathematics

GAN with MathematicsHyeongmin Lee

[Paper Review] Image captioning with semantic attention

[Paper Review] Image captioning with semantic attention

[Paper Review] Image captioning with semantic attentionHyeongmin Lee

Git commandHyeongmin Lee

Data Visualization and t-SNE

Data Visualization and t-SNE

Data Visualization and t-SNEHyeongmin Lee

More from Hyeongmin Lee (20)

PR-328: End-to-End OptimizedImage Compression

PR-328: End-to-End OptimizedImage Compression

PR-328: End-to-End OptimizedImage Compression

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-315: Taming Transformers for High-Resolution Image Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

PR-278: RAFT: Recurrent All-Pairs Field Transforms for Optical Flow

Pr266

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-252: Making Convolutional Networks Shift-Invariant Again

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-240: Modulating Image Restoration with Continual Levels viaAdaptive Featu...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-228: Geonet: Unsupervised learning of dense depth, optical flow and camera...

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

PR-214: FlowNet: Learning Optical Flow with Convolutional Networks

[PR12] Making Convolutional Networks Shift-Invariant Again

[PR12] Making Convolutional Networks Shift-Invariant Again

[PR12] Making Convolutional Networks Shift-Invariant Again

Latest Frame interpolation Algorithms

Latest Frame interpolation Algorithms

Latest Frame interpolation Algorithms

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

[Paper Review] Temporal Generative Adversarial Nets with Singular Value Clipping

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] A Middlebury Benchmark & Context-Aware Synthesis for Video Fra...

[Paper Review] Video Frame Interpolation via Adaptive Convolution

[Paper Review] Video Frame Interpolation via Adaptive Convolution

[Paper Review] Video Frame Interpolation via Adaptive Convolution

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] A spatio -Temporal Descriptor Based on 3D -Gradients

[Paper Review] Unmasking the abnormal events in video

[Paper Review] Unmasking the abnormal events in video

[Paper Review] Unmasking the abnormal events in video

GAN with Mathematics

GAN with Mathematics

GAN with Mathematics

[Paper Review] Image captioning with semantic attention

[Paper Review] Image captioning with semantic attention

[Paper Review] Image captioning with semantic attention

Git command

Data Visualization and t-SNE

Data Visualization and t-SNE

Data Visualization and t-SNE

Recently uploaded

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?Jay Park

공학 관점에서 바라본 JMP 머신러닝 최적화

공학 관점에서 바라본 JMP 머신러닝 최적화

공학 관점에서 바라본 JMP 머신러닝 최적화JMP Korea

실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석

실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석

실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석JMP Korea

JMP가 걸어온 여정, 새로운 도약 JMP 18!

JMP가 걸어온 여정, 새로운 도약 JMP 18!

JMP가 걸어온 여정, 새로운 도약 JMP 18!JMP Korea

JMP를 활용한 가속열화 분석 사례

JMP를 활용한 가속열화 분석 사례

JMP를 활용한 가속열화 분석 사례JMP Korea

JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology

JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology

JMP를 활용한 전자/반도체 산업 Yield Enhancement MethodologyJMP Korea

데이터 분석 문제 해결을 위한 나의 JMP 활용법

데이터 분석 문제 해결을 위한 나의 JMP 활용법

데이터 분석 문제 해결을 위한 나의 JMP 활용법JMP Korea

JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개

JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개

JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개JMP Korea

Recently uploaded (8)

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?

(독서광) 인간이 초대한 대형 참사 - 대형 참사가 일어날 때까지 사람들은 무엇을 하고 있었는가?

공학 관점에서 바라본 JMP 머신러닝 최적화

공학 관점에서 바라본 JMP 머신러닝 최적화

공학 관점에서 바라본 JMP 머신러닝 최적화

실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석

실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석

실험 설계의 평가 방법: Custom Design을 중심으로 반응인자 최적화 및 Criteria 해석

JMP가 걸어온 여정, 새로운 도약 JMP 18!

JMP가 걸어온 여정, 새로운 도약 JMP 18!

JMP가 걸어온 여정, 새로운 도약 JMP 18!

JMP를 활용한 가속열화 분석 사례

JMP를 활용한 가속열화 분석 사례

JMP를 활용한 가속열화 분석 사례

JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology

JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology

JMP를 활용한 전자/반도체 산업 Yield Enhancement Methodology

데이터 분석 문제 해결을 위한 나의 JMP 활용법

데이터 분석 문제 해결을 위한 나의 JMP 활용법

데이터 분석 문제 해결을 위한 나의 JMP 활용법

JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개

JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개

JMP 기능의 확장 및 내재화의 핵심 JMP-Python 소개

PR-455: CoTracker: It is Better to Track Together

1. CoTracker: It is Better to Track Together PR-455 Hyeongmin Lee Twelve Labs

2. 2023.7.14 공개 Meta AI arXiv (NeurIPS Format)

3. Somoothness Constraint [PR-302]

4. Somoothness Constraint [PR-302]

5. Optical Flow + Tracking “Tracking Together”

6. Notations Data Input Output

7. Features Image Features Track Features ⇒ properties of each track point ⇒ (Like RAFT)

8. Correlation Feature

10. Initialization Windowed Inference

11. Other Details Grid-Time Factorization ⇒ Point Sampling

12. Experiments

13. Experiments

14. Experiments

15. “Alignment” Conclusion Struggling to break down the barrier between video understanding and pixel correspondence, but it's still not easy.