Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

•

1 like•237 views

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning - Author: Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang - Origin: https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tree-search-planning - Related: https://github.com/number9473/nn-algorithm/issues/251

Engineering

Deep Learning for Real-Time Atari
Game Play Using Offline Monte-Carlo
Tree Search Planning
Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang. NIPS 2014.
Yu Kai Huang

Outline
● Main idea
● Monte-Carlo Tree Search
○ Selection
○ Expansion
○ Simulation
○ Backpropagation
● Experiment
○ Three methods
○ Visualization

Main Idea
“We achieve this by introducing new methods for combining RL and DL that use
slow, off-line Monte Carlo tree search planning methods to generate training
data for a deep-learned classifier capable of state-of-the-art real-time play.”

Deep Q-learning Network
Image from https://arxiv.org/pdf/1312.5602.pdf

Sampling training data
● Experience Replay
● ϵ−greedy action selection
○ Exploration & Exploitation

Sampling training data
● Off-line Monte Carlo tree search planning method
○ UCT-agent

MCTS
● The true value of any action can be approximated by running several random
simulations.
● These values can be efficiently used to adjust the policy (strategy) towards a
best-first strategy.
Image from https://www.zhihu.com/question/39916945

MCTS
● Iteratively building partial search tree
● Iteration
○ Most urgent node
■ Tree policy
■ Exploration/exploitation
○ Simulation
■ Add child node
■ Default policy
○ Update weights
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
● Upper Confidence bounds for Trees
Image from https://www.researchgate.net/publication/220978338_Monte-Carlo_Tree_Search_A_New_Framework_for_Game_AI

MCTS - UCT
Selection
● Start at root node
● Based on Tree Policy select child: UCB
● Apply recursively - descend through tree
○ Stop when expandable node is reached
○ Expandable
■ Node that is non-terminal and has unexplored children
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Expansion
● Add one or more child nodes to tree
○ Depends on what actions are available for the current position
○ Method in which this is done depends on Tree Policy
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Simulation
● Runs simulation of path that was selected
● Default Policy determines how simulation is run
● The outcome determines value
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Backpropagation
● Moves backward through saved path
● Value of Node
○ representative of benefit of going down that path from parent
● Values are updated dependent on board outcome
○ Based on how the simulated game ends, values are updated
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

MCTS - UCT
Image from https://zhuanlan.zhihu.com/p/30458774

Three Methods
● UCTtoRegression
○ The UCT training data is used to train the CNN via regression.
● UCTtoClassification
○ The UCT training data is used to train the CNN via classification.
● UCTtoClassification-Interleaved
○ The UCT training data is used to train the CNN via classification.
○ Then use the trained CNN to decide action choices in collecting further runs.
○ Then finetune the trained CNN.

Visualization of the first-layer features

Reference
[1] Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning,
https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tr
ee-search-planning
[2] Monte Carlo Tree Search and AlphaGo, Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar,
http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
[3] tobe: 如何学习蒙特卡罗树搜索（MCTS）, https://zhuanlan.zhihu.com/p/30458774

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Reinforcement learning in a nutshellNing Zhou

Memory-based Reinforcement LearningHung Le

Introduction to reinforcement learningMarsan Ma

Building a deep learning ai.pptxDaniel Slater

An Introduction to Neural Architecture SearchBill Liu

Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt

Web Traffic Time Series ForecastingBillTubbs

Time series analysis : Refresher and InnovationsQuantUniversity

Ai and ml study group lecture 1 and 2Ashley Davis

Machine Learning with Python GLC Networks

C3 w1Ajay Taneja

Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee

Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev

Reinforcement LearningCloudxLab

Sequential Decision Making in RecommendationsJaya Kawale

Thamme Gowda's Summer2016- NASA JPL InternshipThamme Gowda

Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya

Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH

A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale

PosterCollin Purcell

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (20)

Reinforcement learning in a nutshell

Memory-based Reinforcement Learning

Introduction to reinforcement learning

Building a deep learning ai.pptx

An Introduction to Neural Architecture Search

Using SigOpt to Tune Deep Learning Models with Nervana Cloud

Web Traffic Time Series Forecasting

Time series analysis : Refresher and Innovations

Ai and ml study group lecture 1 and 2

Machine Learning with Python

C3 w1

Reinforcement Learning 8: Planning and Learning with Tabular Methods

Automatic Image Cropping - A journey from a Master Thesis to Production

Reinforcement Learning

Sequential Decision Making in Recommendations

Thamme Gowda's Summer2016- NASA JPL Internship

Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)

Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...

A Multi-Armed Bandit Framework For Recommendations at Netflix

Poster

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat

College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR9953056974 Low Rate Call Girls In Saket, Delhi NCR

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Current Transformer Drawing and GTP for MSETCLDeelipZope

Internship report on mechanical engineeringmalavadedarshan25

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N

IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst

Analog to Digital and Digital to Analog ConverterAbhinavSharma374939

GDSC ASEB Gen AI study jams presentationGDSCAESB

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Introduction to Multiple Access Protocol.pptxupamatechverse

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik

Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...

(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

Coefficient of Thermal Expansion and their Importance.pptx

IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Current Transformer Drawing and GTP for MSETCL

Internship report on mechanical engineering

MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE

IVE Industry Focused Event - Defence Sector 2024

Analog to Digital and Digital to Analog Converter

GDSC ASEB Gen AI study jams presentation

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...

Processing & Properties of Floor and Wall Tiles.pptx

Introduction to Multiple Access Protocol.pptx

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

1. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang. NIPS 2014. Yu Kai Huang

2. Outline ● Main idea ● Monte-Carlo Tree Search ○ Selection ○ Expansion ○ Simulation ○ Backpropagation ● Experiment ○ Three methods ○ Visualization

3. Main idea

4. Main Idea “We achieve this by introducing new methods for combining RL and DL that use slow, off-line Monte Carlo tree search planning methods to generate training data for a deep-learned classifier capable of state-of-the-art real-time play.”

5. Deep Q-learning Network Image from https://arxiv.org/pdf/1312.5602.pdf

6. Sampling training data ● Experience Replay ● ϵ−greedy action selection ○ Exploration & Exploitation

7. Sampling training data ● Off-line Monte Carlo tree search planning method ○ UCT-agent

8. Monte-Carlo Tree Search

9. MCTS ● The true value of any action can be approximated by running several random simulations. ● These values can be efficiently used to adjust the policy (strategy) towards a best-first strategy. Image from https://www.zhihu.com/question/39916945

10. MCTS ● Iteratively building partial search tree ● Iteration ○ Most urgent node ■ Tree policy ■ Exploration/exploitation ○ Simulation ■ Add child node ■ Default policy ○ Update weights Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

11. MCTS - UCT ● Upper Confidence bounds for Trees Image from https://www.researchgate.net/publication/220978338_Monte-Carlo_Tree_Search_A_New_Framework_for_Game_AI

12. MCTS - UCT Selection ● Start at root node ● Based on Tree Policy select child: UCB ● Apply recursively - descend through tree ○ Stop when expandable node is reached ○ Expandable ■ Node that is non-terminal and has unexplored children Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

13. MCTS - UCT Selection ● Start at root node ● Based on Tree Policy select child: UCB ● Apply recursively - descend through tree ○ Stop when expandable node is reached ○ Expandable ■ Node that is non-terminal and has unexplored children Exploitation Exploration Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

14. MCTS - UCT Expansion ● Add one or more child nodes to tree ○ Depends on what actions are available for the current position ○ Method in which this is done depends on Tree Policy Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

15. MCTS - UCT Simulation ● Runs simulation of path that was selected ● Default Policy determines how simulation is run ● The outcome determines value Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

16. MCTS - UCT Backpropagation ● Moves backward through saved path ● Value of Node ○ representative of benefit of going down that path from parent ● Values are updated dependent on board outcome ○ Based on how the simulated game ends, values are updated Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf

17. MCTS - UCT Image from https://zhuanlan.zhihu.com/p/30458774

18. Experiment

19. Three Methods ● UCTtoRegression ○ The UCT training data is used to train the CNN via regression. ● UCTtoClassification ○ The UCT training data is used to train the CNN via classification. ● UCTtoClassification-Interleaved ○ The UCT training data is used to train the CNN via classification. ○ Then use the trained CNN to decide action choices in collecting further runs. ○ Then finetune the trained CNN.

20. CNN Architecture

21. Experimental Results

22. Visualization of the first-layer features

23. Reference [1] Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tr ee-search-planning [2] Monte Carlo Tree Search and AlphaGo, Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar, http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf [3] tobe: 如何学习蒙特卡罗树搜索（MCTS）, https://zhuanlan.zhihu.com/p/30458774

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Recommended

Recommended

More Related Content

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (20)

More from 郁凱黃

More from 郁凱黃 (10)

Recently uploaded

Recently uploaded (20)

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Recommended

Recommended

More Related Content

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (20)

More from 郁凱 黃

More from 郁凱 黃 (10)

Recently uploaded

Recently uploaded (20)

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

More from 郁凱黃

More from 郁凱黃 (10)