SlideShare a Scribd company logo
1 of 23
Download to read offline
Deep Learning for Real-Time Atari
Game Play Using Offline Monte-Carlo
Tree Search Planning
Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang. NIPS 2014.
Yu Kai Huang
Outline
● Main idea
● Monte-Carlo Tree Search
○ Selection
○ Expansion
○ Simulation
○ Backpropagation
● Experiment
○ Three methods
○ Visualization
Main idea
Main Idea
“We achieve this by introducing new methods for combining RL and DL that use
slow, off-line Monte Carlo tree search planning methods to generate training
data for a deep-learned classifier capable of state-of-the-art real-time play.”
Deep Q-learning Network
Image from https://arxiv.org/pdf/1312.5602.pdf
Sampling training data
● Experience Replay
● ϵ−greedy action selection
○ Exploration & Exploitation
Sampling training data
● Off-line Monte Carlo tree search planning method
○ UCT-agent
Monte-Carlo Tree Search
MCTS
● The true value of any action can be approximated by running several random
simulations.
● These values can be efficiently used to adjust the policy (strategy) towards a
best-first strategy.
Image from https://www.zhihu.com/question/39916945
MCTS
● Iteratively building partial search tree
● Iteration
○ Most urgent node
■ Tree policy
■ Exploration/exploitation
○ Simulation
■ Add child node
■ Default policy
○ Update weights
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
MCTS - UCT
● Upper Confidence bounds for Trees
Image from https://www.researchgate.net/publication/220978338_Monte-Carlo_Tree_Search_A_New_Framework_for_Game_AI
MCTS - UCT
Selection
● Start at root node
● Based on Tree Policy select child: UCB
● Apply recursively - descend through tree
○ Stop when expandable node is reached
○ Expandable
■ Node that is non-terminal and has unexplored children
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
MCTS - UCT
Selection
● Start at root node
● Based on Tree Policy select child: UCB
● Apply recursively - descend through tree
○ Stop when expandable node is reached
○ Expandable
■ Node that is non-terminal and has unexplored children
Exploitation Exploration
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
MCTS - UCT
Expansion
● Add one or more child nodes to tree
○ Depends on what actions are available for the current position
○ Method in which this is done depends on Tree Policy
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
MCTS - UCT
Simulation
● Runs simulation of path that was selected
● Default Policy determines how simulation is run
● The outcome determines value
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
MCTS - UCT
Backpropagation
● Moves backward through saved path
● Value of Node
○ representative of benefit of going down that path from parent
● Values are updated dependent on board outcome
○ Based on how the simulated game ends, values are updated
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
MCTS - UCT
Image from https://zhuanlan.zhihu.com/p/30458774
Experiment
Three Methods
● UCTtoRegression
○ The UCT training data is used to train the CNN via regression.
● UCTtoClassification
○ The UCT training data is used to train the CNN via classification.
● UCTtoClassification-Interleaved
○ The UCT training data is used to train the CNN via classification.
○ Then use the trained CNN to decide action choices in collecting further runs.
○ Then finetune the trained CNN.
CNN Architecture
Experimental Results
Visualization of the first-layer features
Reference
[1] Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning,
https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tr
ee-search-planning
[2] Monte Carlo Tree Search and AlphaGo, Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar,
http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
[3] tobe: 如何学习蒙特卡罗树搜索(MCTS), https://zhuanlan.zhihu.com/p/30458774

More Related Content

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

Reinforcement learning in a nutshell
Reinforcement learning in a nutshellReinforcement learning in a nutshell
Reinforcement learning in a nutshellNing Zhou
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement LearningHung Le
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learningMarsan Ma
 
Building a deep learning ai.pptx
Building a deep learning ai.pptxBuilding a deep learning ai.pptx
Building a deep learning ai.pptxDaniel Slater
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchBill Liu
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudSigOpt
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsQuantUniversity
 
Ai and ml study group lecture 1 and 2
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2Ashley Davis
 
Machine Learning with Python
Machine Learning with Python Machine Learning with Python
Machine Learning with Python GLC Networks
 
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsSeung Jae Lee
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAlexey Grigorev
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement LearningCloudxLab
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Thamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL InternshipThamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL InternshipThamme Gowda
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...PATHALAMRAJESH
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 

Similar to Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (20)

Reinforcement learning in a nutshell
Reinforcement learning in a nutshellReinforcement learning in a nutshell
Reinforcement learning in a nutshell
 
Memory-based Reinforcement Learning
Memory-based Reinforcement LearningMemory-based Reinforcement Learning
Memory-based Reinforcement Learning
 
Introduction to reinforcement learning
Introduction to reinforcement learningIntroduction to reinforcement learning
Introduction to reinforcement learning
 
Building a deep learning ai.pptx
Building a deep learning ai.pptxBuilding a deep learning ai.pptx
Building a deep learning ai.pptx
 
An Introduction to Neural Architecture Search
An Introduction to Neural Architecture SearchAn Introduction to Neural Architecture Search
An Introduction to Neural Architecture Search
 
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana CloudUsing SigOpt to Tune Deep Learning Models with Nervana Cloud
Using SigOpt to Tune Deep Learning Models with Nervana Cloud
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
Time series analysis : Refresher and Innovations
Time series analysis : Refresher and InnovationsTime series analysis : Refresher and Innovations
Time series analysis : Refresher and Innovations
 
Ai and ml study group lecture 1 and 2
Ai and ml study group   lecture 1 and 2Ai and ml study group   lecture 1 and 2
Ai and ml study group lecture 1 and 2
 
Machine Learning with Python
Machine Learning with Python Machine Learning with Python
Machine Learning with Python
 
C3 w1
C3 w1C3 w1
C3 w1
 
Reinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular MethodsReinforcement Learning 8: Planning and Learning with Tabular Methods
Reinforcement Learning 8: Planning and Learning with Tabular Methods
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to Production
 
Reinforcement Learning
Reinforcement LearningReinforcement Learning
Reinforcement Learning
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Thamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL InternshipThamme Gowda's Summer2016- NASA JPL Internship
Thamme Gowda's Summer2016- NASA JPL Internship
 
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
Methodology (DLAI D6L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC                           ...
Copy of CRICKET MATCH WIN PREDICTOR USING LOGISTIC ...
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Poster
PosterPoster
Poster
 

More from 郁凱 黃

Human-level control through deep reinforcement learning
Human-level control through deep reinforcement learningHuman-level control through deep reinforcement learning
Human-level control through deep reinforcement learning郁凱 黃
 
Ring loss: Convex Feature Normalization for Face Recognition
Ring loss: Convex Feature Normalization for Face RecognitionRing loss: Convex Feature Normalization for Face Recognition
Ring loss: Convex Feature Normalization for Face Recognition郁凱 黃
 
Practical Block-wise Neural Network Architecture Generation
Practical Block-wise Neural Network Architecture GenerationPractical Block-wise Neural Network Architecture Generation
Practical Block-wise Neural Network Architecture Generation郁凱 黃
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning郁凱 黃
 
A Revisit of Feature Learning on CNN-based Face Recognition
A Revisit of Feature Learning on CNN-based Face RecognitionA Revisit of Feature Learning on CNN-based Face Recognition
A Revisit of Feature Learning on CNN-based Face Recognition郁凱 黃
 
Rose x Girl x White sheet
Rose x Girl x White sheetRose x Girl x White sheet
Rose x Girl x White sheet郁凱 黃
 
Akatsuki Hackathon 2015 Demo
Akatsuki Hackathon 2015 DemoAkatsuki Hackathon 2015 Demo
Akatsuki Hackathon 2015 Demo郁凱 黃
 
Introduction to FreeBSD commands
Introduction to FreeBSD commandsIntroduction to FreeBSD commands
Introduction to FreeBSD commands郁凱 黃
 
Introduction to FreeBSD commands(beta)
Introduction to FreeBSD commands(beta)Introduction to FreeBSD commands(beta)
Introduction to FreeBSD commands(beta)郁凱 黃
 
電競大賽說明會ppt
電競大賽說明會ppt電競大賽說明會ppt
電競大賽說明會ppt郁凱 黃
 

More from 郁凱 黃 (10)

Human-level control through deep reinforcement learning
Human-level control through deep reinforcement learningHuman-level control through deep reinforcement learning
Human-level control through deep reinforcement learning
 
Ring loss: Convex Feature Normalization for Face Recognition
Ring loss: Convex Feature Normalization for Face RecognitionRing loss: Convex Feature Normalization for Face Recognition
Ring loss: Convex Feature Normalization for Face Recognition
 
Practical Block-wise Neural Network Architecture Generation
Practical Block-wise Neural Network Architecture GenerationPractical Block-wise Neural Network Architecture Generation
Practical Block-wise Neural Network Architecture Generation
 
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement LearningPlaying Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
 
A Revisit of Feature Learning on CNN-based Face Recognition
A Revisit of Feature Learning on CNN-based Face RecognitionA Revisit of Feature Learning on CNN-based Face Recognition
A Revisit of Feature Learning on CNN-based Face Recognition
 
Rose x Girl x White sheet
Rose x Girl x White sheetRose x Girl x White sheet
Rose x Girl x White sheet
 
Akatsuki Hackathon 2015 Demo
Akatsuki Hackathon 2015 DemoAkatsuki Hackathon 2015 Demo
Akatsuki Hackathon 2015 Demo
 
Introduction to FreeBSD commands
Introduction to FreeBSD commandsIntroduction to FreeBSD commands
Introduction to FreeBSD commands
 
Introduction to FreeBSD commands(beta)
Introduction to FreeBSD commands(beta)Introduction to FreeBSD commands(beta)
Introduction to FreeBSD commands(beta)
 
電競大賽說明會ppt
電競大賽說明會ppt電競大賽說明會ppt
電競大賽說明會ppt
 

Recently uploaded

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 

Recently uploaded (20)

High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 

Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning

  • 1. Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang. NIPS 2014. Yu Kai Huang
  • 2. Outline ● Main idea ● Monte-Carlo Tree Search ○ Selection ○ Expansion ○ Simulation ○ Backpropagation ● Experiment ○ Three methods ○ Visualization
  • 4. Main Idea “We achieve this by introducing new methods for combining RL and DL that use slow, off-line Monte Carlo tree search planning methods to generate training data for a deep-learned classifier capable of state-of-the-art real-time play.”
  • 5. Deep Q-learning Network Image from https://arxiv.org/pdf/1312.5602.pdf
  • 6. Sampling training data ● Experience Replay ● ϵ−greedy action selection ○ Exploration & Exploitation
  • 7. Sampling training data ● Off-line Monte Carlo tree search planning method ○ UCT-agent
  • 9. MCTS ● The true value of any action can be approximated by running several random simulations. ● These values can be efficiently used to adjust the policy (strategy) towards a best-first strategy. Image from https://www.zhihu.com/question/39916945
  • 10. MCTS ● Iteratively building partial search tree ● Iteration ○ Most urgent node ■ Tree policy ■ Exploration/exploitation ○ Simulation ■ Add child node ■ Default policy ○ Update weights Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
  • 11. MCTS - UCT ● Upper Confidence bounds for Trees Image from https://www.researchgate.net/publication/220978338_Monte-Carlo_Tree_Search_A_New_Framework_for_Game_AI
  • 12. MCTS - UCT Selection ● Start at root node ● Based on Tree Policy select child: UCB ● Apply recursively - descend through tree ○ Stop when expandable node is reached ○ Expandable ■ Node that is non-terminal and has unexplored children Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
  • 13. MCTS - UCT Selection ● Start at root node ● Based on Tree Policy select child: UCB ● Apply recursively - descend through tree ○ Stop when expandable node is reached ○ Expandable ■ Node that is non-terminal and has unexplored children Exploitation Exploration Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
  • 14. MCTS - UCT Expansion ● Add one or more child nodes to tree ○ Depends on what actions are available for the current position ○ Method in which this is done depends on Tree Policy Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
  • 15. MCTS - UCT Simulation ● Runs simulation of path that was selected ● Default Policy determines how simulation is run ● The outcome determines value Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
  • 16. MCTS - UCT Backpropagation ● Moves backward through saved path ● Value of Node ○ representative of benefit of going down that path from parent ● Values are updated dependent on board outcome ○ Based on how the simulated game ends, values are updated Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
  • 17. MCTS - UCT Image from https://zhuanlan.zhihu.com/p/30458774
  • 19. Three Methods ● UCTtoRegression ○ The UCT training data is used to train the CNN via regression. ● UCTtoClassification ○ The UCT training data is used to train the CNN via classification. ● UCTtoClassification-Interleaved ○ The UCT training data is used to train the CNN via classification. ○ Then use the trained CNN to decide action choices in collecting further runs. ○ Then finetune the trained CNN.
  • 22. Visualization of the first-layer features
  • 23. Reference [1] Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning, https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tr ee-search-planning [2] Monte Carlo Tree Search and AlphaGo, Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar, http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf [3] tobe: 如何学习蒙特卡罗树搜索(MCTS), https://zhuanlan.zhihu.com/p/30458774