Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
- Author: Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang
- Origin: https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tree-search-planning
- Related: https://github.com/number9473/nn-algorithm/issues/251
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning
1. Deep Learning for Real-Time Atari
Game Play Using Offline Monte-Carlo
Tree Search Planning
Xiaoxiao Guo, Satinder Singh, Honglak Lee, Richard L. Lewis, Xiaoshi Wang. NIPS 2014.
Yu Kai Huang
2. Outline
● Main idea
● Monte-Carlo Tree Search
○ Selection
○ Expansion
○ Simulation
○ Backpropagation
● Experiment
○ Three methods
○ Visualization
4. Main Idea
“We achieve this by introducing new methods for combining RL and DL that use
slow, off-line Monte Carlo tree search planning methods to generate training
data for a deep-learned classifier capable of state-of-the-art real-time play.”
9. MCTS
● The true value of any action can be approximated by running several random
simulations.
● These values can be efficiently used to adjust the policy (strategy) towards a
best-first strategy.
Image from https://www.zhihu.com/question/39916945
10. MCTS
● Iteratively building partial search tree
● Iteration
○ Most urgent node
■ Tree policy
■ Exploration/exploitation
○ Simulation
■ Add child node
■ Default policy
○ Update weights
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
11. MCTS - UCT
● Upper Confidence bounds for Trees
Image from https://www.researchgate.net/publication/220978338_Monte-Carlo_Tree_Search_A_New_Framework_for_Game_AI
12. MCTS - UCT
Selection
● Start at root node
● Based on Tree Policy select child: UCB
● Apply recursively - descend through tree
○ Stop when expandable node is reached
○ Expandable
■ Node that is non-terminal and has unexplored children
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
13. MCTS - UCT
Selection
● Start at root node
● Based on Tree Policy select child: UCB
● Apply recursively - descend through tree
○ Stop when expandable node is reached
○ Expandable
■ Node that is non-terminal and has unexplored children
Exploitation Exploration
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
14. MCTS - UCT
Expansion
● Add one or more child nodes to tree
○ Depends on what actions are available for the current position
○ Method in which this is done depends on Tree Policy
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
15. MCTS - UCT
Simulation
● Runs simulation of path that was selected
● Default Policy determines how simulation is run
● The outcome determines value
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
16. MCTS - UCT
Backpropagation
● Moves backward through saved path
● Value of Node
○ representative of benefit of going down that path from parent
● Values are updated dependent on board outcome
○ Based on how the simulated game ends, values are updated
Image from http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
19. Three Methods
● UCTtoRegression
○ The UCT training data is used to train the CNN via regression.
● UCTtoClassification
○ The UCT training data is used to train the CNN via classification.
● UCTtoClassification-Interleaved
○ The UCT training data is used to train the CNN via classification.
○ Then use the trained CNN to decide action choices in collecting further runs.
○ Then finetune the trained CNN.
23. Reference
[1] Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning,
https://papers.nips.cc/paper/5421-deep-learning-for-real-time-atari-game-play-using-offline-monte-carlo-tr
ee-search-planning
[2] Monte Carlo Tree Search and AlphaGo, Suraj Nair, Peter Kundzicz, Kevin An, Vansh Kumar,
http://www.yisongyue.com/courses/cs159/lectures/MCTS.pdf
[3] tobe: 如何学习蒙特卡罗树搜索(MCTS), https://zhuanlan.zhihu.com/p/30458774