Artículo publicado en el reciente CoSECiVi 2020, celebrado online el 7 y 8 de octubre de 2020.
RESUMEN:
The core challenge facing search techniques when used to play Real-Time Strategy (RTS) games is the extensive combinatorial decision space. Several approaches were proposed to alleviate this dimensionality burden, using scripts or action probability distributions, based on expert knowledge. We propose to replace expert-authored scripts by a collection of smaller parametric scripts we call heuristics and use them to pre-select actions for Monte Carlo Tree Search (MCTS). The advantages of this proposal consist of granular control of the decision space and the ability to adapt the agent’s strategy in-game, all by altering the heuristics and their parameters. Experimentation results in μRTS using a proposed implementation have shown a significant performance gain over state-of-the-art agents.
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
CoSECiVi 2020 - Parametric Action Pre-Selection for MCTS in Real-Time Strategy Games
1. Parametric Action Pre-Selection for MCTS in
Real-Time Strategy Games
Abdessamed Ouessai, Mohammed Salem, and Antonio M. Mora
University of Mascara,
Algeria
University of Granada,
Spain
VI CoSECiVi-2020
2. → Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
3. → Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
4. Introduction
→ First game AI research domain: Classic board games
→ Evolution of board games is constrained by physics
→ Video games represent an unconstrained medium
→ Real-Time Strategy sub-genre concretized abstract board games (Warfare)
→ RTS Games are an evolution of abstract board games
→ ++ Concrete | ++ Challenging for humans | ++ Complex for AI
1
5. → Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
6. RTS Games & AI
→ Multiplayer, zero-sum, non-deterministic game with imperfect information.
→ Top-down perspective. Recognizable mouse and keyboard-based UI.
General Strategy
Gather Build & Train Confront
Destruction of Opponent’s Forces
Units Structures Resources
Victory
Condition
2
7. RTS Games & AI
→ What does an RTS game-playing AI have to deal with?
3
Short decision cycles (~50/s) Simultaneous moves for different units
Durative actions (> one decision cycle)
Non-determinismPartial observability (opponent & environment)
Exponential growth of the decision/state spaces
Chess Go StarCraft
Branching Factor 36 180 1050
State Space 1047 10171 101685
Real-Time Aspect
Uncertainty
Complexity Large topographic environments
Approximate
Estimates
8. RTS Games & AI
→ Notable developments:
→ Scripts: Portfolio Greedy Search (Churchill et al, 2013), Puppet Search (Barriga et al, 2015)
→ Learning: Bayesian Models (Synnaeve et al, 2011), AlphaStar (Vinyals et al, 2019)
→ Planning: NaïveMCTS (Ontañón, 2013), AHTN (Ontañón and Buro, 2015), CCG (Kantharaju et al, 2018)
→ Evaluation: CNN (Stanescu et al, 2016), (Barriga et al, 2019)
→ Competitions:
→ IEEE CoG (StarCraft & µRTS), AAAI AIIDE (StarCraft), SSCAIT
→ RTS AI Testbeds:
→ ORTS – Wargus – BWAPI(SC) – SparCraft – SC2LE – ELF – DeepRTS - µRTS.
4
9. → Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
10. Monte Carlo Tree Search
→ An iterative, anytime, sampling-based search framework
→ Main components:
→ Tree Policy
→ Default Policy
→ Popular variant:
→ UCT (UCB1 as Tree Policy)
→ Popular application:
→ Go (AlphaGo)
→ Downside:
→ Scalability issues
5
Tree Policy
Reward
Default Policy
(4) Backpropagation(3) Simulation(2) Expansion(1) Selection
11. Monte Carlo Tree Search
→ Proposed solutions to enhance MCTS scalability:
6
CMAB
Abstraction
→ Selection phase framed as a Combinatorial Multi-Armed Bandit problem
→ NaïveMCTS is based on a CMAB formulation and a naïve assumption
𝑎1 𝑎2 𝑎3 … 𝑎 𝑛
𝑣1 𝑣2 𝑣3 … 𝑣 𝑛
𝑢1 𝑢2 𝑢3 … 𝑢 𝑛Units
Player Action
(𝛼 𝑡)
Values
𝑣𝑖 =
𝑛
𝑖=1
𝑉(𝛼 𝑡)
(The naïve assumption)
→ Search the decision space induced by expert-authored scripts instead of the original
decision space
→ Downsides: (1) Sacrifices tactical performance. (2) Performance depends on scripts
→ Successfully adapts MCTS to combinatorial decision spaces (ex. RTS Games)
→ Downside: The algorithm is still affected by the dimensionality of the decision space.
12. Monte Carlo Tree Search
→ Our proposition:
→ A multi-stage parametric action pre-selection scheme to control the decision space
and its granularity
→ Combine abstraction with CMAB (NaïveMCTS) using small-scale parametric scripts
(heuristics)
→ Define a strategy as a collection of heuristics and parameters
7
13. → Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
14. Parametric Action Pre-Selection
→ Expert-authored scripts usually encode a deterministic strategy using a limited portion of
the decision space
→ How to generate novel strategies that can better exploit the available actions?
→ How to preserve low-level tactical performance?
→ A strategy is a combination of heuristics
8
Direct offense
heuristic
Harvest heuristicTrain heuristic
Worker Rush
Strategy
→ Heuristic: A parametric single-goal procedure for
controlling a sub-group of units
→ Single unit:
ℎ ∈ H ∶ 𝑆 × 𝑈 × 𝐴𝑙
× 𝑅ℎ → 𝐴 𝑘
𝑘 ≤ 𝑙
→ 𝑆 : States, 𝑈 : Units, 𝐴 : Unit-Actions, 𝑅ℎ : Parameters
→ Group of units: applied to each member
→ In expert-authored scripts, 𝑘 = 1 and 𝑅ℎ = 1
15. Parametric Action Pre-Selection
→ Action Pre-Selection: Downsizing the decision space by selecting a subset of actions satisfying a certain
criterion (strategy), prior to planning
→ When 𝑘 > 1 the final decision will be made by a a search approach (ex. MCTS)
→ A unit partitioning 𝑑 ∈ D determines unit groups (manually or automatically)
→ Each unit group is associated with a heuristic. Heuristics’ output defines the search space
9
Planning (MCTS)Pre-Selected ActionsOriginal Actions
Partitioning
Heuristics
Parameters
Action
Pre-Selection
16. Parametric Action Pre-Selection
→ The general algorithm:
→ Pre-selected actions are refined over successive phases
→ Parametric Action Pre-Selection: 𝑇(𝑠, 𝑈, 𝐴0, 𝑥1, … , 𝑥 𝑛) with 𝑥𝑖(𝐴𝑖−1, 𝑑𝑖, 𝐻𝑖, 𝜃𝑖)
→ A strategy can be expressed as: 𝜎 = (𝑑1, … , 𝑑 𝑛, 𝐻𝑖, … , 𝐻 𝑛, 𝜃1, … , 𝜃 𝑛)
10
A
d1
g1
gm1
H1
h1
hm1
A
Ò1
d2
g1
gm2
H2
h1
h m2
Ò2
A n-110
dn
g1
gmn
H n
h1
hmn
Òn
Game State s
Units U
A n
Search
Execution
𝑥1 𝑥2 𝑥 𝑛
𝑇
17. Parametric Action Pre-Selection
→ Proposed implementation: ParaMCTS
→ A 2-phase action pre-selection process using NaïveMCTS for search
→ Inspired by the macro- and micro-management task decomposition
→ 47 parameter govern the behaviour of ParaMCTS, tuned manually
→ NaïveMCTS enhancement: Inactive player-action pruning (previous study)
11
Groups Heuristics Parameters
Harvesters <Harvest> maxU, buildMode, pf,
…
Offense <Attack> maxU, offMode,
maxTargets, pf, …
Defense <Defend> maxU, defMode,
defPerimeter, pf, …
Structures <Train> maxU, trainMode, …
Groups Heuristics Parameters
Front-Line <Front-Line Tactics> maxU, waitDuration,
…
Back <Back Tactics> waitDuration, …
Phase-1 (𝑥1) Phase-2 (𝑥2)
NaïveMCTS
18. → Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
19. Experiments & Results
→ How can MCTS benefit from the downsized decision space?
→ Should we increasing the playout duration, the maximum search depth, or both? By how much?
→ How does the performance of ParaMCTS compare to state-of-the-art agents?
→ Experiments setting:
→ Computation budget: 100𝑚𝑠 per game cycle, Maps: basesWorkers 8 × 8, 16 × 16, 32 × 32
→ Tested maximum search depths: {10, 15, 20, 30, 50}. Tested playout durations: {100, 150, 200, 300, 500}
12
→ A lightweight, AI research-focused RTS simulator
→ Open source, written in Java by Santiago Ontañón
→ Includes a forward model and many baseline agents
→ Subject of a yearly AI competition as part of IEEE CoG
Testbed: µRTS (or microRTS)
20. Experiments & Results
→ Experiments 1: Two 120 iteration round-robin tournaments
1) Between ParaMCTS variants with a fixed playout duration (100 cycles) and different max search depths
2) Between ParaMCTS variants with a fixed max search depth (10) and different playout duration
→ Total matches: 4800 in each map. Score = Wins + Draws / 2, normalized.
→ Results:
13
21. Experiments & Results
→ Experiment 2: Maximum search depth and playout duration combinations
→ 100 match between each ParaMCTS(search depth, playout duration) variant and MixedBot
→ Sides switched after 50 matches. ParaMCTS implements a similar strategy to MixedBot
→ Total matches: 2500 in each map
→ Results:
14
22. Experiments & Results
→ Experiment 3: Vs. state-of-the-art.
→ 100 iteration round-robin tournament
→ Participants:
→ ParaMCTS
→ MixedBot
→ Izanagi
→ Droplet
→ NaïveMCTS*
→ NaïveMCTS
→ Total Matches: 3000 in each map
→ 11.9 to 19.1 overall margin
15
Top ranking agents from
2019’s µRTS competition
Same hyperparameters as
ParaMCTS
Using best hyperparameters
23. → Introduction
→ RTS Games & AI
→ Monte Carlo Tree Search
→ Parametric Action Pre-Selection
→ Experiments & Results
→ Conclusion & Future Work
Overview
24. Conclusion & Future Work
→ Parametric action pre-selection describes a general action/state abstraction framework,
applicable to any game with similar characteristics to RTS games
→ Using heuristics instead of scripts grants greater flexibility
→ A proposed implementation, ParaMCTS, significantly outperformed state-of-the-art
agents, using manually tuned parameters
→ Recovered computation budget is better used for deeper search
16
Future Work
→ ParaMCTS parameter optimization for different objectives (maps, opponents, …)
→ Dynamic parameter adaptation through RL
→ Heuristic/partitioning discovery
→ Difficulty adjustment given adequate heuristics and parameters