From Alpha Go to Alpha Zero - Vaas Madrid 2018

•

0 likes•167 views

Juantomás García Molina

From Alpha Go to Alpha Zero

Technology

Juantomás García
• Data Solutions Manager @ OpenSistemas
• GDE (Google Developer Expert) for cloud
Others
• Co-Author of the first Spanish free software book “La Pastilla
Roja”
• President of Hispalinux (Spanish Linux User Group)
• Organizer of the Machine Learning Spain and GDG Cloud Madrid.
Who I am

• People interested in Machine Learning
• Wants to know more about what’s is Alpha Go
• With a good technical background.
Who are the Audience

• I love Machine Learning.
• There are a lot of takeaways from this project.
• I wish to divulge it
Why I did this presentation

• Alpha Go: the epic project
• AlphaGo Zero: re-evolution version
• Alpha Zero: Looking for general solutions
• DIY: Alpha Zero Connect 4
• Takeaways
Outline

A brief introduction
• Deep Blue was about brute force
• They were emulating how humans play chess

A brief introduction
• A very huge Search Space
Chess -> Opening 20 possible moves
Go -> Opening 361 possible moves

Alpha Go Main Concepts
• Policy Neural Network
“To decide which are the most sensible moves in
a particular board position”.

Alpha Go Main Concepts
• Value Neural Network
“How great is a particular board arrangements”.
“How likely you are to win the game with this
position”.

Alpha Go First Approach: SL
• Just train both networks using human games.
• Just old and ordinary supervised learning.
• With this: AlphaGo just play with like a weak
human.
• It like the approach of deep blue: just emulating
human chess players

Alpha Go Second Approach: RL
• Improve SL version starting playing again itself.
• With Reinforcement Learning is able to play well
against state of the art go playing programs
• These programs are using MCTS

Alpha Go Second Approach: RL
• It is not 2 NN vs Monte Carlo Tree Search
• Is a better MCTS thanks to the NNs.

Alpha Go Second Approach: RL
• Optimal Value Function V*(s)
“Determine the outcome of the game from every
board position (s is the state)”.
Brute force solution is impossible:
Chess: 35 ** 80
Go: 250 ** 150

Alpha Go Second Approach: RL
• Two solutions for reduce the effective search
space:
Truncate the tree subtree search: V(s) like V*(s)
Reducing the breadth of the search with the
policy: P(a|s)
We MCTS rollout the moves choose by the policy
function and evaluate with the optimal value
function.

AlphaGo Zero: Re-Evolution version
• Just trained with Reinforcement Learning
• Choose the less out different moves: u(s,a)
• Just one neural network for policy and value.
• Every time a search is done the neural network is
retrained.

AlphaGo Zero: Re-Evolution version
• Human games was noisy and not reliable.
• Don’t use rollouts for predict who will win.

Alpha Zero: New Challenges
AlphaGo Zero VS AlphaZero:
• Binary outcome (win / loss) × expected outcome
(including
• 3 draws or potentially other outcomes)
• Board positions transformed before passing to neural
networks (by randomly selected rotation or redirection) × no
data augmentation
• Games generated by the best player from previous iterations
(margin of 55 %) × continual update using the latest
parameters (without the evaluation and selection steps)
• Hyper-parameters tuned by Bayesian optimisation × reused
the same hyper-parameters without game-specific tuning

Alpha Zero: DYI
https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188

Takeaways
RL is more than Atari Games and GO

Takeaways
AI discovery new ways to play.
Think about new projects like proteins fold.

Takeaways
We’re living awesome times.
Sharing AI papers, tools, models, etc. More
than any time before.

Takeaways
As Ms Fei Fei said: “It’s about democratizing AI”

Takeaways
Watch this Documentary Film about Alpha Go:

Similar to From Alpha Go to Alpha Zero - Vaas Madrid 2018

J-Fall 2017 - AI Self-learning Game PlayingRichard Abbuhl

Gameskalavathisugan

How DeepMind Mastered The Game Of GoTim Riser

Devoxx 2017 - AI Self-learning Game PlayingRichard Abbuhl

Games.4Praveen Kumar

AlphaGo: An AI Go player based on deep neural networks and monte carlo tree s...Michael Jongho Moon

Implementation and analysis of search algorithms in single player connect fou...Anmol Rajpurohit

AlphaGo zeroDong Guo

Alpha go 16110226_김영우영우 김

chess-algorithms-theory-and-practice_ver2017.pdfrajdipdas12

AlphaZeroKarel Ha

TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...Seldon

[1312.5602] Playing Atari with Deep Reinforcement LearningSeung Jae Lee

IaGo: an Othello AI inspired by AlphaGoShion Honda

Deep learning to the rescue - solving long standing problems of recommender ...Balázs Hidasi

21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCEudayvanand

Mastering the game of go with deep neural networks and tree searchSanFengChang

Foundations: Artificial Neural Networksananth

GamePlaying.pptVihaanN2

SDEC2011 Mahout - the what, the how and the whyKorea Sdec

Similar to From Alpha Go to Alpha Zero - Vaas Madrid 2018 (20)

J-Fall 2017 - AI Self-learning Game Playing

Games

How DeepMind Mastered The Game Of Go

Devoxx 2017 - AI Self-learning Game Playing

Games.4

AlphaGo: An AI Go player based on deep neural networks and monte carlo tree s...

Implementation and analysis of search algorithms in single player connect fou...

AlphaGo zero

Alpha go 16110226_김영우

chess-algorithms-theory-and-practice_ver2017.pdf

AlphaZero

TensorFlow London 11: Pierre Harvey Richemond 'Trends and Developments in Rei...

[1312.5602] Playing Atari with Deep Reinforcement Learning

IaGo: an Othello AI inspired by AlphaGo

Deep learning to the rescue - solving long standing problems of recommender ...

21CSC206T_UNIT3.pptx.pdf ARITIFICIAL INTELLIGENCE

Mastering the game of go with deep neural networks and tree search

Foundations: Artificial Neural Networks

GamePlaying.ppt

SDEC2011 Mahout - the what, the how and the why

Recently uploaded

Pigging Solutions in Pet Food ManufacturingPigging Solutions

Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Vulnerability_Management_GRC_by Sohang Sengupta.pptxnull - The Open Security Community

Artificial intelligence in the post-deep learning eraDeakin University

costume and set research powerpoint presentationphoebematthew05

CloudStudio User manual (basic edition):comworks

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada

Bluetooth Controlled Car with Arduino.pdfngoud9212

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Build your next Gen AI Breakthrough - April 2024Neo4j

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

AI as an Interface for Commercial BuildingsMemoori

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

Recently uploaded (20)

Pigging Solutions in Pet Food Manufacturing

Snow Chain-Integrated Tire for a Safe Drive on Winter Roads

SQL Database Design For Developers at php[tek] 2024

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Vulnerability_Management_GRC_by Sohang Sengupta.pptx

Artificial intelligence in the post-deep learning era

costume and set research powerpoint presentation

CloudStudio User manual (basic edition):

Pigging Solutions Piggable Sweeping Elbows

Science&tech:THE INFORMATION AGE STS.pdf

New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024

Bluetooth Controlled Car with Arduino.pdf

Understanding the Laravel MVC Architecture

Build your next Gen AI Breakthrough - April 2024

Streamlining Python Development: A Guide to a Modern Project Setup

AI as an Interface for Commercial Buildings

Advanced Test Driven-Development @ php[tek] 2024

#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

Enhancing Worker Digital Experience: A Hands-on Workshop for Partners

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

From Alpha Go to Alpha Zero - Vaas Madrid 2018

1. From Alpha Go to Alpha Zero Vaas May 2018

2. Juantomás García • Data Solutions Manager @ OpenSistemas • GDE (Google Developer Expert) for cloud Others • Co-Author of the first Spanish free software book “La Pastilla Roja” • President of Hispalinux (Spanish Linux User Group) • Organizer of the Machine Learning Spain and GDG Cloud Madrid. Who I am

3. • People interested in Machine Learning • Wants to know more about what’s is Alpha Go • With a good technical background. Who are the Audience

4. • I love Machine Learning. • There are a lot of takeaways from this project. • I wish to divulge it Why I did this presentation

5. • Alpha Go: the epic project • AlphaGo Zero: re-evolution version • Alpha Zero: Looking for general solutions • DIY: Alpha Zero Connect 4 • Takeaways Outline

6. A brief introduction • Deep Blue was about brute force • They were emulating how humans play chess

7. A brief introduction • A very huge Search Space Chess -> Opening 20 possible moves Go -> Opening 361 possible moves

8. Alpha Go Main Concepts • Policy Neural Network “To decide which are the most sensible moves in a particular board position”.

9. Alpha Go Main Concepts • Value Neural Network “How great is a particular board arrangements”. “How likely you are to win the game with this position”.

10. Alpha Go Main Concepts

11. Alpha Go First Approach: SL • Just train both networks using human games. • Just old and ordinary supervised learning. • With this: AlphaGo just play with like a weak human. • It like the approach of deep blue: just emulating human chess players

12. Alpha Go First Approach: SL

13. Alpha Go Second Approach: RL • Improve SL version starting playing again itself. • With Reinforcement Learning is able to play well against state of the art go playing programs • These programs are using MCTS

14. Alpha Go Second Approach: RL

15. Alpha Go Second Approach: RL • It is not 2 NN vs Monte Carlo Tree Search • Is a better MCTS thanks to the NNs.

16. Alpha Go Second Approach: RL • Optimal Value Function V*(s) “Determine the outcome of the game from every board position (s is the state)”. Brute force solution is impossible: Chess: 35 ** 80 Go: 250 ** 150

17. Alpha Go Second Approach: RL • Two solutions for reduce the effective search space: Truncate the tree subtree search: V(s) like V*(s) Reducing the breadth of the search with the policy: P(a|s) We MCTS rollout the moves choose by the policy function and evaluate with the optimal value function.

18. AlphaGo: The Match

19. AlphaGo Zero: Re-Evolution version • Just trained with Reinforcement Learning • Choose the less out different moves: u(s,a) • Just one neural network for policy and value. • Every time a search is done the neural network is retrained.

20. AlphaGo Zero: Re-Evolution version • Human games was noisy and not reliable. • Don’t use rollouts for predict who will win.

21. AlphaGo Zero: Re-Evolution version

22. AlphaGo Zero: Re-Evolution version

23. Alpha Zero: New Challenges AlphaGo Zero VS AlphaZero: • Binary outcome (win / loss) × expected outcome (including • 3 draws or potentially other outcomes) • Board positions transformed before passing to neural networks (by randomly selected rotation or redirection) × no data augmentation • Games generated by the best player from previous iterations (margin of 55 %) × continual update using the latest parameters (without the evaluation and selection steps) • Hyper-parameters tuned by Bayesian optimisation × reused the same hyper-parameters without game-specific tuning

24. Alpha Zero

25. Alpha Zero: DYI https://medium.com/applied-data-science/how-to-build-your-own-alphazero-ai-using-python-and-keras-7f664945c188

26. Takeaways RL is more than Atari Games and GO

27. Takeaways AI discovery new ways to play. Think about new projects like proteins fold.

28. Takeaways We’re living awesome times. Sharing AI papers, tools, models, etc. More than any time before.

29. Takeaways As Ms Fei Fei said: “It’s about democratizing AI”

30. Takeaways Watch this Documentary Film about Alpha Go:

31. Thank You