environment agent reward interaction policy valeur d action valeur d etat value fonction value fonction internal state learning programmimg optimisation strategie non supervise supervise deep learning profond apprentissage exploit discover explorartion exploitation slides from cmu deep imitation learning lecture asynchronous reinforcement policy gradient methods value function reward signal neural networks and deep learn finite markov decision process introduction to reinforcement neural network artificial intelligence
See more