12/11/2020

Beat the Slots in Pokémon Using Reinforcement Learning

 Daniel Saunders, How I Beat the Slots in Pokémon Using Reinforcement Learning, towardsdatascience, 2020/12/10. (Python code)

Given a set of possible actions (“arms” of a multi-armed bandit — in this case different machines to try), Thompson sampling optimally trades off exploration vs exploitation to find the best action, by trying the promising actions more often, and so getting a more detailed estimate of their reward probabilities. At the same time, it’s still randomly suggesting the others from time to time, in case it turns out one of them is the best after all. At each step, the knowledge of the system, in the form of posterior probability distributions, is updated using Bayesian logic. The simplest version of the one-armed bandit problem involves Bernoulli trials, where there are only two possible outcomes, reward or no reward, and we are trying to determine which action has the highest probability of reward.

This is a nice article and it should be interesting to teach this example to motivate the students. 


沒有留言:

張貼留言