讀書寫作: Beat the Slots in Pokémon Using Reinforcement Learning

12/11/2020

Beat the Slots in Pokémon Using Reinforcement Learning

Daniel Saunders, How I Beat the Slots in Pokémon Using Reinforcement Learning, towardsdatascience, 2020/12/10. (Python code)

Given a set of possible actions (“arms” of a multi-armed bandit — in this case different machines to try), Thompson sampling optimally trades off exploration vs exploitation to find the best action, by trying the promising actions more often, and so getting a more detailed estimate of their reward probabilities. At the same time, it’s still randomly suggesting the others from time to time, in case it turns out one of them is the best after all. At each step, the knowledge of the system, in the form of posterior probability distributions, is updated using Bayesian logic. The simplest version of the one-armed bandit problem involves Bernoulli trials, where there are only two possible outcomes, reward or no reward, and we are trying to determine which action has the highest probability of reward.

This is a nice article and it should be interesting to teach this example to motivate the students.

讀書寫作

12/11/2020

Beat the Slots in Pokémon Using Reinforcement Learning

沒有留言:

張貼留言

搜尋此網誌

標籤

熱門文章

網誌存檔