Difference between revisions of "Reinforcement Learning"

From Wiki2
Jump to navigation Jump to search
Line 9: Line 9:
 
* [https://github.com/bgalbraith/bandits, basic] (softmax, UCB, epsilon-greedy)
 
* [https://github.com/bgalbraith/bandits, basic] (softmax, UCB, epsilon-greedy)
 
* [https://github.com/david-cortes/contextualbandits, intermediate] (more algorithms, contextual bandits)
 
* [https://github.com/david-cortes/contextualbandits, intermediate] (more algorithms, contextual bandits)
 +
 +
 +
===Literature===
 +
* [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems]

Revision as of 17:25, 9 July 2019

Multi-Armed Bandit Examples


Git Repos

  • basic (softmax, UCB, epsilon-greedy)
  • intermediate (more algorithms, contextual bandits)


Literature