Difference between revisions of "Reinforcement Learning"

From Wiki2
Jump to navigation Jump to search
Line 13: Line 13:
 
===Literature===
 
===Literature===
 
* [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems]
 
* [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems]
 +
* [https://arxiv.org/pdf/1706.06978.pdf Zhou et al. 2018] (Alibaba Group, Deep Interest Network, Click Through Rate Prediction)

Revision as of 07:17, 10 July 2019

Multi-Armed Bandit Examples


Git Repos

  • basic (softmax, UCB, epsilon-greedy)
  • intermediate (more algorithms, contextual bandits)


Literature