Difference between revisions of "Reinforcement Learning"

From Wiki2
Jump to navigation Jump to search
Line 18: Line 18:
 
* [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems]
 
* [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems]
 
* [https://arxiv.org/pdf/1706.06978.pdf Zhou et al. 2018] (Alibaba Group, Deep Interest Network, Click Through Rate Prediction)
 
* [https://arxiv.org/pdf/1706.06978.pdf Zhou et al. 2018] (Alibaba Group, Deep Interest Network, Click Through Rate Prediction)
 +
* [https://medium.com/@vermashresth/a-primer-on-deep-reinforcement-learning-frameworks-part-1-6c9ab6a0f555 RL Frameworks]

Revision as of 09:09, 10 July 2019

Multi-Armed Bandit Examples


Image Ranking


Git Repos

  • basic (softmax, UCB, epsilon-greedy)
  • intermediate (more algorithms, contextual bandits)
  • MobileNet (Rank Hotels, extending MobileNet Architecture)

Literature