Difference between revisions of "Reinforcement Learning"

From Wiki2
Jump to navigation Jump to search
Line 13: Line 13:
 
* [https://github.com/bgalbraith/bandits, basic] (softmax, UCB, epsilon-greedy)
 
* [https://github.com/bgalbraith/bandits, basic] (softmax, UCB, epsilon-greedy)
 
* [https://github.com/david-cortes/contextualbandits, intermediate] (more algorithms, contextual bandits)
 
* [https://github.com/david-cortes/contextualbandits, intermediate] (more algorithms, contextual bandits)
 
+
* [https://github.com/idealo/image-quality-assessment/blob/master/data/TID2013/get_labels.py MobileNet] (Rank Hotels, extending MobileNet Architecture)
  
 
===Literature===
 
===Literature===
 
* [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems]
 
* [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems]
 
* [https://arxiv.org/pdf/1706.06978.pdf Zhou et al. 2018] (Alibaba Group, Deep Interest Network, Click Through Rate Prediction)
 
* [https://arxiv.org/pdf/1706.06978.pdf Zhou et al. 2018] (Alibaba Group, Deep Interest Network, Click Through Rate Prediction)

Revision as of 08:45, 10 July 2019

Multi-Armed Bandit Examples


Image Ranking


Git Repos

  • basic (softmax, UCB, epsilon-greedy)
  • intermediate (more algorithms, contextual bandits)
  • MobileNet (Rank Hotels, extending MobileNet Architecture)

Literature