# Difference between revisions of "Reinforcement Learning"

Jump to navigation
Jump to search

Line 11: | Line 11: | ||

==Multi-Agent Learning== | ==Multi-Agent Learning== | ||

* Stochastic games, Nash-Q, Gradient Ascent, WOLF, and Mean-field Q learning, particle swarm intelligence, Ant Colony Optimization (Colorni et al., 1991) | * Stochastic games, Nash-Q, Gradient Ascent, WOLF, and Mean-field Q learning, particle swarm intelligence, Ant Colony Optimization (Colorni et al., 1991) | ||

− | + | * [https://towardsdatascience.com/smart-incentives-and-game-theory-in-decentralized-multi-agent-reinforcement-learning-systems-58442e508378 Game Theory in Smart Decentralised multi-agent RL] | |

==Extra== | ==Extra== | ||

## Revision as of 15:29, 10 July 2019

## Contents

## Multi-Armed Bandit Examples

- Click Through Rate: Random, UCB
- Digital Advertising (Epsilon-greedy and Thompson sampling)

## Image Ranking

- Hotel Image Ranking (asthetic & technical quality of images)

## Multi-Agent Learning

- Stochastic games, Nash-Q, Gradient Ascent, WOLF, and Mean-field Q learning, particle swarm intelligence, Ant Colony Optimization (Colorni et al., 1991)
- Game Theory in Smart Decentralised multi-agent RL

## Extra

### Git Repos

- basic (softmax, UCB, epsilon-greedy)
- intermediate (more algorithms, contextual bandits)
- MobileNet (Rank Hotels, extending MobileNet Architecture)
- Google Dopamine (Dopamine is a research framework for fast prototyping of reinforcement learning algorithms).
- TRFL Reinforcement Learning
- Facebook ELF Research RL
- TF-Agents (TF-Agents is a library for Reinforcement Learning in TensorFlow)
- SLM-Lab (Modular Deep Reinforcement Learning framework in PyTorch)
- Coordinated-Multi-Agent-Imitation-Learning

### Literature

- On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
- Zhou et al. 2018 (Alibaba Group, Deep Interest Network, Click Through Rate Prediction)
- RL Frameworks
- Real Time Bidding (Distributed Coordinated Multi-agent reinforcement learning)