# Difference between revisions of "Reinforcement Learning"

Jump to navigation
Jump to search

(25 intermediate revisions by the same user not shown) | |||

Line 1: | Line 1: | ||

* [https://gym.openai.com/ OpenGym AI] | * [https://gym.openai.com/ OpenGym AI] | ||

+ | * [https://rise.cs.berkeley.edu/blog/scaling-multi-agent-rl-with-rllib/ RLLib] [https://ray.readthedocs.io/en/latest/rllib.html docs] | ||

==Multi-Armed Bandit Examples== | ==Multi-Armed Bandit Examples== | ||

Line 5: | Line 6: | ||

* [https://www.spotx.tv/resources/blog/developer-blog/introduction-to-multi-armed-bandits-with-applications-in-digital-advertising/ Digital Advertising] (Epsilon-greedy and Thompson sampling) | * [https://www.spotx.tv/resources/blog/developer-blog/introduction-to-multi-armed-bandits-with-applications-in-digital-advertising/ Digital Advertising] (Epsilon-greedy and Thompson sampling) | ||

+ | |||

+ | ==Image Ranking== | ||

+ | * [https://medium.com/idealo-tech-blog/using-deep-learning-to-automatically-rank-millions-of-hotel-images-c7e2d2e5cae2 Hotel Image Ranking] (asthetic & technical quality of images) | ||

+ | |||

+ | ==Multi-Agent Learning== | ||

+ | * Stochastic games, Nash-Q, Gradient Ascent, WOLF, and Mean-field Q learning, particle swarm intelligence, Ant Colony Optimization (Colorni et al., 1991) | ||

+ | * [https://towardsdatascience.com/smart-incentives-and-game-theory-in-decentralized-multi-agent-reinforcement-learning-systems-58442e508378 Game Theory in Smart Decentralised multi-agent RL] | ||

+ | * As above: It involves multi-agent reinforcement learning to compute the Nash equilibrium and Bayesian optimization to compute the optimal incentive, within a simulated environment. In the Prowler architecture, uses both MARL and Bayesian optimization in very clever ensemble to optimize the incentives in the network of agents. MARL is used to simulate the agents’ actions and produce the Nash equilibrium behavior by the agents for a given choice of parameter by the meta-agent. Bayesian optimization is used to select the parameters of the game that lead to more desirable outcomes. Bayesian optimizations find the best model based on randomness, which matches the dynamics of the system. | ||

+ | |||

+ | ==Extra== | ||

===Git Repos=== | ===Git Repos=== | ||

* [https://github.com/bgalbraith/bandits, basic] (softmax, UCB, epsilon-greedy) | * [https://github.com/bgalbraith/bandits, basic] (softmax, UCB, epsilon-greedy) | ||

− | * [https://github.com/david-cortes/contextualbandits, intermediate] (more algorithms) | + | * [https://github.com/david-cortes/contextualbandits, intermediate] (more algorithms, contextual bandits) |

+ | * [https://github.com/idealo/image-quality-assessment/blob/master/data/TID2013/get_labels.py MobileNet] (Rank Hotels, extending MobileNet Architecture) | ||

+ | * [https://github.com/google/dopamine Google Dopamine] (Dopamine is a research framework for fast prototyping of reinforcement learning algorithms). | ||

+ | * [https://github.com/deepmind/trfl/blob/master/docs/index.md TRFL Reinforcement Learning] | ||

+ | * [https://github.com/facebookresearch/ELF Facebook ELF Research RL] | ||

+ | * [https://github.com/tensorflow/agents TF-Agents] (TF-Agents is a library for Reinforcement Learning in TensorFlow) | ||

+ | * [https://github.com/kengz/SLM-Lab SLM-Lab] (Modular Deep Reinforcement Learning framework in PyTorch) | ||

+ | * [https://github.com/samshipengs/Coordinated-Multi-Agent-Imitation-Learning Coordinated-Multi-Agent-Imitation-Learning] | ||

+ | |||

+ | ===Literature=== | ||

+ | * [https://arxiv.org/pdf/0805.3415.pdf On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems] | ||

+ | * [https://arxiv.org/pdf/1706.06978.pdf Zhou et al. 2018] (Alibaba Group, Deep Interest Network, Click Through Rate Prediction) | ||

+ | * [https://medium.com/@vermashresth/a-primer-on-deep-reinforcement-learning-frameworks-part-1-6c9ab6a0f555 RL Frameworks] | ||

+ | * [https://arxiv.org/pdf/1802.09756.pdf Real Time Bidding] (Distributed Coordinated Multi-agent reinforcement learning) | ||

+ | [https://chemoinformatician.co.uk/images/RTB_multi-agent.png RTB image] | ||

+ | * [https://rise.cs.berkeley.edu/blog/scaling-multi-agent-rl-with-rllib/ Berkeley Multi-agent RL Scaling OpenSource] | ||

+ | * [https://arxiv.org/pdf/1901.10923.pdf?source=your_stories_page--------------------------- Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems Multi-agent RL 2019] | ||

+ | * [https://arxiv.org/pdf/1902.01554 Kim et al 2019] (Learning to Schedule Communication in Multi-agent Reinforcement Learning) |

## Latest revision as of 17:48, 10 July 2019

## Contents

## Multi-Armed Bandit Examples

- Click Through Rate: Random, UCB
- Digital Advertising (Epsilon-greedy and Thompson sampling)

## Image Ranking

- Hotel Image Ranking (asthetic & technical quality of images)

## Multi-Agent Learning

- Stochastic games, Nash-Q, Gradient Ascent, WOLF, and Mean-field Q learning, particle swarm intelligence, Ant Colony Optimization (Colorni et al., 1991)
- Game Theory in Smart Decentralised multi-agent RL
- As above: It involves multi-agent reinforcement learning to compute the Nash equilibrium and Bayesian optimization to compute the optimal incentive, within a simulated environment. In the Prowler architecture, uses both MARL and Bayesian optimization in very clever ensemble to optimize the incentives in the network of agents. MARL is used to simulate the agents’ actions and produce the Nash equilibrium behavior by the agents for a given choice of parameter by the meta-agent. Bayesian optimization is used to select the parameters of the game that lead to more desirable outcomes. Bayesian optimizations find the best model based on randomness, which matches the dynamics of the system.

## Extra

### Git Repos

- basic (softmax, UCB, epsilon-greedy)
- intermediate (more algorithms, contextual bandits)
- MobileNet (Rank Hotels, extending MobileNet Architecture)
- Google Dopamine (Dopamine is a research framework for fast prototyping of reinforcement learning algorithms).
- TRFL Reinforcement Learning
- Facebook ELF Research RL
- TF-Agents (TF-Agents is a library for Reinforcement Learning in TensorFlow)
- SLM-Lab (Modular Deep Reinforcement Learning framework in PyTorch)
- Coordinated-Multi-Agent-Imitation-Learning

### Literature

- On Upper-Confidence Bound Policies for Non-Stationary Bandit Problems
- Zhou et al. 2018 (Alibaba Group, Deep Interest Network, Click Through Rate Prediction)
- RL Frameworks
- Real Time Bidding (Distributed Coordinated Multi-agent reinforcement learning)

- Berkeley Multi-agent RL Scaling OpenSource
- Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems Multi-agent RL 2019
- Kim et al 2019 (Learning to Schedule Communication in Multi-agent Reinforcement Learning)