Difference between revisions of "Reinforcement Learning"

From Wiki2
Jump to navigation Jump to search
Line 35: Line 35:
 
* [https://rise.cs.berkeley.edu/blog/scaling-multi-agent-rl-with-rllib/ Berkeley Multi-agent RL Scaling OpenSource]
 
* [https://rise.cs.berkeley.edu/blog/scaling-multi-agent-rl-with-rllib/ Berkeley Multi-agent RL Scaling OpenSource]
 
* [https://arxiv.org/pdf/1901.10923.pdf?source=your_stories_page--------------------------- Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems Multi-agent RL 2019]
 
* [https://arxiv.org/pdf/1901.10923.pdf?source=your_stories_page--------------------------- Coordinating the Crowd: Inducing Desirable Equilibria in Non-Cooperative Systems Multi-agent RL 2019]
 +
* [https://arxiv.org/pdf/1902.01554 Kim et al 2019] (Learning to Schedule Communication in Multi-agent Reinforcement Learning)

Revision as of 15:44, 10 July 2019

Multi-Armed Bandit Examples


Image Ranking

Multi-Agent Learning

  • Stochastic games, Nash-Q, Gradient Ascent, WOLF, and Mean-field Q learning, particle swarm intelligence, Ant Colony Optimization (Colorni et al., 1991)
  • Game Theory in Smart Decentralised multi-agent RL
  • As above: It involves multi-agent reinforcement learning to compute the Nash equilibrium and Bayesian optimization to compute the optimal incentive, within a simulated environment. In the Prowler architecture, uses both MARL and Bayesian optimization in very clever ensemble to optimize the incentives in the network of agents. MARL is used to simulate the agents’ actions and produce the Nash equilibrium behavior by the agents for a given choice of parameter by the meta-agent. Bayesian optimization is used to select the parameters of the game that lead to more desirable outcomes. Bayesian optimizations find the best model based on randomness, which matches the dynamics of the system.

Extra

Git Repos

Literature

RTB image