Two simple examples to optimize reward functions (transformer based) for RL of a fleet of taxis in New York (learning from their environment interactions) and Reinforcement Learning (RL multi-agents) for swarm intelligence of 100 drones exploring Jupiter's stormy atmosphere.
Open Problems and Fundamental Limitations of
Reinforcement Learning from Human Feedback
https://arxiv.org/pdf/2307.15217.pdf
#ai
#reinforcementlearning
#datascience
1 Comments