publications
2024
- pre-printThe Surprising Effectiveness of Test-Time Training for Abstract ReasoningAkyürek, Ekin, Damani, Mehul, Qiu, Linlu, Guo, Han, Kim, Yoon, and Andreas, Jacob2024
- pre-printLearning How Hard to Think: Input-Adaptive Allocation of LM ComputationDamani, Mehul, Shenfeld, Idan, Peng, Andi, Bobu, Andreea, and Andreas, JacobarXiv preprint arXiv:2410.04707 2024
- AAMASFormal contracts mitigate social dilemmas in multi-agent reinforcement learningHaupt, Andreas, Christoffersen, Phillip, Damani, Mehul, and Hadfield-Menell, DylanAutonomous Agents and Multi-Agent Systems 2024
2023
- NeurIPSMitigating Generative Agent Social DilemmasYocum, Julian, Christoffersen, Phillip, Damani, Mehul, Svegliato, Justin, Hadfield-Menell, Dylan, and Russell, StuartIn NeurIPS 2023 Foundation Models for Decision Making Workshop 2023
- TMLROpen Problems and Fundamental Limitations of Reinforcement Learning from Human FeedbackCasper, Stephen, Davies, Xander, Shi, Claudia, Gilbert, Thomas Krendl, Scheurer, Jeremey, Rando, Javier, Freedman, Rachel, Korbak, Tomasz, Lindner, David, Freire, Pedro, Wang, Tony Tong, Marks, Samuel, Segerie, Charbel-Raphael, Carroll, Micah, Peng, Andi, Christoffersen, Phillip, Damani, Mehul, Slocum, Stewart, Anwar, Usman, Siththaranjan, Anand, Nadeau, Max, Michaud, Eric J, Pfau, Jacob, Krasheninnikov, Dmitrii, Chen, Xin, Langosco, Lauro, Hase, Peter, Biyik, Erdem, Dragan, Anca, Krueger, David, Sadigh, Dorsa, and Hadfield-Menell, DylanTransactions on Machine Learning Research 2023
- AAMASSocialLight: Distributed Cooperation Learning towards Network-Wide Traffic Signal ControlGoel, Harsh, Zhang, Yifeng, Damani, Mehul, and Sartoretti, GuillaumeIn Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems 2023
Many recent works have turned to multi-agent reinforcement learning (MARL) for adaptive traffic signal control to optimize the travel time of vehicles over large urban networks. However, achieving effective and scalable cooperation among junctions (agents) remains an open challenge, as existing methods often rely on extensive, non-generalizable reward shaping or on non-scalable centralized learning. To address these problems, we propose a new MARL method for traffic signal control, SocialLight, which learns cooperative traffic control policies by distributedly estimating the individual marginal contribution of agents on their local neighborhood. SocialLight relies on the Asynchronous Actor Critic (A3C) framework, and makes learning scalable by learning a locally-centralized critic conditioned over the states and actions of neighboring agents, used by agents to estimate individual contributions by counterfactual reasoning. We further introduce important modifications to the advantage calculation that help stabilize policy updates. These modifications decouple the impact of the neighbors' actions on the computed advantages, thereby reducing the variance in the gradient updates. We benchmark our trained network against state-of-the-art traffic signal control methods on standard benchmarks in two traffic simulators, SUMO and CityFlow. Our results show that SocialLight exhibits improved scalability to larger road networks and better performance across usual traffic metrics.
2022
- SpringerDistributed Reinforcement Learning for Robot Teams: a ReviewWang, Yutong, Damani, Mehul, Wang, Pamela, Cao, Yuhong, and Sartoretti, GuillaumeCurrent Robotics Reports 2022
Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation.
- AAMASMulti-Agent Traffic Signal Control via Distributed RL with Spatial and Temporal Feature ExtractionZhang, Yifeng, Damani, Mehul, and Sartoretti, GuillaumeIn International Workshop on Agent-Based Modelling of Urban Systems (ABMUS) 2022
2021
- NeurIPSFlatland Competition 2020: MAPF and MARL for Efficient Train Coordination on a Grid WorldLaurent, Florian, Schneider, Manuel, Scheller, Christian, Watson, Jeremy, Li, Jiaoyang, Chen, Zhe, Zheng, Yi, Chan, Shao-Hung, Makhnev, Konstantin, Svidchenko, Oleg, Egorov, Vladimir, Ivanov, Dmitry, Shpilman, Aleksei, Spirovska, Evgenija, Tanevski, Oliver, Nikov, Aleksandar, Grunder, Ramon, Galevski, David, Mitrovski, Jakov, Sartoretti, Guillaume, Luo, Zhiyao, Damani, Mehul, Bhattacharya, Nilabha, Agarwal, Shivam, Egli, Adrian, Nygren, Erik, and Mohanty, SharadaIn Proceedings of the NeurIPS 2020 Competition and Demonstration Track 2021
The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing complexity of modern railway networks makes dynamic real-time scheduling of traffic virtually impossible. Recently, multi-agent reinforcement learning (MARL) has successfully tackled challenging tasks where many agents need to be coordinated, such as multiplayer video games. However, the coordination of hundreds of agents in a real-life setting like a railway network remains challenging and the Flatland environment used for the competition models these real-world properties in a simplified manner. Submissions had to bring as many trains (agents) to their target stations in as little time as possible. While the best submissions were in the OR category, participants found many promising MARL approaches. Using both centralized and decentralized learning based approaches, top submissions used graph representations of the environment to construct tree-based observations. Further, different coordination mechanisms were implemented, such as communication and prioritization between agents. This paper presents the competition setup, four outstanding solutions to the competition, and a cross-comparison between them.