Mehul Damani | publications

2025

pre-print

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Damani, Mehul, Puri, Isha, Slocum, Stewart, Shenfeld, Idan, Choshen, Leshem, Kim, Yoon, and Andreas, Jacob

2025

arXiv
ICML

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Akyürek, Ekin, Damani, Mehul, Qiu, Linlu, Guo, Han, Kim, Yoon, and Andreas, Jacob

International Conference on Machine Learning 2025

arXiv
ICLR

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Damani, Mehul, Shenfeld, Idan, Peng, Andi, Bobu, Andreea, and Andreas, Jacob

International Conference on Learning Representations 2025

arXiv

2024

AAMAS

Formal contracts mitigate social dilemmas in multi-agent reinforcement learning

Haupt, Andreas, Christoffersen, Phillip, Damani, Mehul, and Hadfield-Menell, Dylan

Autonomous Agents and Multi-Agent Systems 2024

2023

NeurIPS

Mitigating Generative Agent Social Dilemmas

Yocum, Julian, Christoffersen, Phillip, Damani, Mehul, Svegliato, Justin, Hadfield-Menell, Dylan, and Russell, Stuart

In NeurIPS 2023 Foundation Models for Decision Making Workshop 2023
TMLR

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Casper, Stephen, Davies, Xander, Shi, Claudia, Gilbert, Thomas Krendl, Scheurer, Jeremey, Rando, Javier, Freedman, Rachel, Korbak, Tomasz, Lindner, David, Freire, Pedro, Wang, Tony Tong, Marks, Samuel, Segerie, Charbel-Raphael, Carroll, Micah, Peng, Andi, Christoffersen, Phillip, Damani, Mehul, Slocum, Stewart, Anwar, Usman, Siththaranjan, Anand, Nadeau, Max, Michaud, Eric J, Pfau, Jacob, Krasheninnikov, Dmitrii, Chen, Xin, Langosco, Lauro, Hase, Peter, Biyik, Erdem, Dragan, Anca, Krueger, David, Sadigh, Dorsa, and Hadfield-Menell, Dylan

Transactions on Machine Learning Research 2023

arXiv
AAMAS

SocialLight: Distributed Cooperation Learning towards Network-Wide Traffic Signal Control

Goel, Harsh, Zhang, Yifeng, Damani, Mehul, and Sartoretti, Guillaume

In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems 2023

Abs

Many recent works have turned to multi-agent reinforcement learning (MARL) for adaptive traffic signal control to optimize the travel time of vehicles over large urban networks. However, achieving effective and scalable cooperation among junctions (agents) remains an open challenge, as existing methods often rely on extensive, non-generalizable reward shaping or on non-scalable centralized learning. To address these problems, we propose a new MARL method for traffic signal control, SocialLight, which learns cooperative traffic control policies by distributedly estimating the individual marginal contribution of agents on their local neighborhood. SocialLight relies on the Asynchronous Actor Critic (A3C) framework, and makes learning scalable by learning a locally-centralized critic conditioned over the states and actions of neighboring agents, used by agents to estimate individual contributions by counterfactual reasoning. We further introduce important modifications to the advantage calculation that help stabilize policy updates. These modifications decouple the impact of the neighbors' actions on the computed advantages, thereby reducing the variance in the gradient updates. We benchmark our trained network against state-of-the-art traffic signal control methods on standard benchmarks in two traffic simulators, SUMO and CityFlow. Our results show that SocialLight exhibits improved scalability to larger road networks and better performance across usual traffic metrics.

2022

Springer

Distributed Reinforcement Learning for Robot Teams: a Review

Wang, Yutong, Damani, Mehul, Wang, Pamela, Cao, Yuhong, and Sartoretti, Guillaume

Current Robotics Reports 2022

Abs

Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation.
AAMAS

Multi-Agent Traffic Signal Control via Distributed RL with Spatial and Temporal Feature Extraction

Zhang, Yifeng, Damani, Mehul, and Sartoretti, Guillaume

In International Workshop on Agent-Based Modelling of Urban Systems (ABMUS) 2022

2021

NeurIPS

Flatland Competition 2020: MAPF and MARL for Efficient Train Coordination on a Grid World

Laurent, Florian, Schneider, Manuel, Scheller, Christian, Watson, Jeremy, Li, Jiaoyang, Chen, Zhe, Zheng, Yi, Chan, Shao-Hung, Makhnev, Konstantin, Svidchenko, Oleg, Egorov, Vladimir, Ivanov, Dmitry, Shpilman, Aleksei, Spirovska, Evgenija, Tanevski, Oliver, Nikov, Aleksandar, Grunder, Ramon, Galevski, David, Mitrovski, Jakov, Sartoretti, Guillaume, Luo, Zhiyao, Damani, Mehul, Bhattacharya, Nilabha, Agarwal, Shivam, Egli, Adrian, Nygren, Erik, and Mohanty, Sharada

In Proceedings of the NeurIPS 2020 Competition and Demonstration Track 2021

Abs PDF

The Flatland competition aimed at finding novel approaches to solve the vehicle re-scheduling problem (VRSP). The VRSP is concerned with scheduling trips in traffic networks and the re-scheduling of vehicles when disruptions occur, for example the breakdown of a vehicle. While solving the VRSP in various settings has been an active area in operations research (OR) for decades, the ever-growing complexity of modern railway networks makes dynamic real-time scheduling of traffic virtually impossible. Recently, multi-agent reinforcement learning (MARL) has successfully tackled challenging tasks where many agents need to be coordinated, such as multiplayer video games. However, the coordination of hundreds of agents in a real-life setting like a railway network remains challenging and the Flatland environment used for the competition models these real-world properties in a simplified manner. Submissions had to bring as many trains (agents) to their target stations in as little time as possible. While the best submissions were in the OR category, participants found many promising MARL approaches. Using both centralized and decentralized learning based approaches, top submissions used graph representations of the environment to construct tree-based observations. Further, different coordination mechanisms were implemented, such as communication and prioritization between agents. This paper presents the competition setup, four outstanding solutions to the competition, and a cross-comparison between them.
IEEE-RAL, ICRA

PRIMAL2: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning - Lifelong

Damani, Mehul, Luo, Zhiyao, Wenzel, Emerson, and Sartoretti, Guillaume

IEEE Robotics and Automation Letters 2021

arXiv PDF Code