Mehul Damani

Hello! I am a third year Ph.D. student at MIT advised by Jacob Andreas.

My research interests lie at the intersection of reinforcement learning (RL) and large language models (LLMs). I am very excited by the potential of RL to improve reasoning, math, coding, and other capabilities in LLMs.Currently, I am thinking about how RL can be used to improve calibration and reduce hallucinations in LLMs. Finally, I have also been thinking about the paradigm of inference-time compute, and how optimally selecting inference-time techniques can significantly improve the efficiency of LLMs.

Previously, I worked with Lerrel Pinto at NYU on developing automatic curriculum learning methods for RL agents. Before that, I was a part of the MARMot Lab at NUS, where I worked with Guillaume Sartoretti on applying multi-agent reinforcement learning to traffic signal control and multi-agent pathfinding.

I’m always excited to explore new research directions and am open to collaborating or advising students. If you are interested in my research or simply want to chat, don’t hesitate to get in touch!

News

Jul 23, 2025	New paper! We trained reasoning models to reason about their uncertainty using RL!
Jun 1, 2025	Started internship at MIT-IBM Watson Lab to work on RL for tool-use.
Apr 30, 2025	Our paper on test-time training was accepted to ICML!

Selected Publications

pre-print

Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty

Damani, Mehul, Puri, Isha, Slocum, Stewart, Shenfeld, Idan, Choshen, Leshem, Kim, Yoon, and Andreas, Jacob

2025

arXiv
ICML

The Surprising Effectiveness of Test-Time Training for Abstract Reasoning

Akyürek, Ekin, Damani, Mehul, Qiu, Linlu, Guo, Han, Kim, Yoon, and Andreas, Jacob

International Conference on Machine Learning 2025

arXiv
ICLR

Learning How Hard to Think: Input-Adaptive Allocation of LM Computation

Damani, Mehul, Shenfeld, Idan, Peng, Andi, Bobu, Andreea, and Andreas, Jacob

International Conference on Learning Representations 2025

arXiv
TMLR

Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback

Casper, Stephen, Davies, Xander, Shi, Claudia, Gilbert, Thomas Krendl, Scheurer, Jeremey, Rando, Javier, Freedman, Rachel, Korbak, Tomasz, Lindner, David, Freire, Pedro, Wang, Tony Tong, Marks, Samuel, Segerie, Charbel-Raphael, Carroll, Micah, Peng, Andi, Christoffersen, Phillip, Damani, Mehul, Slocum, Stewart, Anwar, Usman, Siththaranjan, Anand, Nadeau, Max, Michaud, Eric J, Pfau, Jacob, Krasheninnikov, Dmitrii, Chen, Xin, Langosco, Lauro, Hase, Peter, Biyik, Erdem, Dragan, Anca, Krueger, David, Sadigh, Dorsa, and Hadfield-Menell, Dylan

Transactions on Machine Learning Research 2023

arXiv
IEEE-RAL, ICRA

PRIMAL2: Pathfinding Via Reinforcement and Imitation Multi-Agent Learning - Lifelong

Damani, Mehul, Luo, Zhiyao, Wenzel, Emerson, and Sartoretti, Guillaume

IEEE Robotics and Automation Letters 2021

arXiv PDF Code