Mehul Damani

Hello! I am a fourth year Ph.D. student at MIT advised by Jacob Andreas.

My research interests lie at the intersection of RL and LLMs, where I actively think about how RL can be used to drive improvements in a wide range of LLM capabilities. Currently, I’m exploring methods that leverage the general-purpose abilities of LLMs to augment or improve standard learning algorithms. In recent work, I worked on using RL to improve calibration and reduce hallucinations in LLMs.

Previously, I worked with Lerrel Pinto at NYU on developing automatic curriculum learning methods for RL agents. Before that, I was a part of the MARMot Lab at NUS, where I worked with Guillaume Sartoretti on applying multi-agent reinforcement learning to traffic signal control and multi-agent pathfinding.

I’m always excited to explore new research directions and am open to collaborating. If you are interested in my research or simply want to chat, don’t hesitate to get in touch!

News

Jul 23, 2025 New paper! We trained reasoning models to reason about their uncertainty using RL!
Jun 1, 2025 Started internship at MIT-IBM Watson Lab to work on RL for tool-use.
Apr 30, 2025 Our paper on test-time training was accepted to ICML!

Selected Publications

  1. Thumbnail for Self-Distillation Enables Continual Learning
    Self-Distillation Enables Continual Learning
    ICLR 2026 Workshop on Lifelong Agents (LLA)
    Idan Shenfeld, Mehul Damani, Jonas Hubotter, and Pulkit Agrawal
  2. Thumbnail for Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
    Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty
    ICLR, 2026
    Mehul Damani, Isha Puri, Stewart Slocum, Idan Shenfeld, Leshem Choshen, Yoon Kim, and Jacob Andreas
  3. Thumbnail for The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
    The Surprising Effectiveness of Test-Time Training for Few-Shot Learning
    ICML, 2025
    Ekin Akyürek, Mehul Damani, Adam Zweiger, Linlu Qiu, Han Guo, Jyo Pari, Yoon Kim, and Jacob Andreas
  4. Thumbnail for Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
    Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
    ICLR, 2025
    Mehul Damani, Idan Shenfeld, Andi Peng, Andreea Bobu, and Jacob Andreas