Abhishek Sharma
@abhishekshar.bsky.social
43 followers 200 following 7 posts
CS PhD @Harvard w/ Finale Doshi-Velez | Research in {Reinforcement Learning | Healthcare | Representation Learning} 🌐 https://abhishekshar.com/
Posts Media Videos Starter Packs
abhishekshar.bsky.social
Our paper: Decision-Point Guided Safe Policy Improvement
We show that a simple approach to learn safe RL policies can outperform most offline RL methods. (+theoretical guarantees!)

How? Just allow the state-actions that have been seen enough times! 🤯

arxiv.org/abs/2410.09361
Decision-Point Guided Safe Policy Improvement
Within batch reinforcement learning, safe policy improvement (SPI) seeks to ensure that the learnt policy performs at least as well as the behavior policy that generated the dataset. The core challeng...
arxiv.org
abhishekshar.bsky.social
Going to my first ever AISTATS in Thailand this year! 🎉🌴
abhishekshar.bsky.social
Wow this is amazing! Thanks for sharing!
Reposted by Abhishek Sharma
rl-conference.bsky.social
The call for papers for RLC is now up! Abstract deadline of 2/14, submission deadline of 2/21!
Please help us spread the word.
rl-conference.cc/callforpaper...
RLJ | RLC Call for Papers
rl-conference.cc
Reposted by Abhishek Sharma
kempnerinstitute.bsky.social
NEW: we have an exciting opportunity for a tenure-track professor at the #KempnerInstitute and the John A. Paulson School of Engineering and Applied Sciences (SEAS). Read the full description & apply today: academicpositions.harvard.edu/postings/14362
#ML #AI
abhishekshar.bsky.social
I was today years old when I realized that when people use log-sum-exp instead of the softmax function in the soft-Bellman operator, they are probably going for the mellowmax function!
abhishekshar.bsky.social
How are public comments a thing?? @iclr-conf.bsky.social

I can see why they can help get feedback from the crowds, but then why even have a double-blind process?
abhishekshar.bsky.social
The notes are great! Thank you!
abhishekshar.bsky.social
Would be cool to be included! (work on RL in healthcare)..