Posts
Media
Videos
Starter Packs
Reposted
Abhishek Sharma
@abhishekshar.bsky.social
· Jan 23
Decision-Point Guided Safe Policy Improvement
Within batch reinforcement learning, safe policy improvement (SPI) seeks to ensure that the learnt policy performs at least as well as the behavior policy that generated the dataset. The core challeng...
arxiv.org