Lightnews — Scholar-powered news

Raphaël Avalos

@raphael.avalos.fr

540 followers 290 following 11 posts

Fine-tuning LLMs @Cohere | PhD Candidate on RL @VUB

Posts Replies Media Videos

Raphaël Avalos

@raphael.avalos.fr

I’m not sure about 1, but you could look into results on belief MDPs.
For 2, consider an environment with two rooms where the agent needs to press different buttons to get the optimal reward. If there's a cheap way to determine which room the agent is in, that would be the optimal policy :)

November 25, 2024 at 11:00 PM

Raphaël Avalos

@raphael.avalos.fr

Bsky’s strength lies in being open-source and federated. This enables anyone to host servers, set moderation policies, create custom feeds, while avoiding incentives for allowing bots to survive. It’s a tough challenge, but there’s hope!

November 25, 2024 at 10:47 PM

Raphaël Avalos

@raphael.avalos.fr

IMO, UCB favors exploration, not information-seeking, as it adds an exploration bonus rather than aiming to reduce state uncertainty. However, effective exploration can uncover policies where gathering information leads to better outcomes.
Hope that helps!

November 25, 2024 at 9:25 PM

Raphaël Avalos

@raphael.avalos.fr

I am down !

November 25, 2024 at 6:45 AM

Raphaël Avalos

@raphael.avalos.fr

Would love to be added!

November 22, 2024 at 11:16 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news