@ohav.bsky.social
3 followers 12 following 8 posts
Posts Media Videos Starter Packs
ohav.bsky.social
Using our method, we see consistent gains across varying difficulty levels of WhoDunitEnv. We apply our method to GovSim, a recent resource sharing environment and show an increase in both survival rates and efficiency of agents.
@giorgiopiatti
ohav.bsky.social
The environment features several difficulty scales and two variants of action separation, to ensure an interesting task for a wide range of agents.
ohav.bsky.social
To evaluate our approach we release WhoDunitEnv, a collaborative environment where agents play the role of detectives🕵️, attempting to point out a culprit out of a suspect lineup. Agents each have different pieces of information about either the suspects or the culprit.
ohav.bsky.social
During communication, we monitor agent uncertainty, and train a classifier that predicts task success.
If our monitor signifies a failure is likely to occur, we intervene by resetting the current communication, and allow the agents another opportunity to discuss.
ohav.bsky.social
In multi-agent systems, we often rely on agents to each contribute their part to solve a task. But sometimes agents make mistakes. Those mistakes can spread through the communication channel, infecting other agents and causing a complete failure!
ohav.bsky.social
"One bad apple can spoil the bunch 🍎", and that's doubly true for language agents!
Our new paper shows how monitoring and intervention can prevent agents from going rogue, boosting performance by up to 20%. We're also releasing a new multi-agent environment 🕵️‍♂️