An RL algorithm inspired by trad retrieval that trains agents to more effectively use lists of documents in context for better multi-hop {QA, agentic tasks, and more}!
A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve
👨💻pip install tale-suite
A benchmark of a few hundred text envs: science experiments and embodied cooking to solving murder mysteries. We test over 30 of the best LLM agents and pinpoint failure modes +how to improve
👨💻pip install tale-suite