Nicolas Yax
@nicolasyax.bsky.social
65 followers 81 following 30 posts
PhD student working on the cognition of LLMs | HRL team - ENS Ulm | FLOWERS - Inria Bordeaux
Posts Media Videos Starter Packs
Pinned
nicolasyax.bsky.social
🔥Our paper PhyloLM got accepted at ICLR 2025 !🔥
In this work we show how easy it can be to infer relationship between LLMs by constructing trees and to predict their performances and behavior at a very low cost with @stepalminteri.bsky.social and @pyoudeyer.bsky.social ! Here is a brief recap ⬇️
Reposted by Nicolas Yax
stepalminteri.bsky.social
New (revised) preprint with @thecharleywu.bsky.social
We rethink how to assess machine consciousness: not by code or circuitry, but by behavioral inference—as in cognitive science.
Extraordinary claims still need extraordinary evidence.
👉 osf.io/preprints/ps...
#AI #Consciousness #LLM
nicolasyax.bsky.social
Curious about LLM interpretability and understanding ? We borrowed concepts from genetics to map language models, predict their capabilities, and even uncovered surprising insights about their training !

Come see my poster at #ICLR2025 3pm Hall 2B #505 !
nicolasyax.bsky.social
In short, PhyloLM is a cheap and versatile algorithm that generates useful representations for LLMs that can have creative applications in pratice. 9/10
paper : arxiv.org/abs/2404.04671
colab : colab.research.google.com/drive/1agNE5...
code : github.com/Nicolas-Yax/...
ICLR : Saturday 3pm Poster 505
nicolasyax.bsky.social
A PhyloLM collaborative Huggingface space is available to try the algorithm and visualize maps : huggingface.co/spaces/nyax/... The Model Submit button has been temporarily suspended for technical reasons but it should be back very soon ! 8/10
PhyloLM - a Hugging Face Space by nyax
This app allows you to explore and compare language models through various visualizations, including similarity matrices, 2D scatter plots, and tree diagrams. You can search for models by name, adj...
huggingface.co
nicolasyax.bsky.social
By using code related contexts we can obtain a fairly different map. For example we notice that Qwen and GPT-3.5 have a very different way of coding compared to the other models which was not visible on the reasoning map. 7/10
nicolasyax.bsky.social
The contexts choice is important as it reflects different capabilities of LLMs. Here on a general reasoning type of context we can plot a map of models using UMAP. The larger the edge, the closer models are from each other. Models on the same cluster are even closer ! 6/10
nicolasyax.bsky.social
It can also measure quantization efficiency by observing the behavioral distance between LLM and quantized versions. In the Qwen 1.5 release, GPTQ seems to perform best. This new concept of metric could provide additional insights to quantization efficiency. 5/10
nicolasyax.bsky.social
Aside from plotting trees, PhyloLM similarity matrix is very versatile. For example, running a logistic regression on the distance matrix makes it possible to predict performance of new models even from unseen families with good accuracy. Here is what we got on ARC. 4/10
nicolasyax.bsky.social
Not taking into account these requirements can still produce efficient distance vizualisation trees. However it is important to remember they do not represent evolutionary trees. Feel free to zoom in to see model names. 3/10
nicolasyax.bsky.social
Phylogenetic algorithms often require common ancestors to not appear in the objects studied but are clearly able to retrieve the evolution of the family. Here is an example in the richness of open-access model : @teknium.bsky.social @maximelabonne.bsky.social @mistralai.bsky.social 2/10
nicolasyax.bsky.social
We build a distance matrix from comparing outputs of LLMs to a hundred of different contexts and build maps and trees from this distance matrix. Because PhyloLM only requires sampling very few tokens after a very short contexts the algorithm is particularly cheap to run. 1/10
nicolasyax.bsky.social
🔥Our paper PhyloLM got accepted at ICLR 2025 !🔥
In this work we show how easy it can be to infer relationship between LLMs by constructing trees and to predict their performances and behavior at a very low cost with @stepalminteri.bsky.social and @pyoudeyer.bsky.social ! Here is a brief recap ⬇️
Reposted by Nicolas Yax
cartathomas.bsky.social
🚀 Introducing 🧭MAGELLAN—our new metacognitive framework for LLM agents! It predicts its own learning progress (LP) in vast natural language goal spaces, enabling efficient exploration of complex domains.🌍✨Learn more: 🔗 arxiv.org/abs/2502.07709 #OpenEndedLearning #LLM #RL
MAGELLAN: Metacognitive predictions of learning progress guide...
Open-ended learning agents must efficiently prioritize goals in vast possibility spaces, focusing on those that maximize learning progress (LP). When such autotelic exploration is achieved by LLM...
arxiv.org
Reposted by Nicolas Yax
ccolas.bsky.social
we are recruiting interns for a few projects with @pyoudeyer
in bordeaux
> studying llm-mediated cultural evolution with @nisioti_eleni
@Jeremy__Perez

> balancing exploration and exploitation with autotelic rl with @ClementRomac

details and links in 🧵
please share!
Reposted by Nicolas Yax
hamongautier.bsky.social
Putting some Flow Lenia here too
Reposted by Nicolas Yax
hamongautier.bsky.social
1/⚡️Looking for a fast and simple Transformer baseline for your RL environment in JAX ?
Sharing my implementation of transformerXL-PPO: github.com/Reytuag/tran...
The implementation is the first to attain the 3rd floor and obtain advanced achievements in the challenging Craftax
nicolasyax.bsky.social
It is part of a research agenda to open the LLM black box and provide tools for researchers to better interact with models in a more transparent manner. The last paper in this agenda was PhyloLM proposing methods to investigate the phylogeny of LLMs arxiv.org/abs/2404.04671 15/15
arxiv.org
nicolasyax.bsky.social
This method was first introduced in our paper Studying and improving reasoning in humans and machines investigating the evolution of cognitive biases in language models. www.nature.com/articles/s44... 14/15
www.nature.com
nicolasyax.bsky.social
As such, LogProber can be a useful tool to check contamination in language models at a very low cost (one forward pass) given some very high level assumptions about the training method (that are very often verified in practice). 13/15
nicolasyax.bsky.social
Lastly, the A scenario is more common in instruction finetuning scenarios. In open access models finetuning databases are often shared making it possible to check directly if the item is found in the training set which is rarely the case for pretraining databases. 12/15