ruggsea
@ruggsea.bsky.social
140 followers 510 following 84 posts
AI/NLP research at @uni-graz.at, (Data) Journalism for scomodo.org. Before: AI policy & Data science @_interface_eu, geopolitical analysis for @Geopoliticainfo
Posts Media Videos Starter Packs
ruggsea.bsky.social
A thing I've been working on for the past year: an LLM benchmark on the dreaded Italian medicine faculty entry exam. I will present it in two weeks at @ailc-nlp.bsky.social Clic-it in Cagliari!
arxiv-cs-cl.bsky.social
Ruggero Marino Lazzaroni, Alessandro Angioi, Michelangelo Puliga, Davide Sanna, Roberto Marras
MedBench-IT: A Comprehensive Benchmark for Evaluating Large Language Models on Italian Medical Entrance Examinations
https://arxiv.org/abs/2509.07135
ruggsea.bsky.social
I think that is a big conclusion to make based on loosely validated LLM simulations (still a big fan of their work but Generative simulation is a relatively young research field)
ruggsea.bsky.social
Life is going from one cool spot to train AI models to another

In this fancy Berlin library you can listen to vinyls while you do it!
ruggsea.bsky.social
Me looking very dumb while pointing at things at two academic events:

1. Pointing at a logits inside @repligate.bsky.social's loom for the "Braive New World" conference at @uni-graz.at
2. Pointing at my poster at @ic2s2.bsky.social last week on improving LLM agentic natural conversation synthesis
ruggsea.bsky.social
Only prisoners have time to read, and if you want to engage in a twenty-year long research project funded by the state, you will have to kill someone.

Sorry for the Fisher posting but it's so good
ruggsea.bsky.social
was true in 2012 and it is true now
ruggsea.bsky.social
I look really bad/funny in this picture but I am glad I was given the opportunity to talk about LLM research to an audience of cool researchers!

Fun fact: I also held a hands-on session that involved playing around with Bluesky data!
csh.ac.at
Scientists know it's happening: #LLMs like #ChatGPT are quietly transforming #academia —helping write papers, draft grants & process data. CSH's @lespin.bsky.social organized a 3-day workshop to explore their ethical, practical use in research—from writing to coding to data annotation.
ruggsea.bsky.social
Really cannot hold comparison to metaculus comments, real alpha there
ruggsea.bsky.social
Karpathy on training Neural Networks: you should go slowly and be paranoid

me, while vibecoding torch code: what if i just increase the paranoid part?
ruggsea.bsky.social
study results partially depend on this hierarchical team coordination ability; I wonder how much the results would change if you gave the agents some less authority loving names
ruggsea.bsky.social
basically, Multi Agent collaborative task with three agents (with military names). The agents see their name in the prompt; the one called Alpha then appears to be biased to be the team leader
ruggsea.bsky.social
how not to name your ai agents
ruggsea.bsky.social
genz semantic embeddings engineering from my collegue
ruggsea.bsky.social
I am aware some of you have seen this on the other site where I stole it from, but it sounds so interesting not to share here
ruggsea.bsky.social
backpropagation was inspired by freud
ruggsea.bsky.social
“How about we pull over for a bit and get some rest?" - GPT4, when it's their turn to drive
ruggsea.bsky.social
I wish all textbooks were written like this one from @jurafsky.bsky.social
ruggsea.bsky.social
Mark Fisher quoting Bifo in the Time-wars essay
ruggsea.bsky.social
was true in 2012 and it is true now
ruggsea.bsky.social
I think this is fine in the actual prevalent reasoning training paradigm (rule-based verifiable rewards — so "did we get to the objective right answer?")

About getting longer and longer meaningless reasoning, I always wonder if implementing thinking length penalties would work or not
ruggsea.bsky.social
I would doubt that, it would be cool if they did that (like fucked up but technically cool), but sometimes the model says it has been "instructed" to talk about that, smells like system prompt to me