Alexander Hoyle
@alexanderhoyle.bsky.social
2.3K followers 290 following 200 posts
Postdoctoral fellow at ETH AI Center, working on Computational Social Science + NLP. Previously a PhD in CS at UMD, advised by Philip Resnik. Internships at MSR, AI2. he/him alexanderhoyle.com
Posts Media Videos Starter Packs
alexanderhoyle.bsky.social
I would like the full slide deck!
Reposted by Alexander Hoyle
manoelhortaribeiro.bsky.social
Computer Science is no longer just about building systems or proving theorems--it's about observation and experiments.

In my latest blog post, I argue it’s time we had our own "Econometrics," a discipline devoted to empirical rigor.

doomscrollingbabel.manoel.xyz/p/the-missin...
alexanderhoyle.bsky.social
really like this post! I feel that ML/NLP is an often empirical field that fails to adopt the practices of one—and being in an econ group for my postdoc, I’ve also noticed the big gulf in rigor. curious to see how your course shapes up (you might be interested in this paper arxiv.org/abs/2411.10939)
alexanderhoyle.bsky.social
Circular meaning: if you model bargaining behavior using personas with different risk profiles and recover behavior that mirrors those risk profiles, what have you learned?
alexanderhoyle.bsky.social
Nice share I missed before! I've been thinking of an similar argument along these lines. We lack a good mathematical model of psychology undergraduates in a lab, so to imagine we can simulate whole populations via prompting seems like wishful thinking (not to mention potentially circular)
alexanderhoyle.bsky.social
You're doing terrific work. In fact, I'll be referencing your "Using ChatGPT is not bad for the environment" research in a talk tomorrow
alexanderhoyle.bsky.social
Was about to share this with you @nbalepur.bsky.social until I realized it was your paper :P
alexanderhoyle.bsky.social
Thanks ! also, we should have a zoom catchup soon !
alexanderhoyle.bsky.social
Accepted to EMNLP (and more to come 👀)! The camera ready version is now online---very happy with how this turned out

arxiv.org/abs/2507.01234
alexanderhoyle.bsky.social
New preprint! Have you ever tried to cluster text embeddings from different sources, but the clusters just reproduce the sources? Or attempted to retrieve similar documents across multiple languages, and even multilingual embeddings return items in the same language?

Turns out there's an easy fix🧵
Barchart of number of items in four clusters of text embeddings, with colors showing the distribution of sources in each cluster.

Caption: Clustering text embeddings from disparate sources (here, U.S. congressional bill summaries and senators’ tweets) can produce clusters where one source dominates (Panel A). Using linear erasure to remove the source information produces more evenly balanced clusters that maintain semantic coherence (Panel B; sampled items relate to immigration). Four random clusters of k-means shown (k=25), trained on a combined 5,000 samples from each dataset
alexanderhoyle.bsky.social
You might be interested in this blog post, which discusses these issues in depth: andymasley.substack.com/p/individual...
alexanderhoyle.bsky.social
This is terrific advice (that I'd do better to take into account myself, too). Whenever I've read more deeply into a subject/theory, it ends up paying dividends
alexanderhoyle.bsky.social
hot take: I think the one at Kebab aur Sharab on the UWS is better (although i didn’t go to the main dishoom location last time I was in London and I think their QC maybe was not up to par?). Also terrific butter chicken
alexanderhoyle.bsky.social
Yes—in fact a couple years ago I’d been chatting with your now-colleague Maria Pacheco about a paper with the premise “the methods are new but the the pitfalls remain” re: CSS/TADA research“
alexanderhoyle.bsky.social
this looks terrific, very excited to read
emollick.bsky.social
LLMs introduce a huge range of new capabilities for research, but also make it possible for researchers to "hack" their results in new ways by how they chose to use models for annotation

This is a useful pass at quantifying some of the risk, and some mitigation strategies arxiv.org/pdf/2509.08825
alexanderhoyle.bsky.social
Spoiler alert!! I’m currently reading A Place of Greater Safety
alexanderhoyle.bsky.social
You swap between these regularly? Or is there a way to consolidate ? Staying on top of my half dozen active slacks is already too much
alexanderhoyle.bsky.social
a friend bought these a size too large and I love them . Can wear to work or (as I just did) on a long haul flight
alexanderhoyle.bsky.social
While writing my dissertation, I came to realize we perhaps presented an overly neat view of QCA in the second paper (eg, that inter-annotator agreement is an unalloyed good) and failed to characterize the full extent of practices/debates. But I still think an unstable model is not desirable :)
alexanderhoyle.bsky.social
Yep, unfortunately that can be an issue---if you need support, these all show BERTopic underperforming (apologies for the blatant self-promotion):

aclanthology.org/2025.acl-lon...
aclanthology.org/2025.acl-lon...
aclanthology.org/2024.naacl-l...
aclanthology.org/2024.eacl-lo...
alexanderhoyle.bsky.social
One piece of advice (which applies to all topic models) is to vary hyperparameters (like the underlying sentence embedding model; min cluster size) and see if the outputs differ
alexanderhoyle.bsky.social
Yeah, we've found BERTopic is empirically worse than bog-standard LDA (Mallet/Tomotopy) across several different evaluation settings. It also is less stable

It can be useful if you have multilingual data, although other options like CTM may be preferable. Is there a reason you need to use BERTopic?
Reposted by Alexander Hoyle
dallascard.bsky.social
I am delighted to share our new #PNAS paper, with @grvkamath.bsky.social @msonderegger.bsky.social and @sivareddyg.bsky.social, on whether age matters for the adoption of new meanings. That is, as words change meaning, does the rate of adoption vary across generations? www.pnas.org/doi/epdf/10....