David Smith
dasmiq.bsky.social
David Smith
@dasmiq.bsky.social
Associate professor of computer science at Northeastern University. Natural language processing, digital humanities, OCR, computational bibliography, and computational social sciences. Artificial intelligence is an archival science.
Reposted by David Smith
I had the absolute pleasure to visit @craicexeter.bsky.social, where I laid out an argument for how critical & computational scholars should lead the conversation on AI. We need to expand research on harms, interrogate corporate hype, and support people’s critical understanding these technologies
January 22, 2026 at 4:32 PM
Reposted by David Smith
I couldn't figure out eloquent language to describe digital agents that assist with web navigation tasks, so I just wrote "click click" and if I keep this up maybe I will start referring to language generation as "word word"
January 22, 2026 at 5:27 PM
Reposted by David Smith
A very random view into how some people* outside of tech think about and use chatbots. It's not coding, that's for sure, and some of it might sound ridiculous, but I think this kind of perspective and usage is way more common than we might assume.

*LA people (sorry, I love LA, but this is very LA)
Hi Honey, I’m Homo Neuricus
Six Ways I'm using AI to Become More Human
sissychacon.substack.com
January 22, 2026 at 5:11 PM
Reposted by David Smith
The second new class I'm teaching is a very experimental graduate level seminar in CSE: "Building Small Language Models". I taught the grad level NLP class last semester (so fun!) but students wanted more—which of these new ideas work, and which work for SLMs? jurgens.people.si.umich.edu/CSE598-004/
CSE 598-004 - Building Small Language Models
jurgens.people.si.umich.edu
January 19, 2026 at 9:29 PM
Reposted by David Smith
For social scientists interested in LLMs for text classification/coding, the process here is potentially very helpful (even if you don't use the product itself).

Their core technique: Contradictory Example Training
Their training method: Binocular Labeling

More details in the linked post below.
We just published the methodology behind CoPE, our 9B parameter model that matches GPT-4o at content classification at 1% the size! The model is already open source, but now we're sharing our training technique. blog.zentropi.ai/how-we-built... 🧵 1/6
How we built CoPE
We just published the methodology behind CoPE. This is the model that powers Zentropi, and we think the approach might be useful for others working on policy-steerable classification systems. We had ...
blog.zentropi.ai
January 15, 2026 at 7:37 PM
Reposted by David Smith
The first research paper from WashU's AI Humanities Lab, which I co-direct with Gabi Kirilloff, is available now in the Harvard Data Science Review! Read to learn more about how (badly) current LLMs are at replicating literary style: doi.org/10.1162/9960...
‘Written in the Style of’: ChatGPT and the Literary Canon
doi.org
January 10, 2026 at 9:14 PM
Reposted by David Smith
… that the paper defines “AI” very expansively, including many kinds of analysis that, for most of the data, is not what we now think of as “AI”. Like logistic regression, PCA, LDA, and KNN methods. 🤨 So I feel just a little baited-and switched.
January 14, 2026 at 11:39 PM
Reposted by David Smith
Chat about this paper naturally focuses on the headline about AI being high “impact” but also narrowing science; a nice something-for-everyone (including the haters) story. Looking at Fig 2, you might ask “How can they have 30 yrs of impact data for AI?” Well, … www.nature.com/articles/s41...
January 14, 2026 at 11:39 PM
Reposted by David Smith
📢 New article in #JCLS 5(1)! 🎉
@axelpichler.fedihum.org.ap.brid.gy, Endres, M. & @nilsreiter.de (2026) “#Interpretation, Argument, #Evaluation. A Workflow for Assessing #LLM-Generated Interpretations of #Poetrydoi.org/10.48694/jcl...

#RollingIssue #NLG #CLS #LiteraryComputing
January 14, 2026 at 10:12 PM
Reposted by David Smith
yes, this is a really great paper, showing how AI can enhance individual science but narrow the general scope.
January 14, 2026 at 10:08 PM
Reposted by David Smith
But if this is the case, why are models acting so differently between languages?
Datasets like eclektic show that models know different things in different languages. A rare fact is usually only known in the language in which it was seen.
bsky.app/profile/lcho...
🚀 "Multilingual" LLMs are really just clusters of monolingual ones!
They might know a Brazilian 🇧🇷 brewery—but only in Portuguese
With ECLEKTIC, you can now test this. The challenge? Making them truly multilingual.
alphaxiv.org/pdf/2502.21228 📈🤖
🧵⬇️
#LLM #ai #genAI #nlp
January 14, 2026 at 4:52 PM
Reposted by David Smith
Do you have ideas for the future of reading?

Submit a 2-4 page paper to the CHI workshop I am co-organising! (deadline Feb 12) “Science and Technology for Augmenting Reading"

chi-star-workshop.github.io
January 12, 2026 at 3:46 AM
Reading environments for classical languages FTW
I can't read Chinese, but my family has old genealogy documents I've always wanted to understand. Claude and Gemini helped me build an interactive reader to explore the calligraphy character by character.

I can finally read my great-grandfather's epitaph. Try it:
davidbau.com/archives/202...
January 12, 2026 at 3:30 AM
Reposted by David Smith
Excited about this Duke AI conference + stoked to present new work on cultural AI. Grateful this high profile conference will include humanistic perspectives. Meaning, history, aesthetics, narrative etc are a part of the society centered AI question. Glad the humanities will be a part of the convo.
Join us for the 2nd annual Conference on Society-Centered AI at Duke University (Feb 12-14th). Last year’s event drew over 700 people from 50+ companies and 20 universities to discuss topics ranging from AI safety to alignment to the impact of AI systems. Register here: sites.duke.edu/scai/
Main - #SCAI2026
Conference on Society Centered AI February 12 -14, 2026 Duke University, Durham, NC https://youtu.be/m9CGFovLZGQ The #SCAI2026: Conference on Society-Centered AI (previously Responsible AI…
sites.duke.edu
January 7, 2026 at 5:32 PM
Reposted by David Smith
✨The NLP+CSS workshop is returning to ACL 2026!✨

And this year, we have a new shared task with prizes!

Website/CfP: sites.google.com/site/nlpandc...
Deadlines: March 5 (direct), March 24 (pre-reviewed ARR)

#NLProc #CompSocialSci #ComputationalSocialScience #ACL2026NLP
@aclmeeting.bsky.social
NLP+CSS Workshops
https://www.pexels.com/photo/group-hand-fist-bump-1068523/
sites.google.com
December 18, 2025 at 12:38 PM
Reposted by David Smith
The world must boycott the World Cup and the Olympics.

It is both the only moral choice and will actually get the attention of these dead-eyed clout demons.
January 3, 2026 at 7:22 AM
Reposted by David Smith
also, 21 years on from my last comp-lit course: I finally read Lord's Singer of Tales (in digital ed., h/t @dasmiq.bsky.social for the link on a syllabus of his).
https://bookwyrm.social/user/agoldst/comment/9290902#anchor-9290902
December 26, 2025 at 4:32 PM
Reposted by David Smith
I'm very excited about our new work which aims to model causes and effects on stories online! Narratives and stories are everywhere, so it's helpful to be able to understand how people use them in nuanced ways.
Reading social media stories evokes a wide range of contextual reader reactions—inferential, affective, evaluative—yet we lack methods to study these at scale.

Excited to share our new paper that builds a framework for analyzing storytelling practices across online communities!
December 22, 2025 at 9:20 AM
Reposted by David Smith
The next edition of the NLP+CSS will be at ACL 2026! It includes an open-ended shared task (work with the Opioid Industry Documents Archive) with travel grants as prizes!
✨The NLP+CSS workshop is returning to ACL 2026!✨

And this year, we have a new shared task with prizes!

Website/CfP: sites.google.com/site/nlpandc...
Deadlines: March 5 (direct), March 24 (pre-reviewed ARR)

#NLProc #CompSocialSci #ComputationalSocialScience #ACL2026NLP
@aclmeeting.bsky.social
NLP+CSS Workshops
https://www.pexels.com/photo/group-hand-fist-bump-1068523/
sites.google.com
December 18, 2025 at 9:41 PM
Reposted by David Smith
WE HAVE A BYTE MODEL THAT DOESN'T SUCK
Introducing Bolmo, a new family of byte-level language models built by "byteifying" our open Olmo 3—and to our knowledge, the first fully open byte-level LM to match or surpass SOTA subword models across a wide range of tasks. 🧵
December 15, 2025 at 5:19 PM
Reposted by David Smith
We‘ve seen huge improvements thanks to improvements in scaling and data curation, which admittedly were hard to build scientific careers on. But there’s been no revolutionary shift in methodology since the victory of neural machine translation with attention over ngram models ~2015.
December 13, 2025 at 2:59 PM
Reposted by David Smith
LLMs didn’t move language modeling research from linguists to AI people, they just moved it from computer scientists who thought language was interesting to computer scientists who thought language was boring
December 12, 2025 at 7:38 PM
Reposted by David Smith
Excited to get this work out in the world at #chr2025 (with Sabrina Baur, Mackenzie Cramer, Anna Ho and Tom McEnaney) -- asking: how much do contemporary songs tell stories, and how has that changed over the past half century?

anthology.ach.org/volumes/vol0...
Measuring the Stories in Contemporary Songs
anthology.ach.org
December 12, 2025 at 1:09 PM