Johannes B. Gruber
@jbgruber.bsky.social
1.3K followers 950 following 410 posts
Senior Researcher @gesis.org // Data Editor @polcommjournal.bsky.social 🔎 political communication (#polsky + #commsky) with text analysis and #rstats (#opendata + #openscience) 🌏 JohannesBGruber.eu 👨‍💻 research software github.com/JBGruber
Posts Media Videos Starter Packs
Pinned
jbgruber.bsky.social
Some big personal/professional news: starting next month, I will be leading a team in the Data Services for the Social Sciences department at @gesis.org (in Cologne)!
Gesis building in Cologne
jbgruber.bsky.social
Academic life hack: check which papers AI hallucinated most often and write them 🚀🚀🚀
rikefranke.bsky.social
And here we go. I never wrote this article, and yet it is cited here.

www.liberalbriefs.com/geopolitics/...

And of course, it sounds so plausible, I seriously checked whether I had forgotten it, or the footnote was slightly wrong.

#AIisnotresearch
Reposted by Johannes B. Gruber
kwelle.bsky.social
Social-Media-Daten zwischen Forschung und Infrastrukturen - nachhaltige Archivierung, Erschließung und Bereitstellung: An der @dnb-aktuelles.bsky.social finden vom 17.-19.03.2026 die Social Media Access Days statt. Wir freuen uns über Einreichungen bis zum 31.10.2025. www.dnb.de/DE/Professio...
Call for Submissions: Social Media Access Days
Call for Submissions: Social Media Access Days
www.dnb.de
Reposted by Johannes B. Gruber
societal-analytics.nl
#AmCAT is proudly developed by the @societal-analytics.nl

You can learn more about it in the:
* Book: amcat.nl/book/
* Blog post: societal-analytics.nl/blogs/202501...
medem.bsky.social
Day 2 of the #MEDemConference at @gesis.org starts with powerful tool demos:
🔍 AmCAT @sof14g1l.bsky.social on enabling large-scale text analysis of media & political debates.
🌐 HarDIS @sziaja.bsky.social on harmonizing and sustaining cross-national democracy data (surveys, parties, experts).
Sofia Gil-Clavel stands at a podium presenting AmCAT at the 3rd MEDem Conference. Behind her, a slide shows the AmCAT team (Kasper Welbers, Wouter van Atteveldt, Johannes Gruber, Sofia Gil-Clavel) with the tagline: “Developed by researchers for researchers, society, and data savvy users.” Logos of MEDem, VU Amsterdam, and the Societal Analytics Lab are displayed at the top. Sebastian Ziaja stands at a podium presenting HarDIS (Harmony in the Democratic Ideological Space) at the 3rd MEDem Conference. A slide behind him shows the HarDIS team (Lea Kaftan, Paul Bederke, Selçuk Timur Uluer) with the logos of MEDem, GESIS Leibniz Institute for the Social Sciences, and OSCARS (the funding initiative).
jbgruber.bsky.social
@sebstier.bsky.social at #MEDem Conf: computational research of democracy stands in the shoulders of the few enthusiasts who create datasets, software and infrastructure for it. How can we move forward? Short answer: more collaboration & sharing!
How to move forward

Collaborate on improving data coverage
Filling gaps in poliitical text orpora
Collecting online platform data via APls, webscraping and the Digital Services Act
Share open-source software, R packages and infrastructure components
Improve conditions for data sharing
Critically evaluate and improve the application of Al
Requires collaboration and funding at the European level
jbgruber.bsky.social
@simonsaysnothin.bsky.social at #MEDem Conf: we need to integrate our efforts instead of researchers all building their own datasets and infrastructure. Couldn't agree more!


Integrate

○ Researchers get lost in building fragmented data
infrastructures (case in point: me).
○ specialized collection is fine, but fragmented
dissemination hinders use.
○ Lack of integration blocks comparative research.

Whot I'd like MEDem to build towards

○ Incentivization of shared infrastructures
○ Toolkits to standardize data collection (measurement and interoperability)
Reposted by Johannes B. Gruber
alissonmasoares.bsky.social
The "validate, validate, validate" (GRIMMER, 2014) principle of Text Analysis/NLP never gets old.
jbgruber.bsky.social
Bluesky is not just a clone of the old Twitter. It's meant to look and feel like it to popularise a version of social media with a fundamental difference to the big platforms: its infrastructure is open.

Nice write up of that background: overreacted.io/open-social/
Open Social — overreacted
The protocol is the API.
overreacted.io
Reposted by Johannes B. Gruber
ulrikeklinger.bsky.social
Wanna know more about #data #access and the Digital Services Act? Here’s our latest policy paper about how it works👇

www.weizenbaum-library.de/items/86842c...

#commsky #polisky #dsa @weizenbauminstitut.bsky.social
Reposted by Johannes B. Gruber
dariia.bsky.social
❗️Our next workshop will be on October 2nd, 6 pm CEST, on Effective and Useful Feature engineering by @emilhvitfeldt.bsky.social

Register or sponsor a student by donating to support Ukraine!
Details: bit.ly/3wBeY4S
Please share!
#AcademicSky #EconSky #RStats
jbgruber.bsky.social
And if it's not about responsibility, why else? The models don't care. OpenAI does not need free advertisement. And whether somebody can write well or not does not factor into the quality of a piece, imo. I'm honestly trying to understand how this transparency would improve anything.
jbgruber.bsky.social
"acknowledging LLM contributions is key to maintaining transparency and ethical standards in academic publishing"

Why though? Acknowledging the use of LLMs only dilutes responsibility. Authors are responsible for everything in an article. And if it's fake/plagiarised, authors are responsible.
jbgruber.bsky.social
Just wanted to share this Google Scholar trick: I often have the problem that I want to find papers using certain computational methods, but specifically in my own field (for lit reviews).

You can do that by limiting the search to certain sources. My (imperfect) collection in the alt text.
"BERT" AND "multilingual" source:"Digital Journalism" OR source:"Digital Journalism" OR source:"International Journal of Press/Politics" OR source:"Journal of Communication" OR source:"New Media and Society" OR source:"Communication Methods and Measures" OR source:"Communication Research" OR source:"Journal of Computer-Mediated Communication" OR source:"Big Data and Society" OR source:"Political Communication" OR source:"Social Media and Society" OR source:"Computational Communication Research"
Reposted by Johannes B. Gruber
jbengler.de
The new ggplot2 4.0.0 now supports absolute plot dimensions 🤩

#rstats #dataviz #phd
library(tidyverse)

mtcars |> 
  head(8) |> 
  rownames_to_column("name") |> 
  ggplot(aes(x = drat, y = name, fill = name)) + 
  geom_col() +
  theme(panel.widths = unit(50, "mm"), panel.heights = unit(50, "mm"))
jbgruber.bsky.social
Works fine for me. Maybe the user agent got blocked after too many tries?
Reposted by Johannes B. Gruber
dprex.bsky.social
Find us Sep 22.-26. at the #DGS2025 Conference, Campus Duisburg.
At the @gesis.org stand we present DP-R|EX – the Data Portal for Right-Wing & Extremism Data.
Let’s talk about sharing data for reuse, data management & hate speech!
👉info: datenportal-rechtsextremismus.de #ResearchData #ExtremismData
Reposted by Johannes B. Gruber
inessadeangelis.bsky.social
Which Canadian MPs are on Bluesky and what do they post?

My new paper w/ @rohanalexander.bsky.social in @cjps-rcsp.bsky.social unpacks these questions, finding MPs
use it like Twitter to discuss policy, the Ottawa bubble & constituency

Read more: doi.org/10.1017/S000...

#polsky #commsky #cdnpoli
What are Canadian Members of Parliament Doing on Bluesky? research note abstract
jbgruber.bsky.social
If you feel uneasy using LLMs for data annotation, you are right (if not, you should). It offers new chances for research that is difficult with traditional #NLP/#textasdata methods, but the risk of false conclusions is high!

Experiment + *evidence-based* mitigation strategies in this preprint 👇
joachimbaumann.bsky.social
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses. 
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.
Reposted by Johannes B. Gruber
aurman21.bsky.social
Are you a CSS researcher using LLMs for text annotation tasks? Do you integrate the results into downstream statistical analyses?

Turns out, even with SOTA models you have a 31-50% chance of coming to wrong conclusions this way!

Learn more about this and mitigation strategies in our new preprint 👇
joachimbaumann.bsky.social
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses. 
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.
Reposted by Johannes B. Gruber
joachimbaumann.bsky.social
🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825
We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation".
We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks.
For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations.
Then, we collect 13 million LLM annotations across plausible LLM configurations.
These annotations feed into 1.4 million regressions testing the hypotheses. 
For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions.
Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors.
Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models.
Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.
Reposted by Johannes B. Gruber
davekarpf.bsky.social
Turned this into a blog post, adding a dash of Alinsky as well, just to get the ideas out of my head and down on paper.

(Some people smoke as a stress-reliever. I quote Schattschneider and Alinsky.)

open.substack.com/pub/davekarp...
Reposted by Johannes B. Gruber
stefanmueller.bsky.social
The University of Cologne plans drastic cuts to its Political Science department, potentially abolishing the MSc in Political Science and eliminating the professorships in IR and Comparative Politics. This would severely weaken the social sciences at @unicologne.bsky.social.
Petition (de/en) ⤵️
Petition unterschreiben
Keine WiSo ohne PoWi! Für eine starke Politikwissenschaft im Bachelor & Master
https://www.change.org/p/keine-wiso-ohne-powi-für-eine-starke-politikwissenschaft-im-bachelor-master
jbgruber.bsky.social
Such a fun course to teach! Just a very brief overview of:

- obtaining (scraping) data
- network analysis
- computational text analysis
- how to use LLMs for social science
gesistraining.bsky.social
🚀 The #GESISfallseminar in Computational Social Science 2025 has officially kicked off!
Today, we started with two blended learning courses: self-learning sessions in the morning followed by live sessions in the afternoon.
👋 Welcome to all lecturers & participants for week 1!
📸 A glimpse into our first online live session of the #GESISfallseminar in Computational Social Science 2025!

📊 Introduction to CSS with R by Johannes B. Gruber 📸 A glimpse into our first online live session of the #GESISfallseminar in Computational Social Science 2025!

👩‍💻 Introduction to CSS with Python by John McLevey
Reposted by Johannes B. Gruber
fabiennelind.bsky.social
🎙️ New @ccspod.bsky.social episode!
I unpack how computational methods transform the study of opinionated communication + share insights from my work on studying climate change campaigns on social media with @cidape.bsky.social

Listen here 👉 aboutccs.net/opinion/
Observing Opinions – What is it about CCS
aboutccs.net
Reposted by Johannes B. Gruber
dariia.bsky.social
This is becoming so normalized, I did not even wake up the second time there was an air alert.
100s of drones, missiles, with loud explosions & house shaking, reading about the destruction & deaths in the morning - several times a week

#UkrainianView