Lightnews — Scholar-powered news

Johannes B. Gruber @jbgruber.bsky.social · 16h

Academic life hack: check which papers AI hallucinated most often and write them 🚀🚀🚀

Ulrike Franke @rikefranke.bsky.social · 1d

And here we go. I never wrote this article, and yet it is cited here.

www.liberalbriefs.com/geopolitics/...

And of course, it sounds so plausible, I seriously checked whether I had forgotten it, or the footnote was slightly wrong.

#AIisnotresearch

1 11

Reposted by Johannes B. Gruber

Katrin Weller @kwelle.bsky.social · 7d

Social-Media-Daten zwischen Forschung und Infrastrukturen - nachhaltige Archivierung, Erschließung und Bereitstellung: An der @dnb-aktuelles.bsky.social finden vom 17.-19.03.2026 die Social Media Access Days statt. Wir freuen uns über Einreichungen bis zum 31.10.2025. www.dnb.de/DE/Professio...

Call for Submissions: Social Media Access Days

www.dnb.de

6 14

Reposted by Johannes B. Gruber

Societal Analytics Lab @societal-analytics.nl · 8d

#AmCAT is proudly developed by the @societal-analytics.nl

You can learn more about it in the:
* Book: amcat.nl/book/
* Blog post: societal-analytics.nl/blogs/202501...

MEDem @medem.bsky.social · 8d

Day 2 of the #MEDemConference at @gesis.org starts with powerful tool demos:
🔍 AmCAT @sof14g1l.bsky.social on enabling large-scale text analysis of media & political debates.
🌐 HarDIS @sziaja.bsky.social on harmonizing and sustaining cross-national democracy data (surveys, parties, experts).

Sofia Gil-Clavel stands at a podium presenting AmCAT at the 3rd MEDem Conference. Behind her, a slide shows the AmCAT team (Kasper Welbers, Wouter van Atteveldt, Johannes Gruber, Sofia Gil-Clavel) with the tagline: “Developed by researchers for researchers, society, and data savvy users.” Logos of MEDem, VU Amsterdam, and the Societal Analytics Lab are displayed at the top.

Sebastian Ziaja stands at a podium presenting HarDIS (Harmony in the Democratic Ideological Space) at the 3rd MEDem Conference. A slide behind him shows the HarDIS team (Lea Kaftan, Paul Bederke, Selçuk Timur Uluer) with the logos of MEDem, GESIS Leibniz Institute for the Social Sciences, and OSCARS (the funding initiative).

3 5

Johannes B. Gruber @jbgruber.bsky.social · 7d

@sebstier.bsky.social at #MEDem Conf: computational research of democracy stands in the shoulders of the few enthusiasts who create datasets, software and infrastructure for it. How can we move forward? Short answer: more collaboration & sharing!

How to move forward

Collaborate on improving data coverage
Filling gaps in poliitical text orpora
Collecting online platform data via APls, webscraping and the Digital Services Act
Share open-source software, R packages and infrastructure components
Improve conditions for data sharing
Critically evaluate and improve the application of Al
Requires collaboration and funding at the European level

6 17

Johannes B. Gruber @jbgruber.bsky.social · 8d

@simonsaysnothin.bsky.social at #MEDem Conf: we need to integrate our efforts instead of researchers all building their own datasets and infrastructure. Couldn't agree more!

Integrate

○ Researchers get lost in building fragmented data
infrastructures (case in point: me).
○ specialized collection is fine, but fragmented
dissemination hinders use.
○ Lack of integration blocks comparative research.

Whot I'd like MEDem to build towards

○ Incentivization of shared infrastructures
○ Toolkits to standardize data collection (measurement and interoperability)

1 18

Reposted by Johannes B. Gruber

Alisson Soares @alissonmasoares.bsky.social · 10d

The "validate, validate, validate" (GRIMMER, 2014) principle of Text Analysis/NLP never gets old.

1 3

Johannes B. Gruber @jbgruber.bsky.social · 11d

Bluesky is not just a clone of the old Twitter. It's meant to look and feel like it to popularise a version of social media with a fundamental difference to the big platforms: its infrastructure is open.

Nice write up of that background: overreacted.io/open-social/

Open Social — overreacted

The protocol is the API.

overreacted.io

10

Reposted by Johannes B. Gruber

Ulrike Klinger @ulrikeklinger.bsky.social · 12d

Wanna know more about #data #access and the Digital Services Act? Here’s our latest policy paper about how it works👇

www.weizenbaum-library.de/items/86842c...

#commsky #polisky #dsa @weizenbauminstitut.bsky.social

7 14

Reposted by Johannes B. Gruber

Dariia Mykhailyshyna @dariia.bsky.social · 12d

❗️Our next workshop will be on October 2nd, 6 pm CEST, on Effective and Useful Feature engineering by @emilhvitfeldt.bsky.social

Register or sponsor a student by donating to support Ukraine!
Details: bit.ly/3wBeY4S
Please share!
#AcademicSky #EconSky #RStats

1 9 18

Reposted by Johannes B. Gruber

Alexia Katsanidou @alexiakatsanidou.bsky.social · 15d

Coming up on Monday the @medem.bsky.social conference at @gesis.org in Cologne. Stay tuned for the future of democracy research infrastructures www.medem.eu/coming-up-th... Keynotes from @simonsaysnothin.bsky.social and @sldelange.bsky.social

Coming Up: The 2025 MEDem Conference & Workshop! - Monitoring Electoral Democracy

Coming Up: the 2025 medem Conference! We are thrilled for the upcoming 3rd MEDem Conference, scheduled to take place from September 29-30 at GESIS in Cologne!The 3rd MEDem conference will bring togeth...

www.medem.eu

17 21

Johannes B. Gruber @jbgruber.bsky.social · 15d

And if it's not about responsibility, why else? The models don't care. OpenAI does not need free advertisement. And whether somebody can write well or not does not factor into the quality of a piece, imo. I'm honestly trying to understand how this transparency would improve anything.

2

Johannes B. Gruber @jbgruber.bsky.social · 15d

"acknowledging LLM contributions is key to maintaining transparency and ethical standards in academic publishing"

Why though? Acknowledging the use of LLMs only dilutes responsibility. Authors are responsible for everything in an article. And if it's fake/plagiarised, authors are responsible.

Kai Arzheimer @kai-arzheimer.com · 15d

I smell some social desirability bias. Also, who acknowledges their (overly wordy) spell checker?

What do researchers acknowledge ChatGPT for in their papers? - Impact of Social Sciences

A new study finds LLMs to be acknowledged for only a narrow set of academic tasks.

blogs.lse.ac.uk

1 6

Johannes B. Gruber @jbgruber.bsky.social · 16d

Just wanted to share this Google Scholar trick: I often have the problem that I want to find papers using certain computational methods, but specifically in my own field (for lit reviews).

You can do that by limiting the search to certain sources. My (imperfect) collection in the alt text.

"BERT" AND "multilingual" source:"Digital Journalism" OR source:"Digital Journalism" OR source:"International Journal of Press/Politics" OR source:"Journal of Communication" OR source:"New Media and Society" OR source:"Communication Methods and Measures" OR source:"Communication Research" OR source:"Journal of Computer-Mediated Communication" OR source:"Big Data and Society" OR source:"Political Communication" OR source:"Social Media and Society" OR source:"Computational Communication Research"

4 17

Reposted by Johannes B. Gruber

Jan Broder Engler @jbengler.de · 19d

The new ggplot2 4.0.0 now supports absolute plot dimensions 🤩

#rstats #dataviz #phd

library(tidyverse)

mtcars |>
head(8) |>
rownames_to_column("name") |>
ggplot(aes(x = drat, y = name, fill = name)) +
geom_col() +
theme(panel.widths = unit(50, "mm"), panel.heights = unit(50, "mm"))

16 90

Johannes B. Gruber @jbgruber.bsky.social · 16d

Works fine for me. Maybe the user agent got blocked after too many tries?

1 2

Reposted by Johannes B. Gruber

dprex.bsky.social @dprex.bsky.social · 17d

Find us Sep 22.-26. at the #DGS2025 Conference, Campus Duisburg.
At the @gesis.org stand we present DP-R|EX – the Data Portal for Right-Wing & Extremism Data.
Let’s talk about sharing data for reuse, data management & hate speech!
👉info: datenportal-rechtsextremismus.de #ResearchData #ExtremismData

11 12

Reposted by Johannes B. Gruber

Inessa De Angelis @inessadeangelis.bsky.social · Sep 4

Which Canadian MPs are on Bluesky and what do they post?

My new paper w/ @rohanalexander.bsky.social in @cjps-rcsp.bsky.social unpacks these questions, finding MPs
use it like Twitter to discuss policy, the Ottawa bubble & constituency

Read more: doi.org/10.1017/S000...

#polsky #commsky #cdnpoli

What are Canadian Members of Parliament Doing on Bluesky? research note abstract

2 12 30

Johannes B. Gruber @jbgruber.bsky.social · 22d

If you feel uneasy using LLMs for data annotation, you are right (if not, you should). It offers new chances for research that is difficult with traditional #NLP/#textasdata methods, but the risk of false conclusions is high!

Experiment + *evidence-based* mitigation strategies in this preprint 👇

Joachim Baumann @joachimbaumann.bsky.social · 26d

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

1 4 22

Reposted by Johannes B. Gruber

Aleksandra Urman @aurman21.bsky.social · 26d

Are you a CSS researcher using LLMs for text annotation tasks? Do you integrate the results into downstream statistical analyses?

Turns out, even with SOTA models you have a 31-50% chance of coming to wrong conclusions this way!

Learn more about this and mitigation strategies in our new preprint 👇

Joachim Baumann @joachimbaumann.bsky.social · 26d

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

2 3 10

Reposted by Johannes B. Gruber

Joachim Baumann @joachimbaumann.bsky.social · 26d

🚨 New paper alert 🚨 Using LLMs as data annotators, you can produce any scientific result you want. We call this **LLM Hacking**.

Paper: arxiv.org/pdf/2509.08825

$We present our new preprint titled "Large Language Model Hacking: Quantifying the Hidden Risks of Using LLMs for Text Annotation". We quantify LLM hacking risk through systematic replication of 37 diverse computational social science annotation tasks. For these tasks, we use a combined set of 2,361 realistic hypotheses that researchers might test using these annotations. Then, we collect 13 million LLM annotations across plausible LLM configurations. These annotations feed into 1.4 million regressions testing the hypotheses. For a hypothesis with no true effect (ground truth $p > 0.05$), different LLM configurations yield conflicting conclusions. Checkmarks indicate correct statistical conclusions matching ground truth; crosses indicate LLM hacking -- incorrect conclusions due to annotation errors. Across all experiments, LLM hacking occurs in 31-50\% of cases even with highly capable models. Since minor configuration changes can flip scientific conclusions, from correct to incorrect, LLM hacking can be exploited to present anything as statistically significant.$

5 94 260

Reposted by Johannes B. Gruber

Dave Karpf @davekarpf.bsky.social · 27d

Turned this into a blog post, adding a dash of Alinsky as well, just to get the ideas out of my head and down on paper.

(Some people smoke as a stress-reliever. I quote Schattschneider and Alinsky.)

open.substack.com/pub/davekarp...

4 17 98

Reposted by Johannes B. Gruber

Stefan Müller @stefanmueller.bsky.social · 29d

The University of Cologne plans drastic cuts to its Political Science department, potentially abolishing the MSc in Political Science and eliminating the professorships in IR and Comparative Politics. This would severely weaken the social sciences at @unicologne.bsky.social.
Petition (de/en) ⤵️

Petition unterschreiben

Keine WiSo ohne PoWi! Für eine starke Politikwissenschaft im Bachelor & Master

https://www.change.org/p/keine-wiso-ohne-powi-für-eine-starke-politikwissenschaft-im-bachelor-master

6 96 140

Johannes B. Gruber @jbgruber.bsky.social · Sep 4

Such a fun course to teach! Just a very brief overview of:

- obtaining (scraping) data
- network analysis
- computational text analysis
- how to use LLMs for social science

GESIS Training @gesistraining.bsky.social · Sep 1

🚀 The #GESISfallseminar in Computational Social Science 2025 has officially kicked off!
Today, we started with two blended learning courses: self-learning sessions in the morning followed by live sessions in the afternoon.
👋 Welcome to all lecturers & participants for week 1!

📸 A glimpse into our first online live session of the #GESISfallseminar in Computational Social Science 2025!

📊 Introduction to CSS with R by Johannes B. Gruber

📸 A glimpse into our first online live session of the #GESISfallseminar in Computational Social Science 2025!

👩‍💻 Introduction to CSS with Python by John McLevey

1 9

Reposted by Johannes B. Gruber

Fabienne Lind @fabiennelind.bsky.social · Sep 3

🎙️ New @ccspod.bsky.social episode!
I unpack how computational methods transform the study of opinionated communication + share insights from my work on studying climate change campaigns on social media with @cidape.bsky.social

Listen here 👉 aboutccs.net/opinion/

Observing Opinions – What is it about CCS

aboutccs.net

1 9 18

Reposted by Johannes B. Gruber

Dariia Mykhailyshyna @dariia.bsky.social · Sep 3

This is becoming so normalized, I did not even wake up the second time there was an air alert.
100s of drones, missiles, with loud explosions & house shaking, reading about the destruction & deaths in the morning - several times a week

#UkrainianView

Maksym Polyakov 🇺🇦🌻 @maksympolyakov.bsky.social · Sep 3

It was a difficult night in Ukraine

t.me/mon1tor_ua/5...

11 70 150