Lightnews — Scholar-powered news

Jörg Lehmann

@jrglmn.bsky.social

That’s my favourite dedication I’ve seen in 2025 …

„Dedicated to Donald Lau, fortune cookie writer for 30 years“

December 26, 2025 at 5:39 PM

Reposted by Jörg Lehmann

Stabi Berlin

@stabiberlin.bsky.social

Seit wann gibt es das Spiel "Stadt, Land, Fluss"? Antworten auf diese Frage hat Dr. Tarek Saier gefunden - mithilfe von Datensätzen, die bei uns an der #StabiBerlin publiziert wurden und dem klugen Einsatz von #KI. Mehr zu seinem Rechercheprojekt lest ihr hier 👉 blog.sbb.berlin/stadt_land_f...

Seit wann gibt es „Stadt, Land, Fluss?“ Mit Stabi-Daten und KI Antworten finden - SBB aktuell

Das Spiel „Stadt, Land, Fluss“ lässt sich in seiner heutigen Form auf die 1930er Jahre datieren; der Name ist spätestens 1937 nachweisbar. Das zugrunde liegende Spielprinzip – Antworten zu mehreren Ka...

blog.sbb.berlin

December 19, 2025 at 11:08 AM

Jörg Lehmann

@jrglmn.bsky.social

Two results from the project "Human.Machine.Culture" at @stabiberlin.bsky.social published

Guidelines for the Documentation of Ethical, Legal and Social Issues (ELSI) in Cultural Data
doi.org/10.5281/zeno...

Guidelines for the Publication of Cultural Data for AI Research
doi.org/10.5281/zeno...

December 17, 2025 at 5:17 PM

Jörg Lehmann

@jrglmn.bsky.social

A note on the cost-benefit ratio of #chatgpt - a user of one of our datasets (doi.org/10.5281/zeno...) calculated the cost of what the use of GPT-5 would have been if the 5 million pages provided by us would have been processed using #OpenAI 's service: $1.2 million (gasp!)

@stabiberlin.bsky.social

Berlin State Library (2023). Fulltexts of the Digitized Collections of the Berlin State Library (SBB)

The motivation for creating this dataset was to enable research on the basis of fulltexts which are available in a cultural heritage institution on a large scale. Libraries such as the Berlin State Li...

doi.org

December 12, 2025 at 9:26 AM

Jörg Lehmann

@jrglmn.bsky.social

In the poster session at #CHR2025 starting at 5pm, our dear colleague Michał Bubula will present our short paper entitled "How Scalable is Quality Assessment of Text Recognition?"

anthology.ach.org/volumes/vol0...

Go there and talk to him since we aim to provide your high-quality research data!

How Scalable is Quality Assessment of Text Recognition? A Combination of Ground Truth and Confidence Scores

anthology.ach.org

December 11, 2025 at 3:56 PM

Reposted by Jörg Lehmann

CRETA e.V.

@cretaverein.bsky.social

Morgen, am Donnerstag den 11.12.2025 sind Nils Reiter & Janis Pagel vom CRETA-Verein bei der Poster-Session der CHR2025 Conference in Luxemburg anzutreffen. Sie stellen einen Ansatz zum Thema "Automatic detection and classification of literary character properties in German narratives" vor!
#CHR2025

December 10, 2025 at 10:33 AM

Jörg Lehmann

@jrglmn.bsky.social

#FF2025

@danielvanstrien.bsky.social has published his workshop slides on #opensource AI for #GLAM 's

danielvanstrien.xyz/slides.html

I consider his readworthy contribution as the AI complement to my reflections on open data (and their shades) published here:

mmk.sbb.berlin/2024/06/21/o...

Slides – Daniel van Strien

danielvanstrien.xyz

December 10, 2025 at 9:42 AM

Reposted by Jörg Lehmann

Christof Schöch

@christof.fedihum.org.ap.brid.gy

Very cool!

The Proceedings for #chr2025 have already been published, now at the new and slick #ach, the "Anthology of Computers and the Humanities", developed and maintained by ACH, the "Association for Computers and the Humanities".

As an example, you can […]

[Original post on fedihum.org]

Screenshot of the landing page of one article in the ACH, in simple tones of gray.

November 20, 2025 at 12:31 PM

Reposted by Jörg Lehmann

Daniel van Strien

@danielvanstrien.bsky.social

Just posted my slides from the AI4LAM #FF2025 workshop on open source AI for GLAMs.

Probably slides on their own aren't that useful, but they do feature one of my growing collection of libraries-and-AI memes, so there's that danielvanstrien.xyz/slides.html

Here's alt text for the meme:

Alt text: "Flex Tape meme format. Top panel: Phil Swift (labeled 'Library Systems Vendor') aggressively spraying water representing 'Outdated systems, metadata issues, disjointed search and complex user needs.' Bottom panel: A hand slapping Flex Tape underwater, with the tape labeled 'AI-powered chat interface.'"

This captures the joke that vendors are positioning AI chat as a quick fix for deep-seated library infrastructure problems—a bit like slapping tape on a leak rather than fixing the plumbing.

December 9, 2025 at 10:13 AM

Jörg Lehmann

@jrglmn.bsky.social

#FF2025 pickings:
This year has been extremely productive with regard to the AI & commons debate, as well as in view of the publication of open, public domain datasets.

Paul Keller & Europeana Foundation: Publishing cultural heritage data in the age of AI, Dec 2025
openfuture.eu/publication/...

Impulse paper: Publishing cultural heritage data in the age of AI – Open Future

This paper proposes a framework to help cultural heritage institutions decide when and how to share collection data for AI training, balancing open access with managing large-scale AI reuse aligned wi...

openfuture.eu

December 8, 2025 at 10:48 AM

Jörg Lehmann

@jrglmn.bsky.social

A colleague from the UK and I bring in our two cents on a highly divisive issue, written from the perspective of European CHIs and research libraries.

A Position Paper on AI and Copyrights in Cultural Heritage and Research (EU and UK)

doi.org/10.5334/johd.290

#genAI #commons #openness

A Position Paper on AI and Copyrights in Cultural Heritage and Research (EU and UK) | Journal of Open Humanities Data

doi.org

April 2, 2025 at 3:35 PM

Jörg Lehmann

@jrglmn.bsky.social

Sigh. This topic #openness, intellectual property rights #IPR, #genAI is getting really complicated for #GLAM institutions.
Wrote a blogpost to chart what's up in the EU and what we currently need:

mmk.sbb.berlin/2024/03/13/o...

Currently, there is no technical solution to implement an opt out...

Orientation in Turbulent Times – Mensch.Maschine.Kultur

mmk.sbb.berlin

March 14, 2024 at 2:42 PM

Jörg Lehmann

@jrglmn.bsky.social

New post on the "power hungry magic" of contemporary artificial intelligence published on the blog of the HumanMachineCulture project:
Energy, CO2 intensity and sustainability as mostly overlooked issues in the deployment of GPTs.

mmk.sbb.berlin/2024/01/26/p...

#LLMs #ChatGPT #metaverse

Power Hungry Magic – Mensch.Maschine.Kultur

mmk.sbb.berlin

January 26, 2024 at 3:40 PM

Jörg Lehmann

@jrglmn.bsky.social

Copyright is but one indicator of the value of digital texts, which have gone through a quality filter called ‚publishing houses‘. The same can apply to texts in the public domain, and GLAM institutions should reflect on this. Texts in open access are as well valuable, see
doi.org/10.54900/zg9...

January 18, 2024 at 9:32 PM

Jörg Lehmann

@jrglmn.bsky.social

Dutch National Library restricts access for commercial AI
Blocking is done via the robots.txt. Crawlers are thus excluded regardless of copyright. Consequently, public domain material is not accessible to the crawlers. Restriction is selective: Googlebot-image, dataforseo.com, GPTBot, ChatGPT-User

January 14, 2024 at 8:58 AM

Jörg Lehmann

@jrglmn.bsky.social

New post "Feeding the cuckoo" published on the blog of the MMK project, focusing on privacy issues in large language models, especially Google's Bard (my friend, the poet).

mmk.sbb.berlin/2024/01/12/f...

#LLMs #privacy #ChatGPT #ethics #elsi

Feeding the Cuckoo – Mensch.Maschine.Kultur

mmk.sbb.berlin

January 12, 2024 at 2:23 PM

Jörg Lehmann

@jrglmn.bsky.social

Power Hungry Processing

Luccioni, Jernite & Strubell, November 2023

"the most efficient text generation model uses as much energy as 16% of a full smartphone charge for 1,000 inferences, whereas the least efficient image generation model uses as much energy as 950 smartphone charges (11.49 kWh)"

Power Hungry Processing: Watts Driving the Cost of AI Deployment?

Recent years have seen a surge in the popularity of commercial AI products based on generative, multi-purpose AI systems promising a unified approach to building machine learning (ML) models into...

doi.org

January 9, 2024 at 3:40 PM

Jörg Lehmann

@jrglmn.bsky.social

I wrote a blogpost on LLMs and anthropomorphism for the blog of our project:

mmk.sbb.berlin/2023/12/20/h...

People who are a bit lonely before Christmas may want to read it…

Human-Machine-Cognition – Mensch.Maschine.Kultur

mmk.sbb.berlin

December 23, 2023 at 2:28 PM

Jörg Lehmann

@jrglmn.bsky.social

Datasheets for Digital Cultural Heritage Datasets:

doi.org/10.5334/johd.124

What are the characteristics of digital cultural heritage datasets? How would dataset documentation look like?
We formulate a series of recommendations and propose a datasheet template, see:
doi.org/10.5281/ZENODO.8375033

December 22, 2023 at 5:02 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news