Jörg Lehmann
banner
jrglmn.bsky.social
Jörg Lehmann
@jrglmn.bsky.social
Digital humanism | machine learning | digital cultural heritage | Berlin State Library | „Name a bias – we have it!“
That’s my favourite dedication I’ve seen in 2025 …
December 26, 2025 at 5:39 PM
Reposted by Jörg Lehmann
Seit wann gibt es das Spiel "Stadt, Land, Fluss"? Antworten auf diese Frage hat Dr. Tarek Saier gefunden - mithilfe von Datensätzen, die bei uns an der #StabiBerlin publiziert wurden und dem klugen Einsatz von #KI. Mehr zu seinem Rechercheprojekt lest ihr hier 👉 blog.sbb.berlin/stadt_land_f...
Seit wann gibt es „Stadt, Land, Fluss?“ Mit Stabi-Daten und KI Antworten finden - SBB aktuell
Das Spiel „Stadt, Land, Fluss“ lässt sich in seiner heutigen Form auf die 1930er Jahre datieren; der Name ist spätestens 1937 nachweisbar. Das zugrunde liegende Spielprinzip – Antworten zu mehreren Ka...
blog.sbb.berlin
December 19, 2025 at 11:08 AM
Two results from the project "Human.Machine.Culture" at @stabiberlin.bsky.social published

Guidelines for the Documentation of Ethical, Legal and Social Issues (ELSI) in Cultural Data
doi.org/10.5281/zeno...

Guidelines for the Publication of Cultural Data for AI Research
doi.org/10.5281/zeno...
December 17, 2025 at 5:17 PM
A note on the cost-benefit ratio of #chatgpt - a user of one of our datasets (doi.org/10.5281/zeno...) calculated the cost of what the use of GPT-5 would have been if the 5 million pages provided by us would have been processed using #OpenAI 's service: $1.2 million (gasp!)

@stabiberlin.bsky.social
Berlin State Library (2023). Fulltexts of the Digitized Collections of the Berlin State Library (SBB)
The motivation for creating this dataset was to enable research on the basis of fulltexts which are available in a cultural heritage institution on a large scale. Libraries such as the Berlin State Li...
doi.org
December 12, 2025 at 9:26 AM
In the poster session at #CHR2025 starting at 5pm, our dear colleague Michał Bubula will present our short paper entitled "How Scalable is Quality Assessment of Text Recognition?"

anthology.ach.org/volumes/vol0...

Go there and talk to him since we aim to provide your high-quality research data!
How Scalable is Quality Assessment of Text Recognition? A Combination of Ground Truth and Confidence Scores
anthology.ach.org
December 11, 2025 at 3:56 PM
Reposted by Jörg Lehmann
Morgen, am Donnerstag den 11.12.2025 sind Nils Reiter & Janis Pagel vom CRETA-Verein bei der Poster-Session der CHR2025 Conference in Luxemburg anzutreffen. Sie stellen einen Ansatz zum Thema "Automatic detection and classification of literary character properties in German narratives" vor!
#CHR2025
December 10, 2025 at 10:33 AM
#FF2025

@danielvanstrien.bsky.social has published his workshop slides on #opensource AI for #GLAM 's

danielvanstrien.xyz/slides.html

I consider his readworthy contribution as the AI complement to my reflections on open data (and their shades) published here:

mmk.sbb.berlin/2024/06/21/o...
Slides – Daniel van Strien
danielvanstrien.xyz
December 10, 2025 at 9:42 AM
Reposted by Jörg Lehmann
Very cool!

The Proceedings for #chr2025 have already been published, now at the new and slick #ach, the "Anthology of Computers and the Humanities", developed and maintained by ACH, the "Association for Computers and the Humanities".

As an example, you can […]

[Original post on fedihum.org]
November 20, 2025 at 12:31 PM
Reposted by Jörg Lehmann
Just posted my slides from the AI4LAM #FF2025 workshop on open source AI for GLAMs.

Probably slides on their own aren't that useful, but they do feature one of my growing collection of libraries-and-AI memes, so there's that danielvanstrien.xyz/slides.html
December 9, 2025 at 10:13 AM
#FF2025 pickings:
This year has been extremely productive with regard to the AI & commons debate, as well as in view of the publication of open, public domain datasets.

Paul Keller & Europeana Foundation: Publishing cultural heritage data in the age of AI, Dec 2025
openfuture.eu/publication/...
Impulse paper: Publishing cultural heritage data in the age of AI – Open Future
This paper proposes a framework to help cultural heritage institutions decide when and how to share collection data for AI training, balancing open access with managing large-scale AI reuse aligned wi...
openfuture.eu
December 8, 2025 at 10:48 AM
A colleague from the UK and I bring in our two cents on a highly divisive issue, written from the perspective of European CHIs and research libraries.

A Position Paper on AI and Copyrights in Cultural Heritage and Research (EU and UK)

doi.org/10.5334/johd.290

#genAI #commons #openness
A Position Paper on AI and Copyrights in Cultural Heritage and Research (EU and UK) | Journal of Open Humanities Data
doi.org
April 2, 2025 at 3:35 PM
Sigh. This topic #openness, intellectual property rights #IPR, #genAI is getting really complicated for #GLAM institutions.
Wrote a blogpost to chart what's up in the EU and what we currently need:

mmk.sbb.berlin/2024/03/13/o...

Currently, there is no technical solution to implement an opt out...
Orientation in Turbulent Times – Mensch.Maschine.Kultur
mmk.sbb.berlin
March 14, 2024 at 2:42 PM
New post on the "power hungry magic" of contemporary artificial intelligence published on the blog of the HumanMachineCulture project:
Energy, CO2 intensity and sustainability as mostly overlooked issues in the deployment of GPTs.

mmk.sbb.berlin/2024/01/26/p...

#LLMs #ChatGPT #metaverse
Power Hungry Magic – Mensch.Maschine.Kultur
mmk.sbb.berlin
January 26, 2024 at 3:40 PM
Copyright is but one indicator of the value of digital texts, which have gone through a quality filter called ‚publishing houses‘. The same can apply to texts in the public domain, and GLAM institutions should reflect on this. Texts in open access are as well valuable, see
doi.org/10.54900/zg9...
January 18, 2024 at 9:32 PM
Dutch National Library restricts access for commercial AI
Blocking is done via the robots.txt. Crawlers are thus excluded regardless of copyright. Consequently, public domain material is not accessible to the crawlers. Restriction is selective: Googlebot-image, dataforseo.com, GPTBot, ChatGPT-User
January 14, 2024 at 8:58 AM
New post "Feeding the cuckoo" published on the blog of the MMK project, focusing on privacy issues in large language models, especially Google's Bard (my friend, the poet).

mmk.sbb.berlin/2024/01/12/f...

#LLMs #privacy #ChatGPT #ethics #elsi
Feeding the Cuckoo – Mensch.Maschine.Kultur
mmk.sbb.berlin
January 12, 2024 at 2:23 PM
Power Hungry Processing

Luccioni, Jernite & Strubell, November 2023

"the most efficient text generation model uses as much energy as 16% of a full smartphone charge for 1,000 inferences, whereas the least efficient image generation model uses as much energy as 950 smartphone charges (11.49 kWh)"
Power Hungry Processing: Watts Driving the Cost of AI Deployment?
Recent years have seen a surge in the popularity of commercial AI products based on generative, multi-purpose AI systems promising a unified approach to building machine learning (ML) models into...
doi.org
January 9, 2024 at 3:40 PM
I wrote a blogpost on LLMs and anthropomorphism for the blog of our project:

mmk.sbb.berlin/2023/12/20/h...

People who are a bit lonely before Christmas may want to read it…
Human-Machine-Cognition – Mensch.Maschine.Kultur
mmk.sbb.berlin
December 23, 2023 at 2:28 PM
Datasheets for Digital Cultural Heritage Datasets:

doi.org/10.5334/johd.124

What are the characteristics of digital cultural heritage datasets? How would dataset documentation look like?
We formulate a series of recommendations and propose a datasheet template, see:
doi.org/10.5281/ZENODO.8375033
December 22, 2023 at 5:02 PM