Daniel van Strien
@danielvanstrien.bsky.social
4.3K followers 2.4K following 270 posts
Machine Learning Librarian at @hf.co
Posts Media Videos Starter Packs
Reposted by Daniel van Strien
danielvanstrien.bsky.social
DoTS.ocr just got native vLLM support!

I built a UV script so you can run SOTA multilingual OCR in seconds with zero setup using @hf.co Jobs

Tested on 1800s library cards - works great ✨
Screenshot of an index card with annotated bounding box predictions from the ocr model screenshot of a code command.
danielvanstrien.bsky.social
DoTS.ocr just got native vLLM support!

I built a UV script so you can run SOTA multilingual OCR in seconds with zero setup using @hf.co Jobs

Tested on 1800s library cards - works great ✨
Screenshot of an index card with annotated bounding box predictions from the ocr model screenshot of a code command.
danielvanstrien.bsky.social
Also uploaded related datasets for index cards bsky.app/profile/dani...
danielvanstrien.bsky.social
Card catalogues aren't just a relic of the past - many institutions still rely on them because full migration is too expensive. VLMs could help change that.

I uploaded two new @hf.co datasets (~470K cards) for training/evaluating models to extract structured metadata from catalogue cards.
Picture of a digitised index card.
danielvanstrien.bsky.social
Card catalogues aren't just a relic of the past - many institutions still rely on them because full migration is too expensive. VLMs could help change that.

I uploaded two new @hf.co datasets (~470K cards) for training/evaluating models to extract structured metadata from catalogue cards.
Picture of a digitised index card.
Reposted by Daniel van Strien
jay.bsky.team
We’re hiring for two machine learning roles. A chance to do cutting edge things with ML to make this place a lot more personalized.

jobs.gem.com/bluesky/am9i...
Bluesky Jobs
Bluesky Jobs
jobs.gem.com
danielvanstrien.bsky.social
Let me know if you think it's good to add any more context about that in the dataset card!
danielvanstrien.bsky.social
New @hf.co BigLAM dataset: 9,363 OA books with page images + rich MARC metadata for evaluating (and training) VLMs on metadata extraction.

Libraries are starting to explore AI-assisted cataloguing, but we lack public evaluation data. Hoping this helps fill that gap.

huggingface.co/datasets/big...
Screenshot of the dataset viewer showing a column of marc data + the first few pages of an open access monograph
Reposted by Daniel van Strien
danielvanstrien.bsky.social
Blogged: Fine-tuning a VLM for art history in hours, not weeks

iconclass-vlm generates museum catalog codes (fun fact: "71H7131" = "Bathsheba with David's letter"!)

@hf.co TRL + Jobs = magic ✨

Guide here: danielvanstrien.xyz/posts/2025/i...
danielvanstrien.xyz
danielvanstrien.bsky.social
Blogged: Fine-tuning a VLM for art history in hours, not weeks

iconclass-vlm generates museum catalog codes (fun fact: "71H7131" = "Bathsheba with David's letter"!)

@hf.co TRL + Jobs = magic ✨

Guide here: danielvanstrien.xyz/posts/2025/i...
danielvanstrien.xyz
danielvanstrien.bsky.social
I fine-tuned a smol VLM to generate specialized art history metadata!

iconclass-vlm: Qwen2.5-VL-3B trained using SFT to generate ICONCLASS codes (think Dewey Decimal for art!)

Trained with @hf.co TRL + Jobs - single UV script, no GPU needed!

Blog soon!
Screenshot of the iconclass-vlm model demo showing predictions for a 17th century portrait painting of a standing woman in black dress with white ruff collar. The interface displays the model's raw JSON prediction with ICONCLASS codes, then compares predictions against ground truth labels in two columns. Model correctly identifies "31A231 standing figure" and "61B(+55) historical persons (portraits and scenes from the life) (+ full length portrait)" among others, achieving 3 out of 6 matches. Some predictions marked as "Not a valid iconclass label" showing areas where the model needs improvement.
danielvanstrien.bsky.social
Try it with one line of code via Jobs!

It processes images from any dataset and outputs a new dataset with extracted markdown - all using HF GPUs.

See the full OCR uv scripts collection: huggingface.co/datasets/uv-...
Screenshot of a hf jobs uv run command with some flags and a URL pointing to a script.
danielvanstrien.bsky.social
What if OCR models could show you their thought process?

NuMarkdown-8B-Thinking from NuMind (YC S22) doesn't just extract text - it reasons through documents first.

Could be pretty valuable for weird historical documents?

Example here: davanstrien-ocr-time-capsule.static.hf.space/index.html?d...
Screenshot of an app showing an image from a page + model reasoning showing how the model is parsing the text and layout.
danielvanstrien.bsky.social
You can now generate synthetic data using OpenAIs GPT OSS models on @hf.co Jobs!

One command, no setup:

hf jobs uv run --flavor l4x4 [script-url] \
--input-dataset your/dataset \
--output-dataset your/output

Works on L4 GPUs ⚡

huggingface.co/datasets/uv-...
uv-scripts/openai-oss · Datasets at Hugging Face
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
danielvanstrien.bsky.social
I’m continuing my experiments with VLM-based OCR…

How well do these models handle Victorian theatre playbills from @bldigischol.bsky.social?

RolmOCR vs traditional OCR on tricky playbills (ornate fonts, faded ink, DRAMATIC ALL CAPS!)

@hf.co Demo: huggingface.co/spaces/davan...
Screenshot of a plyabill with some OCR results on the right
danielvanstrien.bsky.social
It's often not documented, but "traditional" OCR in this case is whatever libraries and archives used in the past to generate some OCR. My goal with this work is mainly to see how much better VLMs might be (and in which situations), to get some better sense of when redoing OCR might be worth it.
Reposted by Daniel van Strien
danielvanstrien.bsky.social
Many VLM-based OCR models have been released recently. Are they useful for libraries and archives?

I made a quick Space to compare VLM OCR with "traditional" OCR using 11k Scottish exam papers from @natlibscot.bsky.social

huggingface.co/spaces/davanstrien/ocr-time-capsule
Screenshot of the app showing a page from a book + different views of existing and new ocr.
danielvanstrien.bsky.social
I'm planning to add more example datasets & OCR models using HF Jobs. Feel free to suggest collections to test with: I need image + existing OCR!

Even better: upload your GLAM datasets to @hf.co! 🤗