Luca
banner
sciencialab.com
Luca
@sciencialab.com
On the 26-27 November we held the #Grobid Camp at the Centre de #Inria Paris.
The goal was to have a meeting with the major players in the French community which spaces from government institutes, to companies and large scale projects.
1/4
December 8, 2025 at 12:59 PM
Me: Please, fix the tests!
LLM Agent: OK
November 1, 2025 at 9:49 PM
The Safari browser is like a car with one gear that claim it does not pollute...
August 24, 2025 at 2:11 PM
Exactly! There is a common misconception that by throwing any kind of crap into a vector it will magically work. Still at the age of AI, metadata information cannot still be ignored.
Vector search is good, but it's not enough.
Metadata adds a layer of logic on top of your text chunks.
This lets you filter by date or source first, then retrieve all the related pieces at once.

#RAG #AI #VectorSearch
July 28, 2025 at 7:25 AM
Reposted by Luca
Yes. The time is now. Vaccines to treat and prevent cancer.
www.jci.org/articles/vie...
July 1, 2025 at 5:57 PM
Grobid 0.8.2 is out! 🚀
- 🧠 New processing "flavors" for different doc types (e.g. SDO, corrections, editorials)
- 🔗 Improved URL extraction
- ✅ Better text extraction for paragraphs around figures and tables
🧵🔽
May 18, 2025 at 8:27 AM
Dear @github, I wonder whether it would be possible to have a way to save certain "search parameters" inside the issues/pulls so that our work may be framed to important tasks. E.g. working on a specific milestone and wanting to know everything that is not yet done:
May 11, 2025 at 6:49 AM
Reposted by Luca
GROBID by Patrice Lopez turns messy PDFs into well-structured text in TEI format including references- super useful! https://github.com/kermitt2/grobid
GitHub - kermitt2/grobid: A machine learning software for...
A machine learning software for extracting information fr...
github.com
December 1, 2024 at 3:55 PM
Reposted by Luca
To what extent do researchers funded by Dutch Research Council NWO and ZonMw share the research data and code underlying their publications?

Today we published an analysis based on 10.000+ papers using the open source tool Grobid: www.nwo.nl/en/news/shar...

All underlying data openly available!
February 10, 2025 at 9:10 PM
Grobid popularity is still growing, despite LM, LLM, LLLM....
May 6, 2025 at 2:57 PM
Suggestion not asked. If you don't want advertisements anymore on Twitter, you can pay 200 EUR per year (100 EUR only reduced them by half, lol), or you can pay 0 EUR and
1/2
November 22, 2024 at 8:35 AM
Here a few tips for using Bluesky github.com/JefTek/Blues...
GitHub - JefTek/BlueskyGuide: Collection of Tips & Tricks for collaborating on Bluesky
Collection of Tips & Tricks for collaborating on Bluesky - JefTek/BlueskyGuide
github.com
November 21, 2024 at 8:13 AM