Etienne Bernard
etiennebcp.bsky.social
Etienne Bernard
@etiennebcp.bsky.social
CEO@NuMind - Making NuExtract LLMs

previously: head of ML@WolframResearch // Physics PhD @ ENS & MIT
quick links: Hugging Face model: huggingface.co/numind/NuMar...

- GitHub repository: github.com/numindai/NuM...

- Discord server: discord.gg/3tsEtJNCDe
LinkedIn
This link will take you to a page that’s not on LinkedIn
lnkd.in
August 7, 2025 at 11:52 AM
We find that NuMarkdown is outperforming generalist non-reasoning models like GPT-4o, specialized OCR models like OCRFlux, and is competitive against large reasoning closed-source models like Gemini 2.5!

Congrats to Alexandre Constantin for this tour de force
August 7, 2025 at 11:14 AM
This model is a fine-tune of Qwen 2.5-VL-7B on synthetic Doc → Reasoning → Markdown examples, followed by an RL phase (GRPO) with a layout-centric reward.
August 7, 2025 at 11:14 AM
It generates thought tokens to figure out the layout of the document before generating the Markdown. It is particularly good at understanding documents with weird layouts and complex tables. The number of thinking tokens can vary from 20% to 500% of the final answer, depending on task difficulty
August 7, 2025 at 11:14 AM