Sarah Guthals
banner
guthals.com
Sarah Guthals
@guthals.com
Engineer, writer, advocate, mom
Reposted by Sarah Guthals
www.nytimes.com/shared/comme... “We have reached peak insanity: $100B in annual compensation is equivalent to approximately 3M full-time minimum-wage workers (at $16/hr) for one year.
No accomplishment by any individual justifies receiving the same compensation as 3M people working full-time.”
Read a Times Reader's Comment on: Elon Musk Wins $1 Trillion Tesla Pay Package
Tesla shareholders approved a plan to grant Elon Musk shares worth nearly $1 trillion if he meets ambitious goals, including vastly expanding the company’s stock market valuation.
www.nytimes.com
November 7, 2025 at 3:02 AM
Reposted by Sarah Guthals
The tensorlake playground was, unlike AWS textract and every other tool I have tried, able to parse my angled, low-quality scan of Norwegian pay statistics from 1926.

Not that 1926 Norwegian statistical tables is a generally useful benchmark…
Document parsing benchmarks have been measuring the wrong thing.

We tested every major parser on real enterprise documents.

The results will change how you think about OCR accuracy 🧵
November 5, 2025 at 7:52 PM
It's time to start measuring accuracy of data extraction with downstream systems and usability in mind, not just vanity metrics for a marketing slide
Document parsing benchmarks have been measuring the wrong thing.

We tested every major parser on real enterprise documents.

The results will change how you think about OCR accuracy 🧵
November 5, 2025 at 5:53 PM
Reposted by Sarah Guthals
Document parsing benchmarks have been measuring the wrong thing.

We tested every major parser on real enterprise documents.

The results will change how you think about OCR accuracy 🧵
November 5, 2025 at 5:05 PM
Make your agents smarter with accurate and complete data

Learn how to extract data from unstructured documents with @tensorlake.ai, store them in @qdrant.bsky.social, and then use @langchain.bsky.social to for natural language querying.

Check out our lesson 👇
Want to build scalable data lakes w/ Tensorlake + @qdrant.bsky.social?

In the free Qdrant Essentials Course, learn how to:
- Architect vector-powered data lakes
- Optimize ETL pipelines
- Create knowledge graphs
- Integrate @langchain.bsky.social agents for natural language queries

t.co/OoPZswrL7z
October 23, 2025 at 7:39 PM
Reposted by Sarah Guthals
Want to build scalable data lakes w/ Tensorlake + @qdrant.bsky.social?

In the free Qdrant Essentials Course, learn how to:
- Architect vector-powered data lakes
- Optimize ETL pipelines
- Create knowledge graphs
- Integrate @langchain.bsky.social agents for natural language queries

t.co/OoPZswrL7z
October 23, 2025 at 7:37 PM
Reposted by Sarah Guthals
New: Vision Language Models now power key document processing features
We're using VLMs for:
- Page classification in large documents
- Table/figure summarization
- Fast structured extraction (skip_ocr mode)

Here's what this means for document processing 🧵
October 16, 2025 at 9:44 PM
Reposted by Sarah Guthals
Most parsers strip all tracked changes when you extract the text.

That means:
❌ Lost audit trails
❌ Manual review of revision history
❌ No programmatic access to reviewer comments
❌ Workflows that can't route based on specific edits
October 10, 2025 at 5:25 PM
OCR has it's limitations when it comes to document layout/structure.

When you need to have an accurate representation of the document (e.g. header levels), you need something more than OCR.

Tensorlake fixes OCR results, detecting and correcting header levels when parsing. 👇
OCR engines constantly mess up document hierarchy.

Section 2.2 becomes a top-level header (##) instead of nested (###).

We just shipped automatic header correction.

🧵 How it works:
October 2, 2025 at 4:21 PM
Reposted by Sarah Guthals
OCR engines constantly mess up document hierarchy.

Section 2.2 becomes a top-level header (##) instead of nested (###).

We just shipped automatic header correction.

🧵 How it works:
October 2, 2025 at 4:21 PM
Anyone else notice that Chloe’s address number is 101?

(She’s the Dalmatian)

Or just my kid?
September 27, 2025 at 2:34 PM
Anyone else notice that Chloe’s address number is 101?

(She’s the Dalmatian)

Or just my kid?
September 27, 2025 at 2:34 PM
How do you handle citations in RAG?

Being able to add spatial information from the original parse api call makes it super easy, but I'm curious how others are also handling it?
Citations.

When users ask "where did this come from?" your system should point to the exact page fragment...not just "file_name.pdf".

Built citation-aware RAG with spatial metadata has:
→ Parse docs with bounding boxes
→ Embed citation anchors in chunks
→ Return page numbers + coordinates

A 🧵
September 19, 2025 at 6:13 PM
Reposted by Sarah Guthals
Citations.

When users ask "where did this come from?" your system should point to the exact page fragment...not just "file_name.pdf".

Built citation-aware RAG with spatial metadata has:
→ Parse docs with bounding boxes
→ Embed citation anchors in chunks
→ Return page numbers + coordinates

A 🧵
September 19, 2025 at 5:44 PM
"Because the AI said so" is exactly the kind of future we don't want to move towards.

Those of us working in the space know that the bar is set *much* higher than that. AI is a tool. You wouldn't just hammer everything and shrug when a window breaks.

Use tools intelligently.
“Because the AI said so” isn’t good enough.

Every answer should come with receipts (citations + context).

Learn how to make your AI correct and verifiable in this month’s Document Digest newsletter 👇
The Document Digest by Tensorlake
Product updates and dev insights from the Tensorlake team.
tlake.link
September 11, 2025 at 6:49 PM
I was listening to Mondays AI Daily Brief about infra costs/investment wrt AI and it got me wondering:
How are smaller companies deciding between what to build, what infra to pay directly for, and what tools to just leverage?
September 10, 2025 at 10:52 PM
Back in my day you had to take seeds out of watermelon unless you wanted a watermelon to grow in your stomach
September 5, 2025 at 7:52 PM
Reposted by Sarah Guthals
You can now login into Tensorlake using Microsoft and Azure SSO credentials.

This is the beginning of better integration with Microsoft Azure and Tensorlake.

If you are using Azure, and need better Document Ingestion and ETL for unstructured data reach out to us!
September 5, 2025 at 5:49 PM
Humans will always be in the loop, our tools should make data extracted with AI easily verifiable.
To build trustworthy AI, your data needs proof.

Get citations for every field extracted with Tensorlake.

Read the blog and try our citations with the example notebooks: tlake.link/blog/citations
September 5, 2025 at 4:30 PM
Reposted by Sarah Guthals
Just noticed this on the checkout notification I get from the Chicago Public Library. I think it’s new and I love it!
August 29, 2025 at 8:19 PM
Reposted by Sarah Guthals
Always happy when something I've been thinking about, someone else actually writes, posts, and shares! I'm glad to see this conceptual shift from "vibe coding" to context engineering with AI-enabled developer tools! 🥳

Great article by @guthals.com.

dev.to/drguthals/th...
The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools
In The Mythical Man-Month, Fred Brooks famously wrote: The magic of myth and legend has come true...
dev.to
August 26, 2025 at 6:22 PM
"If Brooks were writing today, I think he’d smile at the idea of The Mythical Vibe-Month. But he’d also remind us that engineering discipline is what makes software scale."

It's why I love this industry is because it's really all about learning 🤓

Regardless of my title or how tech evolves
The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools
In The Mythical Man-Month, Fred Brooks famously wrote: The magic of myth and legend has come true...
dev.to
August 21, 2025 at 3:44 AM
RAG isn’t dead.

It was fun coming up with the colab notebooks for this one: Compare the claims made in news articles about Tesla with actual Tesla SEC filings 👀

Check out the thread and blog (with notebooks) 👇
“RAG is dead” is lazy.

What’s dead is cosine‑N without a retrieval plan.

We ship advanced RAG...out of the box:
• Classify pages → target sections
• Extract structured fields → filter by form_type, fiscal_period
• Verify data; cite page/bbox

Want to know how? 🧵👇
August 21, 2025 at 3:00 AM
Genuinely curious…why are people over 30 on Snapchat?
July 26, 2025 at 9:52 PM
My best friend, platonic soul mate, died today.

I’m so grateful he ever came into my life.

I am forever changed by him in the best ways, and would have never survived the last three years without him.

If you need to rest, then fine, but I’d much prefer if you haunt me
July 17, 2025 at 6:59 AM