Lightnews — Scholar-powered news

Reposted by Sarah Guthals

Scott Hanselman 🌮

@scott.hanselman.com

www.nytimes.com/shared/comme... “We have reached peak insanity: $100B in annual compensation is equivalent to approximately 3M full-time minimum-wage workers (at $16/hr) for one year.
No accomplishment by any individual justifies receiving the same compensation as 3M people working full-time.”

Read a Times Reader's Comment on: Elon Musk Wins $1 Trillion Tesla Pay Package

Tesla shareholders approved a plan to grant Elon Musk shares worth nearly $1 trillion if he meets ambitious goals, including vastly expanding the company’s stock market valuation.

www.nytimes.com

November 7, 2025 at 3:02 AM

Reposted by Sarah Guthals

Henning

@radbrt.bsky.social

The tensorlake playground was, unlike AWS textract and every other tool I have tried, able to parse my angled, low-quality scan of Norwegian pay statistics from 1926.

Not that 1926 Norwegian statistical tables is a generally useful benchmark…

Tensorlake @tensorlake.ai · 7d

Document parsing benchmarks have been measuring the wrong thing.

We tested every major parser on real enterprise documents.

The results will change how you think about OCR accuracy 🧵

Two dense document pages flank a skeptical person’s sticker-style portrait against a green gradient, link text centered below.

November 5, 2025 at 7:52 PM

Sarah Guthals

@guthals.com

It's time to start measuring accuracy of data extraction with downstream systems and usability in mind, not just vanity metrics for a marketing slide

Tensorlake @tensorlake.ai · 7d

Document parsing benchmarks have been measuring the wrong thing.

We tested every major parser on real enterprise documents.

The results will change how you think about OCR accuracy 🧵

November 5, 2025 at 5:53 PM

Reposted by Sarah Guthals

Tensorlake

@tensorlake.ai

Document parsing benchmarks have been measuring the wrong thing.

We tested every major parser on real enterprise documents.

The results will change how you think about OCR accuracy 🧵

November 5, 2025 at 5:05 PM

Sarah Guthals

@guthals.com

Make your agents smarter with accurate and complete data

Learn how to extract data from unstructured documents with @tensorlake.ai, store them in @qdrant.bsky.social, and then use @langchain.bsky.social to for natural language querying.

Check out our lesson 👇

Tensorlake @tensorlake.ai · 20d

Want to build scalable data lakes w/ Tensorlake + @qdrant.bsky.social?

In the free Qdrant Essentials Course, learn how to:
- Architect vector-powered data lakes
- Optimize ETL pipelines
- Create knowledge graphs
- Integrate @langchain.bsky.social agents for natural language queries

t.co/OoPZswrL7z

October 23, 2025 at 7:39 PM

Reposted by Sarah Guthals

Tensorlake

@tensorlake.ai

Want to build scalable data lakes w/ Tensorlake + @qdrant.bsky.social?

In the free Qdrant Essentials Course, learn how to:
- Architect vector-powered data lakes
- Optimize ETL pipelines
- Create knowledge graphs
- Integrate @langchain.bsky.social agents for natural language queries

t.co/OoPZswrL7z

October 23, 2025 at 7:37 PM

Reposted by Sarah Guthals

Tensorlake

@tensorlake.ai

New: Vision Language Models now power key document processing features
We're using VLMs for:
- Page classification in large documents
- Table/figure summarization
- Fast structured extraction (skip_ocr mode)

Here's what this means for document processing 🧵

October 16, 2025 at 9:44 PM

Reposted by Sarah Guthals

Tensorlake

@tensorlake.ai

Most parsers strip all tracked changes when you extract the text.

That means:
❌ Lost audit trails
❌ Manual review of revision history
❌ No programmatic access to reviewer comments
❌ Workflows that can't route based on specific edits

Tensorlake interface showing parsed Word document with tracked changes preserved as HTML tags, displaying an insurance claim report

October 10, 2025 at 5:25 PM

Sarah Guthals

@guthals.com

OCR has it's limitations when it comes to document layout/structure.

When you need to have an accurate representation of the document (e.g. header levels), you need something more than OCR.

Tensorlake fixes OCR results, detecting and correcting header levels when parsing. 👇

Tensorlake @tensorlake.ai · Oct 2

OCR engines constantly mess up document hierarchy.

Section 2.2 becomes a top-level header (##) instead of nested (###).

We just shipped automatic header correction.

🧵 How it works:

October 2, 2025 at 4:21 PM

Reposted by Sarah Guthals

Tensorlake

@tensorlake.ai

OCR engines constantly mess up document hierarchy.

Section 2.2 becomes a top-level header (##) instead of nested (###).

We just shipped automatic header correction.

🧵 How it works:

October 2, 2025 at 4:21 PM

Sarah Guthals

@guthals.com

Anyone else notice that Chloe’s address number is 101?

(She’s the Dalmatian)

Or just my kid?

September 27, 2025 at 2:34 PM

Sarah Guthals

@guthals.com

Anyone else notice that Chloe’s address number is 101?

(She’s the Dalmatian)

Or just my kid?

September 27, 2025 at 2:34 PM

Sarah Guthals

@guthals.com

How do you handle citations in RAG?

Being able to add spatial information from the original parse api call makes it super easy, but I'm curious how others are also handling it?

Tensorlake @tensorlake.ai · Sep 19

Citations.

When users ask "where did this come from?" your system should point to the exact page fragment...not just "file_name.pdf".

Built citation-aware RAG with spatial metadata has:
→ Parse docs with bounding boxes
→ Embed citation anchors in chunks
→ Return page numbers + coordinates

A 🧵

September 19, 2025 at 6:13 PM

Reposted by Sarah Guthals

Tensorlake

@tensorlake.ai

Citations.

When users ask "where did this come from?" your system should point to the exact page fragment...not just "file_name.pdf".

Built citation-aware RAG with spatial metadata has:
→ Parse docs with bounding boxes
→ Embed citation anchors in chunks
→ Return page numbers + coordinates

A 🧵

September 19, 2025 at 5:44 PM

Sarah Guthals

@guthals.com

"Because the AI said so" is exactly the kind of future we don't want to move towards.

Those of us working in the space know that the bar is set *much* higher than that. AI is a tool. You wouldn't just hammer everything and shrug when a window breaks.

Use tools intelligently.

Tensorlake @tensorlake.ai · Sep 11

“Because the AI said so” isn’t good enough.

Every answer should come with receipts (citations + context).

Learn how to make your AI correct and verifiable in this month’s Document Digest newsletter 👇

The Document Digest by Tensorlake

Product updates and dev insights from the Tensorlake team.

tlake.link

September 11, 2025 at 6:49 PM

Sarah Guthals

@guthals.com

I was listening to Mondays AI Daily Brief about infra costs/investment wrt AI and it got me wondering:
How are smaller companies deciding between what to build, what infra to pay directly for, and what tools to just leverage?

September 10, 2025 at 10:52 PM

Sarah Guthals

@guthals.com

Back in my day you had to take seeds out of watermelon unless you wanted a watermelon to grow in your stomach

September 5, 2025 at 7:52 PM

Reposted by Sarah Guthals

Tensorlake

@tensorlake.ai

You can now login into Tensorlake using Microsoft and Azure SSO credentials.

This is the beginning of better integration with Microsoft Azure and Tensorlake.

If you are using Azure, and need better Document Ingestion and ETL for unstructured data reach out to us!

September 5, 2025 at 5:49 PM

Sarah Guthals

@guthals.com

Humans will always be in the loop, our tools should make data extracted with AI easily verifiable.

Tensorlake @tensorlake.ai · Sep 5

To build trustworthy AI, your data needs proof.

Get citations for every field extracted with Tensorlake.

Read the blog and try our citations with the example notebooks: tlake.link/blog/citations

September 5, 2025 at 4:30 PM

Reposted by Sarah Guthals

Catherine Bannister

@cbannister.bsky.social

Just noticed this on the checkout notification I get from the Chicago Public Library. I think it’s new and I love it!

You just saved $26.00 by using your library. You have saved $2,160.24 this past year and $6, 122.52 since you began using the library!
Thank You!

August 29, 2025 at 8:19 PM

Reposted by Sarah Guthals

Thomas Lewis

@tommylee.bsky.social

Always happy when something I've been thinking about, someone else actually writes, posts, and shares! I'm glad to see this conceptual shift from "vibe coding" to context engineering with AI-enabled developer tools! 🥳

Great article by @guthals.com.

dev.to/drguthals/th...

The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools

In The Mythical Man-Month, Fred Brooks famously wrote: The magic of myth and legend has come true...

dev.to

August 26, 2025 at 6:22 PM

Sarah Guthals

@guthals.com

"If Brooks were writing today, I think he’d smile at the idea of The Mythical Vibe-Month. But he’d also remind us that engineering discipline is what makes software scale."

It's why I love this industry is because it's really all about learning 🤓

Regardless of my title or how tech evolves

The Mythical Vibe-Month: Vibe Coding, Context Engineering, and the Future of AI Dev Tools

In The Mythical Man-Month, Fred Brooks famously wrote: The magic of myth and legend has come true...

dev.to

August 21, 2025 at 3:44 AM

Sarah Guthals

@guthals.com

RAG isn’t dead.

It was fun coming up with the colab notebooks for this one: Compare the claims made in news articles about Tesla with actual Tesla SEC filings 👀

Check out the thread and blog (with notebooks) 👇

Tensorlake @tensorlake.ai · Aug 21

“RAG is dead” is lazy.

What’s dead is cosine‑N without a retrieval plan.

We ship advanced RAG...out of the box:
• Classify pages → target sections
• Extract structured fields → filter by form_type, fiscal_period
• Verify data; cite page/bbox

Want to know how? 🧵👇

August 21, 2025 at 3:00 AM

Sarah Guthals

@guthals.com

Genuinely curious…why are people over 30 on Snapchat?

July 26, 2025 at 9:52 PM

Sarah Guthals

@guthals.com

My best friend, platonic soul mate, died today.

I’m so grateful he ever came into my life.

I am forever changed by him in the best ways, and would have never survived the last three years without him.

If you need to rest, then fine, but I’d much prefer if you haunt me

July 17, 2025 at 6:59 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news