Lightnews — Scholar-powered news

Reposted by Oussama

Alex Miller

@alexmillerdb.bsky.social

Recording for those who missed the talk:
www.youtube.com/watch?v=Xdg3...

ChatGPT Ain’t Got $%@& On Me! The Future of Automated Database Tuning

YouTube video by South Bay Systems

www.youtube.com

August 8, 2025 at 12:46 AM

Oussama

@oussamasaoudi.com

I learned about this while listening to
@tlbh.it's episode on Parsers: tlbh.it/005_parsers....

This stack overflow thread also helped shed light on the motivation: stackoverflow.com/questions/55...

TLB hit 💥 Parsers Podcast Notes

tlbh.it

June 23, 2025 at 3:39 AM

Oussama

@oussamasaoudi.com

The motivation for `SETEND` is to support manipulating big endian data (ex: that may come in from the network) in an otherwise little endian application.

`SETEND` was later deprecated in ARMv8. 🪦

You can check out the docs here: developer.arm.com/documentatio...

Documentation – Arm Developer

developer.arm.com

June 23, 2025 at 3:39 AM

Oussama

@oussamasaoudi.com

Most systems lie in between. They use a bunch of knobs to tune scalability and replication. These often have tradeoffs with consistency and fault tolerance. Some knobs that come to mind are: partitioning, consistency level, how many machines are queried to consider a read successful/up to date, etc

January 7, 2025 at 8:19 PM

Oussama

@oussamasaoudi.com

On the other hand, if you sacrifice on consistency, you can scale to n machines and only ever consult any one of them for a read. So we have n machines and only process a read once.

These are toy examples, but illustrate the extremes.

January 7, 2025 at 8:19 PM

Oussama

@oussamasaoudi.com

I think it depends on the system. Consider a replication scheme that requires each read to consult all participating machines. In other words we scaled to n machines, but made each of them process the read, doing n times the work. We didn’t gain any capacity to process more reads!

January 7, 2025 at 8:19 PM

Oussama

@oussamasaoudi.com

A consequence of using a shared library is that it requires FFI to the C abi. For a language like Rust, that means giving up the borrow checker.

January 2, 2025 at 7:53 PM

Oussama

@oussamasaoudi.com

Typically you’ll want to opt for the native TLS lib. Suppose there is a vulnerability in TLS. The native library can be updated once system-wide, while language-specific implementations demand that all binaries be recompiled. Binaries can easily slip through the cracks, leaving you exposed!

January 2, 2025 at 7:53 PM

Oussama

@oussamasaoudi.com

3. A failed node in a DB often causes the entire computation to fail, forcing a retry. On the other hand, MR and DFEs are built to recover and retry partially completed computations. After all, large, long-running, multi-stage computations are more likely to experience failures.

December 29, 2024 at 11:45 PM

Oussama

@oussamasaoudi.com

2. DBs exclusively ingest and operate on structured data, while MR and DFEs allow for all sorts of data like images, vector embeddings. This allows the flexibility of storing raw data first, and processing into structured data later. This is the so called sushi principle: “raw data is better”.

December 29, 2024 at 11:45 PM

Oussama

@oussamasaoudi.com

1. MR and DFEs provide a generalize compute platform that can execute arbitrary code, while DBs are restricted to SQL operations that may not be able to perform domain specific computations such as those needed in ML or text search with relevance ranking.

December 29, 2024 at 11:45 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news