Lightnews — Scholar-powered news

Garik

@garik.codes

Turns out it was a bad GIF. Whether it was the title or the file itself I'm not sure.

Regardless, better error descriptions and handling in general lead to a better world :)

December 1, 2025 at 7:35 PM

Garik

@garik.codes

🧵 19/19 To recap, S3 is the world's hard drive.

It's cheap, fast enough, and extremely reliable for most scenarios.

Smart system design helps AWS cut costs, save on compute and storage, while also maintaining high availability and acceptable latency.

Now you know!

/rant

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 18/19 Crazier still is that AWS doesn't just add more and more features, they proactively edit their codebase.

In 2021 they rewrote a fundamental service, ShardStore, in Rust. It has 40k LOC and is frequently updated without interruptions to service.

S3 is alive!?

https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 17/19 S3 Tables deals with Parquet files and can optimize how they're squished together and stashed away.

Meanwhile, S3 Metadata makes it much easier to search and organize all that data. It makes the data that is already present more useful.

S3 Metadata for better data access and usability

S3 Tables for better performance, access control, and structure

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 16/19 One of the newer, bigger user groups on S3 is data analytics. Think endless stuff in data lakes.

It's one thing to get data, but how to use it?

AWS is making changes to their infra based on this shift, and they now have automated key workflows that users had to implement themselves.

Big Data has seen big growth over the last decade and is not slowing down

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 15/19 The most recent storage class even uses SSDs!

Here this makes sense because utilizing expensive SSDs for the right data means savings, both in time and money for AWS as well as customers.

It acts as S3's RAM--serving objects with very low latency so elsewhere compute is minimized.

One example of SSDs reducing overall costs and compute for a customer

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 14/19 Similarly, S3 offers many different tiers of storage to optimize the spread of hot and cold data throughout the system.

You can make rules yourself with S3 Lifecycle or automate the process with Intelligent-Tiering.

Retrieval times vary more than 10^6x from single-digit ms to 12 hours 🕦

S3 storage classes vary greatly in retrieval time and use cases

https://cloudiamo.com/2024/12/13/s3-lifecycle-or-intelligent-tiering-object-size-always-matters/

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 13/19 That's where balancing data comes in.

AWS engineers designed the system to preload colder data onto new storage racks to maintain an even distribution as newer, hotter data arrives.

Balancing a growing operation by moving cold data on to new racks first for even distribution as hot data arrives

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 12/19 When data moves into S3 it starts hot and gradually cools, i.e. it's used more often when it's young and is accessed less frequently as it ages.

This fact of life could mess with operations if not dealt with properly.

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 11/19 Another cool feature of S3 is that it becomes more predictable and resilient as it scales.

Since data is accessed across hard drives, and you can't really forecast reads from customers, having a large operation smooths out aggregate demand.

As the service grows, it becomes less spiky.

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 10/19 Erasure coding also helps with testing code in production. Since everything is super redundant, it's fine if things break in prod.

S3 even eliminates long tails by canceling requests that go over its p95. The request is resent to a different server, and thus a different shard. And it works!

Shuffle sharding and erasure coding can eliminate long tails when it comes to latency. Long requests are retried after exceeding p95 times

https://youtu.be/NXehLy7IiPM?si=sUv9AY6Xs7RCHgs_&t=1948

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 9/19 S3 achieves such high durability through erasure coding, essentially splitting up objects into chunks. But also doing some voodoo magic on that data and storing that, too.

The advantage of this approach is that instead of needing 3x storage from straight replication, data can be safe at 1.8x

Erasure coding is safe and efficient but compute heavy

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 8/19 And not only is it big, it is reliable.

S3 is designed for 99.999999999% data durability. Famously that's 11 nines.

Say you had 10,000 objects in S3, math dictates that you'd lose 1 object in 10,000,000 years.

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 7/19 To say S3 is a large service is an understatement. Look at these numbers 👀

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 6/19 For the past 30 years HDDs have been stuck at 120 IOPS. And they might be forever.

Progress elsewhere, however, isn't slowing yet, and there are already solid sketches of 200TB drives within the next 10 years.

So, design around the constraint! Shard, and shard hard.

The evolution of HDDs is impressive

https://highscalability.com/behind-aws-s3s-massive-scale/

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 5/19 Slight detour--hard drives are wonderful and illustrate the insane progress in hardware over the last 75 years.

The catch is that they are constrained for I/O 😭

HDDs are beyond accurate

https://www.allthingsdistributed.com/2023/07/building-and-operating-a-pretty-big-storage-system.html

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 4/19 The way data is added to the system is called shuffle sharding. It's totally random, but not just regular random.

Before committing to a drive, S3 actually looks at 2 random drives, then picks the least used one.

This small change has outsized impact in organizing and spreading out data.

The power of two random choices for better storage capacity across the fleet

https://youtu.be/NXehLy7IiPM?si=bbT8qAqM80-AEze_&t=2117

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 3/19 Basically S3 operates by spreading out simple GET and PUT HTTP requests across many servers and stores sharded data on insanely cheap--and slow--hard disks.

Since S3 leverages massive parallelism, customers hardly notice any lag. Some customers have data stored on over a million hard drives!

December 1, 2025 at 7:31 PM

Garik

@garik.codes

🧵 2/19 Amazon's Simple Storage Service (S3) came onto the scene in 2006 as a backup utility and place to keep media.

It has grown and evolved a lot in the past two decades!

Its biggest customer today, Netflix, wasn't even streaming video in 2006!

Still, S3's core concepts remain unchanged.

Netflix <3 S3

https://www.cloudzero.com/blog/aws-biggest-customers/

December 1, 2025 at 7:31 PM

Garik

@garik.codes

No prob! Think I identified the form issue here:

github.com/overcommitte...

December 1, 2025 at 6:22 PM

Garik

@garik.codes

Just started digging into some re:Invent videos on YouTube and it's nice to be able to learn so much about AWS's infra + design philosophies!

Looking forward to new material 👍

November 29, 2025 at 2:51 PM

Garik

@garik.codes

Might be out of date, but there's this:

github.com/mainmatter/s...

GitHub - mainmatter/svelte-lynx-integration: A POC for the svelte-lynx integration

A POC for the svelte-lynx integration. Contribute to mainmatter/svelte-lynx-integration development by creating an account on GitHub.

github.com

November 27, 2025 at 1:41 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news