Garik
banner
garik.codes
Garik
@garik.codes
Independent SWE + IT guy in the rural inland PNW
Turns out it was a bad GIF. Whether it was the title or the file itself I'm not sure.

Regardless, better error descriptions and handling in general lead to a better world :)
December 1, 2025 at 7:35 PM
🧵 19/19 To recap, S3 is the world's hard drive.

It's cheap, fast enough, and extremely reliable for most scenarios.

Smart system design helps AWS cut costs, save on compute and storage, while also maintaining high availability and acceptable latency.

Now you know!

/rant
December 1, 2025 at 7:31 PM
🧵 18/19 Crazier still is that AWS doesn't just add more and more features, they proactively edit their codebase.

In 2021 they rewrote a fundamental service, ShardStore, in Rust. It has 40k LOC and is frequently updated without interruptions to service.
December 1, 2025 at 7:31 PM
🧵 17/19 S3 Tables deals with Parquet files and can optimize how they're squished together and stashed away.

Meanwhile, S3 Metadata makes it much easier to search and organize all that data. It makes the data that is already present more useful.
December 1, 2025 at 7:31 PM
🧵 16/19 One of the newer, bigger user groups on S3 is data analytics. Think endless stuff in data lakes.

It's one thing to get data, but how to use it?

AWS is making changes to their infra based on this shift, and they now have automated key workflows that users had to implement themselves.
December 1, 2025 at 7:31 PM
🧵 15/19 The most recent storage class even uses SSDs!

Here this makes sense because utilizing expensive SSDs for the right data means savings, both in time and money for AWS as well as customers.

It acts as S3's RAM--serving objects with very low latency so elsewhere compute is minimized.
December 1, 2025 at 7:31 PM
🧵 14/19 Similarly, S3 offers many different tiers of storage to optimize the spread of hot and cold data throughout the system.

You can make rules yourself with S3 Lifecycle or automate the process with Intelligent-Tiering.

Retrieval times vary more than 10^6x from single-digit ms to 12 hours 🕦
December 1, 2025 at 7:31 PM
🧵 13/19 That's where balancing data comes in.

AWS engineers designed the system to preload colder data onto new storage racks to maintain an even distribution as newer, hotter data arrives.
December 1, 2025 at 7:31 PM
🧵 12/19 When data moves into S3 it starts hot and gradually cools, i.e. it's used more often when it's young and is accessed less frequently as it ages.

This fact of life could mess with operations if not dealt with properly.
December 1, 2025 at 7:31 PM
🧵 11/19 Another cool feature of S3 is that it becomes more predictable and resilient as it scales.

Since data is accessed across hard drives, and you can't really forecast reads from customers, having a large operation smooths out aggregate demand.

As the service grows, it becomes less spiky.
December 1, 2025 at 7:31 PM
🧵 10/19 Erasure coding also helps with testing code in production. Since everything is super redundant, it's fine if things break in prod.

S3 even eliminates long tails by canceling requests that go over its p95. The request is resent to a different server, and thus a different shard. And it works!
December 1, 2025 at 7:31 PM
🧵 9/19 S3 achieves such high durability through erasure coding, essentially splitting up objects into chunks. But also doing some voodoo magic on that data and storing that, too.

The advantage of this approach is that instead of needing 3x storage from straight replication, data can be safe at 1.8x
December 1, 2025 at 7:31 PM
🧵 8/19 And not only is it big, it is reliable.

S3 is designed for 99.999999999% data durability. Famously that's 11 nines.

Say you had 10,000 objects in S3, math dictates that you'd lose 1 object in 10,000,000 years.
December 1, 2025 at 7:31 PM
🧵 7/19 To say S3 is a large service is an understatement. Look at these numbers 👀
December 1, 2025 at 7:31 PM
🧵 6/19 For the past 30 years HDDs have been stuck at 120 IOPS. And they might be forever.

Progress elsewhere, however, isn't slowing yet, and there are already solid sketches of 200TB drives within the next 10 years.

So, design around the constraint! Shard, and shard hard.
December 1, 2025 at 7:31 PM
🧵 5/19 Slight detour--hard drives are wonderful and illustrate the insane progress in hardware over the last 75 years.

The catch is that they are constrained for I/O 😭
December 1, 2025 at 7:31 PM
🧵 4/19 The way data is added to the system is called shuffle sharding. It's totally random, but not just regular random.

Before committing to a drive, S3 actually looks at 2 random drives, then picks the least used one.

This small change has outsized impact in organizing and spreading out data.
December 1, 2025 at 7:31 PM
🧵 3/19 Basically S3 operates by spreading out simple GET and PUT HTTP requests across many servers and stores sharded data on insanely cheap--and slow--hard disks.

Since S3 leverages massive parallelism, customers hardly notice any lag. Some customers have data stored on over a million hard drives!
December 1, 2025 at 7:31 PM
🧵 2/19 Amazon's Simple Storage Service (S3) came onto the scene in 2006 as a backup utility and place to keep media.

It has grown and evolved a lot in the past two decades!

Its biggest customer today, Netflix, wasn't even streaming video in 2006!

Still, S3's core concepts remain unchanged.
December 1, 2025 at 7:31 PM
No prob! Think I identified the form issue here:

github.com/overcommitte...
December 1, 2025 at 6:22 PM
Just started digging into some re:Invent videos on YouTube and it's nice to be able to learn so much about AWS's infra + design philosophies!

Looking forward to new material 👍
November 29, 2025 at 2:51 PM