It's cheap, fast enough, and extremely reliable for most scenarios.
Smart system design helps AWS cut costs, save on compute and storage, while also maintaining high availability and acceptable latency.
Now you know!
/rant
It's cheap, fast enough, and extremely reliable for most scenarios.
Smart system design helps AWS cut costs, save on compute and storage, while also maintaining high availability and acceptable latency.
Now you know!
/rant
In 2021 they rewrote a fundamental service, ShardStore, in Rust. It has 40k LOC and is frequently updated without interruptions to service.
In 2021 they rewrote a fundamental service, ShardStore, in Rust. It has 40k LOC and is frequently updated without interruptions to service.
Meanwhile, S3 Metadata makes it much easier to search and organize all that data. It makes the data that is already present more useful.
Meanwhile, S3 Metadata makes it much easier to search and organize all that data. It makes the data that is already present more useful.
It's one thing to get data, but how to use it?
AWS is making changes to their infra based on this shift, and they now have automated key workflows that users had to implement themselves.
It's one thing to get data, but how to use it?
AWS is making changes to their infra based on this shift, and they now have automated key workflows that users had to implement themselves.
Here this makes sense because utilizing expensive SSDs for the right data means savings, both in time and money for AWS as well as customers.
It acts as S3's RAM--serving objects with very low latency so elsewhere compute is minimized.
Here this makes sense because utilizing expensive SSDs for the right data means savings, both in time and money for AWS as well as customers.
It acts as S3's RAM--serving objects with very low latency so elsewhere compute is minimized.
You can make rules yourself with S3 Lifecycle or automate the process with Intelligent-Tiering.
Retrieval times vary more than 10^6x from single-digit ms to 12 hours 🕦
You can make rules yourself with S3 Lifecycle or automate the process with Intelligent-Tiering.
Retrieval times vary more than 10^6x from single-digit ms to 12 hours 🕦
AWS engineers designed the system to preload colder data onto new storage racks to maintain an even distribution as newer, hotter data arrives.
AWS engineers designed the system to preload colder data onto new storage racks to maintain an even distribution as newer, hotter data arrives.
This fact of life could mess with operations if not dealt with properly.
This fact of life could mess with operations if not dealt with properly.
Since data is accessed across hard drives, and you can't really forecast reads from customers, having a large operation smooths out aggregate demand.
As the service grows, it becomes less spiky.
Since data is accessed across hard drives, and you can't really forecast reads from customers, having a large operation smooths out aggregate demand.
As the service grows, it becomes less spiky.
S3 even eliminates long tails by canceling requests that go over its p95. The request is resent to a different server, and thus a different shard. And it works!
S3 even eliminates long tails by canceling requests that go over its p95. The request is resent to a different server, and thus a different shard. And it works!
The advantage of this approach is that instead of needing 3x storage from straight replication, data can be safe at 1.8x
The advantage of this approach is that instead of needing 3x storage from straight replication, data can be safe at 1.8x
S3 is designed for 99.999999999% data durability. Famously that's 11 nines.
Say you had 10,000 objects in S3, math dictates that you'd lose 1 object in 10,000,000 years.
S3 is designed for 99.999999999% data durability. Famously that's 11 nines.
Say you had 10,000 objects in S3, math dictates that you'd lose 1 object in 10,000,000 years.
Progress elsewhere, however, isn't slowing yet, and there are already solid sketches of 200TB drives within the next 10 years.
So, design around the constraint! Shard, and shard hard.
Progress elsewhere, however, isn't slowing yet, and there are already solid sketches of 200TB drives within the next 10 years.
So, design around the constraint! Shard, and shard hard.
The catch is that they are constrained for I/O 😭
The catch is that they are constrained for I/O 😭
Before committing to a drive, S3 actually looks at 2 random drives, then picks the least used one.
This small change has outsized impact in organizing and spreading out data.
Before committing to a drive, S3 actually looks at 2 random drives, then picks the least used one.
This small change has outsized impact in organizing and spreading out data.
Since S3 leverages massive parallelism, customers hardly notice any lag. Some customers have data stored on over a million hard drives!
Since S3 leverages massive parallelism, customers hardly notice any lag. Some customers have data stored on over a million hard drives!
It has grown and evolved a lot in the past two decades!
Its biggest customer today, Netflix, wasn't even streaming video in 2006!
Still, S3's core concepts remain unchanged.
It has grown and evolved a lot in the past two decades!
Its biggest customer today, Netflix, wasn't even streaming video in 2006!
Still, S3's core concepts remain unchanged.
Thread incoming.
Thread incoming.
Well, at the very least I learned a bunch about S3 the other day. Here's hoping I can retrieve it 🤞
Well, at the very least I learned a bunch about S3 the other day. Here's hoping I can retrieve it 🤞
The kids tweaked the robot design ever so slightly and increased the consistency of one of their combo moves.
I thought they should move on, but they were just so happy to see it repeat itself for 15 minutes.
The kids tweaked the robot design ever so slightly and increased the consistency of one of their combo moves.
I thought they should move on, but they were just so happy to see it repeat itself for 15 minutes.
Guess they meant front and back.
Guess they meant front and back.
Tried on multiple browsers with ad blockers off and it always 404/405s.
Tried on multiple browsers with ad blockers off and it always 404/405s.