Former: Storage infra things 🤗 @hf.co, Devops things @lexblog.bsky.social and devex/cloud infra things at @pantheon.io
A little over a year ago, @hf.co acquired XetHub to unlock the next phase of growth in models and datasets. huggingface.co/blog/xethub-...
In April, there were 1,000 Hugging Face repos on Xet. Now every repo (over 6M) on the Hub is on Xet.
A little over a year ago, @hf.co acquired XetHub to unlock the next phase of growth in models and datasets. huggingface.co/blog/xethub-...
In April, there were 1,000 Hugging Face repos on Xet. Now every repo (over 6M) on the Hub is on Xet.
We call this the Git LFS Bridge internally, and like our migration process, it's power is in its simplicity.
We call this the Git LFS Bridge internally, and like our migration process, it's power is in its simplicity.
Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s
Each spike corresponds to a significant migration (where we download from LFS and upload to Xet) with the baseline steadily increasing to just shy of 100 Gb/s
In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
In June, we hit upload speeds of 577Gb/s (crossing 500Gb/s for the first time).
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
🤗 700,000 users/orgs
📈 350,000 repos
🚀 15PB
A month ago there were 5,500 users/orgs on Xet with 150K repos and 4PB. Today?
🤗 700,000 users/orgs
📈 350,000 repos
🚀 15PB
But I'll never know how often she was chasing squirrels for those few hours.
But I'll never know how often she was chasing squirrels for those few hours.
How am I going to replay/track the events of her going in and out the dog door during this outage?
These are the important questions.
How am I going to replay/track the events of her going in and out the dog door during this outage?
These are the important questions.
🤗 5,500 users and orgs with Xet access
🚀 150,000 Xet-backed models and datasets
🤯 4+ PB managed by Xet
How much more to go? If the Hub's top storage users are any indication: many bytes
🤗 5,500 users and orgs with Xet access
🚀 150,000 Xet-backed models and datasets
🤯 4+ PB managed by Xet
How much more to go? If the Hub's top storage users are any indication: many bytes
Here you can see how different versions of the Qwen, Llama, and Phi models are grouped together.
Interactive graph here: huggingface.co/spaces/xet-t...
Here you can see how different versions of the Qwen, Llama, and Phi models are grouped together.
Interactive graph here: huggingface.co/spaces/xet-t...
It's a byte-level map of the Hub.
The result is a beautiful visualization from Saba Noorassa and @reverius42.bsky.social that I’ve already lost way too much time to.
It's a byte-level map of the Hub.
The result is a beautiful visualization from Saba Noorassa and @reverius42.bsky.social that I’ve already lost way too much time to.
yellow = GETs; dashed line = launch time.
I think it's pretty easy to spot when Xet started to send the first bytes to excited downloaders 👀
yellow = GETs; dashed line = launch time.
I think it's pretty easy to spot when Xet started to send the first bytes to excited downloaders 👀
Every request to download these files comes to our infrastructure.
Every request to download these files comes to our infrastructure.
On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. Here's a few selected models and how they perform on Xet.
On average, we're seeing ~25% dedupe, providing huge savings to the community who iterate on these state-of-the-art models. Here's a few selected models and how they perform on Xet.
There's nothing more satisfying than working on infrastructure for months and seeing requests funnel through and take off like a rocket 🚀
There's nothing more satisfying than working on infrastructure for months and seeing requests funnel through and take off like a rocket 🚀
We fixed these issues on the fly without any major disruption.
We fixed these issues on the fly without any major disruption.
Head over to your account settings for more information or join anywhere you see the Xet logo on a repository you know.
Head over to your account settings for more information or join anywhere you see the Xet logo on a repository you know.
✅ First off, no action needed - this migration is helping us test and scale the infrastructure before a broader rollout
👀🔎 But you can play "spot the Xet logo" - if you see our logo on a file in a repo, that's a file we're serving now!
Download away 🌐
✅ First off, no action needed - this migration is helping us test and scale the infrastructure before a broader rollout
👀🔎 But you can play "spot the Xet logo" - if you see our logo on a file in a repo, that's a file we're serving now!
Download away 🌐