Horace He
chhillee.bsky.social
Horace He
@chhillee.bsky.social
@PyTorch "My learning style is Horace twitter threads" -
@typedfemale
Reposted by Horace He
@chhillee.bsky.social's talk at Jane Street is now up!

youtu.be/139UPjoq7Kw?...
Building Machine Learning Systems for a Trillion Trillion Floating Point Operations
YouTube video by Jane Street
youtu.be
December 11, 2024 at 12:19 PM
Reposted by Horace He
Getting different attention masks working for AstroPT (a proto-foundation model for astronomy github.com/Smith42/astr...), so much nicer to do it with Flex Attention vs custom CUDA kernels -- thank you for releasing it to the world 🫡
GitHub - Smith42/astroPT: Transformer for galaxy images (and general astronomy)
Transformer for galaxy images (and general astronomy) - Smith42/astroPT
github.com
December 2, 2024 at 9:30 AM
I judge social networks by how many FlexAttention users I can find on each one, and by that metric, Bluesky is doing pretty good!
December 1, 2024 at 2:21 AM
If you'd like to influence what features the PyTorch distributed team work on in torchtitan (e.g. MoE, multimodal, context parallelism, etc.), go made your voices heard here!
Vote on new features! · pytorch torchtitan · Discussion #693
Hi torchtitanists, Thank you for your interests in torchtitan! Please upvote on what features you would like to see next, and add one if it's not already there. We'll try to prioritize on the most ...
github.com
November 25, 2024 at 7:06 PM
First thought: Seems kinda "FlexAttention-y": https://bsky.app/profile/sungkim.bsky.social/post/3lbjbfmyqts27

Second thought: oh cool, they're already using FlexAttention!

it's a nice usage of the `or_masks` and `and_masks` API - I think they do (causal & sliding_window) | (register_mask)
November 23, 2024 at 1:55 AM