Joel Burget
joelburget.bsky.social
Joel Burget
@joelburget.bsky.social
Do you all have plans for how multimodal would work? Treating an image as a sequence of bytes (the rows in a bitmap or something) seems pretty bad since it throws away so much structure. Reverting to tokens is ugly. You probably want to learn the encoding but it's not clear how.
December 14, 2024 at 9:21 PM