Morris Alper
@malper.bsky.social
1K followers 650 following 46 posts
PhD student researching multimodal learning (language, vision, ...). Also a linguistics enthusiast. morrisalp.github.io
Posts Media Videos Starter Packs
malper.bsky.social
Now accepted to #NeurIPS2025!
malper.bsky.social
💥New preprint! WildCAT3D uses tourist photos in-the-wild as supervision to learn to generate novel, consistent views of scenes like the one shown below. h/t Tom Monnier and all collaborators (1/5)
malper.bsky.social
At inference time, we inject the appearance of the observed view to get consistent novel views. This also enables cool applications like appearance-conditioned NVS! (4/5)
malper.bsky.social
To learn from this data, we use a novel multi-view diffusion architecture adapted from CAT3D, modeling appearance variations with a bottleneck encoder applied to VAE latents and disambiguating scene scale via warping. (3/5)
malper.bsky.social
Photos like the ones below differ in global appearance (day vs. night, lighting), aspect ratio, and even weather. But they give clues to how scenes are build in 3D. (2/5)
malper.bsky.social
💥New preprint! WildCAT3D uses tourist photos in-the-wild as supervision to learn to generate novel, consistent views of scenes like the one shown below. h/t Tom Monnier and all collaborators (1/5)
malper.bsky.social
Disappointing that arXiv doesn't allow XeLaTex/LuaLaTex submissions, which have the least broken multilingual support of LaTeX compilers. The web shouldn't be limited to English in 2025!
malper.bsky.social
Finally we show that ProtoSnap-aligned skeletons can be used as conditions for a ControlNet model to generate synthetic OCR training data. By controlling the shapes of signs in training, we can achieve SOTA on cuneiform sign recognition. (Bottom: synthetic generated sign images)
malper.bsky.social
Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.
malper.bsky.social
ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!
malper.bsky.social
We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.
malper.bsky.social
Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)
malper.bsky.social
Arguably the most ancient writing system in the world (since ~3300 BCE), cuneiform inscriptions in ancient languages (e.g. Sumerian, Akkadian) are numerous but hard to read due to the complex writing system, wide variation in sign shapes, and physical nature as imprints in clay.
malper.bsky.social
Cuneiform at #ICLR2025! ProtoSnap finds the configuration of wedges in scanned cuneiform signs for downstream applications like OCR. A new tool for understanding the ancient world!
tau-vailab.github.io/ProtoSnap/
h/t Rachel Mikulinsky @ShGordin @ElorHadar and all collaborators.
🧵👇
malper.bsky.social
Our results show that ProtoSnap effectively aligns wedge-based skeletons to scans of real cuneiform signs, with global and local refinement steps. We provide a new expert-annotated test set to quantify these results.
malper.bsky.social
ProtoSnap uses features from a fine-tuned diffusion model to optimize for the correct alignment between a skeleton matched with a prototype font image and a scanned sign. Perhaps surprising that image generation models can be applied to this sort of discriminative task!
malper.bsky.social
We tackle this by directly measuring the internal configuration of characters. Our approach ProtoSnap "snaps" a prototype (font)-based skeleton onto a scanned cuneiform sign using a multi-stage pipeline with SOTA methods from computer vision and generative AI.
malper.bsky.social
Some prior work has tried to classify scans of signs categorically, but signs' shapes differ drastically in different time periods and regions making this less effective. E.g. both signs below are AN, from different eras. (Top: font prototype; bottom: scan of sign real tablet)
malper.bsky.social
Arguably the most ancient writing system in the world (since ~3300 BCE), cuneiform inscriptions in ancient languages (e.g. Sumerian, Akkadian) are numerous but hard to read due to the complex writing system, wide variation in sign shapes, and physical nature as imprints in clay.
Reposted by Morris Alper
kjain14.bsky.social
Thrilled to announce our new work TestGenEval, a benchmark that measures unit test generation and test completion capabilities. This work was done in collaboration with the FAIR CodeGen team.

Preprint: arxiv.org/abs/2410.00752
Leaderboard: testgeneval.github.io/leaderboard....
malper.bsky.social
Great news! BERT-like models are extremely useful and imo unfairly overlooked in the recent GenAI hype cycle. Looking forward to playing with this
mm-jj-nn.bsky.social
Great blog post (by a 15-author team!) on their release of ModernBERT, the continuing relevance of encoder-only models, and how they relate to, say, GPT-4/llama. Accessible enough that I might use this as an undergrad reading.
Finally, a Replacement for BERT: Introducing ModernBERT
We’re on a journey to advance and democratize artificial intelligence through open source and open science.
huggingface.co
malper.bsky.social
We look forward to progress on architectural tasks being benchmarked and accelerated by WAFFLE!
See our project page for more details and links to our paper, code, and data: tau-vailab.github.io/WAFFLE/
WAFFLE: Multimodal Floorplan Understanding in the Wild
tau-vailab.github.io
malper.bsky.social
We show that our dataset serves as a new, challenging benchmark for common floorplan understanding tasks such as semantic segmentation. We also show it can be used to enable new tasks such as floorplan generation conditioned on building type and boundary.