@yuluqin.bsky.social
14 followers 18 following 15 posts
Posts Media Videos Starter Packs
yuluqin.bsky.social
[8/9] Finally, a preliminary exploration into why vision helps: visual similarity between members of a concept predicts the performance of the VLM better than the LM. The effects vary by concept (which is intuitive) - concepts with higher visual cohesion seem to have larger effects!
yuluqin.bsky.social
[7/9] Next, the linear separability of questions containing hypernym vs. non-hypernym substitutions was greater in the VLM than in the LM, on questions where the gold answer was the same (“No”).

Both findings support H2!
yuluqin.bsky.social
[6/9] To test H2, we analyze contextual embs. of TaxonomiGQA instances for the Qwen2.5 pair.
First, the diff. in how close the hyponym was to the hypernym vs. non-hypernym in the intermediate layers had a larger effect for predicting model correctness in the VLM than in the LM.
yuluqin.bsky.social
[5/9] To test H1, we perform a series of behavioral and representational evaluations, including direct elucidation of taxonomic knowledge, embedding similarity analysis, and analysis of @kihopark.bsky.social et al for geometric organization of concepts. We find ~no evidence supporting H1.
yuluqin.bsky.social
[4/9] Why are VLMs performing better❓Two possible hypotheses are: (H1) VL training improves the underlying taxonomic knowledge in LMs; (H2) VL training improves the deployment of the taxonomic knowledge in specific task contexts. Our results support H2 but not H1.
yuluqin.bsky.social
[3/9] We test 7 VLM-LM pairs that use the same base LM (i.e., the pair only differs in terms of additional V+L training), and find most VLMs to outperform their LM counterpart despite TaxonomiGQA being a purely text-based task!
yuluqin.bsky.social
[2/9] We tackle this question by evaluating both VLMs & LMs on a new QA dataset, TaxonomiGQA, that requires sensitivity to taxonomic relations. TaxonomiGQA is a *TEXT-ONLY* dataset derived from GQA by substituting entities with their hypernyms and non-hypernyms.
yuluqin.bsky.social
Does vision training change how language is represented and used in meaningful ways?🤔The answer is a nuanced yes! Comparing VLM-LM minimal pairs, we find that while the taxonomic organization of the lexicon is similar, VLMs are better at _deploying_ this knowledge. [1/9]
yuluqin.bsky.social
[7/9] Next, the linear separability of questions containing hypernym vs. non-hypernym substitutions was greater in the VLM than in the LM, on questions where the gold answer was the same (“No”).

Both findings support H2!
yuluqin.bsky.social
[6/9] To test H2, we analyze contextual embs. of TaxonomiGQA instances for the Qwen2.5 pair.
First, the diff. in how close the hyponym was to the hypernym vs. non-hypernym in the intermediate layers had a larger effect for predicting model correctness in the VLM than in the LM.
yuluqin.bsky.social
[5/9] To test H1, we perform a series of behavioral and representational evaluations, including direct elucidation of taxonomic knowledge, embedding similarity analysis, and analysis of @kihopark.b et al for geometric organization of concepts. We find ~no evidence supporting H1.
yuluqin.bsky.social
[4/9] Why are VLMs performing better❓Two possible hypotheses are: (H1) VL training improves the underlying taxonomic knowledge in LMs; (H2) VL training improves the deployment of the taxonomic knowledge in specific task contexts. Our results support H2 but not H1.
yuluqin.bsky.social
[3/9] We test 7 VLM-LM pairs that use the same base LM (i.e., the pair only differs in terms of additional V+L training), and find most VLMs to outperform their LM counterpart despite TaxonomiGQA being a purely text-based task!
yuluqin.bsky.social
[2/9] We tackle this question by evaluating both VLMs & LMs on a new QA dataset, TaxonomiGQA, that requires sensitivity to taxonomic relations. TaxonomiGQA is a *TEXT-ONLY* dataset derived from GQA by substituting entities with their hypernyms and non-hypernyms.