Alexis Carrillo
banner
yagwar.bsky.social
Alexis Carrillo
@yagwar.bsky.social
Machine Learning and Psychology in Intelligence Research
Operant Conditioning
May 18, 2025 at 8:23 PM
To address these limitations, future research should explore a wider range of model architectures, training paradigms, and task variations. Upscaling models and datasets is crucial for gaining deeper insights into the factors influencing stimulus equivalence.
December 5, 2024 at 4:48 PM
The restricted scope of our experiments, focusing on specific training structures and relation types, limits the generalizability of our findings. Additionally, the use of relatively small, untrained models may have constrained our ability to observe more complex behaviors.
December 5, 2024 at 4:48 PM
Our research contributes to the discussion of language model abilities. These findings raise concerns about drawing definitive conclusions about LLM capabilities based on limited or biased observations.
December 5, 2024 at 4:48 PM
This findingns aligns with the differentiation of functional and formal linguistic competence. Stimulus Equivalence can be seen as a test for functional competence in language models.
December 5, 2024 at 4:48 PM
The ability to treat different stimuli as equivalent is essential for language use. This process requires both the creation of symbolic relationships (SE) and the ability to utilize these symbols appropriately within social and linguistic interactions (functional competence).
December 5, 2024 at 4:48 PM
Models with Stimulus Equivalence capabilities could potentially reduce the need for extensive RLHF by independently determining correct responses and also decreasing the occurrence of hallucinations.
December 5, 2024 at 4:48 PM
Both BERT and GPT exhibited some degree of few-shot learning by responding correctly to transitivity, reflexivity, and symmetry tests in linear series. BERT's bidirectional processing might contribute to its slightly better performance in few-shot learning scenarios.
December 5, 2024 at 4:48 PM
Findings relate to the reversal curse. GPT exhibited difficulties in handling reversed transitivity pairs in the linear series (LS) condition with select and reject conditions. Both agents failed in the one-to-many (MTO) and many-to-one (OTM) structures.
December 5, 2024 at 4:48 PM
Our study aligns with the goals of the Abstraction and Reasoning Corpus (ARC) dataset (@fchollet.bsky.social) and relates to other abstract reasoning measures such as Raven's Progressive Matrices (@adamsantoro.bsky.social).
December 5, 2024 at 4:48 PM
In this research, we introduced Stimulus Equivalence as a tool to probe abstraction and symbolic manipulation in transformers, demonstrated its potential as an explainability technique, and positioned SE as a valuable benchmark for evaluating language models.
December 5, 2024 at 4:48 PM
Our findings suggest that Transformers may require further development to achieve human-level symbolic manipulation. Stimulus Equivalence can help us understand the limitations of TBMs and serve as a valuable benchmark for evaluating symbolic reasoning.
December 5, 2024 at 4:48 PM
BERT and GPT agents failed all tests across all training structures under the select-only condition.
December 5, 2024 at 4:48 PM
BERT and GPT models failed to pass reflexivity, transitivity, and symmetry tests in one-to-many and many-to-one training structures with both select and reject conditions.
December 5, 2024 at 4:48 PM
Linear series training with select-reject relations yielded the best performance, but still fell short of human-level equivalence. BERT slightly outperformed GPT. TBMs struggled to demonstrate true stimulus equivalence, relying on reject rule-based decision making.
December 5, 2024 at 4:48 PM
Hallucinations were prevalent in the models, indicating difficulties in understanding the task. High hallucination rates correlated with lower overall performance and the models' reliance on rule-based decision making rather than true equivalence class formation.
December 5, 2024 at 4:48 PM
Models struggled to demonstrate true stimulus equivalence across all conditions, tending to rely on rule-based decision making rather than forming equivalence classes. The best performance was achieved in linear series training with select-reject relations.
December 5, 2024 at 4:48 PM