Auditory-Visual Speech Association (AVISA)
banner
avsp.bsky.social
Auditory-Visual Speech Association (AVISA)
@avsp.bsky.social
The official(ish) account of the Auditory-VIsual Speech Association (AVISA) AV 👄 👓 speech references, but mostly what interests me avisa.loria.fr
Pinned
A teaser for the next instalment of AVSP Visionaries
youtube.com/watch?v=y2e9...
Preconfigured neuronal firing sequences in human brain organoids www.nature.com/articles/s41... "...results suggest that temporal sequences do not arise in an experience-dependent manner, but are rather constrained by a preconfigured architecture established during neurodevelopment"
Preconfigured neuronal firing sequences in human brain organoids - Nature Neuroscience
Examining human brain organoids and ex vivo neonatal murine cortical slices demonstrates that structured neuronal sequences emerge independently of sensory input, highlighting the potential of brain o...
www.nature.com
November 25, 2025 at 7:58 AM
Fun read (Q and As) from the workshop: Speech Production Models and Empirical Evidence from Typical and Pathological Speech
hal.science/hal-05352262...
hal.science
November 25, 2025 at 12:49 AM
Reposted by Auditory-Visual Speech Association (AVISA)
for all of you using the ALIGN library (to measure lexical, syntactic and semantic alignment in conversations), Nick Duran has put together a great refactoring: ALIGN 2.0 (github.com/nickduran/al...), now integrated with Spacy and Bert
GitHub - nickduran/align2-linguistic-alignment: ALIGN 2.0: Modern Python package for multi-level linguistic alignment analysis. Faster, streamlined, and feature-rich while maintaining full compatibili...
ALIGN 2.0: Modern Python package for multi-level linguistic alignment analysis. Faster, streamlined, and feature-rich while maintaining full compatibility with the original ALIGN methodology (Duran...
github.com
November 24, 2025 at 10:17 AM
The Role of Modality-specific Brain Regions in Statistical Learning: Insights from Intracranial Neural Entrainment direct.mit.edu/jocn/article... iEEG to trisyllabic words - tracking at (approx) syllable & word frequencies - SL assessed explicit & implicit measures - SL found but modality-specific
The Role of Modality-specific Brain Regions in Statistical Learning: Insights from Intracranial Neural Entrainment
Abstract. Statistical learning (SL) is a powerful mechanism that supports the ability to extract regularities from environmental input. Yet, its neural underpinnings are not well understood. Previous ...
direct.mit.edu
November 21, 2025 at 9:46 PM
AV integration and acoustic change complex in adult cochlear implant users: an electrophysiological and behavioral investigation www.tandfonline.com/doi/full/10.... CI users show limitations in AV integration & prolonged ACC latencies (reflecting auditory discrimination not multisensory integration)
www.tandfonline.com
November 21, 2025 at 9:41 PM
The ADVANCE toolkit: Automated descriptive video annotation in naturalistic child environments link.springer.com/article/10.3... Looks like: YOLOv8; MediaPipe (used to mask frames); OSNet; OpenPose (via Docker) & some k-means clustering & XGBoost - code @ osf.io/4mfsk/ Hmm - info on setting up?
The ADVANCE toolkit: Automated descriptive video annotation in naturalistic child environments - Behavior Research Methods
Video recordings are commonplace for observing human and animal behaviours, including interindividual interactions. In studies of humans, analyses for clinical applications remain particularly cumbers...
link.springer.com
November 20, 2025 at 8:52 PM
Joint Population Coding and Temporal Coherence Link an Attended Talker’s Voice and Location Features in Naturalistic Multi-talker Scenes www.jneurosci.org/content/45/4... Two complementary mechanisms: joint population coding & temporal coherence enable integration of voice & location features
Joint Population Coding and Temporal Coherence Link an Attended Talker’s Voice and Location Features in Naturalistic Multi-talker Scenes
Listeners effortlessly extract multi-dimensional auditory objects, such as a localized talker, from complex acoustic scenes. However, the neural mechanisms that enable simultaneous encoding and linkin...
www.jneurosci.org
November 19, 2025 at 11:56 PM
Head Gestures Do Not Serve as Precursors of Prosodic Focus Marking in the Second Language as They Do in the First Language onlinelibrary.wiley.com/doi/pdf/10.1... "L2 learners highlight the same, possibly inaccurate, part of an utterance in both modalities"
<em>Language Learning</em> | Language Learning Research Club Journal | Wiley Online Library
Research shows that children use head gestures to mark discourse focus before developing the required prosodic cues in their first language (L1), and their gestures affect the prosodic parameters of ...
onlinelibrary.wiley.com
November 17, 2025 at 3:49 AM
Audiovisual Speech Perception With Less Familiar and Frequent Words
pubs.asha.org/doi/full/10....
Investigated how word familiarity affected word recognition in A-only & AV conditions using medically related sentences embedded in a simulated hospital soundscape
Audiovisual Speech Perception With Less Familiar and Frequent Words
Purpose: Lexical factors, such as word frequency and neighborhood density, impact word recognition accuracy in challenging listening environments...
pubs.asha.org
November 14, 2025 at 9:05 PM
Front page note on Perceptrons draft

"We are anxious to collect: ...3. statements that are clear but false....Diligence will be rewarded, somehow or other" : )
November 14, 2025 at 4:47 AM
Neural Tracking of the Maternal Voice in the Infant Brain
www.jneurosci.org/content/earl...
Used TRF to look at how 7-month-old human infants track maternal vs unfamiliar speech & if this affects simultaneous face processing - maternal speech enhances neural tracking & alter how faces are processed
Neural Tracking of the Maternal Voice in the Infant Brain
Infants preferentially process familiar social signals, but the neural mechanisms underlying continuous processing of maternal speech remain unclear. Using EEG-based neural encoding models based on te...
www.jneurosci.org
November 11, 2025 at 8:59 PM
Visual Speech Reduces Cognitive Effort as Measured by EEG Theta Power and Pupil Dilation
www.eneuro.org/content/12/1...
Combined pupillometry & EEG to investigate how visual speech cues modulate cognitive effort during speech recognition ... code/software & data github.com/brianman515/... - nice!
Visual Speech Reduces Cognitive Effort as Measured by EEG Theta Power and Pupil Dilation
Listening effort reflects the cognitive and motivational resources allocated to speech comprehension, particularly under challenging conditions. Visual cues are known to enhance speech perception, pot...
www.eneuro.org
November 11, 2025 at 8:52 PM
Reposted by Auditory-Visual Speech Association (AVISA)
happy to share our new paper, out now in Neuron! led by the incredible Yizhen Zhang, we explore how the brain segments continuous speech into word-forms and uses adaptive dynamics to code for relative time - www.sciencedirect.com/science/arti...
Human cortical dynamics of auditory word form encoding
We perceive continuous speech as a series of discrete words, despite the lack of clear acoustic boundaries. The superior temporal gyrus (STG) encodes …
www.sciencedirect.com
November 7, 2025 at 6:16 PM
Correlation detection as a stimulus computable account for AV perception, causal inference & saliency maps in mammals elifesciences.org/articles/106... Image- & sound-computable population model for AV perception -> Used simulation to model psychophysical, eye-tracking & pharmacological experiments
Correlation detection as a stimulus computable account for audiovisual perception, causal inference, and saliency maps in mammals
Optimal cue integration, Bayesian Causal Inference, spatial orienting, speech illusions and other key phenomena in audiovisual perception naturally emerge from the collective behavior of a population ...
elifesciences.org
November 7, 2025 at 12:22 PM
A richly annotated dataset of co-speech hand gestures across diverse speaker contexts www.nature.com/articles/s41... Dataset comprising 2373 annotated gestures, 9 speakers across 3 distinct categories: University lecturers, Politicians, and Psychotherapists can be accessed at doi.org/10.17605/OSF...
A richly annotated dataset of co-speech hand gestures across diverse speaker contexts - Scientific Data
Scientific Data - A richly annotated dataset of co-speech hand gestures across diverse speaker contexts
www.nature.com
November 6, 2025 at 12:38 PM
Reposted by Auditory-Visual Speech Association (AVISA)
Applications are now open for the MARCS International Visiting Scholar Program 2026! 🌏

We are pleased to offer scholarships for PhD students and postdocs for visits of 1–3 months before the end of 2026.

📅 Applications close 4 December 2025.

If you are interested, email [email protected]
November 5, 2025 at 10:38 PM
Distinct Portions of Superior Temporal Sulcus Combine Auditory Representations with Different Visual Streams www.jneurosci.org/content/45/4... Analysed open-source auditory cortex fMRI data from people watching a movie with ANNs to investigate STS & 2 visual streams
Distinct Portions of Superior Temporal Sulcus Combine Auditory Representations with Different Visual Streams
In humans, the superior temporal sulcus (STS) combines auditory and visual information. However, the extent to which it relies on visual information from the ventral or dorsal stream remains uncertain...
www.jneurosci.org
November 5, 2025 at 9:34 PM
When brain talks back to the eye "The state of our brain shapes what we see, but how early in the visual system does this start? A new study in PLOS Biology shows that brain state-dependent release of histamine modulates the very first stage of vision in the retina" journals.plos.org/plosbiology/...
When the brain talks back to the eye
The state of our brain shapes what we see, but how early in the visual system does this start? This Primer explores a new PLOS Biology study which shows that brain state-dependent release of histamine...
journals.plos.org
November 5, 2025 at 9:30 PM
Reposted by Auditory-Visual Speech Association (AVISA)
Here's an interesting new study exploring whether LLMs are able to understand the narrative sequencing of comics and... even the best AI models are *terrible* at it for pretty much all tasks that were analyzed aclanthology.org/2025.finding...
Beyond Single Frames: Can LMMs Comprehend Implicit Narratives in Comic Strip?
Xiaochen Wang, Heming Xia, Jialin Song, Longyu Guan, Qingxiu Dong, Rui Li, Yixin Yang, Yifan Pu, Weiyao Luo, Yiru Wang, Xiangdi Meng, Wenjie Li, Zhifang Sui. Findings of the Association for Computatio...
aclanthology.org
November 4, 2025 at 8:05 PM
Reposted by Auditory-Visual Speech Association (AVISA)

Can you feel what I am saying? Speech-based vibrotactile stimulation enhances the cortical tracking of attended speech in a multi-talker background

https://www.biorxiv.org/content/10.1101/2025.10.31.685484v1
November 2, 2025 at 3:28 AM
Audiovisual Synchrony in Left-hemisphere Brain-lesioned Individuals with Aphasia direct.mit.edu/jocn/article... Found a statistically significant effect of aphasia type on measures of AV synchrony; an effect not explained by lesion volume...damage to left posterior temporal bad for AV processing.
Audiovisual Synchrony in Left-hemisphere Brain-lesioned Individuals with Aphasia
Abstract. We investigated the ability of 40 left-hemisphere brain-lesioned individuals with various diagnoses of aphasia to temporally synchronize the audio of a spoken word to its congruent video usi...
direct.mit.edu
November 1, 2025 at 5:17 AM
Expectation-driven shifts in perception and production pubs.aip.org/asa/jasa/art... Failed to find evidence that individuals' expectation-driven shifts in perception correlate with those in production ...
Expectation-driven shifts in perception and production
While phonetic convergence has been taken as evidence for tight perception–production links, attempts to correlate perceptual adjustments with production shifts
pubs.aip.org
October 29, 2025 at 12:49 AM
Yes, the McGurk effect, that's right -> "The influence of age, listener sex, and speaker sex on the McGurk effect" journals.sagepub.com/doi/10.1177/... Are reports of higher sensitivity to the McGurk effect in females than males influenced by the match of Listener-Speaker sex?
Sage Journals: Discover world-class research
Subscription and open access journals from Sage, the world's leading independent academic publisher.
journals.sagepub.com
October 29, 2025 at 12:46 AM
Audiovisual speech perception in Mandarin cochlear implant users across age and listening conditions www.sciencedirect.com/science/arti... "AV cues play a critical role in speech perception for Mandarin-speaking CI users, especially under acoustically challenging conditions"
Audiovisual speech perception in Mandarin cochlear implant users across age and listening conditions
To investigate how visual cues influence speech recognition in Mandarin-speaking cochlear implant (CI) users and examine age-related differences in au…
www.sciencedirect.com
October 28, 2025 at 12:17 AM
Visual induction of spatial release from masking during speech perception in noise pubs.aip.org/asa/jel/arti... "There was no enhancement of auditory SRM through visual spatial separation" (shown before) - It did have a negative effect though i.e., to "disrupt existing auditory SRM" ...
Visual induction of spatial release from masking during speech perception in noise
Spatially separating target and masker talkers improves speech perception in noise, an effect known as spatial release from masking (SRM). Independently, the pe
pubs.aip.org
October 27, 2025 at 9:50 PM