Lightnews — Scholar-powered news

Andrew Chang @candrew123.bsky.social · Jun 2

Huge thanks to co-authors @yikeli.bsky.social, Iran R. Roman, @davidpoeppel.bsky.social, and to the Interspeech reviewers for the perfect 4/4 score! 🙌

Can’t wait to present and discuss how this bridges machine and human perception! See you in Rotterdam!

1

Andrew Chang @candrew123.bsky.social · Jun 2

💥 Key Impact 3:
This paves the way for advances in #CognitiveComputing and audio-related brain–computer (#BCI) applications (e.g., sound/speech reconstruction).

1

Andrew Chang @candrew123.bsky.social · Jun 2

💥 Key Impact 2:
STM features link directly to brain processing, offering a more interpretable, biologically grounded representation.

1

Andrew Chang @candrew123.bsky.social · Jun 2

💥 Key Impact 1:
Without any pretraining, our STM-based DNN matches popular spectrogram-based models on speech, music, and environmental sound classification.

1

Andrew Chang @candrew123.bsky.social · Jun 2

While spectrogram-based audio DNNs excel, they’re often bulky, compute-heavy, hard to interpret, and data-hungry.
We explored an alternative: training a DNN on spectrotemporal modulation (#STM) features—an approach inspired by how the human auditory cortex processes sound.

1

Andrew Chang @candrew123.bsky.social · Jun 2

I’m excited to share one of two papers accepted to #Interspeech2025! @interspeech.bsky.social

“Spectrotemporal Modulation: Efficient & Interpretable Feature Representation for Classifying Speech, Music & Environmental Sounds”
📄 Paper: arxiv.org/abs/2505.23509
#NeuroInspiredML #AudioAI

Spectrotemporal Modulation: Efficient and Interpretable Feature Representation for Classifying Speech, Music, and Environmental Sounds

Audio DNNs have demonstrated impressive performance on various machine listening tasks; however, most of their representations are computationally costly and uninterpretable, leaving room for optimiza...

arxiv.org

1 1 3

Reposted by Andrew Chang

David Poeppel @davidpoeppel.bsky.social · May 31

Our Interspeech2025 contrib (for geeks)
arxiv.org/pdf/2505.23509
Audio DNNs: impressive performance on machine listening tasks. But most representations are computationally costly & uninterpretable. Let's try something different:

arxiv.org

1 4 14

Andrew Chang @candrew123.bsky.social · Apr 24

neuroscience x urban design? fascinating
www.nature.com/articles/s41...

What cities can learn from the brain - Nature Human Behaviour

Given its ability to manage a multitude of functions in support of survival, the dynamics and organization of the brain offer the city — another confluence of structures and processes — lessons for ur...

www.nature.com

1

Reposted by Andrew Chang

Haley Kragness @haleykrag.bsky.social · Mar 14

why DO babies dance? when do they start dancing? what counts as dancing, anyway (and how can we measure it)? out online today in CDPS, @lkcirelli.bsky.social and i attempt to integrate what is known about the development of dance
journals.sagepub.com/doi/epub/10.... (2/4)

1 1 10

Reposted by Andrew Chang

David Poeppel @davidpoeppel.bsky.social · Mar 13

www.youtube.com/watch?v=n5nE...
Important and painful

How Germany's elite research institution fails young scientists | DW Documentary

YouTube video by DW Documentary

www.youtube.com

24 72

Reposted by Andrew Chang

Hsing-Hao Lee @hsinghaolee.bsky.social · Mar 10

Our new preprint is now on bioRxiv!
'Visual adaptation stronger at horizontal than vertical meridian: Linking performance with V1 cortical surface area'
www.biorxiv.org/content/10.1...

Visual adaptation stronger at horizontal than vertical meridian: Linking performance with V1 cortical surface area

Visual adaptation, a mechanism that conserves bioenergetic resources by reducing energy expenditure on repetitive stimuli, leads to decreased sensitivity for similar features (e.g., orientation and sp...

www.biorxiv.org

2 3 8

Andrew Chang @candrew123.bsky.social · Mar 12

I have emailed @interspeech.bsky.social, but it would be great if you could also reach out to them at [email protected] if this concerns you as well, so they understand that this will affect many people. I’m sure none of us want to be stuck writing a rebuttal in a hotel at #ICASSP!

Andrew Chang @candrew123.bsky.social · Mar 12

@interspeech.bsky.social just changed its rebuttal period to April 4-11, which overlaps with #ICASSP.

Given the overlap in research communities, I believe many researchers who submitted to #Interspeech2025 will also be attending #ICASSP2025. Could it be at least a week later?

1

Andrew Chang @candrew123.bsky.social · Mar 10

What's next? We are currently working on (1) refining our ML model by combining active learning and semi-supervised learning approaches and (2) experimenting with new human-computer interaction designs to mitigate negative experiences during videoconferencing. 7/end

1

Andrew Chang @candrew123.bsky.social · Mar 10

Beyond improving technical aspects like signal quality and latency of a videoconferencing system, social dynamics can deeply affect user experience. Our research paves the way for future enhancements by predicting and preventing conversational derailments in real time.
6/n

1

Andrew Chang @candrew123.bsky.social · Mar 10

One surprising insight: awkward silences—those long gaps in turn-taking—were more detrimental to conversational fluidity and enjoyment than chaotic overlaps or interruptions.
5/n

1

Andrew Chang @candrew123.bsky.social · Mar 10

We used multimodal ML on 100+ person-hours of videoconferences, modeling voice, facial expressions, and body movements. Key result: ROC-AUC 0.87 in predicting unfluid and unenjoyable moments and classifying various disruptive events, such as gaps and interruptions.
4/n

1

Andrew Chang @candrew123.bsky.social · Mar 10

Videoconferencing has become essential in our professional and personal lives, especially post-pandemic. Yet, we've all experienced the “derailed” moments, such as awkward pauses and uncoordinated turn-taking, and that can make virtual meetings less effective and enjoyable.
3/n

1

Andrew Chang @candrew123.bsky.social · Mar 10

See my thread below, and also this press release: www.nyu.edu/about/news-p...
2/n

Can AI Tell Us if Those Zoom Calls Are Flowing Smoothly? New Study Gives a Thumbs Up

Researchers find machine learning can predict how we rate social interactions in videoconference conversations

www.nyu.edu

1

Andrew Chang @candrew123.bsky.social · Mar 10

Excited to share that our paper, "Multimodal Machine Learning Can Predict Videoconference Fluidity and Enjoyment," has been accepted for an **oral presentation** at #ICASSP! ieeexplore.ieee.org/document/108...
@dustinfreeman.bsky.social @davidpoeppel.bsky.social
1/n

Multimodal Machine Learning Can Predict Videoconference Fluidity and Enjoyment

Videoconferencing is now a frequent mode of communication in both professional and informal settings, yet it often lacks the fluidity and enjoyment of in-person conversation. This study leverages mult...

ieeexplore.ieee.org

1 2 2

Andrew Chang @candrew123.bsky.social · Feb 21

There is an excellent cross-cultural study on this topic by @norijacoby.bsky.social. A lay summary of the paper can be found here: www.aesthetics.mpg.de/en/research/...

Perception of pitch is culturally influenced

Study on cross-cultural music perception published in Current Biology

www.aesthetics.mpg.de

1

Andrew Chang @candrew123.bsky.social · Feb 21

Thanks for your comment. Yes there are several recent studies suggesting that chroma is not really an innate or universal property of pitch perception. Our study cannot answer this question, but we indeed found that the effect of chroma is much weaker than height.

Andrew Chang @candrew123.bsky.social · Feb 19

In short: By combining machine learning and MEG, we show how the brain’s dynamic pitch representation echoes ideas proposed over 100 years ago. Feels like completing a full circle in music cognitive neuroscience! Huge thanks to my collaborators! End/n

9

Andrew Chang @candrew123.bsky.social · Feb 19

The helix model reflects the idea that pitches separated by an octave (e.g., the repeating piano keys) are perceived as inherently similar. This concept was first explored in the early 1900s by Géza Révész, laying the groundwork for modern music cognition! 🧠🎹 6/n

1 8

Andrew Chang @candrew123.bsky.social · Feb 19

The brain doesn’t process pitch in an unstructured way. Typically, it represents pitches in a mostly linear structure—think piano keyboard layout. BUT—just 0.3 seconds after hearing a sound, something wild happens: the brain briefly represents pitch in a helix-like structure! 5/n

1 2 16