Andrew Chang
@candrew123.bsky.social
260 followers 96 following 38 posts
Postdoctoral researcher at NYU, working on computational cognitive neuroscience, audition (music and speech), and real-world communication. 🇹🇼🇨🇦🇺🇸
Posts Media Videos Starter Packs
candrew123.bsky.social
Huge thanks to co-authors @yikeli.bsky.social, Iran R. Roman, @davidpoeppel.bsky.social, and to the Interspeech reviewers for the perfect 4/4 score! 🙌

Can’t wait to present and discuss how this bridges machine and human perception! See you in Rotterdam!
candrew123.bsky.social
💥 Key Impact 3:
This paves the way for advances in #CognitiveComputing and audio-related brain–computer (#BCI) applications (e.g., sound/speech reconstruction).
candrew123.bsky.social
💥 Key Impact 2:
STM features link directly to brain processing, offering a more interpretable, biologically grounded representation.
candrew123.bsky.social
💥 Key Impact 1:
Without any pretraining, our STM-based DNN matches popular spectrogram-based models on speech, music, and environmental sound classification.
candrew123.bsky.social
While spectrogram-based audio DNNs excel, they’re often bulky, compute-heavy, hard to interpret, and data-hungry.
We explored an alternative: training a DNN on spectrotemporal modulation (#STM) features—an approach inspired by how the human auditory cortex processes sound.
Reposted by Andrew Chang
Our Interspeech2025 contrib (for geeks)
arxiv.org/pdf/2505.23509
Audio DNNs: impressive performance on machine listening tasks. But most representations are computationally costly & uninterpretable. Let's try something different:
arxiv.org
Reposted by Andrew Chang
haleykrag.bsky.social
why DO babies dance? when do they start dancing? what counts as dancing, anyway (and how can we measure it)? out online today in CDPS, @lkcirelli.bsky.social and i attempt to integrate what is known about the development of dance
journals.sagepub.com/doi/epub/10.... (2/4)
candrew123.bsky.social
I have emailed @interspeech.bsky.social, but it would be great if you could also reach out to them at [email protected] if this concerns you as well, so they understand that this will affect many people. I’m sure none of us want to be stuck writing a rebuttal in a hotel at #ICASSP!
candrew123.bsky.social
@interspeech.bsky.social just changed its rebuttal period to April 4-11, which overlaps with #ICASSP.

Given the overlap in research communities, I believe many researchers who submitted to #Interspeech2025 will also be attending #ICASSP2025. Could it be at least a week later?
candrew123.bsky.social
What's next? We are currently working on (1) refining our ML model by combining active learning and semi-supervised learning approaches and (2) experimenting with new human-computer interaction designs to mitigate negative experiences during videoconferencing. 7/end
candrew123.bsky.social
Beyond improving technical aspects like signal quality and latency of a videoconferencing system, social dynamics can deeply affect user experience. Our research paves the way for future enhancements by predicting and preventing conversational derailments in real time.
6/n
candrew123.bsky.social
One surprising insight: awkward silences—those long gaps in turn-taking—were more detrimental to conversational fluidity and enjoyment than chaotic overlaps or interruptions.
5/n
candrew123.bsky.social
We used multimodal ML on 100+ person-hours of videoconferences, modeling voice, facial expressions, and body movements. Key result: ROC-AUC 0.87 in predicting unfluid and unenjoyable moments and classifying various disruptive events, such as gaps and interruptions.
4/n
candrew123.bsky.social
Videoconferencing has become essential in our professional and personal lives, especially post-pandemic. Yet, we've all experienced the “derailed” moments, such as awkward pauses and uncoordinated turn-taking, and that can make virtual meetings less effective and enjoyable.
3/n
candrew123.bsky.social
Thanks for your comment. Yes there are several recent studies suggesting that chroma is not really an innate or universal property of pitch perception. Our study cannot answer this question, but we indeed found that the effect of chroma is much weaker than height.
candrew123.bsky.social
In short: By combining machine learning and MEG, we show how the brain’s dynamic pitch representation echoes ideas proposed over 100 years ago. Feels like completing a full circle in music cognitive neuroscience! Huge thanks to my collaborators! End/n
candrew123.bsky.social
The helix model reflects the idea that pitches separated by an octave (e.g., the repeating piano keys) are perceived as inherently similar. This concept was first explored in the early 1900s by Géza Révész, laying the groundwork for modern music cognition! 🧠🎹 6/n
candrew123.bsky.social
The brain doesn’t process pitch in an unstructured way. Typically, it represents pitches in a mostly linear structure—think piano keyboard layout. BUT—just 0.3 seconds after hearing a sound, something wild happens: the brain briefly represents pitch in a helix-like structure! 5/n