Paper List

Journal: ArXiv Preprint
Published: Unknown
NeuroscienceComputational Auditory Cognition

Decoding Selective Auditory Attention to Musical Elements in Ecologically Valid Music Listening

Sony Computer Science Laboratories, Inc, Tokyo, Japan

Taketo Akama, Zhuohao Zhang, Tsukasa Nagashima, Takagi Yutaka, Shun Minamikawa, Natalia Polouliakh
Figure
Figure
Figure
Figure
Figure

The 30-Second View

IN SHORT: This paper addresses the core challenge of objectively quantifying listeners' selective attention to specific musical components (e.g., vocals, drums, bass) during naturalistic music listening, a task previously hindered by the lack of overt behavioral correlates and reliance on simplified, non-ecological stimuli.

Innovation (TL;DR)

  • Methodology First study to decode auditory attention using real, studio-produced, polyphonic songs across diverse genres (pop, rock, jazz, electronic), moving beyond simplified instrument tracks or synthetic mixtures.
  • Methodology Demonstrates the practical feasibility of using a lightweight, four-channel consumer-grade EEG device (Muse2) for reliable neural decoding in an ecologically valid music listening paradigm.
  • Biology Provides empirical evidence that a frontal–temporal four-electrode montage can effectively support the decoding of selective musical attention, offering insights into the neural correlates of auditory focus.

Key conclusions

  • The 'Model: all-0 ms' (trained on all trials without EEG-audio delay) achieved the highest global decoding accuracy, significantly outperforming models trained only on high-attention trials ('attn-0 ms', p=1.89e-25) or with a 200ms delay ('all-200 ms', p=2.31e-19).
  • The model demonstrated robust generalization, achieving a mean global accuracy of 86.41% across subjects for unseen songs and maintaining performance (mean 84.54%) even when evaluated only on trials where participants self-reported high attention.
  • Decoding performance was stable across the four musical component tasks (Vocal, Drum, Bass, Others) in the best model, with task-level accuracy exceeding 80% for all tasks in the all-data evaluation, though performance on the Bass task was comparatively lower (65%) in the high-attention evaluation.
Background and Gap: Prior research on auditory attention decoding has largely focused on controlled speech paradigms ('cocktail party' scenarios) or simplified musical stimuli (isolated tones, instrument streams, or spatially separated audio), creating a significant gap in understanding how attention is allocated during natural, complex, and emotionally engaging music listening.

Abstract: Art has long played a profound role in shaping human emotion, cognition, and behavior. While visual arts such as painting and architecture have been studied through eye-tracking, revealing distinct gaze patterns between experts and novices, analogous methods for auditory art forms remain underdeveloped. Music, despite being a pervasive component of modern life and culture, still lacks objective tools to quantify listeners’ attention and perceptual focus during natural listening experiences. To our knowledge, this is the first attempt to decode selective attention to musical elements using naturalistic, studio-produced songs and a lightweight consumer-grade EEG device with only four electrodes. By analyzing neural responses during real-world–like music listening, we test whether decoding is feasible under conditions that minimize participant burden and preserve the authenticity of the musical experience. Our contributions are fourfold: (i) decoding music attention in real studio-produced songs, (ii) demonstrating feasibility with a four-channel consumer EEG, (iii) providing insights for music attention decoding, and (iv) demonstrating improved model ability over prior work. Our findings suggest that musical attention can be decoded not only for novel songs but also across new subjects, showing performance improvements compared to existing approaches under our tested conditions. These findings show that consumer-grade devices can reliably capture signals, and that neural decoding in music could be feasible in real-world settings. This paves the way for applications in education, personalized music technologies, and therapeutic interventions.