Cable USB

This is not necessarily the primary cognitive process involved… This study demonstrates a synergistic process of timbre recognition.

Encoding of Natural Timbre Dimensions in Human Auditory Cortex

1. Introduction

Timbre, the perceptual quality or color of a sound, is defined as everything by which a listener can distinguish between two sounds with the same loudness, pitch, spatial location, and duration (ANSI, 2013). For instance, it is differences in timbre that allow us to distinguish a violin from a guitar, or one vowel sound from another. Among the typical adjectives that fall under the category of timbre are “brightness”, “clarity”, “harshness”, “fullness”, and “noisiness” (Stepanek, 2006). Efforts have been made to identify and quantify the most salient aspects of timbre through the use of multidimensional scaling (MDS) techniques (e.g., Grey, 1977; Elliott et al., 2013). MDS utilizes subjective measures to determine how perceptually similar a selection of sounds are to one another, thereby creating a geometric representation that derives the subjective distances between a diverse set of stimuli using as few dimensions as possible (Grey, 1977). After collecting similarity ratings for musical instrument sounds with unique timbres, Grey (1977) used MDS to identify three dimensions that best represented the distribution of timbres. The first dimension was related to the spectral energy distribution of the sounds (ranging from a low to high spectral centroid, corresponding to timbral descriptors ranging from dull to bright), and the other two related to temporal patterns, such as whether the onset was rapid (like a struck piano note or a plucked guitar string) or slow (as is characteristic of many woodwind instruments) and the synchronicity of higher harmonic transients.

Grey’s influential study contained only sixteen instrumental sounds from three instrument families, placing some limits on the generalizability of the outcomes, and used sounds that may not have all had exactly the same fundamental frequency (F0), which itself may have affected some aspects of timbre judgments (e.g., Moore and Glasberg, 1990; Warrier and Zatorre, 2002; Allen and Oxenham, 2014). Elliott et al. (2013)extended Grey’s approach by using 42 natural orchestral instruments from five instrument families, all with the same F0 (311 Hz, the E♭ above middle C). After collecting similarity and semantic ratings, they performed multiple analyses, including MDS. They consistently found five dimensions to be both necessary and sufficient for describing the timbre space of these orchestral sounds.

The aim of the current study was to determine whether similar dimensions can be identified in the cortical representations of timbral differences. Although the literature on the neural representations of timbre is limited, there is some evidence to suggest it is processed in both primary and secondary auditory cortical regions including superior temporal sulcus (STS), posterior Heschl’s gyrus (HG), and planum temporale (PT), bilaterally, with possible hemispheric asymmetries (Casey et al., 2012; Halpern et al., 2004; Menon et al., 2002; Staeren et al., 2009; Warren et al., 2005). However, previous studies have not attempted to differentiate the neural representations of different timbral dimensions, and have not explored the possibility that a subjectively based model of timbre could predict patterns of cortical activation in response to sound. In the present study, we use fMRI encoding (Kay et al., 2008; Moerel et al., 2012; Santoro et al., 2014) to determine whether neural populations in the cortex can represent the timbre dimensions identified by Elliott et al. (2013), and compare this model’s performance with that of models based on the spectral and temporal characteristics of the sounds.

1 Like

Thank you for the reference to this very interesting and informative paper. It’s somewhat orthogonal to my point about pattern recognition, since the test subjects here underwent training, so they experienced repetition which is an aid to pattern recognition. There was no control group of untrained subjects to determine whether training aided recognition of timbral differences. (I imagine it would, but without experimental confirmation that’s just speculation on my part.)

The paper is concerned with whether various forms of objective modeling or a subjective model best predict brain response to timbral differences. The subjective model does best, which is a lovely result; it shows us that at least as of 2019 (the publication date), models based on measurements of the sound itself don’t account for higher level auditory processing in the brain. The best performing among the objective models takes into account not only frequency-based but time-based objective characteristics, which makes a nice counterbalance to some advocates of objective measurements who tend to concentrate primarily on the frequency-based characteristics of sound reproduction.

Thanks again.

The ‘training’ is correlated to this seminal research as referenced in the article at ‘2.3 Stimuli and procedure’