Spectral centroid
View on WikipediaThe spectral centroid is a measure used in digital signal processing to characterise a spectrum. It indicates where the center of mass of the spectrum is located. Perceptually, it has a robust connection with the impression of brightness of a sound.[1] It is sometimes called center of spectral mass.[2]
Calculation
[edit]It is calculated as the weighted mean of the frequencies present in the signal, determined using a Fourier transform, with their magnitudes as the weights:[3]
where x(n) represents the weighted frequency value, or magnitude, of bin number n, and f(n) represents the center frequency of that bin.
Alternative usage
[edit]Some people use "spectral centroid" to refer to the median of the spectrum. This is a different statistic, the difference being essentially the same as the difference between the unweighted median and mean statistics. Since both are measures of central tendency, in some situations they will exhibit some similarity of behaviour. But since typical audio spectra are not normally distributed, the two measures will often give strongly different values. Grey and Gordon in 1978 found the mean a better fit than the median.[1]
Applications
[edit]Because the spectral centroid is a good predictor of the "brightness" of a sound,[1] it is widely used in digital audio and music processing as an automatic measure of musical timbre.[4]
References
[edit]- ^ a b c Grey, John M.; Gordon, John W. (1978). "Perceptual effects of spectral modifications on musical timbres". The Journal of the Acoustical Society of America. 63 (5). Acoustical Society of America (ASA): 1493–1500. Bibcode:1978ASAJ...63.1493G. doi:10.1121/1.381843. ISSN 0001-4966.
- ^ Pulavarti, Surya V. S. R. K.; Maguire, Jack B.; Yuen, Shirley; Harrison, Joseph S.; Griffin, Jermel; Premkumar, Lakshmanane; Esposito, Edward A.; Makhatadze, George I.; Garcia, Angel E.; Weiss, Thomas M.; Snell, Edward H. (2022-02-17). "From Protein Design to the Energy Landscape of a Cold Unfolding Protein". The Journal of Physical Chemistry B. 126 (6): 1212–1231. doi:10.1021/acs.jpcb.1c10750. ISSN 1520-6106. PMC 9281400. PMID 35128921.
- ^ A Large Set of Audio Features for Sound Description. Technical report published by IRCAM in 2003. Section 6.1.1 describes the spectral centroid.
- ^ Schubert, Emery; Wolfe, Joe; Tarnopolsky, Alex (2004). "Spectral centroid and timbre in complex, multiple instrumental textures" (PDF). Proceedings of the 8th International Conference on Music Perception & Cognition, North Western University, Illinois. International Conference on Music Perception & Cognition. Lipscomb, S.D.; Ashley, R.; Gjerdingen, R. O.; Webster, P. (Eds.). Sydney, Australia: School of Music and Music Education; School of Physics, University of New South Wales. Archived from the original (PDF) on 2011-08-10.
Spectral centroid
View on GrokipediaDefinition and Background
Definition
The spectral centroid is a fundamental spectral descriptor in digital signal processing that represents the "center of mass" of a signal's frequency spectrum, calculated as the weighted average of the frequencies weighted by their corresponding magnitude (amplitude) values.[7] This measure treats the spectrum as a distribution of mass, where higher magnitudes contribute more to the central tendency, analogous to the physical center of gravity of an object with uneven density.[2] Conceptually, it is expressed as the sum of each frequency multiplied by its magnitude, divided by the total sum of magnitudes across the spectrum.[5] Unlike related spectral measures such as spectral rolloff, which identifies the frequency below which a specified percentage of the total spectral energy is contained, or spectral flux, which quantifies the rate of change in the spectrum between consecutive frames, the spectral centroid specifically emphasizes the overall balance and distribution of energy across frequencies.[8] This focus makes it particularly useful for characterizing the spectral shape without regard to temporal variations or energy thresholds.[4] The concept emerged in the late 20th century within digital signal processing as a tool for spectrum characterization, coinciding with advancements in Fourier analysis techniques that enabled detailed frequency-domain representations of signals.[7] In audio processing, it serves as a key indicator for timbre analysis by capturing the perceptual "brightness" of sounds through their spectral weighting.[9]Historical Development
The concept of the spectral centroid emerged in the context of spectral analysis using the Fourier transform during the 1970s and 1980s, as researchers began quantifying the distribution of spectral energy to describe auditory perceptions such as timbre brightness. Early applications focused on psychoacoustic experiments to model how modifications to the spectrum altered perceived sound qualities, laying the groundwork for its use as a descriptor beyond basic frequency analysis.[10] A pivotal advancement occurred in 1978 with the work of Grey and Gordon, who formalized the spectral centroid as the center of mass of the spectrum and demonstrated its superior correlation with perceptual similarity judgments for musical timbres compared to other metrics like the median frequency. This validation in psychoacoustics marked its transition from a mathematical construct to a perceptually grounded feature, influencing subsequent timbre research in the 1980s and 1990s. By the early 1990s, it gained traction in music information retrieval (MIR), as evidenced by Freed's 1990 study linking spectral centroid variations to perceived mallet hardness in percussive sounds, enabling automated audio classification and analysis. In the 2010s, practical tools facilitated its widespread adoption; for instance, MATLAB's Audio Toolbox, introduced in 2016, began supporting spectral centroid computations with the spectralCentroid function, aiding researchers in MIR and signal analysis workflows.[6] In the 2010s, the concept extended beyond audio to fields like ultrasound imaging, where spectral centroid shifts were used to estimate tissue properties and attenuation in backscattered signals.Mathematical Formulation
Continuous-Time Calculation
The spectral centroid in the continuous-time domain is defined as the ratio of the first moment of the magnitude spectrum to the zeroth moment, providing a measure of the spectrum's central tendency in frequency. For a continuous-time signal with Fourier transform , the spectral centroid is given byDiscrete-Time Implementation
In discrete-time systems, the spectral centroid is computed from the discrete Fourier transform (DFT) of finite-length signal frames, approximating the continuous-time integral via summation over frequency bins. For a frame of length sampled at rate , the spectral centroid at time index is given byfunction spectral_centroid(frame, fs, N):
# Step 1: Window and FFT
windowed = frame * hamming_window(N)
X = fft(windowed)
M = floor(N/2) + 1
magnitudes = abs(X[0:M]) # Positive frequency bins for real signals
# Step 2: Bin frequencies (exclude DC if desired)
freqs = (0:M-1) * fs / N
# freqs = (1:M-1) * fs / N # Exclude DC (k=0)
# Step 3-4: Weighted sum and normalize
numerator = sum(freqs .* magnitudes)
denominator = sum(magnitudes)
if denominator == 0:
return 0 # Or NaN/skip frame
else:
return numerator / denominator
This routine assumes a real-valued input frame and symmetric spectrum.[12]
Edge cases require careful handling to ensure numerical stability. If the denominator (sum of magnitudes) is zero, indicating a silent or zero-energy frame, the centroid is undefined; implementations often return zero or propagate NaN while skipping such frames in analysis.[5] DC offset in the input signal concentrates energy at the zero-frequency bin, lowering the centroid; this is addressed by pre-filtering the signal with a high-pass filter or subtracting the mean to remove the offset before windowing.[14] Windowing introduces estimation errors due to sidelobe leakage, particularly overestimating at low frequencies and underestimating at high frequencies; larger window sizes (e.g., 1024 samples) reduce bias, and advanced methods like thresholding the magnitude spectrum below a noise floor (e.g., -14 dB) can yield exact estimates for structured signals.[15]
The dominant computational cost arises from the FFT, yielding complexity per frame, suitable for offline processing but challenging for real-time applications at high sampling rates. Optimizations, such as recursive updates to the spectrogram or Toeplitz matrix formulations, maintain while reducing constants for streaming audio, enabling real-time centroid tracking without full recomputation.[16]