Hubbry Logo
search
logo

Spectral centroid

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia

The spectral centroid is a measure used in digital signal processing to characterise a spectrum. It indicates where the center of mass of the spectrum is located. Perceptually, it has a robust connection with the impression of brightness of a sound.[1] It is sometimes called center of spectral mass.[2]

Calculation

[edit]

It is calculated as the weighted mean of the frequencies present in the signal, determined using a Fourier transform, with their magnitudes as the weights:[3]

where x(n) represents the weighted frequency value, or magnitude, of bin number n, and f(n) represents the center frequency of that bin.

Alternative usage

[edit]

Some people use "spectral centroid" to refer to the median of the spectrum. This is a different statistic, the difference being essentially the same as the difference between the unweighted median and mean statistics. Since both are measures of central tendency, in some situations they will exhibit some similarity of behaviour. But since typical audio spectra are not normally distributed, the two measures will often give strongly different values. Grey and Gordon in 1978 found the mean a better fit than the median.[1]

Applications

[edit]

Because the spectral centroid is a good predictor of the "brightness" of a sound,[1] it is widely used in digital audio and music processing as an automatic measure of musical timbre.[4]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
The spectral centroid is a fundamental measure in digital signal processing, particularly for audio signals, that quantifies the "center of mass" of a signal's frequency spectrum, providing insight into its spectral distribution.[1] It is computed as the weighted average of the frequencies in the spectrum, where each frequency bin is weighted by the corresponding magnitude (or power) of the spectral coefficients, typically derived from the short-time Fourier transform (STFT) of the signal.[2] For a discrete spectrum with magnitude values $ |X(k)| $ at frequency bins $ k $, the formula is $ C = \frac{\sum_k k \cdot |X(k)|}{\sum_k |X(k)|} $, often scaled by the sampling rate to yield a value in Hertz.[3] Perceptually, the spectral centroid correlates strongly with the perceived brightness or sharpness of a sound's timbre, with higher values indicating emphasis on higher frequencies and lower values suggesting duller or bass-heavy tones.[4] In audio analysis and music information retrieval (MIR), the spectral centroid serves as a key low-level feature for tasks such as genre classification, instrument recognition, and timbre modeling, as it captures the overall tonal balance of a sound.[2] For instance, studies have shown that pop and rock music often exhibit higher average spectral centroids compared to classical music, reflecting brighter instrumentation and production styles.[2] It is routinely extracted frame-by-frame from magnitude spectrograms in libraries like LibROSA and MATLAB's Audio Toolbox, enabling time-varying analysis of audio evolution, such as in onset detection or mood estimation.[5][6] Beyond music, the spectral centroid finds applications in speech processing for accent detection and emotion recognition, as well as in broader signal processing for radar and sonar analysis, where it helps characterize the dominant frequency content of complex waveforms.[4] Its robustness to noise and computational efficiency make it a staple in real-time systems, though extensions like spectral spread (measuring variance around the centroid) provide complementary information for more nuanced spectral descriptions.[4]

Definition and Background

Definition

The spectral centroid is a fundamental spectral descriptor in digital signal processing that represents the "center of mass" of a signal's frequency spectrum, calculated as the weighted average of the frequencies weighted by their corresponding magnitude (amplitude) values.[7] This measure treats the spectrum as a distribution of mass, where higher magnitudes contribute more to the central tendency, analogous to the physical center of gravity of an object with uneven density.[2] Conceptually, it is expressed as the sum of each frequency multiplied by its magnitude, divided by the total sum of magnitudes across the spectrum.[5] Unlike related spectral measures such as spectral rolloff, which identifies the frequency below which a specified percentage of the total spectral energy is contained, or spectral flux, which quantifies the rate of change in the spectrum between consecutive frames, the spectral centroid specifically emphasizes the overall balance and distribution of energy across frequencies.[8] This focus makes it particularly useful for characterizing the spectral shape without regard to temporal variations or energy thresholds.[4] The concept emerged in the late 20th century within digital signal processing as a tool for spectrum characterization, coinciding with advancements in Fourier analysis techniques that enabled detailed frequency-domain representations of signals.[7] In audio processing, it serves as a key indicator for timbre analysis by capturing the perceptual "brightness" of sounds through their spectral weighting.[9]

Historical Development

The concept of the spectral centroid emerged in the context of spectral analysis using the Fourier transform during the 1970s and 1980s, as researchers began quantifying the distribution of spectral energy to describe auditory perceptions such as timbre brightness. Early applications focused on psychoacoustic experiments to model how modifications to the spectrum altered perceived sound qualities, laying the groundwork for its use as a descriptor beyond basic frequency analysis.[10] A pivotal advancement occurred in 1978 with the work of Grey and Gordon, who formalized the spectral centroid as the center of mass of the spectrum and demonstrated its superior correlation with perceptual similarity judgments for musical timbres compared to other metrics like the median frequency. This validation in psychoacoustics marked its transition from a mathematical construct to a perceptually grounded feature, influencing subsequent timbre research in the 1980s and 1990s. By the early 1990s, it gained traction in music information retrieval (MIR), as evidenced by Freed's 1990 study linking spectral centroid variations to perceived mallet hardness in percussive sounds, enabling automated audio classification and analysis. In the 2010s, practical tools facilitated its widespread adoption; for instance, MATLAB's Audio Toolbox, introduced in 2016, began supporting spectral centroid computations with the spectralCentroid function, aiding researchers in MIR and signal analysis workflows.[6] In the 2010s, the concept extended beyond audio to fields like ultrasound imaging, where spectral centroid shifts were used to estimate tissue properties and attenuation in backscattered signals.

Mathematical Formulation

Continuous-Time Calculation

The spectral centroid in the continuous-time domain is defined as the ratio of the first moment of the magnitude spectrum to the zeroth moment, providing a measure of the spectrum's central tendency in frequency. For a continuous-time signal x(t)x(t) with Fourier transform X(ω)X(\omega), the spectral centroid CC is given by
C=0ωX(ω)dω0X(ω)dω, C = \frac{\int_0^\infty \omega |X(\omega)| \, d\omega}{\int_0^\infty |X(\omega)| \, d\omega},
where ω\omega denotes angular frequency in radians per second and X(ω)|X(\omega)| is the magnitude spectrum. This formulation arises from treating the magnitude spectrum X(ω)|X(\omega)| as a continuous density function analogous to mass distribution along the frequency axis. The numerator calculates the first moment by integrating frequency ω\omega weighted by the spectral magnitude, capturing the "balance point" of the distribution, while the denominator provides normalization by the total integrated magnitude, equivalent to the total "mass" or spectral energy in the positive frequency domain. This step-by-step construction ensures CC is invariant to scaling of the spectrum and represents a true mean frequency.[11] The derivation relies on assumptions of idealized continuous signals with infinite bandwidth support from 0 to \infty, though real signals are typically bandlimited; negative frequencies are excluded due to symmetry in real-valued signals. The resulting CC has units of angular frequency (rad/s), but it is often expressed in Hertz (Hz) via division by 2π2\pi or normalized relative to a reference bandwidth for comparability across signals. Positioned as the first moment of the spectrum, the spectral centroid quantifies the mean frequency location, forming the basis for higher-order moments that describe spectral shape, such as variance or asymmetry.[11] Perceptually, it correlates with the brightness of a sound timbre.[11]

Discrete-Time Implementation

In discrete-time systems, the spectral centroid is computed from the discrete Fourier transform (DFT) of finite-length signal frames, approximating the continuous-time integral via summation over frequency bins. For a frame of length NN sampled at rate fsf_s, the spectral centroid C(n)C(n) at time index nn is given by
C(n)=k=0N/2fkX(n,k)k=0N/2X(n,k), C(n) = \frac{\sum_{k=0}^{\lfloor N/2 \rfloor} f_k \cdot |X(n,k)|}{\sum_{k=0}^{\lfloor N/2 \rfloor} |X(n,k)|},
where fk=kfs/Nf_k = k \cdot f_s / N is the center frequency of the kk-th DFT bin in Hz (for positive frequencies), and X(n,k)|X(n,k)| is the magnitude of the DFT coefficient at bin kk for the nn-th frame.[12] This formulation treats the magnitude spectrum as a discrete probability distribution over frequencies, yielding the first moment as the centroid.[13] The computation typically proceeds in four steps within a short-time Fourier transform (STFT) framework: (1) Apply a window function (e.g., Hamming) to the signal frame to reduce spectral leakage, then compute the DFT using the fast Fourier transform (FFT) algorithm to obtain X(n,k)X(n,k); (2) Extract the magnitude spectrum X(n,k)|X(n,k)| for positive frequency bins k=0k = 0 to N/2\lfloor N/2 \rfloor; (3) Compute the numerator as the sum of bin frequencies weighted by magnitudes and the denominator as the sum of magnitudes; (4) Divide to obtain C(n)C(n), often excluding the DC bin (k=0k=0) to mitigate low-frequency bias.[12][13] A pseudocode implementation for a single frame is as follows:
function spectral_centroid(frame, fs, N):
    # Step 1: Window and FFT
    windowed = frame * hamming_window(N)
    X = fft(windowed)
    M = floor(N/2) + 1
    magnitudes = abs(X[0:M])  # Positive frequency bins for real signals
    
    # Step 2: Bin frequencies (exclude DC if desired)
    freqs = (0:M-1) * fs / N
    # freqs = (1:M-1) * fs / N  # Exclude DC (k=0)
    
    # Step 3-4: Weighted sum and normalize
    numerator = sum(freqs .* magnitudes)
    denominator = sum(magnitudes)
    
    if denominator == 0:
        return 0  # Or NaN/skip frame
    else:
        return numerator / denominator
This routine assumes a real-valued input frame and symmetric spectrum.[12] Edge cases require careful handling to ensure numerical stability. If the denominator (sum of magnitudes) is zero, indicating a silent or zero-energy frame, the centroid is undefined; implementations often return zero or propagate NaN while skipping such frames in analysis.[5] DC offset in the input signal concentrates energy at the zero-frequency bin, lowering the centroid; this is addressed by pre-filtering the signal with a high-pass filter or subtracting the mean to remove the offset before windowing.[14] Windowing introduces estimation errors due to sidelobe leakage, particularly overestimating at low frequencies and underestimating at high frequencies; larger window sizes (e.g., 1024 samples) reduce bias, and advanced methods like thresholding the magnitude spectrum below a noise floor (e.g., -14 dB) can yield exact estimates for structured signals.[15] The dominant computational cost arises from the FFT, yielding O(NlogN)O(N \log N) complexity per frame, suitable for offline processing but challenging for real-time applications at high sampling rates. Optimizations, such as recursive updates to the spectrogram or Toeplitz matrix formulations, maintain O(NlogN)O(N \log N) while reducing constants for streaming audio, enabling real-time centroid tracking without full recomputation.[16]

Properties and Interpretation

Perceptual Significance

The spectral centroid serves as a key perceptual cue in human auditory processing, particularly correlating with the perceived "brightness" or "sharpness" of a sound. Psychoacoustic studies have demonstrated that higher spectral centroid values are associated with brighter timbres, such as the high-frequency energy in cymbals (often exceeding 5 kHz), while lower values correspond to duller or warmer sounds, like those of a bass drum (typically below 1 kHz). This relationship arises because the centroid reflects the spectral balance toward higher frequencies, influencing subjective impressions of timbral acuity.[17] In timbre modeling, the spectral centroid plays a significant role in distinguishing musical instruments, as it captures variations in their characteristic spectral envelopes. For instance, the average spectral centroid for violin tones typically falls in the range of 2-3 kHz, contributing to its relatively bright and projecting quality, whereas clarinet tones exhibit lower averages around 1-2 kHz, resulting in a smoother, less piercing timbre. These differences aid in instrument identification tasks by providing a quantifiable proxy for perceptual distinctions in sound color.[18][19][20] Perceived spectral centroid is modulated by factors beyond the raw acoustic measure, including the harmonic structure and formant characteristics of the sound. Complex interactions, such as the distribution of harmonic amplitudes and resonant formants, can alter the effective perceptual weighting of frequencies, leading to deviations from purely computational estimates. Experimental evidence from psychoacoustic research, including multidimensional scaling analyses of synthesized and natural tones from the 1990s, confirms the spectral centroid as one of the primary dimensions underlying timbre dissimilarity judgments, often accounting for a substantial portion of variance in perceptual spaces.[18][21] Despite its prominence, the spectral centroid has limitations in fully describing timbre perception, as it primarily addresses spectral distribution and overlooks temporal attributes like attack time, which form independent perceptual dimensions. Studies using multidimensional scaling have shown that while centroid effectively models brightness-related aspects, integrating it with temporal cues is necessary for comprehensive timbre representation.[18]

Key Characteristics

The spectral centroid is monotonic in its response to shifts in frequency emphasis: as the relative energy allocation increases toward higher frequencies, the centroid value rises proportionally, reflecting a "brighter" spectral balance in a purely mathematical sense.[22] This property arises directly from its formulation as a weighted average, where higher-frequency bins contribute more to the numerator when their amplitudes grow. The measure is invariant to uniform amplitude scaling across the entire spectrum, since both the weighted frequency sum and the normalizing total amplitude scale identically, preserving the ratio.[22] However, it proves highly sensitive to alterations in spectral shape, such as uneven energy redistribution or the introduction of new frequency components, which can cause substantial shifts even without overall amplitude changes.[22] Expressed in units of hertz (Hz), the spectral centroid quantifies a central frequency location, akin to the center of mass in a physical distribution.[22] In discrete-time implementations for signals sampled at rate $ f_s $, its possible values span from near 0 Hz—for spectra dominated by low-frequency content, as in low-pass filtered noise—to approaching the Nyquist limit of $ f_s / 2 $ for high-pass configurations where energy concentrates at the upper band edge.[4] This range underscores its utility in characterizing broadband versus narrowband spectral behaviors, though actual values depend on the signal's bandwidth and sampling parameters. As the first moment of the power spectral density treated as a probability distribution, the spectral centroid captures the expected frequency value weighted by energy.[22] It pairs naturally with subsequent moments for a fuller statistical profile: the second central moment, known as spectral spread or variance, measures deviation from the centroid, quantifying the spectrum's width or irregularity.[22] Higher moments like skewness further describe asymmetry, enabling a comprehensive moment-based analysis of spectral form without assuming specific distributional shapes. The spectral centroid shows relative robustness to additive Gaussian noise, retaining discriminative capability in degraded conditions, as utilized in subband-based features for speech processing amid environmental interference.[23] Its estimation, however, remains vulnerable to aliasing from undersampling, where high-frequency content folds into lower bins and skews the weighting toward erroneous locations. Inadequate windowing during short-time spectral analysis exacerbates this through spectral leakage and sidelobe artifacts, which distort amplitude estimates and bias the centroid.[15] Computationally, it benefits from high stability, relying on finite summations that resist overflow or precision loss in floating-point arithmetic for typical audio spectra. Due to its reliance on an untrimmed arithmetic mean, the centroid is particularly sensitive to spectral outliers—such as transient high-amplitude impulses at extreme frequencies—that can disproportionately influence the result despite minimal total energy impact.[22]

Applications

Audio and Music Processing

In audio and music processing, the spectral centroid serves as a key feature for timbre analysis and instrument recognition, capturing the perceived brightness or "center of mass" of a sound's spectrum to differentiate timbral qualities among instruments. For instance, it is widely used in feature extraction pipelines within libraries like Essentia and Librosa, where it helps classify acoustic instruments by quantifying spectral distribution; in Essentia, the SpectralCentroid algorithm computes this measure to indicate timbral brightness, aiding in tasks such as isolating orchestral sounds based on their harmonic content.[24][5] Research demonstrates its effectiveness in musical instrument timbre classification, where spectral centroid, alongside inharmonicity and partial energy, achieves high recognition accuracy for solo instruments like piano and violin by highlighting differences in spectral spread.[25] The spectral centroid also plays a prominent role in music genre and emotion classification, where higher values often correlate with energetic or bright genres such as rock and disco, contrasting with lower centroids in classical or jazz tracks. In analyses using the GTZAN dataset, it emerges as a discriminative feature for genre categorization, reflecting the spectral energy concentration that distinguishes rhythmic, high-frequency dominant styles from smoother, lower-frequency ones; for example, studies report improved classification accuracy when combining spectral centroid with other spectral features like rolloff and flux.[26] For emotion recognition, it contributes to identifying arousal levels, with brighter centroids (e.g., above 2-3 kHz) linked to happy or exciting moods in multimodal systems analyzing both audio and lyrics.[27] In audio effects and synthesis, real-time spectral centroid tracking enables dynamic adjustments to enhance or modify perceived brightness, such as in adaptive equalization or filter automation within digital audio workstations (DAWs). This is applied in effects processing where the centroid controls parameters like high-pass filter cutoffs to simulate timbral shifts, correlating its value with perceptual sharpness for transparent mappings in real-time synthesis.[28] For example, in environments like Ableton Live, spectral analysis tools leverage centroid estimation via short-time Fourier transform (STFT) frames to drive effects chains, allowing producers to automate brightness adjustments during mixing for more vivid sound design.[29] In speech processing, the spectral centroid aids in distinguishing phonetic elements like vowels and consonants by capturing formant-related spectral centers, where vowels exhibit lower centroids due to concentrated energy in lower frequencies, while consonants show higher, more dispersed values indicative of frication or bursts. This feature supports tasks such as consonant-vowel ratio modification, using centroid transitions to detect and enhance speech clarity in noisy environments.[30] Studies highlight its role in voicing decisions, where combining centroid with spectral spread improves discrimination of fricatives and stops, enhancing automatic speech recognition systems.[31] A notable case study is its application in automatic music transcription, where spectral centroid features improve note detection and polyphony resolution in polyphonic audio from the 2000s onward. In systems processing orchestral or piano recordings, incorporating centroid alongside harmonicity and bandwidth boosts transcription accuracy by up to 20-30% in benchmark tests on datasets like MAPS, helping to onset detection and pitch estimation by modeling timbral context.[32] Early research in the decade emphasized its utility in feature sets for monophonic and polyphonic transcription, reducing errors in instrument-specific note identification through spectral balance analysis.[33]

Other Scientific Domains

In ultrasound imaging, the spectral centroid of backscattered radio-frequency signals serves as a key parameter for quantitative tissue characterization, enabling the estimation of properties such as attenuation and backscatter coefficients without relying on computationally intensive frequency-domain transforms. A time-domain approach proposed in 2012 calculates the spectral centroid directly from the envelope of backscattered signals using circular autocorrelation, achieving estimation errors below 0.2% across various propagation depths and enabling real-time analysis in clinical systems.[34] This method has been applied to differentiate tissue types, such as in elastography, where centroid shifts reflect strain-induced frequency changes, providing robust estimates insensitive to phase decorrelation noise compared to traditional time-delay methods.[34] In power systems engineering, the spectral centroid facilitates harmonic analysis by quantifying the "balance" of distortion in electrical signals, aiding fault detection and grid monitoring through improved accuracy in frequency and magnitude estimation under noisy conditions. By dividing harmonic spectra into sub-bands and computing sub-band centroids, the technique avoids peak-search interpolation errors common in conventional methods, enhancing detection of imbalances caused by non-linear loads like inverters or transformers. Simulations from 2013 demonstrate its efficacy in overcoming fundamental frequency fluctuations, with applications in real-time power quality assessment to prevent equipment damage from harmonic distortions.[35] An analogous two-dimensional spectral centroid, derived from the Fourier transform of images, extends the concept to computer vision for texture analysis, where it captures the spatial distribution of energy in the frequency domain to quantify coarseness, directionality, and periodicity. In texture classification tasks, the 2D centroid's position and magnitude help discriminate patterns, such as in road extraction from aerial imagery, by aligning with Gabor filter orientations that model texture primitives. This measure proves particularly useful in rotation-invariant feature extraction, reducing sensitivity to image orientation while preserving discriminative power for applications like material recognition or segmentation.[36] In environmental acoustics, spectral centroid shifts are employed for noise source localization within urban soundscapes, leveraging variations in the "brightness" of acoustic spectra to isolate and map contributors like traffic or machinery amid complex ambient mixtures. By analyzing centroid dynamics in recordings, researchers distinguish anthropogenic from natural sounds, facilitating targeted interventions in noise pollution management; for instance, lower centroids correlate with muffled urban traffic, enabling localization via beamforming arrays. This approach integrates with eco-acoustic indices to evaluate soundscape quality, as seen in campus studies where centroid trends reveal event typologies and support perceptual assessments.[37] Emerging applications in seismic signal processing post-2020 utilize the spectral centroid to center wave energy distributions, aiding in the characterization of rockfall or firn attenuation by identifying the mean frequency of seismic spectra and quantifying energy shifts due to propagation effects. In analyses of fragmental rockfalls, the centroid (also termed frequency centroid) correlates with source size and material properties, providing a non-parametric indicator of wave energy localization without assuming Gaussian spectra.[38] Recent studies on Antarctic firn demonstrate its role in computing quality factors from centroid differences along propagation paths, enhancing models of seismic wave dissipation in ice structures.[39]

Variations and Alternatives

Modified Formulations

One adaptation of the standard spectral centroid incorporates perceptual weighting to better align with human auditory perception, particularly by emphasizing higher frequencies in a manner consistent with psychoacoustic models. This weighted spectral centroid, often used to compute sharpness—a perceptual attribute related to the "high-frequency" content of a sound—is calculated as $ C_w = \frac{\sum_k f_k \cdot |X(k)| \cdot w_k}{\sum_k |X(k)| \cdot w_k} $, where $ f_k $ is the frequency bin, $ |X(k)| $ is the magnitude spectrum, and $ w_k $ is a frequency-dependent weight, such as one derived from the Bark scale to approximate critical bands of hearing.[40][41] The Bark scale weighting shifts the centroid toward perceptually relevant bands, enhancing its utility in timbre analysis where linear frequency scaling may underrepresent high-frequency contributions.[42] For dynamic audio signals, the spectral centroid is extended to a time-varying form by computing it across consecutive frames of the short-time Fourier transform (STFT), typically with window lengths of 20–60 ms to capture temporal evolution. This approach tracks changes in spectral "brightness" over time, revealing patterns such as the brightening during note attacks in musical instruments. To mitigate frame-to-frame variability and noise, smoothing techniques like moving averages or low-pass filtering are applied to the sequence of centroid values, producing a more stable trajectory for analysis.[43] Multidimensional extensions generalize the centroid to two or more dimensions, treating the spectrum or spectrogram as a 2D distribution for applications beyond 1D audio spectra. In image processing and 2D spectral analysis, the 2D spectral centroid is computed as the center of mass in the 2D frequency domain, given by $ C_x = \frac{\sum_{u,v} u \cdot |F(u,v)|}{\sum_{u,v} |F(u,v)|} $ and $ C_y = \frac{\sum_{u,v} v \cdot |F(u,v)|}{\sum_{u,v} |F(u,v)|} $, where $ F(u,v) $ is the 2D Fourier transform and $ (u,v) $ are frequency coordinates; this locates dominant spatial frequencies in textures or patterns.[36] For audio spectrograms, similar 2D formulations treat time and frequency axes jointly to estimate overall energy distribution centroids.[44] Hybrid variants integrate the spectral centroid with other spectral measures, such as spectral flux, to enhance detection of signal changes like note onsets in polyphonic music. In these approaches, the centroid's brightness trajectory is combined with flux's sensitivity to spectral differences between frames, often through feature concatenation or weighted fusion in onset detection algorithms, improving robustness to varying instrument timbres.[45][46] The spectral centroid, as a measure of central tendency analogous to the mean of a frequency-weighted distribution, differs from the spectral rolloff, which identifies the frequency threshold containing a specified percentile (typically 85-95%) of the total spectral energy. While both features assess the distribution of high-frequency content and correlate with perceived brightness, the centroid provides a balanced "center of gravity" across the entire spectrum, whereas rolloff emphasizes the upper edge of energy concentration, making it more indicative of bandwidth extent.[8][4] In contrast to spectral flux, which quantifies temporal variations by computing the squared differences between successive normalized magnitude spectra, the spectral centroid evaluates the static balance of spectral energy at a given instant without capturing frame-to-frame changes. Spectral flux is thus suited for detecting onsets or segmentation in audio, complementing the centroid's role in timbre description.[8] The spectral spread, often computed as the standard deviation of frequencies around the centroid in the same weighted distribution, measures the dispersion or "instantaneous bandwidth" of the spectrum, indicating how tightly energy is concentrated. This positions the centroid as the mean and spread as the variance of the spectral distribution, allowing joint analysis of location and variability in spectral shape.[8][4] In music information retrieval (MIR) tasks such as genre classification, the spectral centroid is frequently combined with rolloff, flux, and spread in feature vectors to enhance robustness, as these measures capture complementary aspects of timbral texture. For instance, a study achieved classification accuracies of up to 61% across ten genres using timbral features (including centroid, rolloff, and flux), along with rhythmic and pitch features.[26] Although computationally simpler—requiring only a single weighted average—the spectral centroid, as a mean-like statistic, is more sensitive to spectral outliers (e.g., narrow high-amplitude peaks) than median-based alternatives, which use the 50th percentile of the energy distribution for greater robustness in noisy or skewed spectra.[47]

References

User Avatar
No comments yet.