Recent from talks
Nothing was collected or created yet.
Dynamic time warping
View on Wikipedia


In time series analysis, dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences, which may vary in speed. For instance, similarities in walking could be detected using DTW, even if one person was walking faster than the other, or if there were accelerations and decelerations during the course of an observation. DTW has been applied to temporal sequences of video, audio, and graphics data — indeed, any data that can be turned into a one-dimensional sequence can be analyzed with DTW. A well-known application has been automatic speech recognition, to cope with different speaking speeds. Other applications include speaker recognition and online signature recognition. It can also be used in partial shape matching applications.
In general, DTW is a method that calculates an optimal match between two given sequences (e.g. time series) with certain restriction and rules:
- Every index from the first sequence must be matched with one or more indices from the other sequence, and vice versa
- The first index from the first sequence must be matched with the first index from the other sequence (but it does not have to be its only match)
- The last index from the first sequence must be matched with the last index from the other sequence (but it does not have to be its only match)
- The mapping of the indices from the first sequence to indices from the other sequence must be monotonically increasing, and vice versa, i.e. if are indices from the first sequence, then there must not be two indices in the other sequence, such that index is matched with index and index is matched with index , and vice versa
We can plot each match between the sequences and as a path in a matrix from to , such that each step is one of . In this formulation, we see that the number of possible matches is the Delannoy number.
The optimal match is denoted by the match that satisfies all the restrictions and the rules and that has the minimal cost, where the cost is computed as the sum of absolute differences, for each matched pair of indices, between their values.
The sequences are "warped" non-linearly in the time dimension to determine a measure of their similarity independent of certain non-linear variations in the time dimension. This sequence alignment method is often used in time series classification. Although DTW measures a distance-like quantity between two given sequences, it doesn't guarantee the triangle inequality to hold.
In addition to a similarity measure between the two sequences (a so called "warping path" is produced), by warping according to this path the two signals may be aligned in time. The signal with an original set of points X(original), Y(original) is transformed to X(warped), Y(warped). This finds applications in genetic sequence and audio synchronisation. In a related technique sequences of varying speed may be averaged using this technique see the average sequence section.
This is conceptually very similar to the Needleman–Wunsch algorithm.
Implementation
[edit]This example illustrates the implementation of the dynamic time warping algorithm when the two sequences s and t are strings of discrete symbols. For two symbols x and y, is a distance between the symbols, e.g., .
int DTWDistance(s: array [1..n], t: array [1..m]) {
DTW := array [0..n, 0..m]
for i := 0 to n
for j := 0 to m
DTW[i, j] := infinity
DTW[0, 0] := 0
for i := 1 to n
for j := 1 to m
cost := d(s[i], t[j])
DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion
DTW[i , j-1], // deletion
DTW[i-1, j-1]) // match
return DTW[n, m]
}
where DTW[i, j] is the distance between s[1:i] and t[1:j] with the best alignment.
We sometimes want to add a locality constraint. That is, we require that if s[i] is matched with t[j], then is no larger than w, a window parameter.
We can easily modify the above algorithm to add a locality constraint (differences marked). However, the above given modification works only if is no larger than w, i.e. the end point is within the window length from diagonal. In order to make the algorithm work, the window parameter w must be adapted so that (see the line marked with (*) in the code).
int DTWDistance(s: array [1..n], t: array [1..m], w: int) {
DTW := array [0..n, 0..m]
w := max(w, abs(n-m)) // adapt window size (*)
for i := 0 to n
for j:= 0 to m
DTW[i, j] := infinity
DTW[0, 0] := 0
for i := 1 to n
for j := max(1, i-w) to min(m, i+w)
DTW[i, j] := 0
for i := 1 to n
for j := max(1, i-w) to min(m, i+w)
cost := d(s[i], t[j])
DTW[i, j] := cost + minimum(DTW[i-1, j ], // insertion
DTW[i , j-1], // deletion
DTW[i-1, j-1]) // match
return DTW[n, m]
}
Warping properties
[edit]The DTW algorithm produces a discrete matching between existing elements of one series to another. In other words, it does not allow time-scaling of segments within the sequence. Other methods allow continuous warping. For example, Correlation Optimized Warping (COW) divides the sequence into uniform segments that are scaled in time using linear interpolation, to produce the best matching warping. The segment scaling causes potential creation of new elements, by time-scaling segments either down or up, and thus produces a more sensitive warping than DTW's discrete matching of raw elements.
Complexity
[edit]The time complexity of the DTW algorithm is , where and are the lengths of the two input sequences. The 50 years old quadratic time bound was broken in 2016: an algorithm due to Gold and Sharir enables computing DTW in time and space for two input sequences of length .[2] This algorithm can also be adapted to sequences of different lengths. Despite this improvement, it was shown that a strongly subquadratic running time of the form for some cannot exist unless the Strong exponential time hypothesis fails.[3][4]
While the dynamic programming algorithm for DTW requires space in a naive implementation, the space consumption can be reduced to using Hirschberg's algorithm.
Fast computation
[edit]Fast techniques for computing DTW include PrunedDTW,[5] SparseDTW,[6] FastDTW,[7] and the MultiscaleDTW.[8][9]
A common task, retrieval of similar time series, can be accelerated by using lower bounds such as LB_Keogh,[10] LB_Improved,[11] or LB_Petitjean.[12] However, the Early Abandon and Pruned DTW algorithm reduces the degree of acceleration that lower bounding provides and sometimes renders it ineffective.
In a survey, Wang et al. reported slightly better results with the LB_Improved lower bound than the LB_Keogh bound, and found that other techniques were inefficient.[13] Subsequent to this survey, the LB_Enhanced bound was developed that is always tighter than LB_Keogh while also being more efficient to compute.[14] LB_Petitjean is the tightest known lower bound that can be computed in linear time.[12]
Average sequence
[edit]Averaging for dynamic time warping is the problem of finding an average sequence for a set of sequences. NLAAF[15] is an exact method to average two sequences using DTW. For more than two sequences, the problem is related to that of multiple alignment and requires heuristics. DBA[16] is currently a reference method to average a set of sequences consistently with DTW. COMASA[17] efficiently randomizes the search for the average sequence, using DBA as a local optimization process.
Supervised learning
[edit]A nearest-neighbour classifier can achieve state-of-the-art performance when using dynamic time warping as a distance measure.[18]
Amerced Dynamic Time Warping
[edit]Amerced Dynamic Time Warping (ADTW) is a variant of DTW designed to better control DTW's permissiveness in the alignments that it allows.[19] The windows that classical DTW uses to constrain alignments introduce a step function. Any warping of the path is allowed within the window and none beyond it. In contrast, ADTW employs an additive penalty that is incurred each time that the path is warped. Any amount of warping is allowed, but each warping action incurs a direct penalty. ADTW significantly outperforms DTW with windowing when applied as a nearest neighbor classifier on a set of benchmark time series classification tasks.[19]
Alternative approaches
[edit]In functional data analysis, time series are regarded as discretizations of smooth (differentiable) functions of time. By viewing the observed samples at smooth functions, one can utilize continuous mathematics for analyzing data.[20] Smoothness and monotonicity of time warp functions may be obtained for instance by integrating a time-varying radial basis function, thus being a one-dimensional diffeomorphism.[21] Optimal nonlinear time warping functions are computed by minimizing a measure of distance of the set of functions to their warped average. Roughness penalty terms for the warping functions may be added, e.g., by constraining the size of their curvature. The resultant warping functions are smooth, which facilitates further processing. This approach has been successfully applied to analyze patterns and variability of speech movements.[22][23]
Another related approach are hidden Markov models (HMM) and it has been shown that the Viterbi algorithm used to search for the most likely path through the HMM is equivalent to stochastic DTW.[24][25][26]
DTW and related warping methods are typically used as pre- or post-processing steps in data analyses. If the observed sequences contain both random variation in both their values, shape of observed sequences and random temporal misalignment, the warping may overfit to noise leading to biased results. A simultaneous model formulation with random variation in both values (vertical) and time-parametrization (horizontal) is an example of a nonlinear mixed-effects model.[27] In human movement analysis, simultaneous nonlinear mixed-effects modeling has been shown to produce superior results compared to DTW.[28]
Open-source software
[edit]- The tempo C++ library with Python bindings implements Early Abandoned and Pruned DTW as well as Early Abandoned and Pruned ADTW and DTW lower bounds LB_Keogh, LB_Enhanced and LB_Webb.
- The UltraFastMPSearch Java library implements the UltraFastWWSearch algorithm[29] for fast warping window tuning.
- The lbimproved C++ library implements Fast Nearest-Neighbor Retrieval algorithms under the GNU General Public License (GPL). It also provides a C++ implementation of dynamic time warping, as well as various lower bounds.
- The FastDTW library is a Java implementation of DTW and a FastDTW implementation that provides optimal or near-optimal alignments with an O(N) time and memory complexity, in contrast to the O(N2) requirement for the standard DTW algorithm. FastDTW uses a multilevel approach that recursively projects a solution from a coarser resolution and refines the projected solution.
- FastDTW fork (Java) published to Maven Central.
- time-series-classification (Java) a package for time series classification using DTW in Weka.
- The DTW suite provides Python (dtw-python) and R packages (dtw) with a comprehensive coverage of the DTW algorithm family members, including a variety of recursion rules (also called step patterns), constraints, and substring matching.
- The mlpy Python library implements DTW.
- The pydtw Python library implements the Manhattan and Euclidean flavoured DTW measures including the LB_Keogh lower bounds.
- The cudadtw C++/CUDA library implements subsequence alignment of Euclidean-flavoured DTW and z-normalized Euclidean distance similar to the popular UCR-Suite on CUDA-enabled accelerators.
- The JavaML machine learning library implements DTW.
- The ndtw C# library implements DTW with various options.
- Sketch-a-Char uses Greedy DTW (implemented in JavaScript) as part of LaTeX symbol classifier program.
- The MatchBox implements DTW to match mel-frequency cepstral coefficients of audio signals.
- Sequence averaging: a GPL Java implementation of DBA.[16]
- The Gesture Recognition Toolkit|GRT C++ real-time gesture-recognition toolkit implements DTW.
- The PyHubs software package implements DTW and nearest-neighbour classifiers, as well as their extensions (hubness-aware classifiers).
- The simpledtw Python library implements the classic O(NM) Dynamic Programming algorithm and bases on Numpy. It supports values of any dimension, as well as using custom norm functions for the distances. It is licensed under the MIT license.
- The tslearn Python library implements DTW in the time-series context.
- The cuTWED CUDA Python library implements a state of the art improved Time Warp Edit Distance using only linear memory with phenomenal speedups.
- DynamicAxisWarping.jl Is a Julia implementation of DTW and related algorithms such as FastDTW, SoftDTW, GeneralDTW and DTW barycenters.
- The Multi_DTW implements DTW to match two 1-D arrays or 2-D speech files (2-D array).
- The dtwParallel (Python) package incorporates the main functionalities available in current DTW libraries and novel functionalities such as parallelization, computation of similarity (kernel-based) values, and consideration of data with different types of features (categorical, real-valued, etc.).[30]
Applications
[edit]Spoken-word recognition
[edit]Due to different speaking rates, a non-linear fluctuation occurs in speech pattern versus time axis, which needs to be eliminated.[31] DP matching is a pattern-matching algorithm based on dynamic programming (DP), which uses a time-normalization effect, where the fluctuations in the time axis are modeled using a non-linear time-warping function. Considering any two speech patterns, we can get rid of their timing differences by warping the time axis of one so that the maximal coincidence is attained with the other. Moreover, if the warping function is allowed to take any possible value, very less[clarify] distinction can be made between words belonging to different categories. So, to enhance the distinction between words belonging to different categories, restrictions were imposed on the warping function slope.
Correlation power analysis
[edit]Unstable clocks are used to defeat naive power analysis. Several techniques are used to counter this defense, one of which is dynamic time warping.
Finance and econometrics
[edit]Dynamic time warping is used in finance and econometrics to assess the quality of the prediction versus real-world data.[32][33][34]
See also
[edit]References
[edit]- ^ Olsen, NL; Markussen, B; Raket, LL (2018), "Simultaneous inference for misaligned multivariate functional data", Journal of the Royal Statistical Society, Series C, 67 (5): 1147–76, arXiv:1606.03295, doi:10.1111/rssc.12276, S2CID 88515233
- ^ Gold, Omer; Sharir, Micha (2018). "Dynamic Time Warping and Geometric Edit Distance: Breaking the Quadratic Barrier". ACM Transactions on Algorithms. 14 (4). doi:10.1145/3230734. S2CID 52070903.
- ^ Bringmann, Karl; Künnemann, Marvin (2015). "Quadratic Conditional Lower Bounds for String Problems and Dynamic Time Warping". 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. pp. 79–97. arXiv:1502.01063. doi:10.1109/FOCS.2015.15. ISBN 978-1-4673-8191-8. S2CID 1308171.
- ^ Abboud, Amir; Backurs, Arturs; Williams, Virginia Vassilevska (2015). "Tight Hardness Results for LCS and Other Sequence Similarity Measures". 2015 IEEE 56th Annual Symposium on Foundations of Computer Science. pp. 59–78. doi:10.1109/FOCS.2015.14. ISBN 978-1-4673-8191-8. S2CID 16094517.
- ^ Silva, D. F., Batista, G. E. A. P. A. (2015). Speeding Up All-Pairwise Dynamic Time Warping Matrix Calculation.
- ^ Al-Naymat, G., Chawla, S., Taheri, J. (2012). SparseDTW: A Novel Approach to Speed up Dynamic Time Warping.
- ^ Stan Salvador, Philip Chan, FastDTW: Toward Accurate Dynamic Time Warping in Linear Time and Space. KDD Workshop on Mining Temporal and Sequential Data, pp. 70–80, 2004.
- ^ Meinard Müller, Henning Mattes, and Frank Kurth (2006). An Efficient Multiscale Approach to Audio Synchronization. Proceedings of the International Conference on Music Information Retrieval (ISMIR), pp. 192—197.
- ^ Thomas Prätzlich, Jonathan Driedger, and Meinard Müller (2016). Memory-Restricted Multiscale Dynamic Time Warping. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 569—573.
- ^ Keogh, E.; Ratanamahatana, C. A. (2005). "Exact indexing of dynamic time warping". Knowledge and Information Systems. 7 (3): 358–386. doi:10.1007/s10115-004-0154-9. S2CID 207056701.
- ^ Lemire, D. (2009). "Faster Retrieval with a Two-Pass Dynamic-Time-Warping Lower Bound". Pattern Recognition. 42 (9): 2169–2180. arXiv:0811.3301. Bibcode:2009PatRe..42.2169L. doi:10.1016/j.patcog.2008.11.030. S2CID 8658213.
- ^ a b Webb, Geoffrey I.; Petitjean, Francois (2021). "Tight lower bounds for Dynamic Time Warping". Pattern Recognition. 115 107895. arXiv:2102.07076. Bibcode:2021PatRe.11507895W. doi:10.1016/j.patcog.2021.107895. S2CID 231925247.
- ^ Wang, Xiaoyue; et al. (2010). "Experimental comparison of representation methods and distance measures for time series data". Data Mining and Knowledge Discovery. 2010: 1–35. arXiv:1012.2789.
- ^ Tan, Chang Wei; Petitjean, Francois; Webb, Geoffrey I. (2019). "Elastic bands across the path: A new framework and method to lower bound DTW". Proceedings of the 2019 SIAM International Conference on Data Mining. pp. 522–530. arXiv:1808.09617. doi:10.1137/1.9781611975673.59. ISBN 978-1-61197-567-3. S2CID 52120426.
- ^ Gupta, L.; Molfese, D. L.; Tammana, R.; Simos, P. G. (1996). "Nonlinear alignment and averaging for estimating the evoked potential". IEEE Transactions on Biomedical Engineering. 43 (4): 348–356. doi:10.1109/10.486255. PMID 8626184. S2CID 28688330.
- ^ a b Petitjean, F. O.; Ketterlin, A.; Gançarski, P. (2011). "A global averaging method for dynamic time warping, with applications to clustering". Pattern Recognition. 44 (3): 678. Bibcode:2011PatRe..44..678P. doi:10.1016/j.patcog.2010.09.013.
- ^ Petitjean, F. O.; Gançarski, P. (2012). "Summarizing a set of time series by averaging: From Steiner sequence to compact multiple alignment". Theoretical Computer Science. 414: 76–91. doi:10.1016/j.tcs.2011.09.029.
- ^ Ding, Hui; Trajcevski, Goce; Scheuermann, Peter; Wang, Xiaoyue; Keogh, Eamonn (2008). "Querying and mining of time series data: experimental comparison of representations and distance measures". Proc. VLDB Endow. 1 (2): 1542–1552. doi:10.14778/1454159.1454226.
- ^ a b Herrmann, Matthieu; Webb, Geoffrey I. (2023). "Amercing: An intuitive and effective constraint for dynamic time warping". Pattern Recognition. 137 109333. Bibcode:2023PatRe.13709333H. doi:10.1016/j.patcog.2023.109333. S2CID 256182457.
- ^ Lucero, J. C.; Munhall, K. G.; Gracco, V. G.; Ramsay, J. O. (1997). "On the Registration of Time and the Patterning of Speech Movements". Journal of Speech, Language, and Hearing Research. 40 (5): 1111–1117. doi:10.1044/jslhr.4005.1111. PMID 9328881.
- ^ Durrleman, S; Pennec, X.; Trouvé, A.; Braga, J.; Gerig, G. & Ayache, N. (2013). "Toward a Comprehensive Framework for the Spatiotemporal Statistical Analysis of Longitudinal Shape Data". International Journal of Computer Vision. 103 (1): 22–59. doi:10.1007/s11263-012-0592-x. PMC 3744347. PMID 23956495.
- ^ Howell, P.; Anderson, A.; Lucero, J. C. (2010). "Speech motor timing and fluency". In Maassen, B.; van Lieshout, P. (eds.). Speech Motor Control: New Developments in Basic and Applied Research. Oxford University Press. pp. 215–225. ISBN 978-0-19-923579-7.
- ^ Koenig, Laura L.; Lucero, Jorge C.; Perlman, Elizabeth (2008). "Speech production variability in fricatives of children and adults: Results of functional data analysis". The Journal of the Acoustical Society of America. 124 (5): 3158–3170. Bibcode:2008ASAJ..124.3158K. doi:10.1121/1.2981639. ISSN 0001-4966. PMC 2677351. PMID 19045800.
- ^ Nakagawa, Seiichi; Nakanishi, Hirobumi (1988-01-01). "Speaker-Independent English Consonant and Japanese Word Recognition by a Stochastic Dynamic Time Warping Method". IETE Journal of Research. 34 (1): 87–95. doi:10.1080/03772063.1988.11436710. ISSN 0377-2063.
- ^ Fang, Chunsheng. "From Dynamic Time Warping (DTW) to Hidden Markov Model (HMM)" (PDF).
- ^ Juang, B. H. (September 1984). "On the hidden Markov model and dynamic time warping for speech recognition #x2014; A unified view". AT&T Bell Laboratories Technical Journal. 63 (7): 1213–1243. doi:10.1002/j.1538-7305.1984.tb00034.x. ISSN 0748-612X. S2CID 8461145.
- ^ Raket LL, Sommer S, Markussen B (2014). "A nonlinear mixed-effects model for simultaneous smoothing and registration of functional data". Pattern Recognition Letters. 38: 1–7. Bibcode:2014PaReL..38....1R. doi:10.1016/j.patrec.2013.10.018.
- ^ Raket LL, Grimme B, Schöner G, Igel C, Markussen B (2016). "Separating timing, movement conditions and individual differences in the analysis of human movement". PLOS Computational Biology. 12 (9) e1005092. arXiv:1601.02775. Bibcode:2016PLSCB..12E5092R. doi:10.1371/journal.pcbi.1005092. PMC 5033575. PMID 27657545.
- ^ Tan, Chang Wei; Herrmann, Matthieu; Webb, Geoffrey I. (2021). "Ultra fast warping window optimization for Dynamic Time Warping" (PDF). 2021 IEEE International Conference on Data Mining (ICDM). pp. 589–598. doi:10.1109/ICDM51629.2021.00070. ISBN 978-1-6654-2398-4. S2CID 246291550.
- ^ Escudero-Arnanz, Óscar; Marques, Antonio G; Soguero-Ruiz, Cristina; Mora-Jiménez, Inmaculada; Robles, Gregorio (2023). "dtwParallel: A Python package to efficiently compute dynamic time warping between time series". SoftwareX. 22 (101364). Bibcode:2023SoftX..2201364E. doi:10.1016/J.SOFTX.2023.101364. hdl:10115/24752. Retrieved 2024-12-06.
- ^ Sakoe, Hiroaki; Chiba, Seibi (1978). "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech, and Signal Processing. 26 (1): 43–49. doi:10.1109/tassp.1978.1163055. S2CID 17900407.
- ^ Orlando, Giuseppe; Bufalo, Michele; Stoop, Ruedi (2022-02-01). "Financial markets' deterministic aspects modeled by a low-dimensional equation". Scientific Reports. 12 (1): 1693. Bibcode:2022NatSR..12.1693O. doi:10.1038/s41598-022-05765-z. ISSN 2045-2322. PMC 8807815. PMID 35105929.
- ^ Mastroeni, Loretta; Mazzoccoli, Alessandro; Quaresima, Greta; Vellucci, Pierluigi (2021-02-01). "Decoupling and recoupling in the crude oil price benchmarks: An investigation of similarity patterns". Energy Economics. 94 105036. Bibcode:2021EneEc..9405036M. doi:10.1016/j.eneco.2020.105036. ISSN 0140-9883. S2CID 230536868.
- ^ Orlando, Giuseppe; Bufalo, Michele (2021-12-10). "Modelling bursts and chaos regularization in credit risk with a deterministic nonlinear model". Finance Research Letters. 47 102599. doi:10.1016/j.frl.2021.102599. ISSN 1544-6123.
Further reading
[edit]- Pavel Senin, Dynamic Time Warping Algorithm Review
- Vintsyuk, T. K. (1968). "Speech discrimination by dynamic programming". Kibernetika. 4: 81–88.
- Sakoe, H.; Chiba (1978). "Dynamic programming algorithm optimization for spoken word recognition". IEEE Transactions on Acoustics, Speech, and Signal Processing. 26 (1): 43–49. doi:10.1109/tassp.1978.1163055. S2CID 17900407.
- Myers, C. S.; Rabiner, L. R. (1981). "A Comparative Study of Several Dynamic Time-Warping Algorithms for Connected-Word Recognition". Bell System Technical Journal. 60 (7): 1389–1409. doi:10.1002/j.1538-7305.1981.tb00272.x. ISSN 0005-8580. S2CID 12857347.
- Rabiner, Lawrence; Juang, Biing-Hwang (1993). "Chapter 4: Pattern-Comparison Techniques". Fundamentals of speech recognition. Englewood Cliffs, N.J.: PTR Prentice Hall. ISBN 978-0-13-015157-5.
- Müller, Meinard (2007). Dynamic Time Warping. In Information Retrieval for Music and Motion, chapter 4, pages 69-84. Springer. doi:10.1007/978-3-540-74048-3_4. ISBN 978-3-540-74047-6.
- Rakthanmanon, Thanawin (September 2013). "Addressing Big Data Time Series: Mining Trillions of Time Series Subsequences Under Dynamic Time Warping". ACM Transactions on Knowledge Discovery from Data. 7 (3): 10:1–10:31. doi:10.1145/2513092.2500489. PMC 6790126. PMID 31607834.
Dynamic time warping
View on GrokipediaFundamentals
Definition and Motivation
Time series data consist of ordered sequences of observations recorded at successive points in time, capturing temporal dependencies and patterns in domains such as finance, biology, and signal processing.[5] Dynamic time warping (DTW) is an algorithm for measuring similarity between two temporal sequences by aligning them through a nonlinear warping of the time axis, which minimizes the overall distance between the sequences while accommodating variations in timing or speed.[6] This approach finds an optimal warping path that maps corresponding elements between the sequences, allowing similar shapes to be matched despite elastic distortions.[7] The motivation for DTW arises from the limitations of rigid similarity metrics like the Euclidean distance, which assumes a one-to-one point correspondence and performs poorly on sequences with temporal shifts, stretching, or compression, even if their underlying patterns are similar.[5] For instance, in speech recognition, spoken words exhibit nonlinear fluctuations in speaking rate, making direct comparisons unreliable; DTW addresses this by time-normalizing the patterns to enhance matching accuracy, as demonstrated in early applications achieving up to 99.8% recognition rates for isolated words.[7] By enabling flexible alignments, DTW provides a more robust measure for tasks involving variable-speed sequences across diverse fields.[6]History
Dynamic time warping (DTW) originated in the late 1960s as a technique for aligning speech signals in pattern matching tasks. The method was first proposed by Taras K. Vintsyuk in 1968, who introduced a dynamic programming approach for speech discrimination by optimally aligning sequences to account for temporal variations in pronunciation.[8] Building on this, Hiroaki Sakoe and Seibi Chiba advanced the algorithm in 1971 for continuous speech recognition, presenting it at the Seventh International Congress on Acoustics as a way to handle variable speaking rates through nonlinear time normalization.[9] Their seminal 1978 paper further optimized the dynamic programming framework, introducing the Sakoe-Chiba band constraint to limit warping paths and improve computational efficiency for isolated word recognition.[10] In the 1970s, DTW found early applications primarily in isolated word speech recognition systems developed at laboratories such as Bell Labs and NEC, where it enabled robust matching of utterances despite speed differences across speakers.[11] By the 1980s, DTW was integrated with hidden Markov models (HMMs) to enhance continuous speech recognition, as explored in unified frameworks that combined DTW's alignment capabilities with HMM's probabilistic modeling for improved accuracy in large-vocabulary tasks.[12] DTW experienced a revival in the early 2000s for time series data mining, where researchers like Eamonn Keogh addressed common misconceptions about its computational cost and applicability, demonstrating its efficiency and effectiveness for tasks beyond speech, such as motion capture and sensor data analysis in papers from 2002 to 2004.[3] During the 2010s, the technique expanded to diverse domains including genomics for aligning DNA sequences and finance for detecting patterns in stock price movements, reflecting its versatility in handling non-stationary sequential data.[13][14] In the 2020s, focus has shifted toward acceleration methods for big data applications, with GPU-optimized implementations enabling scalable DTW computations for high-volume time series in fields like selective nanopore sequencing.[15] As of 2025, DTW has seen further applications in psychological research for modeling symptom dynamics and in neuroimaging for brain network analysis.[16][17]Algorithm
Basic Implementation
The basic implementation of dynamic time warping (DTW) uses dynamic programming to find the minimum-cost alignment between two sequences of lengths and , respectively, under the assumption of unconstrained warping, where the alignment path is required to be monotonic but allows arbitrary stretching or compression along the time axis. This approach computes an accumulated cost matrix that captures the optimal matching up to each pair of positions in the sequences. Given two sequences and , the algorithm first constructs a cost matrix where each entry represents a local distance between elements. The local distance is typically defined as the squared Euclidean distance or the absolute difference , depending on whether the goal is to emphasize larger deviations or treat them linearly. The accumulated cost matrix is then filled using the recurrence relation: for and . The base cases initialize the boundaries to allow alignments starting from the beginning of either sequence: These ensure that the first row and column accumulate costs along the sequence axes, effectively permitting one sequence to "wait" while the other advances. The DTW distance, which quantifies the total alignment cost, is given by the bottom-right entry . The algorithm can be implemented efficiently in time and space using a two-dimensional array for . The following pseudocode outlines the core computation, assuming 1-based indexing for clarity and using the squared Euclidean distance:function DTW(X, Y):
n = [length](/page/Length)(X)
m = [length](/page/Length)(Y)
γ = [array](/page/Array) of size (n+1) x (m+1), initialized to [infinity](/page/Infinity)
γ[1,1] = (X[1] - Y[1])^2
for i = 2 to n:
γ[i,1] = γ[i-1,1] + (X[i] - Y[1])^2
for j = 2 to m:
γ[1,j] = γ[1,j-1] + (X[1] - Y[j])^2
for i = 2 to n:
for j = 2 to m:
γ[i,j] = (X[i] - Y[j])^2 + min(γ[i-1,j], γ[i,j-1], γ[i-1,j-1])
return γ[n,m]
function DTW(X, Y):
n = [length](/page/Length)(X)
m = [length](/page/Length)(Y)
γ = [array](/page/Array) of size (n+1) x (m+1), initialized to [infinity](/page/Infinity)
γ[1,1] = (X[1] - Y[1])^2
for i = 2 to n:
γ[i,1] = γ[i-1,1] + (X[i] - Y[1])^2
for j = 2 to m:
γ[1,j] = γ[1,j-1] + (X[1] - Y[j])^2
for i = 2 to n:
for j = 2 to m:
γ[i,j] = (X[i] - Y[j])^2 + min(γ[i-1,j], γ[i,j-1], γ[i-1,j-1])
return γ[n,m]
| Y1=1 | Y2=2 | Y3=3 | |
|---|---|---|---|
| X1=1 | 0 | 1 | 2 |
| X2=3 | 2 | 1 | 0 |
| X3=2 | 1 | 0 | 1 |
Warping Path Computation
Once the dynamic programming (DP) table for the accumulated costs has been constructed, the optimal warping path is retrieved through a backtracking procedure. This process begins at the bottom-right cell of the table, corresponding to the indices (n, m) where n and m are the lengths of the two sequences, and traces backward to the top-left cell at (1, 1). At each step, the algorithm selects the predecessor cell that offers the minimum accumulated cost, moving either horizontally, vertically, or diagonally, thereby forming a sequence of index pairs (i_k, j_k) that define the alignment between the sequences.[2] The warping path exhibits specific properties that ensure a valid alignment. It is monotonic, meaning the indices i_k and j_k are non-decreasing along the path (i_{k} \geq i_{k-1} and j_{k} \geq j_{k-1}), preventing reversals in the time progression. The path is also continuous, with adjacent steps limited to increments of at most one in each index (i_k - i_{k-1} \leq 1 and j_k - j_{k-1} \leq 1), which avoids skipping elements in either sequence. Additionally, boundary conditions enforce that the path starts at (1, 1) and ends at (n, m), guaranteeing full coverage of both sequences.[2] These properties allow the path to accommodate expansions and contractions in the sequences, such as aligning multiple points from one sequence to a single point in the other. For instance, consider two short sequences: X = [1, 3, 9, 2, 1] and Y = [2, 0, 0, 8, 7, 2]. The optimal warping path, visualized on a 2D grid of the cost matrix, might proceed as follows: starting at (1,1), it moves to align early elements, then contracts by matching X[18] to multiple Y elements (e.g., horizontal steps), expands later to align Y's trailing points, and ends at (5,6). This path can be represented as a list of pairs:| Step | i_k | j_k |
|---|---|---|
| 1 | 1 | 1 |
| 2 | 1 | 2 |
| 3 | 2 | 3 |
| 4 | 3 | 4 |
| 5 | 3 | 5 |
| 6 | 4 | 6 |
| 7 | 5 | 6 |
Properties and Analysis
Key Properties
Dynamic time warping (DTW) alignments are defined by a warping path that satisfies several fundamental properties to ensure meaningful temporal correspondences between two sequences. These properties guarantee that the alignment respects the sequential order and local structure of the data while minimizing a cumulative cost measure.[6] The monotonicity property requires that the indices in the warping path are non-decreasing, meaning the path can only move forward or stay in place along both sequences, preventing backward jumps that would violate temporal order. Formally, for a warping path where , monotonicity holds if and for all . This ensures faithful timing preservation, such that if one element precedes another in the original sequence, the corresponding elements maintain this order in the alignment.[6][2] Continuity enforces that steps in the path are adjacent, allowing only horizontal, vertical, or diagonal moves without jumps over elements in either sequence. This is captured by the condition and , which prohibits omissions and promotes smooth transitions. Together with monotonicity, continuity restricts possible steps to the set , ensuring no element is skipped.[6][2] Boundary conditions mandate that the path begins at the first elements of both sequences and ends at the last, i.e., and for sequences of lengths and . This anchors the alignment to the full extent of the input data.[6][2] The locality property in basic DTW arises from the continuity constraint, which bounds individual steps, though global warping can extend far without additional limits; extensions like the Sakoe-Chiba band impose a warping window to further restrict deviations, such as , enhancing computational efficiency and alignment quality.[2] The optimality property ensures the warping path minimizes the total alignment cost, defined as , where and is a local distance. Dynamic programming achieves this global optimum by recursively computing minimum costs for subproblems, leveraging the additive cost structure and the monotonic, continuous nature of paths: the optimal path to any cell is the minimum over admissible predecessors, ensuring no better path exists due to the exhaustive exploration of all valid alignments.[6][2] Basic DTW exhibits invariance to temporal shifts and speed variations, as the alignment adapts to differences in sequence length and pacing without assuming uniform sampling rates. In some variants, alignments are piecewise linear, approximating nonlinear warps with linear segments between matched points.[6]Computational Complexity
The standard dynamic time warping (DTW) algorithm exhibits a time complexity of , where and are the lengths of the two input sequences, arising from the need to fill an dynamic programming table by evaluating each cell in constant time.[20] This quadratic scaling stems from the exhaustive search over all possible alignments in the warping path, making DTW computationally intensive for long sequences.[3] In terms of space complexity, the naive implementation requires storage to hold the entire cost matrix, which can become prohibitive for large datasets—for instance, sequences of length 177,000 could demand terabytes of memory.[20] However, this can be optimized to by computing the table row-by-row (or column-by-column) and retaining only the previous and current rows, as each subsequent row depends solely on the prior one, though this approach prevents full path reconstruction without additional bookkeeping.[20] The impact of sequence length is significant: for equal-length sequences with , DTW requires approximately 1 million operations, which remains feasible on modern hardware but highlights the poor scalability for big data applications where quadratic growth quickly overwhelms resources.[3] This limitation motivates trade-offs between exact DTW, which guarantees optimality at cost, and approximate methods that reduce complexity for practical large-scale use while introducing bounded error.[20]Variants and Extensions
Fast and Approximate DTW
Dynamic time warping (DTW) computations can become prohibitive for long time series due to their quadratic time and space complexity of O(nm), where n and m are the lengths of the input sequences. To address this, fast and approximate DTW techniques employ pruning, approximations, and hardware acceleration to achieve significant speedups while maintaining acceptable accuracy for practical applications. Pruning methods leverage lower bounds to enable early abandoning of unpromising computations during the dynamic programming process. A seminal approach is the LB_Keogh lower bound, which constructs an envelope around one time series and computes the distance to this envelope as a quick lower bound on the true DTW distance, allowing branches in the search to be pruned if their lower bound exceeds the current best distance. This technique, introduced for exact indexing of DTW, facilitates faster similarity searches by skipping irrelevant path regions without missing the optimal alignment. Multiscale approximations reduce computational load by performing DTW at multiple resolutions, starting coarse and refining progressively. The FastDTW algorithm exemplifies this with a recursive multi-resolution strategy: it first computes an approximate warping path on downsampled sequences, then refines it on higher-resolution versions, achieving linear O(n) time and space complexity.[21] FastDTW demonstrates high accuracy, often within 1% of exact DTW, while originally reported to provide substantial speedups over the full algorithm; however, recent optimizations to exact DTW, such as advanced lower bounding, can make it faster in realistic applications.[22] Window-based constraints limit the warping path to a local band around the diagonal of the cost matrix, drastically reducing the number of cells evaluated. The Sakoe-Chiba band imposes a fixed-width constraint, typically parameterized by a radius r, confining alignments to |i - j| ≤ r, which lowers complexity to O(n · r) for equal-length series and is particularly effective for sequences with limited temporal distortions. Similarly, the Itakura parallelogram enforces a slope constraint with a maximum slope s, forming a parallelogram-shaped region that prevents excessive stretching or compression, suitable for applications like speech where distortions follow predictable patterns. Sparse DTW variants further optimize by using heuristics to focus computations on promising path regions, avoiding a full matrix fill. The SparseDTW method dynamically identifies and exploits local similarities between series to sparsely populate the cost matrix, ensuring the exact optimal path is found while using O(k · n) space, where k is the sparsity factor determined by sequence similarity.[23] This approach is especially beneficial for highly similar time series, yielding speedups proportional to the degree of alignment. Recent advances incorporate GPU acceleration for parallelizing DTW computations, enabling real-time processing of large-scale data. For instance, GPU implementations like those optimized for nanopore sequencing parallelize the dynamic programming wavefront across thousands of threads, achieving up to 100x speedups over CPU versions for batch alignments while preserving exactness. These methods scale well to modern hardware, supporting applications in streaming time series analysis, with ongoing developments enhancing efficiency further as of 2025. Evaluations of these techniques highlight trade-offs between accuracy and speed: FastDTW and window-based methods often deliver 10-100x speedups with error rates under 5% relative to exact DTW on benchmark datasets, while pruning like LB_Keogh enables exact results with 5-50x gains in indexing tasks, depending on data characteristics.[21] Sparse and GPU approaches excel in specific regimes, such as sparse alignments or high-throughput scenarios, but may incur minor approximations or hardware dependencies.Constrained and Weighted DTW
Constrained variants of dynamic time warping (DTW) introduce restrictions on the allowable warping paths to prevent excessive distortions, improving robustness in applications like speech recognition where alignments must remain plausible. The Sakoe-Chiba band imposes a global constraint by limiting the warping path to a diagonal band of width around the main diagonal in the cost matrix, such that for all aligned points , where is a user-defined parameter typically set to 10-20% of the sequence length. This constraint reduces computational complexity from to while avoiding implausible alignments in similar-length sequences, as originally proposed for optimizing spoken word recognition. The Itakura parallelogram, in contrast, applies a local constraint that allows more flexibility at the endpoints but restricts the slope of the warping path to between 0 and some maximum value, forming a parallelogram-shaped region in the cost matrix suitable for utterance-like signals with varying durations. This approach, developed for minimum prediction residual principle in speech recognition, better handles cases where one sequence is significantly shorter by permitting asymmetric warping near the boundaries. Amerced DTW (ADTW) addresses outliers and extreme warping by adding a fixed additive penalty to the cost of each non-diagonal step in the warping path, discouraging excessive compressions or expansions without completely forbidding them. Introduced as an intuitive constraint for time series classification, ADTW has demonstrated superior performance over standard DTW with Sakoe-Chiba banding on UCR datasets, with mean rank improvements in nearest-neighbor classifiers.[24] The penalty is tuned based on the data scale, often set to 10-20% of the average local distance, making ADTW particularly effective for signals with irregular perturbations. Weighted DTW variants modify the local cost function to incorporate domain-specific priorities, enhancing alignment quality for non-stationary data. Time-weighted DTW (TWDTW) assigns exponentially increasing weights to observations at time , emphasizing recent data in the alignment—ideal for phenological time series where later seasonal changes are more informative—by multiplying the local distance by for aligned points , with typically around 1.1-1.5. This method, applied in remote sensing for land-cover mapping, improves classification accuracy by 5-15% over vanilla DTW on MODIS imagery by prioritizing temporal recency. Soft-DTW, a smoothed and differentiable approximation, replaces the operation in standard DTW with a log-sum-exp softmin over all possible paths, weighted by a temperature parameter , enabling gradient-based optimization in machine learning pipelines while approximating the original DTW distance closely for small . Proposed as a loss function for time-series tasks, soft-DTW facilitates end-to-end learning in neural networks, with convergence rates comparable to Euclidean losses but better handling of temporal misalignments. Derivative DTW (DDTW) incorporates velocity information by augmenting the local distance to include both amplitude and first-derivative differences, defined as , where balances the contributions (often 1-3) and primes denote derivatives approximated via finite differences. This extension produces smoother alignments by penalizing abrupt changes, addressing limitations of vanilla DTW on series with varying speeds, such as motion trajectories, and improving classification accuracy on datasets like ECG signals. DDTW maintains the same computational complexity as DTW but enhances invariance to linear trends, making it suitable for preprocessing before averaging warped series.Multi-dimensional and Supervised DTW
Multi-dimensional dynamic time warping (DTW) extends the univariate algorithm to sequences in -dimensional feature spaces, such as multivariate time series from sensors, by computing alignments that account for multiple attributes simultaneously. Two primary approaches exist: independent DTW (DTW), which applies univariate DTW to each dimension separately and sums the costs, and dependent DTW (DTW), which enforces a single warping path across all dimensions using a multivariate distance metric, such as the Euclidean norm in the feature space. Neither method universally outperforms the other, as DTW may fail when dimensions are correlated, while DTW can be overly rigid; an adaptive strategy, DTW, selects the better approach per query instance by comparing their minimum costs against a learned threshold, achieving accuracy at least as good as the superior of the two and often higher across diverse datasets like gestures and signals.[25] A notable application of multi-dimensional DTW is in identifying time-varying lead-lag relationships in multivariate time series, where a two-step method first computes alignments using shapeDTW—a variant incorporating local shape descriptors—and then extracts lag structures from the warping path. This approach, applied to financial and economic data, outperforms alternatives like rolling cross-correlations and standard DTW in robustness to noise and varying lag patterns, with simulations showing superior recovery of true structures under additive noise levels up to 20%. In sensor data contexts, such as motion capture for human activity analysis, multi-dimensional DTW classifies skeletal trajectories by warping multidimensional joint positions or quaternions, achieving up to 90% accuracy with nearest-neighbor classifiers when using geodesic distances, and enabling joint selection to reduce dimensionality without performance loss. For gesture recognition, multi-dimensional DTW synchronizes trajectories across spatial dimensions using a 1-norm distance, outperforming single-dimension variants by 10-15% in noisy conditions when incorporating derivatives.[26][27] Supervised DTW integrates warping into machine learning pipelines for tasks like classification and clustering, where optimal paths are learned from labeled data to enhance discriminative power. ShapeDTW improves alignment in supervised nearest-neighbor classification by matching points based on local neighborhoods (e.g., via shape contexts), yielding over 10% accuracy gains on 18 of 84 UCR time series datasets compared to vanilla DTW. It supports template computation by producing more accurate averages for prototypes in gesture or shape recognition. For kernel-based methods, weighted DTW serves as a kernel in support vector machines (SVMs), penalizing phase differences to classify non-aligned sequences, with multiclass SVM-WDTW achieving state-of-the-art results on UCR archives by leveraging supervised training on annotated time series. Neural network integrations, such as DTWNet, embed learnable DTW layers for end-to-end feature extraction, using stochastic backpropagation on warping paths to handle temporal invariances in classification, reducing complexity while matching or exceeding DTW-based baselines on tasks like activity recognition.[28][29][30] In clustering, DTW Barycenter Averaging (DBA) computes an average template by iteratively aligning sequences to a barycenter via an expectation-maximization-like process: it finds DTW associations between the current average and inputs, then updates positions as weighted means of aligned points, converging to minimize within-cluster sum of squares. DBA reduces clustering inertia by up to 65% over prior methods like shape-level averaging on standard datasets, making it suitable for template-based supervised refinement in multi-dimensional settings. High dimensionality poses challenges, as the curse of dimensionality amplifies sparsity and computational costs in DTW, potentially degrading alignment quality; solutions include preprocessing with Principal Component Analysis (PCA) to project data onto lower-dimensional subspaces while preserving variance, as commonly applied in motion capture to focus on principal joint movements before DTW. Recent advancements as of 2025 include hybrid models combining DTW with transformers for better scalability in large multivariate datasets.Applications
Speech and Audio Processing
Dynamic time warping (DTW) emerged as a foundational technique in speech recognition during the 1970s, enabling the alignment of variable-length utterances to reference templates by compensating for temporal distortions in spoken words. Early applications focused on isolated digit and word recognition, where DTW facilitated endpoint detection and whole-word matching by computing optimal nonlinear alignments between input signals and stored patterns. At Bell Laboratories, systems in the early 1970s transitioned from linear time normalization to DTW-based approaches, achieving high accuracy for speaker-dependent isolated digits using spectral features like linear predictive coding (LPC) coefficients.[11] The seminal optimization of DTW for spoken word recognition, proposed by Sakoe and Chiba, introduced symmetric formulations and slope constraints to improve discrimination, reducing error rates to as low as 0.3% for Japanese digits in controlled tests. In isolated word recognition, DTW excelled by aligning acoustic features frame-by-frame, allowing systems to handle speaking rate variations without requiring fixed durations; this was particularly effective for small-vocabulary tasks like command recognition in the 1970s and 1980s. For connected word recognition, extensions such as level-building DTW enabled phoneme-level alignment by segmenting utterances into subword units, supporting fluent speech with minimal pauses. By the mid-1980s, DTW was increasingly combined with hidden Markov models (HMMs) to model probabilistic transitions, enhancing robustness for continuous speech; this hybrid approach, developed at Bell Labs, integrated DTW's alignment with HMM's statistical framework, reducing errors in connected digit sequences by incorporating mixture Gaussian densities for acoustic variability.[11][31] Beyond speech, DTW found applications in music information retrieval, notably query-by-humming systems where users retrieve songs by singing or humming melodies; a pioneering implementation used DTW to match hummed pitch sequences against MIDI databases, achieving retrieval accuracies over 70% for short queries by tolerating tempo and rhythm deviations. In beat tracking, DTW variants align onset detection outputs to estimated tempo grids, enabling precise extraction of rhythmic structures in audio signals; for instance, dynamic programming akin to DTW has been applied to minimize path costs in tempo-varying music, supporting real-time applications.[32][33] Despite its strengths in temporal alignment, DTW's performance in speech recognition is limited by sensitivity to speaker variability, as templates are typically speaker-dependent and struggle with inter-speaker acoustic differences like pitch or formant shifts. Adaptations such as spectral normalization and multiple-template averaging mitigate this, improving speaker independence, but full robustness often requires integration with HMMs or speaker adaptation techniques to normalize features across users.[11][34]Time Series in Finance and Other Domains
Dynamic time warping (DTW) has been applied in finance and econometrics to measure similarity between time series such as stock prices and economic indicators, accommodating timing shifts and non-linear alignments that Euclidean distances cannot handle effectively.[14] For instance, DTW enables the analysis of temporal alignment across economic indicators like real GDP, revealing synchronization patterns despite phase differences.[14] In stock market applications, it facilitates pattern recognition in price fluctuations, allowing traders to identify recurring motifs adjusted for varying speeds of market movements.[35] Additionally, DTW supports portfolio similarity assessment by computing elastic distances between asset return series, aiding in diversification strategies that account for asynchronous behaviors.[36] In cryptanalysis, particularly for side-channel attacks since the 2010s, DTW has been used to align power traces in correlation power analysis (CPA), improving key recovery by compensating for timing variations in hardware implementations of cryptographic algorithms like AES. This elastic alignment enhances the correlation between hypothesized and measured power consumption, enabling more robust detection of data-dependent leaks in non-synchronized traces.[37] Beyond finance, DTW finds applications in genomics for aligning gene expression time series, where it warps trajectories to synchronize expression profiles across experiments with differing sampling rates or biological variability.[38] In motion capture for human activity recognition, multi-dimensional DTW aligns skeletal joint trajectories, distinguishing actions like walking or gesturing despite speed variations in captured data.[39] For nanopore sequencing, DTW accelerates real-time alignment of raw electrical signals to reference sequences, supporting selective sequencing decisions with GPU-optimized implementations that achieve up to 1.92× speedup in processing.[40] In sensor data analysis, DTW aids anomaly detection by measuring deviations in time-warped distances, such as identifying faults in industrial equipment through outlier scoring in aligned multivariate streams.[41] Recent advances include multi-dimensional DTW for detecting lead-lag relationships in multivariate financial time series, such as between sector indices, where it identifies directional influences with statistical significance in high-frequency data from 2022 studies.[42] Weighted DTW variants address non-stationarity in financial markets by assigning higher penalties to recent observations, improving similarity measures for volatile, irregularly sampled economic indicators.[43] These extensions highlight DTW's benefit in handling irregular sampling inherent in real-world time series, enhancing robustness in domains with sparse or asynchronous data collection.[44]Alternatives and Implementations
Alternative Similarity Measures
While dynamic time warping (DTW) excels at handling temporal distortions, several alternative similarity measures address its limitations, such as computational expense and potential over-alignment, by imposing stricter assumptions or incorporating noise tolerance. The Euclidean distance computes the root-mean-square difference between aligned points of two time series and , defined as assuming after padding or truncation. This lock-step approach has linear complexity, enabling rapid comparisons on large datasets. However, it performs poorly on series with speed variations or shifts, as it enforces rigid one-to-one correspondence. Experiments on benchmark datasets show Euclidean distance yielding significantly lower classification accuracy than DTW on warped time series, such as human motion trajectories.[45] Edit distance on real sequences (EDR) adapts string edit distance to continuous values by penalizing substitutions if (a noise threshold), deletions, and insertions equally. Formulated as a dynamic programming minimization of edit costs, EDR suits noisy trajectories like GPS data, where gaps represent measurement errors. Introduced for moving object databases, it balances DTW's flexibility with explicit outlier handling, often outperforming DTW in retrieval precision on perturbed series. Unlike DTW's cumulative squaring, which amplifies mismatches, EDR's thresholded penalties prevent over-warping in sparse data.[46] The longest common subsequence (LCSS) identifies the longest matching subsequence within an -tolerance, converting it to a similarity score via for distance. This elastic measure ignores non-matching segments, making it robust to outliers and length differences in ecological or sensor data. Developed for multidimensional trajectories, LCSS provides a bounded, probabilistic metric that avoids DTW's sensitivity to global alignment, achieving superior results in noisy settings like animal movement tracking, where it reduces false positives by focusing on shape invariance. Threshold selection impacts performance, but probabilistic variants mitigate this.[46] Lock-step metrics like the Pearson correlation coefficient extend point-wise comparison by normalizing for amplitude and offset, computing yielding a value in for synchronized series. This assumes no warping, suiting applications like synchronized financial indicators or physiological signals, where it captures linear dependencies faster than elastic measures. As a true metric under normalization, it supports efficient indexing but degrades on desynchronized data, often underperforming DTW in alignment tasks. Learning-based methods, emerging prominently after 2015, use neural networks to derive embeddings where simple distances approximate complex similarities. Sequence autoencoders, for instance, learn latent representations that implicitly align series via reconstruction loss, enabling Euclidean distance in embedding space to mimic warping. Autowarp, trained on unlabeled data, optimizes a differentiable DTW surrogate, improving clustering performance over classical measures on motion datasets while scaling to millions of series via GPU acceleration.[47] Recurrent neural networks (RNNs) or transformers further embed variable-length inputs, handling noise through attention mechanisms, ideal for data-rich domains like genomics where hand-crafted measures falter. These approaches require training data but adapt to domain specifics, addressing DTW's rigidity. Alternatives are preferable for short, aligned series (Euclidean or correlation, due to speed), high-noise scenarios (LCSS or EDR, for outlier tolerance), or abundant unlabeled data (learning-based, for scalability). DTW's quadratic cost and over-warping risk—exacerbated in long series—make these options more efficient when temporal invariance is mild or computational budgets are tight.[46][45]Open-source Software
Several open-source libraries and tools implement dynamic time warping (DTW) and its variants, providing efficient computation for time series alignment across various programming languages. These implementations support core DTW algorithms, often with extensions for constraints, multi-dimensional data, and approximations to handle large-scale datasets. In Python, the dtw-python library offers a comprehensive implementation of DTW, including support for Sakoe-Chiba band constraints and Itakura parallelogram windowing to reduce computational complexity, making it suitable for univariate and multivariate time series. It provides functions for distance computation and optimal path extraction, with optional visualization of warping paths. Another prominent package is tslearn, which integrates DTW into a broader toolkit for time series machine learning; it includes soft-DTW for differentiable optimization in neural networks and supports averaging techniques like DTW Barycenter Averaging (DBA). tslearn also handles multi-dimensional data and approximations such as the FastDTW algorithm to approximate full DTW for longer sequences. For R users, the proxy package provides a basic yet efficient DTW distance metric, integrable with dist() functions for clustering and classification tasks, supporting both univariate and multivariate series. The dtwwave package extends DTW by incorporating wavelet transforms, enabling hybrid approaches for noisy or irregularly sampled data, with built-in functions for alignment and distance calculation. In other languages, Java's Weka machine learning framework includes DTW as a distance measure within its time series classification tools, such as the TimeSeriesTranslationFunction, facilitating supervised learning pipelines with built-in support for multi-dimensional inputs. The mlpack C++ library implements DTW for fast nearest-neighbor search in time series datasets, optimized for high-performance computing with bindings to Python and other languages. Specialized tools include the FastDTW implementation in Java, which uses a multi-resolution approximation to achieve linear time complexity for approximate alignments. Extensions to scikit-learn, such as the dtw-python integration or tslearn's compatibility, allow DTW to be used within popular pipelines for tasks like clustering and anomaly detection, with parameters tunable for approximations. Community-driven development is prominent on GitHub, where repositories like those for tslearn and dtw-python host benchmarks against the UCR Time Series Archive, demonstrating scalability and accuracy on standard datasets. These tools collectively enable researchers to apply DTW without implementing the core algorithm from scratch, fostering reproducible time series analysis.References
- ABSTRACT. The Dynamic Time Warping (DTW) distance measure is a technique that has long been known in speech recognition community.
