Recent from talks
Contribute something
Nothing was collected or created yet.
Sparse approximation
View on WikipediaSparse approximation (also known as sparse representation) theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in image processing, signal processing, machine learning, medical imaging, and more.
Sparse decomposition
[edit]Noiseless observations
[edit]Consider a linear system of equations , where is an underdetermined matrix and . The matrix (typically assumed to be full-rank) is referred to as the dictionary, and is a signal of interest. The core sparse representation problem is defined as the quest for the sparsest possible representation satisfying . Due to the underdetermined nature of , this linear system admits in general infinitely many possible solutions, and among these we seek the one with the fewest non-zeros. Put formally, we solve
where is the pseudo-norm, which counts the number of non-zero components of . This problem is known to be NP-hard with a reduction to NP-complete subset selection problems in combinatorial optimization.
Sparsity of implies that only a few () components in it are non-zero. The underlying motivation for such a sparse decomposition is the desire to provide the simplest possible explanation of as a linear combination of as few as possible columns from , also referred to as atoms. As such, the signal can be viewed as a molecule composed of a few fundamental elements taken from .
While the above posed problem is indeed NP-Hard, its solution can often be found using approximation algorithms. One such option is a convex relaxation of the problem, obtained by using the -norm instead of , where simply sums the absolute values of the entries in . This is known as the basis pursuit (BP) algorithm, which can be handled using any linear programming solver. An alternative approximation method is a greedy technique, such as the matching pursuit (MP), which finds the location of the non-zeros one at a time.
Surprisingly, under mild conditions on (using the spark (mathematics), the mutual coherence or the restricted isometry property) and the level of sparsity in the solution, , the sparse representation problem can be shown to have a unique solution, and BP and MP are guaranteed to find it perfectly.[1][2][3]
Noisy observations
[edit]Often the observed signal is noisy. By relaxing the equality constraint and imposing an -norm on the data-fitting term, the sparse decomposition problem becomes
or put in a Lagrangian form,
where is replacing the .
Just as in the noiseless case, these two problems are NP-Hard in general, but can be approximated using pursuit algorithms. More specifically, changing the to an -norm, we obtain
which is known as the basis pursuit denoising. Similarly, matching pursuit can be used for approximating the solution of the above problems, finding the locations of the non-zeros one at a time until the error threshold is met. Here as well, theoretical guarantees suggest that BP and MP lead to nearly optimal solutions depending on the properties of and the cardinality of the solution . [4] [5] [6] Another interesting theoretical result refers to the case in which is a unitary matrix. Under this assumption, the problems posed above (with either or ) admit closed-form solutions in the form of non-linear shrinkage.[4]
Variations
[edit]There are several variations to the basic sparse approximation problem.
Structured sparsity: In the original version of the problem, any of the atoms in the dictionary can be picked. In the structured (block) sparsity model, instead of picking atoms individually, groups of them are to be picked. These groups can be overlapping and of varying size. The objective is to represent such that it is sparse while forcing this block-structure.[7]
Collaborative (joint) sparse coding: The original version of the problem is defined for a single signal . In the collaborative (joint) sparse coding model, a set of signals is available, each believed to emerge from (nearly) the same set of atoms from . In this case, the pursuit task aims to recover a set of sparse representations that best describe the data while forcing them to share the same (or close-by) support.[8]
Other structures: More broadly, the sparse approximation problem can be cast while forcing a specific desired structure on the pattern of non-zero locations in . Two cases of interest that have been extensively studied are tree-based structure, and more generally, a Boltzmann distributed support.[9]
Algorithms
[edit]As already mentioned above, there are various approximation (also referred to as pursuit) algorithms that have been developed for addressing the sparse representation problem:
We mention below a few of these main methods.
- Matching pursuit is a greedy iterative algorithm for approximately solving the above problem. It works by gradually finding the locations of the non-zeros in one at a time. The core idea is to find in each step the column (atom) in that best correlates with the current residual (initialized to ), and then updating this residual to take the new atom and its coefficient into account. Matching pursuit might pick the same atom multiple times.
- Orthogonal matching pursuit is very similar to matching pursuit, with one major difference: in each of the algorithm's step, all the non-zero coefficients are updated by a least squares. As a consequence, the residual is orthogonal to the already chosen atoms, and thus an atom cannot be picked more than once.
- Stage-wise greedy methods: Improved variations over the above are algorithms that operate greedily while adding two critical features: (i) the ability to add groups of non-zeros at a time (instead of one non-zero per round); and (ii) including a pruning step in each round in which several of the atoms are discarded from the support. Representatives of this approach are the Subspace-Pursuit algorithm and the CoSaMP.[10]
- Basis pursuit solves a convex relaxed version of the problem by replacing the by an -norm. Note that this only defines a new objective, while leaving open the question of the algorithm to use for getting the desired solution. Commonly considered such algorithms are the IRLS, LARS, and iterative soft-shrinkage methods.[11]
- There are several other methods for solving sparse decomposition problems: homotopy method, coordinate descent, iterative hard-thresholding, first order proximal methods, which are related to the above-mentioned iterative soft-shrinkage algorithms, and Dantzig selector.
Applications
[edit]Sparse approximation ideas and algorithms have been extensively used in signal processing, image processing, machine learning, medical imaging, array processing, data mining, and more. In most of these applications, the unknown signal of interest is modeled as a sparse combination of a few atoms from a given dictionary, and this is used as the regularization of the problem. These problems are typically accompanied by a dictionary learning mechanism that aims to fit to best match the model to the given data. The use of sparsity-inspired models has led to state-of-the-art results in a wide set of applications.[12][13][14] Recent work suggests that there is a tight connection between sparse representation modeling and deep-learning.[15]
See also
[edit]References
[edit]- ^ Donoho, D.L. and Elad, M. (2003). "Optimally sparse representation in general (nonorthogonal) dictionaries via L1 minimization" (PDF). Proceedings of the National Academy of Sciences. 100 (5): 2197–2202. Bibcode:2003PNAS..100.2197D. doi:10.1073/pnas.0437847100. PMC 153464. PMID 16576749.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Tropp, J.A. (2004). "Greed is good: Algorithmic results for sparse approximation" (PDF). IEEE Transactions on Information Theory. 50 (10): 2231–2242. CiteSeerX 10.1.1.321.1443. doi:10.1109/TIT.2004.834793. S2CID 675692.
- ^ Donoho, D.L. (2006). "For most large underdetermined systems of linear equations the minimal l1-norm solution is also the sparsest solution" (PDF). Communications on Pure and Applied Mathematics. 56 (6): 797–829. doi:10.1002/cpa.20132. S2CID 8510060.
- ^ a b Elad, M. (2010). Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer. CiteSeerX 10.1.1.331.8963. doi:10.1007/978-1-4419-7011-4. ISBN 978-1441970107.
- ^ Donoho, D.L., Elad, M. and Templyakov, V. (2006). "Stable recovery of sparse overcomplete representations in the presence of noise" (PDF). IEEE Transactions on Information Theory. 52 (1): 6–18. CiteSeerX 10.1.1.125.5610. doi:10.1109/TIT.2005.860430. S2CID 14813938.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Tropp, J.A. (2006). "Just relax: Convex programming methods for identifying sparse signals in noise" (PDF). IEEE Transactions on Information Theory. 52 (3): 1030–1051. CiteSeerX 10.1.1.184.2957. doi:10.1109/TIT.2005.864420. S2CID 6496872.
- ^ Eldar, Y.C, Kuppinger, P. and Bolcskei, H. (2009). "Block-sparse signals: Uncertainty relations and efficient recovery". IEEE Transactions on Signal Processing. 58 (6): 3042–3054. arXiv:0906.3173. Bibcode:2010ITSP...58.3042E. doi:10.1109/TSP.2010.2044837. S2CID 335122.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Tropp, J.A., Gilbert, A.C. and Strauss, M.J. (2006). "Algorithms for simultaneous sparse approximation. Part I: Greedy pursuit". Signal Processing. 86 (3): 572–588. doi:10.1016/j.sigpro.2005.05.030.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Peleg, T. Eldar, Y.C. and Elad, M. (2012). "Exploiting Statistical Dependencies in Sparse Representations for Signal Recovery". IEEE Transactions on Signal Processing. 60 (5): 2286–2303. arXiv:1010.5734. Bibcode:2012ITSP...60.2286P. doi:10.1109/TSP.2012.2188520. S2CID 3179803.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Needell, D. and Tropp, J.A. (2009). "CoSaMP: Iterative signal recovery from incomplete and inaccurate samples". Applied and Computational Harmonic Analysis. 26 (3): 301–321. arXiv:0803.2392. doi:10.1016/j.acha.2008.07.002.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Zibulevsky, M. and Elad, M. (2010). "L1-L2 optimization in signal and image processing" (PDF). IEEE Signal Processing Magazine. 27 (3): 76–88. Bibcode:2010ISPM...27...76Z. doi:10.1109/MSP.2010.936023. S2CID 2783691.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Baraniuk, R.G. Candes, E. Elad, M. and Ma, Y. (2010). "Applications of sparse representation and compressive sensing". Proceedings of the IEEE. 98 (6): 906–909. doi:10.1109/JPROC.2010.2047424.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Elad, M. Figueiredo, M.A.T., and Ma, Y. (2010). "On the role of sparse and redundant representations in image processing" (PDF). Proceedings of the IEEE. 98 (6): 972–982. CiteSeerX 10.1.1.160.465. doi:10.1109/JPROC.2009.2037655. S2CID 10992685. Archived from the original (PDF) on 2018-01-17.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Plumbley, M.D. Blumensath, T. Daudet, L. Gribonval, R. and Davies, M.E. (2010). "Sparse representations in audio and music: From coding to source separation". Proceedings of the IEEE. 98 (6): 995–1005. CiteSeerX 10.1.1.160.1607. doi:10.1109/JPROC.2009.2030345. S2CID 4461063.
{{cite journal}}: CS1 maint: multiple names: authors list (link) - ^ Papyan, V. Romano, Y. and Elad, M. (2017). "Convolutional Neural Networks Analyzed via Convolutional Sparse Coding" (PDF). Journal of Machine Learning Research. 18 (83): 1–52. arXiv:1607.08194. Bibcode:2016arXiv160708194P.
{{cite journal}}: CS1 maint: multiple names: authors list (link)
Sparse approximation
View on GrokipediaFundamentals
Definition and Motivation
Sparse approximation is a signal representation technique that seeks to express a given signal as an approximate linear combination , where serves as a dictionary or basis matrix composed of atoms, and is a sparse coefficient vector with most entries equal to zero or negligible in magnitude. This approach leverages the principle that many real-world signals, such as images or audio, can be efficiently modeled using only a small subset of dictionary elements, promoting parsimony and interpretability in data modeling.[5] Overcomplete dictionaries, which contain more atoms than the dimensionality of the signal, play a crucial role by allowing for sparser representations compared to complete orthogonal bases like Fourier or standard wavelet transforms, as they provide greater flexibility in capturing signal structures.[6] The motivation for sparse approximation arises from its practical advantages in handling high-dimensional data efficiently. In data compression, sparsity enables the storage and transmission of signals using fewer coefficients, significantly reducing redundancy while preserving essential information—for instance, sparse approximation methods can compress images to under 1,000 bytes without substantial loss in quality.[5] Noise reduction benefits from thresholding small coefficients to suppress artifacts, improving signal-to-noise ratios in applications like image denoising. Additionally, in feature selection, sparse models identify the most relevant atoms or variables, aiding tasks such as pattern recognition by focusing on dominant signal components and discarding irrelevant ones. These efficiencies stem from the observation that natural signals often exhibit inherent sparsity in suitable dictionaries, aligning with principles like Ockham's razor for simpler, more robust models.[5][7] An illustrative example is the approximation of a simple 1D piecewise constant signal, such as a step function, using Haar wavelets. In this case, the Haar basis captures the abrupt change with just a few non-zero coefficients corresponding to the scaling and wavelet functions at the discontinuity, yielding a highly sparse representation; in contrast, a Fourier basis would require many coefficients to approximate the same sharp transition, resulting in a denser vector. This demonstrates how sparsity reduces computational and storage demands while maintaining fidelity.[8] The conceptual foundations of sparse approximation have early roots in 1970s signal processing, particularly in geophysics and statistics, where adaptive bases began to emerge for modeling complex data structures—for example, Claerbout and Muir's 1973 use of the -norm for sparse seismic deconvolution—paving the way for modern developments in sparse modeling and its integration with techniques like dictionary learning.[5][1]Historical Development
The origins of sparse approximation trace back to the 1970s and early 1980s, when researchers in signal processing began exploring adaptive signal decomposition techniques to represent signals using fewer components for efficient analysis and compression. These early efforts laid the groundwork for sparsity concepts, particularly through the development of wavelet theory, where Stéphane Mallat contributed foundational work in the mid-1980s by establishing multiresolution frameworks that enabled sparse representations of signals in time-frequency domains.[9] The field gained momentum in the 1990s with the introduction of greedy algorithms for sparse decomposition over redundant dictionaries. A pivotal milestone was the 1993 paper by Mallat and Zhifeng Zhang, which proposed the matching pursuit algorithm, allowing signals to be iteratively decomposed into atoms from overcomplete time-frequency dictionaries selected to best match local signal structures.[10] This approach marked a shift from fixed orthogonal bases like Fourier or early wavelet transforms toward more flexible, adaptive representations. Another key development came in 1998 with the introduction of basis pursuit by Shaobing Chen, David Donoho, and Michael Saunders, which reformulated sparse approximation as a convex optimization problem to find the sparsest solution in the -norm sense, promoting stable and unique recoveries.[3] In the 2000s, sparse approximation evolved significantly through its integration with compressive sensing, pioneered by Emmanuel Candès and Terence Tao, who demonstrated that sparse signals could be accurately recovered from far fewer measurements than traditionally required, linking sparsity directly to undersampled data acquisition.[11] Their 2005 work on decoding via linear programming and subsequent 2006 papers established theoretical guarantees for stable recovery under noise, catalyzing applications in imaging and beyond.[12] Post-2000, the paradigm shifted further from fixed bases to overcomplete dictionaries learned from data, with methods like dictionary learning enabling adaptive, task-specific sparse models that improved approximation quality in diverse signal processing tasks.[5]Mathematical Formulation
Noiseless Case
In the noiseless case, the sparse approximation problem aims to represent a given signal exactly as a linear combination of the fewest possible atoms from an overcomplete dictionary (with ), where the columns of serve as the atoms. This is formulated as the combinatorial optimization problem where counts the number of nonzero entries in the coefficient vector , corresponding to the sparsity level.[3] Direct minimization of the -"norm" is NP-hard, as shown by reduction to known hard problems in sparse linear systems, rendering exact solutions computationally intractable for large-scale instances.[13] To address this, the problem is often relaxed to the convex -norm minimization known as Basis Pursuit: This formulation promotes sparsity by favoring solutions with concentrated energy in few coefficients while remaining solvable via linear programming.[3] Geometrically, a sparse solution selects a minimal subset of dictionary atoms whose linear span contains , ensuring an exact linear reconstruction with the smallest support size (or minimal -weight in the relaxation). Under suitable conditions on , such as the restricted isometry property (RIP) of order with constant , the minimizer exactly recovers any -sparse that satisfies the constraint. The RIP requires that for all -sparse vectors , preserving the Euclidean structure of sparse signals and enabling uniform recovery guarantees.[14] This noiseless setup provides the ideal foundation for sparse recovery, with extensions to measurement errors handled separately.Noisy Case
In real-world applications, sparse approximation must account for noise in the measurements, modeled as , where is an additive noise vector. This introduces the need for formulations that balance the promotion of sparsity in with fidelity to the observed data . A standard constrained optimization approach minimizes the -norm subject to a bound on the reconstruction error: where is chosen to encompass the expected noise magnitude, ensuring the solution is robust to perturbations.[15] This constrained problem can be reformulated into an equivalent penalized form: where the parameter trades off sparsity against approximation error. The parameter is typically tuned via cross-validation, evaluating predictive performance across a grid of values to select the one minimizing validation error.[16] Direct optimization of the -norm remains computationally intractable due to its combinatorial nature, particularly in noisy settings where exact sparsity patterns are obscured. A convex relaxation replaces with the -norm, resulting in the least absolute shrinkage and selection operator (Lasso) formulation: The factor of normalizes the quadratic term for convenience in optimization. This promotes sparse solutions while penalizing large deviations from the data, with the term shrinking irrelevant coefficients to zero. Noise is commonly modeled as Gaussian, , aligning the least-squares fidelity term with maximum likelihood estimation under this distribution. In such cases, the exact solution breaks down, as noise perturbs the measurements away from any true sparse representation, necessitating approximations that tolerate residual error proportional to the noise variance. The Lasso relaxation mitigates this by providing stable sparse estimates, though recovery quality degrades with increasing noise levels.Algorithms
Greedy Algorithms
Greedy algorithms for sparse approximation operate by iteratively selecting dictionary atoms that best match the current residual signal, building the approximation step by step in a non-optimization-based manner. At each iteration, the algorithm computes correlations between the residual and all dictionary elements, chooses the atom with the maximum absolute inner product, and updates the residual by subtracting the projection onto that atom. This process continues until a stopping criterion, such as a maximum number of iterations or a residual energy threshold, is met. These methods are computationally efficient for large dictionaries due to their local, heuristic nature, making them suitable for real-time applications despite not guaranteeing global optimality.[2] The foundational greedy algorithm is Matching Pursuit (MP), introduced by Mallat and Zhang. In MP, the signal is decomposed as , where are selected dictionary atoms, , and is the residual after iterations. The selection step identifies the atom as , followed by the residual update . This maintains energy conservation: . The algorithm can be outlined in pseudocode as follows:Input: Signal f, dictionary D = {φ_j}, max iterations K
R ← f # Initial residual
coefficients ← empty list
indices ← empty list
for n = 1 to K:
j ← argmax_j |⟨R, φ_j⟩| # Greedy selection
a ← ⟨R, φ_j⟩ / ||φ_j||^2 # Coefficient (assuming normalized atoms)
coefficients.[append](/page/Append)(a)
indices.append(j)
R ← R - a φ_j # Update residual
Output: coefficients, indices, residual R
Input: Signal f, dictionary D = {φ_j}, max iterations K
R ← f # Initial residual
coefficients ← empty list
indices ← empty list
for n = 1 to K:
j ← argmax_j |⟨R, φ_j⟩| # Greedy selection
a ← ⟨R, φ_j⟩ / ||φ_j||^2 # Coefficient (assuming normalized atoms)
coefficients.[append](/page/Append)(a)
indices.append(j)
R ← R - a φ_j # Update residual
Output: coefficients, indices, residual R
