Hubbry Logo
Sparse approximationSparse approximationMain
Open search
Sparse approximation
Community hub
Sparse approximation
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Sparse approximation
Sparse approximation
from Wikipedia

Sparse approximation (also known as sparse representation) theory deals with sparse solutions for systems of linear equations. Techniques for finding these solutions and exploiting them in applications have found wide use in image processing, signal processing, machine learning, medical imaging, and more.

Sparse decomposition

[edit]

Noiseless observations

[edit]

Consider a linear system of equations , where is an underdetermined matrix and . The matrix (typically assumed to be full-rank) is referred to as the dictionary, and is a signal of interest. The core sparse representation problem is defined as the quest for the sparsest possible representation satisfying . Due to the underdetermined nature of , this linear system admits in general infinitely many possible solutions, and among these we seek the one with the fewest non-zeros. Put formally, we solve

where is the pseudo-norm, which counts the number of non-zero components of . This problem is known to be NP-hard with a reduction to NP-complete subset selection problems in combinatorial optimization.

Sparsity of implies that only a few () components in it are non-zero. The underlying motivation for such a sparse decomposition is the desire to provide the simplest possible explanation of as a linear combination of as few as possible columns from , also referred to as atoms. As such, the signal can be viewed as a molecule composed of a few fundamental elements taken from .

While the above posed problem is indeed NP-Hard, its solution can often be found using approximation algorithms. One such option is a convex relaxation of the problem, obtained by using the -norm instead of , where simply sums the absolute values of the entries in . This is known as the basis pursuit (BP) algorithm, which can be handled using any linear programming solver. An alternative approximation method is a greedy technique, such as the matching pursuit (MP), which finds the location of the non-zeros one at a time.

Surprisingly, under mild conditions on (using the spark (mathematics), the mutual coherence or the restricted isometry property) and the level of sparsity in the solution, , the sparse representation problem can be shown to have a unique solution, and BP and MP are guaranteed to find it perfectly.[1][2][3]

Noisy observations

[edit]

Often the observed signal is noisy. By relaxing the equality constraint and imposing an -norm on the data-fitting term, the sparse decomposition problem becomes

or put in a Lagrangian form,

where is replacing the .

Just as in the noiseless case, these two problems are NP-Hard in general, but can be approximated using pursuit algorithms. More specifically, changing the to an -norm, we obtain

which is known as the basis pursuit denoising. Similarly, matching pursuit can be used for approximating the solution of the above problems, finding the locations of the non-zeros one at a time until the error threshold is met. Here as well, theoretical guarantees suggest that BP and MP lead to nearly optimal solutions depending on the properties of and the cardinality of the solution . [4] [5] [6] Another interesting theoretical result refers to the case in which is a unitary matrix. Under this assumption, the problems posed above (with either or ) admit closed-form solutions in the form of non-linear shrinkage.[4]

Variations

[edit]

There are several variations to the basic sparse approximation problem.

Structured sparsity: In the original version of the problem, any of the atoms in the dictionary can be picked. In the structured (block) sparsity model, instead of picking atoms individually, groups of them are to be picked. These groups can be overlapping and of varying size. The objective is to represent such that it is sparse while forcing this block-structure.[7]

Collaborative (joint) sparse coding: The original version of the problem is defined for a single signal . In the collaborative (joint) sparse coding model, a set of signals is available, each believed to emerge from (nearly) the same set of atoms from . In this case, the pursuit task aims to recover a set of sparse representations that best describe the data while forcing them to share the same (or close-by) support.[8]

Other structures: More broadly, the sparse approximation problem can be cast while forcing a specific desired structure on the pattern of non-zero locations in . Two cases of interest that have been extensively studied are tree-based structure, and more generally, a Boltzmann distributed support.[9]

Algorithms

[edit]

As already mentioned above, there are various approximation (also referred to as pursuit) algorithms that have been developed for addressing the sparse representation problem:

We mention below a few of these main methods.

  • Matching pursuit is a greedy iterative algorithm for approximately solving the above problem. It works by gradually finding the locations of the non-zeros in one at a time. The core idea is to find in each step the column (atom) in that best correlates with the current residual (initialized to ), and then updating this residual to take the new atom and its coefficient into account. Matching pursuit might pick the same atom multiple times.
  • Orthogonal matching pursuit is very similar to matching pursuit, with one major difference: in each of the algorithm's step, all the non-zero coefficients are updated by a least squares. As a consequence, the residual is orthogonal to the already chosen atoms, and thus an atom cannot be picked more than once.
  • Stage-wise greedy methods: Improved variations over the above are algorithms that operate greedily while adding two critical features: (i) the ability to add groups of non-zeros at a time (instead of one non-zero per round); and (ii) including a pruning step in each round in which several of the atoms are discarded from the support. Representatives of this approach are the Subspace-Pursuit algorithm and the CoSaMP.[10]
  • Basis pursuit solves a convex relaxed version of the problem by replacing the by an -norm. Note that this only defines a new objective, while leaving open the question of the algorithm to use for getting the desired solution. Commonly considered such algorithms are the IRLS, LARS, and iterative soft-shrinkage methods.[11]
  • There are several other methods for solving sparse decomposition problems: homotopy method, coordinate descent, iterative hard-thresholding, first order proximal methods, which are related to the above-mentioned iterative soft-shrinkage algorithms, and Dantzig selector.

Applications

[edit]

Sparse approximation ideas and algorithms have been extensively used in signal processing, image processing, machine learning, medical imaging, array processing, data mining, and more. In most of these applications, the unknown signal of interest is modeled as a sparse combination of a few atoms from a given dictionary, and this is used as the regularization of the problem. These problems are typically accompanied by a dictionary learning mechanism that aims to fit to best match the model to the given data. The use of sparsity-inspired models has led to state-of-the-art results in a wide set of applications.[12][13][14] Recent work suggests that there is a tight connection between sparse representation modeling and deep-learning.[15]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Sparse approximation is a fundamental problem in and that involves representing a given signal or vector as a of the fewest possible atoms (basis elements) selected from a possibly overcomplete , thereby achieving a sparse vector with minimal non-zero entries while minimizing reconstruction . This approach leverages the principle that many signals admit efficient sparse representations in appropriate bases, such as wavelets or Gabor functions, enabling compact encoding and robust feature extraction. The origins of sparse approximation trace back to early 20th-century work on sparse representations, with significant developments in the through subset selection methods in , and a surge in the driven by advances in theory and redundant dictionaries. Key challenges include solving the inherently combinatorial ℓ₀-minimization problem, which counts non-zero coefficients, often relaxed to the tractable ℓ₁-norm minimization for convex optimization. Dictionaries are typically characterized by their coherence, a measure of linear dependence between atoms, which influences recovery guarantees; low-coherence dictionaries ensure that sparse signals can be uniquely recovered under certain conditions, such as when the number of non-zeros K satisfies K < (1 + 1/μ)/2, where μ is the coherence. Prominent algorithms for sparse approximation fall into two main categories: greedy methods and convex relaxation techniques. Greedy algorithms, such as Matching Pursuit (MP) introduced in 1993, iteratively select the dictionary atom most correlated with the residual signal, while Orthogonal Matching Pursuit (OMP), introduced in 1994, orthogonalizes selections for improved accuracy. Convex methods, exemplified by Basis Pursuit (BP) from 1998, formulate the problem as minimizing the ℓ₁-norm subject to the constraint that the linear combination exactly reconstructs the signal, enabling efficient solutions via linear programming and exact recovery for sufficiently incoherent dictionaries. These techniques underpin modern applications in compressive sensing, image denoising, machine learning for feature selection, and biomedical signal analysis, where sparsity promotes interpretability and computational efficiency.

Fundamentals

Definition and Motivation

Sparse approximation is a signal representation technique that seeks to express a given signal yy as an approximate linear combination yΦxy \approx \Phi x, where Φ\Phi serves as a dictionary or basis matrix composed of atoms, and xx is a sparse coefficient vector with most entries equal to zero or negligible in magnitude. This approach leverages the principle that many real-world signals, such as images or audio, can be efficiently modeled using only a small subset of dictionary elements, promoting parsimony and interpretability in data modeling. Overcomplete dictionaries, which contain more atoms than the dimensionality of the signal, play a crucial role by allowing for sparser representations compared to complete orthogonal bases like Fourier or standard wavelet transforms, as they provide greater flexibility in capturing signal structures. The motivation for sparse approximation arises from its practical advantages in handling high-dimensional data efficiently. In data compression, sparsity enables the storage and transmission of signals using fewer coefficients, significantly reducing redundancy while preserving essential information—for instance, sparse approximation methods can compress images to under 1,000 bytes without substantial loss in quality. Noise reduction benefits from thresholding small coefficients to suppress artifacts, improving signal-to-noise ratios in applications like image denoising. Additionally, in feature selection, sparse models identify the most relevant atoms or variables, aiding tasks such as pattern recognition by focusing on dominant signal components and discarding irrelevant ones. These efficiencies stem from the observation that natural signals often exhibit inherent sparsity in suitable dictionaries, aligning with principles like Ockham's razor for simpler, more robust models. An illustrative example is the approximation of a simple 1D piecewise constant signal, such as a step function, using . In this case, the Haar basis captures the abrupt change with just a few non-zero coefficients corresponding to the scaling and wavelet functions at the discontinuity, yielding a highly sparse representation; in contrast, a Fourier basis would require many coefficients to approximate the same sharp transition, resulting in a denser vector. This demonstrates how sparsity reduces computational and storage demands while maintaining fidelity. The conceptual foundations of sparse approximation have early roots in 1970s signal processing, particularly in geophysics and statistics, where adaptive bases began to emerge for modeling complex data structures—for example, Claerbout and Muir's 1973 use of the 1\ell_1-norm for sparse seismic deconvolution—paving the way for modern developments in sparse modeling and its integration with techniques like dictionary learning.

Historical Development

The origins of sparse approximation trace back to the 1970s and early 1980s, when researchers in signal processing began exploring adaptive signal decomposition techniques to represent signals using fewer components for efficient analysis and compression. These early efforts laid the groundwork for sparsity concepts, particularly through the development of wavelet theory, where Stéphane Mallat contributed foundational work in the mid-1980s by establishing multiresolution frameworks that enabled sparse representations of signals in time-frequency domains. The field gained momentum in the 1990s with the introduction of greedy algorithms for sparse decomposition over redundant dictionaries. A pivotal milestone was the 1993 paper by Mallat and Zhifeng Zhang, which proposed the matching pursuit algorithm, allowing signals to be iteratively decomposed into atoms from overcomplete time-frequency dictionaries selected to best match local signal structures. This approach marked a shift from fixed orthogonal bases like Fourier or early wavelet transforms toward more flexible, adaptive representations. Another key development came in 1998 with the introduction of basis pursuit by Shaobing Chen, David Donoho, and Michael Saunders, which reformulated sparse approximation as a convex optimization problem to find the sparsest solution in the 1\ell_1-norm sense, promoting stable and unique recoveries. In the 2000s, sparse approximation evolved significantly through its integration with compressive sensing, pioneered by Emmanuel Candès and , who demonstrated that sparse signals could be accurately recovered from far fewer measurements than traditionally required, linking sparsity directly to undersampled data acquisition. Their 2005 work on decoding via linear programming and subsequent 2006 papers established theoretical guarantees for stable recovery under noise, catalyzing applications in imaging and beyond. Post-2000, the paradigm shifted further from fixed bases to overcomplete dictionaries learned from data, with methods like dictionary learning enabling adaptive, task-specific sparse models that improved approximation quality in diverse signal processing tasks.

Mathematical Formulation

Noiseless Case

In the noiseless case, the sparse approximation problem aims to represent a given signal yRmy \in \mathbb{R}^m exactly as a linear combination of the fewest possible atoms from an overcomplete dictionary ΦRm×n\Phi \in \mathbb{R}^{m \times n} (with n>mn > m), where the columns of Φ\Phi serve as the atoms. This is formulated as the problem minxRnx0subject toy=Φx,\min_{x \in \mathbb{R}^n} \|x\|_0 \quad \text{subject to} \quad y = \Phi x, where x0\|x\|_0 counts the number of nonzero entries in the coefficient vector xx, corresponding to the sparsity level. Direct minimization of the 0\ell_0-"norm" is NP-hard, as shown by reduction to known hard problems in sparse linear systems, rendering exact solutions computationally intractable for large-scale instances. To address this, the problem is often relaxed to the convex 1\ell_1-norm minimization known as Basis Pursuit: minxRnx1subject toy=Φx.\min_{x \in \mathbb{R}^n} \|x\|_1 \quad \text{subject to} \quad y = \Phi x. This formulation promotes sparsity by favoring solutions with concentrated energy in few coefficients while remaining solvable via linear programming. Geometrically, a sparse solution selects a minimal subset of dictionary atoms whose linear span contains yy, ensuring an exact linear reconstruction with the smallest support size (or minimal 1\ell_1-weight in the relaxation). Under suitable conditions on Φ\Phi, such as the restricted isometry property (RIP) of order 2k2k with constant δ2k<21\delta_{2k} < \sqrt{2} - 1
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.