Recent from talks
Nothing was collected or created yet.
Quantization (image processing)
View on WikipediaThis article needs additional citations for verification. (November 2012) |
Quantization, involved in image processing, is a lossy compression technique achieved by compressing a range of values to a single quantum (discrete) value. When the number of discrete symbols in a given stream is reduced, the stream becomes more compressible. For example, reducing the number of colors required to represent a digital image makes it possible to reduce its file size. Specific applications include DCT data quantization in JPEG and DWT data quantization in JPEG 2000.
Color quantization
[edit]Color quantization reduces the number of colors used in an image; this is important for displaying images on devices that support a limited number of colors and for efficiently compressing certain kinds of images. Most bitmap editors and many operating systems have built-in support for color quantization. Popular modern color quantization algorithms include the nearest color algorithm (for fixed palettes), the median cut algorithm, and an algorithm based on octrees.
It is common to combine color quantization with dithering to create an impression of a larger number of colors and eliminate banding artifacts.
Grayscale quantization
[edit]Grayscale quantization, also known as gray level quantization, is a process in digital image processing that involves reducing the number of unique intensity levels (shades of gray) in an image while preserving its essential visual information. This technique is commonly used for simplifying images, reducing storage requirements, and facilitating processing operations. In grayscale quantization, an image with N intensity levels is converted into an image with a reduced number of levels, typically L levels, where L<N. The process involves mapping each pixel's original intensity value to one of the new intensity levels. One of the simplest methods of grayscale quantization is uniform quantization, where the intensity range is divided into equal intervals, and each interval is represented by a single intensity value. Let's say we have an image with intensity levels ranging from 0 to 255 (8-bit grayscale). If we want to quantize it to 4 levels, the intervals would be [0-63], [64-127], [128-191], and [192-255]. Each interval would be represented by the midpoint intensity value, resulting in intensity levels of 31, 95, 159, and 223 respectively.
The formula for uniform quantization is:
Where:
- Q(x) is the quantized intensity value.
- x is the original intensity value.
- Δ is the size of each quantization interval.
Let's quantize an original intensity value of 147 to 3 intensity levels.
Original intensity value: x=147
Desired intensity levels: L=3
We first need to calculate the size of each quantization interval:
Using the uniform quantization formula:
Rounding 191.25 to the nearest integer, we get
So, the quantized intensity value of 147 to 3 levels is 191.
Frequency quantization for image compression
[edit]The human eye is fairly good at seeing small differences in brightness over a relatively large area, but not so good at distinguishing the exact strength of a high frequency (rapidly varying) brightness variation. This fact allows one to reduce the amount of information required by ignoring the high frequency components. This is done by simply dividing each component in the frequency domain by a constant for that component, and then rounding to the nearest integer. This is the main lossy operation in the whole process. As a result of this, it is typically the case that many of the higher frequency components are rounded to zero, and many of the rest become small positive or negative numbers.
As human vision is also more sensitive to luminance than chrominance, further compression can be obtained by working in a non-RGB color space which separates the two (e.g., YCbCr), and quantizing the channels separately.[1]
Quantization matrices
[edit]A typical video codec works by breaking the picture into discrete blocks (8×8 pixels in the case of MPEG[1]). These blocks can then be subjected to discrete cosine transform (DCT) to calculate the frequency components, both horizontally and vertically.[1] The resulting block (the same size as the original block) is then pre-multiplied by the quantization scale code and divided element-wise by the quantization matrix, and rounding each resultant element. The quantization matrix is designed to provide more resolution to more perceivable frequency components over less perceivable components (usually lower frequencies over high frequencies) in addition to transforming as many components to 0, which can be encoded with greatest efficiency. Many video encoders (such as DivX, Xvid, and 3ivx) and compression standards (such as MPEG-2 and H.264/AVC) allow custom matrices to be used. The extent of the reduction may be varied by changing the quantizer scale code, taking up much less bandwidth than a full quantizer matrix.[1]
This is an example of DCT coefficient matrix:
A common quantization matrix is:
Dividing the DCT coefficient matrix element-wise with this quantization matrix, and rounding to integers results in:
For example, using −415 (the DC coefficient) and rounding to the nearest integer
Typically this process will result in matrices with values primarily in the upper left (low frequency) corner. By using a zig-zag ordering to group the non-zero entries and run length encoding, the quantized matrix can be much more efficiently stored than the non-quantized version.[1]
See also
[edit]References
[edit]- ^ Smith, Steven W. (2003). Digital signal processing: a practical guide for engineers and scientists. Demystifying technology series. Amsterdam Boston: Newnes. ISBN 978-0-7506-7444-7.
Quantization (image processing)
View on GrokipediaFundamentals
Definition and Process
Quantization in image processing refers to the process of mapping a large set of continuous or high-precision input values, such as pixel intensities, to a smaller set of discrete output levels, thereby reducing the precision of the data while aiming to preserve essential visual information. This many-to-one mapping function typically decreases the number of bits required to represent each pixel—for instance, reducing 8-bit grayscale values (256 levels) to 4-bit values (16 levels)—to minimize storage requirements or facilitate efficient transmission over bandwidth-limited channels. As a lossy technique inherent to digital representation, quantization introduces irreversible information loss but is crucial for practical digital imaging systems.[4] The general process of quantization begins with analog-to-digital conversion, where a continuous analog image signal is first sampled spatially to form a discrete grid of pixels, followed by the assignment of discrete intensity levels to those pixel values. An input signal is divided into quantization intervals defined by a step size , and each value within an interval is mapped to the nearest representative level, producing the quantized output . For uniform quantization, this is mathematically expressed as where denotes the rounding operation to the nearest integer, ensuring even spacing of levels across the input range. This step size is often determined by the total range of input values divided by the number of desired levels minus one, balancing data reduction against perceptual quality degradation.[5][4] The origins of quantization in image processing trace back to the early 1960s, when advancements in computing and the space race prompted the development of techniques for efficient digital representation of images in resource-constrained environments. Pioneering efforts at NASA's Jet Propulsion Laboratory and Bell Laboratories applied quantization during the processing of lunar and planetary images; for example, the 1964 Ranger 7 mission transmitted the first digital images from the Moon, requiring quantization to convert analog camera signals into manageable digital formats for transmission back to Earth over limited bandwidth. These early applications, building on foundational signal processing concepts from the 1940s and 1950s, established quantization as a core prerequisite for digital imaging, assuming basic pixel representation while addressing the challenges of converting real-world analog scenes into discrete data.[6][4]Uniform and Non-Uniform Quantization
Uniform quantization divides the input signal range into equal intervals, assigning each interval to a discrete level with a constant step size . This approach simplifies implementation and computation, making it suitable for hardware and software processing in image systems. However, it assumes uniform perceptual importance across the dynamic range, which often leads to higher visible errors in regions where the human visual system is more sensitive, such as low-intensity areas. The quantization error for uniform quantization, modeled as uniformly distributed noise over , has a mean squared error (MSE) approximated by .[7] Non-uniform quantization employs variable step sizes to allocate more levels to signal regions with higher perceptual relevance, reducing overall distortion for a fixed number of bits. A common example is the logarithmic -law companding, which compresses the dynamic range nonlinearly before uniform quantization and expands it afterward. The -law companding function is given by , where is the compression parameter (typically 255 for standard applications) and is the peak signal value; the companded signal is then uniformly quantized. This method has been adapted for image processing to enhance dynamic range compression in restoration tasks, improving signal-to-noise ratios at low input levels.[8] To compare distortion between schemes, the signal-to-quantization-noise ratio (SQNR) for uniform quantization of a full-scale sinusoidal input with bits is dB, assuming quantization noise uniformly distributed over the Nyquist bandwidth. Non-uniform schemes can achieve higher effective SQNR in perceptually relevant regions by adapting steps to the signal's probability density, though they increase complexity.[9] Perceptual considerations drive the adoption of non-uniform quantization, as the human visual system exhibits greater sensitivity to changes in luminance than in chrominance, with significantly lower sensitivity to color differences, allowing for coarser quantization steps for chrominance components to minimize visible artifacts without sacrificing efficiency, as implemented in standards like JPEG through tailored quantization tables derived from psychovisual experiments.[10]Spatial Domain Quantization
Grayscale Quantization
Grayscale quantization reduces the bit depth of single-channel intensity images, mapping continuous or high-precision pixel values to a finite set of discrete levels to minimize storage and transmission requirements while aiming to preserve visual fidelity. In an 8-bit grayscale image, pixel intensities range from 0 to 255, representing 256 possible levels; quantization might truncate or round these to fewer levels, such as 16 in a 4-bit representation, by dividing the range into uniform intervals and assigning each pixel to the nearest reproduction value. For instance, with 16 levels spaced at intervals of approximately 17 (0, 17, 34, ..., 255), a pixel value of 200 would be rounded to 204, the nearest level, effectively compressing the dynamic range but potentially introducing visible steps in smooth gradients. Key techniques for grayscale quantization include posterization, which applies hard thresholds to enforce abrupt transitions between discrete intensity bands, creating a stylized, banded appearance often used for artistic or simplified rendering, and bit-plane slicing, which decomposes the image into eight binary planes (from least to most significant bit) to selectively discard lower planes for precision loss. In posterization, thresholds are set to map ranges of intensities to fixed output levels, such as reducing a 256-level image to 8 levels by grouping every 32 values into one, resulting in distinct tonal regions without intermediate shades. Bit-plane slicing, conversely, represents each pixel's value as a binary sum across planes (e.g., 204 = 11001100 in binary, contributing to planes 2, 3, 5, 6, 7, and 8); retaining only higher planes (e.g., the top 5) achieves 5-bit quantization (32 levels) with high compression ratios, as lower planes often contain noise or fine details that can be omitted with minimal perceptual impact.[11][12] The perceptual quality of quantized grayscale images is commonly evaluated using the Peak Signal-to-Noise Ratio (PSNR), which quantifies distortion relative to the original by comparing mean squared differences in pixel intensities. Defined as where is the maximum possible pixel value (typically 255 for 8-bit grayscale) and is the mean squared error between the original and quantized images, higher PSNR values indicate better fidelity; for example, reducing from 8-bit to 4-bit often yields PSNRs around 30-40 dB for natural images, balancing compression against visible artifacts like contouring. This metric, while objective, correlates reasonably with human perception for grayscale distortions but is sensitive to uniform quantization schemes.[13] Applications of grayscale quantization span early monochrome displays, where hardware limitations necessitated reducing intensity levels to achieve feasible resolutions, and medical imaging, where it enables bandwidth savings without compromising diagnostic utility. In early systems, such as those using conventional graphics cards limited to 8-bit output, techniques like video attenuation across RGB channels of color monitors effectively extended grayscale depth to 12 bits (over 4,000 levels) by combining signals for smoother gradients in vision research and displays. In medical contexts, quantizing 12-16 bit images to 10-bit levels supports transmission over constrained networks while aligning with human visual discrimination of approximately 700-900 just-noticeable differences, ensuring perceptual linearity via standards like DICOM GSDF for efficient storage and remote diagnostics.[14][15]Color Quantization
Color quantization is the process of reducing the number of distinct colors in a multi-channel image, typically from millions to hundreds, to facilitate storage, transmission, or display on devices with limited color depth while aiming to minimize perceptual distortion. Unlike scalar quantization in single-channel images, color quantization treats pixels as vectors in a multi-dimensional space, requiring algorithms that cluster colors to form a representative palette. This approach is essential for applications such as palette-based image formats (e.g., GIF or PNG with reduced colors) and real-time rendering on constrained hardware. Color space selection plays a critical role in achieving visually faithful results. In the RGB color space, which is device-dependent and not perceptually uniform, equal Euclidean distances do not correspond to equal perceived color differences, potentially leading to noticeable banding in quantized images. Perceptually uniform spaces like CIELAB (Lab*) address this by approximating human vision, where the L* component represents lightness and a*, b* capture opponent colors, ensuring that quantization errors are more evenly distributed across perceived color variations. Similarly, YCbCr separates luminance (Y) from chrominance (Cb, Cr), allowing independent quantization; this is advantageous because the human visual system exhibits higher spatial sensitivity to luminance changes than to chrominance, enabling coarser quantization of chroma channels without substantial quality loss. Quantization in such spaces often involves transforming the image, quantizing channels separately or jointly, and inverse-transforming back to RGB for display. Several algorithms have been developed for palette-based color quantization, focusing on partitioning the color space to select representative colors. The median-cut algorithm, introduced by Heckbert, operates by recursively subdividing the RGB color space into hyper-rectangular "boxes" (voxels) based on color population. The process begins by representing all unique colors as points in the 24-bit RGB cube; the box containing the most colors is selected, its longest dimension (R, G, or B axis with the greatest range) is identified, and the box is split perpendicular to that axis at the median color count to balance populations on both sides. This splitting continues until the desired number of boxes (e.g., 256) is reached, with each final box's centroid or average color serving as a palette entry. Median-cut is computationally efficient and produces palettes that approximate uniform color distribution but can struggle with sparse color regions. The octree quantization method, proposed by Gervautz and Purgathofer, employs a hierarchical tree structure to cluster colors, offering advantages in handling variable color densities. Colors are inserted into an octree where each level divides the RGB cube into eight equal subcubes (octants) based on binary splits along each axis (e.g., R > 128 or ≤128). Nodes represent color subcubes, with leaf nodes storing color counts; the tree is built to a fixed depth (typically 8 levels for 256 leaves) or until all colors are isolated. To generate a palette of size N, the octree is pruned by repeatedly removing the least populous nodes and promoting their parent as a representative color, weighted by subtree populations. This approach excels at preserving rare colors and is faster for building than exhaustive clustering, though it may require post-processing for optimal palette quality. Palette generation typically reduces a 24-bit color space (16,777,216 possible colors) to 256 or fewer entries, after which each original pixel is mapped to the nearest palette color using a distance metric. Error minimization occurs via nearest-neighbor assignment, often computed with Euclidean distance in the chosen color space; for perceptual accuracy, distances are preferably calculated in Lab* to align with human sensitivity. This mapping can introduce quantization error, measured as the average distortion between original and quantized colors, but the palette ensures compact representation suitable for indexed-color formats. Vector quantization provides a more general framework for color palette creation, treating each RGB pixel as a three-dimensional vector and learning a codebook of prototype vectors (palette colors) to minimize overall distortion. The seminal Linde-Buzo-Gray (LBG) algorithm, a variant of k-means clustering, initializes a codebook with k randomly selected or subsampled colors, then iteratively assigns each image color vector to the nearest codebook vector (using Euclidean distance) and updates codevectors as the centroids of their assigned clusters until convergence. In RGB space, Euclidean distance serves as the distortion measure, though transformations to Lab* or YCbCr are common to incorporate perceptual weighting, reducing the impact of errors in less-sensitive channels. This method yields high-quality palettes for arbitrary k but can be computationally intensive, with convergence depending on initialization to avoid local minima.Frequency Domain Quantization
Principles in Compression
In transform-based image compression schemes, quantization plays a pivotal role by reducing the precision of transform coefficients after applying transforms such as the Discrete Cosine Transform (DCT) or wavelet transform, thereby discarding less perceptually important frequency components to achieve data reduction.[10] In standards like JPEG, this lossy step follows the DCT on 8x8 blocks of the image, where coefficients representing high-frequency details are often quantized to zero, effectively eliminating fine spatial variations that contribute minimally to overall image fidelity.[10] Similarly, in JPEG2000, quantization applied to wavelet subbands targets high-frequency components across scales, leveraging the multi-resolution nature of the transform to prioritize low-frequency energy.[16] The human visual system (HVS) exhibits reduced sensitivity to high spatial frequencies, as characterized by its contrast sensitivity function, which peaks at low to mid frequencies and declines sharply beyond approximately 60 cycles per degree.[17] This perceptual property, rooted in Weber's law—where the just noticeable difference in stimulus intensity is proportional to the stimulus magnitude—allows for coarser quantization of alternating current (AC) coefficients, which capture high-frequency details, while direct current (DC) coefficients, representing average intensity, receive finer quantization to preserve luminance structure.[10][17] The quantization process typically involves dividing each transform coefficient by a scalar or matrix-derived value and rounding to the nearest integer, formalized as , where denotes the transform coefficient at frequency indices , and is the quantizer value.[10][18] Larger values for high frequencies result in many coefficients rounding to zero, concentrating energy in fewer non-zero terms for efficient entropy coding.[18] Due to its irreversible nature, quantization introduces information loss proportional to the degree of bit reduction, yet this enables high compression ratios, such as 10:1 in JPEG for visually acceptable quality in color images.[10] The loss is controlled by adjusting quantizer values, balancing file size against perceptual distortion while exploiting HVS limitations to minimize visible artifacts.[10]Quantization Matrices and Tables
In frequency domain quantization for image compression, quantization matrices are predefined 8x8 tables applied to Discrete Cosine Transform (DCT) coefficient blocks in standards like JPEG, where each entry scales the quantization step for a specific frequency component.[19] These matrices ensure that the 64 DCT coefficients within an 8x8 spatial block are divided by corresponding table values before rounding, reducing precision while prioritizing perceptual quality.[19] For luminance (brightness) components, the baseline JPEG matrix features smaller values in the upper-left corner, corresponding to low-frequency coefficients that are more visible to the human eye, allowing finer quantization steps there.[19] A standard example luminance matrix, recommended in the JPEG specification for typical viewing conditions, is shown below:| 16 | 11 | 10 | 16 | 24 | 40 | 51 | 61 |
|---|---|---|---|---|---|---|---|
| 12 | 12 | 14 | 19 | 26 | 58 | 60 | 55 |
| 14 | 13 | 16 | 24 | 40 | 57 | 69 | 56 |
| 14 | 17 | 22 | 29 | 51 | 87 | 80 | 62 |
| 18 | 22 | 37 | 56 | 68 | 109 | 103 | 77 |
| 24 | 35 | 55 | 64 | 81 | 104 | 113 | 92 |
| 49 | 64 | 78 | 87 | 103 | 121 | 120 | 101 |
| 72 | 92 | 95 | 98 | 112 | 100 | 103 | 99 |
| 17 | 18 | 24 | 47 | 99 | 99 | 99 | 99 |
|---|---|---|---|---|---|---|---|
| 18 | 21 | 26 | 66 | 99 | 99 | 99 | 99 |
| 24 | 26 | 56 | 99 | 99 | 99 | 99 | 99 |
| 47 | 66 | 99 | 99 | 99 | 99 | 99 | 99 |
| 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 |
| 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 |
| 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 |
| 99 | 99 | 99 | 99 | 99 | 99 | 99 | 99 |
