Hubbry Logo
Depth mapDepth mapMain
Open search
Depth map
Community hub
Depth map
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Depth map
Depth map
from Wikipedia

In 3D computer graphics and computer vision, a depth map is an image or image channel that contains information relating to the distance of the surfaces of scene objects from a viewpoint. The term is related (and may be analogous) to depth buffer, Z-buffer, Z-buffering, and Z-depth.[1] The "Z" in these latter terms relates to a convention that the central axis of view of a camera is in the direction of the camera's Z axis, and not to the absolute Z axis of a scene.

Examples

[edit]

Two different depth maps can be seen here, together with the original model from which they are derived. The first depth map shows luminance in proportion to the distance from the camera. Nearer surfaces are darker; further surfaces are lighter. The second depth map shows luminance in relation to the distances from a nominal focal plane. Surfaces closer to the focal plane are darker; surfaces further from the focal plane are lighter, (both closer to and also further away from the viewpoint).[citation needed]

Uses

[edit]
Fog effect
Shallow depth of field effect

Depth maps have a number of uses, including:

  • Simulating the effect of uniformly dense semi-transparent media within a scene - such as fog, smoke or large volumes of water.
  • Simulating shallow depths of field - where some parts of a scene appear to be out of focus. Depth maps can be used to selectively blur an image to varying degrees. A shallow depth of field can be a characteristic of macro photography and so the technique may form a part of the process of miniature faking.
  • Z-buffering and z-culling, techniques which can be used to make the rendering of 3D scenes more efficient. They can be used to identify objects hidden from view and which may therefore be ignored for some rendering purposes. This is particularly important in real time applications such as computer games, where a fast succession of completed renders must be available in time to be displayed at a regular and fixed rate.
  • Shadow mapping - part of one process used to create shadows cast by illumination in 3D computer graphics. In this use, the depth maps are calculated from the perspective of the lights, not the viewer.[2]
  • To provide the distance information needed to create and generate autostereograms and in other related applications intended to create the illusion of 3D viewing through stereoscopy .
  • Subsurface scattering - can be used as part of a process for adding realism by simulating the semi-transparent properties of translucent materials such as human skin.
  • In computer vision single-view or multi-view images depth maps, or other types of images, are used to model 3D shapes or reconstruct them.[3] Depth maps can be generated by 3D scanners[4] or reconstructed from multiple images.[5]
  • In Machine vision and computer vision, to allow 3D images to be processed by 2D image tools.
Generating and reconstructing 3D shapes from single or multi-view depth maps or silhouettes[3]
  • Making depth image datasets.[6]

Limitations

[edit]
  • Single channel depth maps record the first surface seen, and so cannot display information about those surfaces seen or refracted through transparent objects, or reflected in mirrors. This can limit their use in accurately simulating depth of field or fog effects.
  • Single channel depth maps cannot convey multiple distances where they occur within the view of a single pixel. This may occur where more than one object occupies the location of that pixel. This could be the case - for example - with models featuring hair, fur or grass. More generally, edges of objects may be ambiguously described where they partially cover a pixel.
  • Depending on the intended use of a depth map, it may be useful or necessary to encode the map at higher bit depths. For example, an 8 bit depth map can only represent a range of up to 256 different distances.
  • Depending on how they are generated, depth maps may represent the perpendicular distance between an object and the plane of the scene camera. For example, a scene camera pointing directly at - and perpendicular to - a flat surface may record a uniform distance for the whole surface. In this case, geometrically, the actual distances from the camera to the areas of the plane surface seen in the corners of the image are greater than the distances to the central area. For many applications, however, this discrepancy is not a significant issue.

References

[edit]

See also

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
A depth map, also known as a range image, is a two-dimensional image in which the intensity value of each pixel encodes the perpendicular distance from a reference viewpoint—typically a —to the corresponding surface point in a three-dimensional scene, thereby providing an explicit representation of the scene's . This structure contrasts with traditional intensity , which capture only color or , and enables direct quantification of spatial layout without full volumetric modeling. Depth maps are acquired through diverse methods in and graphics. In stereo vision, they are generated by computing the disparity—the horizontal pixel shift—between corresponding points in pairs captured by cameras separated by a known baseline, with depth being inversely proportional to this disparity and dependent on the cameras' . Direct range imaging techniques, such as time-of-flight sensors that measure the round-trip time of emitted light pulses or structured light systems that project and analyze pattern distortions via , produce dense depth maps with high accuracy for nearby objects. Additionally, monocular depth estimation leverages models trained on -depth pairs to infer depth from a single using cues like texture gradients, , and global scene context, often modeled via multiscale Markov random fields for improved precision. These representations are pivotal in numerous applications, underpinning 3D scene reconstruction from sparse or incomplete data, robotic for avoidance, and systems for occlusive interactions between virtual and real elements. In advanced contexts, depth maps facilitate (SLAM), semantic segmentation, , and free-viewpoint rendering in , enhancing spatial understanding across fields like autonomous driving and .

Fundamentals

Definition

A depth map is a two-dimensional image or where each pixel's intensity or numerical value encodes the (depth) from a viewpoint, typically a camera, to the corresponding point on the surface of objects in the scene. This representation captures the geometric structure of a 3D scene in a compact, per-pixel format, often visualized with values where brighter intensities indicate closer and darker ones farther away. Unlike color or intensity maps, which record visual properties like RGB values or , a depth map exclusively focuses on spatial depth information, enabling the reconstruction of 3D from 2D projections. In , depth maps are fundamentally related to the Z-buffer (or depth buffer), a that stores the Z-coordinate (depth) for each during rendering to resolve occlusions and hidden surfaces by comparing depths of overlapping fragments. The Z-buffer algorithm, originally proposed for efficient hidden surface removal, produces a depth map as its output, where the final depth value at each represents the closest surface to the viewpoint along the ray through that . Depth is defined as the perpendicular distance from the viewpoint to the scene surface or, in camera coordinates, the Z-component along the , providing a direct measure of distance relative to the imaging plane. In the , with the viewpoint at the origin, the depth Z(x, y) is the Z-component (Z_c) of the 3D point in camera coordinates, representing the distance along the from the camera to the . The full from the viewpoint to the point is \sqrt{X_c^2 + Y_c^2 + Z_c^2}, but depth maps commonly store Z_c for perspective-correct rendering and reconstruction. This formulation underpins depth maps' utility in representing 3D space inversely, as depth increases nonlinearly with distance due to perspective projection. Depth maps differ from related concepts like disparity maps, which instead encode the horizontal offset between corresponding points in a pair and are inversely related to actual depth via the camera baseline and (i.e., Z1/disparityZ \propto 1 / \text{disparity}). While disparity maps facilitate -based depth inference, depth maps provide absolute metric distances directly. Depth maps are commonly generated via sensors or algorithms, serving as a foundational in 3D processing.

Historical Development

The concept of depth maps emerged in the early 1970s within , primarily as a solution to the hidden surface removal problem in rendering three-dimensional scenes. A foundational contribution was the 1972 algorithm by Martin E. Newell, Robert G. Newell, and Tomás L. Sancha, which addressed visibility by sorting polygons based on depth priorities to determine which surfaces were occluded. This work laid the groundwork for depth-based rendering techniques. Building on this, introduced the Z-buffer algorithm in his 1974 PhD thesis at the , where a per-pixel depth value is stored in a buffer to resolve visibility during rasterization, enabling efficient hidden surface elimination without explicit sorting. By the , depth maps had become integral to offline rendering in software, supporting anti-aliased hidden surface algorithms and curved surface subdivision. The transition to digital depth maps accelerated in the early with their application in CGI, shifting from analog depth cues like optical mattes to precise digital integration. Concurrently, depth maps evolved toward structured methods, with early systems in the projecting patterns for industrial , as detailed in works on active stereo vision. Concurrently, in , depth maps emerged through stereo vision techniques, with seminal work on computational stereo matching by David Marr and in 1979, enabling depth estimation from image pairs. The 1990s saw widespread adoption of depth maps for real-time rendering, particularly in video games, as hardware capabilities improved. The console, released in 1996, incorporated a Z-buffer in its Reality Co-Processor, allowing developers to handle complex 3D scenes with dynamic depth testing, a significant leap from prior polygon-sorting techniques in software renderers. This era marked depth maps' maturation for interactive applications. In the 2000s, depth maps extended into , with consumer accessibility boosted by the Kinect sensor in 2010, which employed structured light to generate real-time depth maps for motion tracking and .

Representation and Formats

Data Structures

Depth maps are typically stored as single-channel images, where intensities represent depth values. Common formats include 8-bit or 16-bit unsigned representations for quantized depth, such as CV_8UC1 or CV_16UC1 in , with values often scaled in millimeters for devices like the Microsoft Kinect. For higher precision, floating-point arrays like CV_32FC1 or CV_64FC1 are used, storing depth in meters without quantization loss. These can be saved in image file formats like , which supports 16-bit channels for efficient storage of depth data, often serialized with metadata for camera intrinsics. In RGB-D formats, depth maps are integrated with color images, either as separate channels in multi-layer files (e.g., EXR) or paired files (RGB in / and depth in 16-bit ), enabling combined processing for applications like scene reconstruction. This integration maintains alignment between color and depth pixels, typically assuming the same resolution and . Encoding schemes for depth maps balance precision and dynamic range, with linear scaling mapping depth Z directly to pixel values (e.g., 0-65535 for 0-65m in 16-bit). Non-linear schemes, such as inverse depth encoding (D = a/Z + b, where a ensures fidelity up to a reference distance Z₀), allocate more bits to nearer objects for improved precision where quantization errors are most perceptible. Quantization in discrete representations, like 16-bit integers, introduces errors proportional to depth squared in linear encoding, but inverse methods mitigate this, achieving near-lossless compression with PSNR >32 dB at bitrates of 1-2 bpp. Storage considerations emphasize memory efficiency, with single-channel depth matrices (e.g., OpenCV's cv::Mat) using less space than multi-channel RGB equivalents—typically 2 bytes per pixel for 16-bit vs. 3 for 8-bit RGB. Compatibility with standards like OpenCV's depth matrices or PNG's single-channel mode ensures interoperability, while scaling factors (e.g., 1000 for mm-to-m conversion) handle unit variations without altering the core structure. Mathematically, a depth map is represented as a 2D array D[i,j]D[i,j], where ii and jj are pixel row and column indices, and D[i,j]D[i,j] denotes the depth value Z at that position. To convert to 3D points in the camera , the equations are: X=xZf,Y=yZfX = \frac{x \cdot Z}{f}, \quad Y = \frac{y \cdot Z}{f} where (x,y)(x, y) are normalized image coordinates ( offsets from the principal point), Z = D[i,j] is the depth, and f is the . This projection assumes a , with full intrinsics extending to X=(ucx)Z/fxX = (u - c_x) \cdot Z / f_x, Y=(vcy)Z/fyY = (v - c_y) \cdot Z / f_y.

Visualization Methods

Depth maps are commonly rendered using pseudocolor techniques, where depth values are assigned colors via colormaps to emphasize gradients and facilitate human interpretation. Popular colormaps include jet, which transitions through a rainbow spectrum, and viridis, a perceptually uniform option that maintains consistent luminance changes across blue, green, and yellow hues for better accessibility in scientific visualization. These approaches enhance the visibility of depth variations in scalar fields like depth data. Additional rendering methods include wireframe overlays, which outline the structural contours derived from depth edges to provide a skeletal view of the 3D geometry, and anaglyph stereo views generated by warping image pixels according to disparities computed from the depth map, enabling red-cyan glasses-based 3D perception. In pseudocolor examples, nearer objects are often depicted as bright or white regions, while distant ones appear dark or black, creating an intuitive inverse depth representation; disparity heatmaps similarly use color gradients to illustrate horizontal pixel shifts in stereo pairs, correlating directly to depth cues. Tools such as support depth rendering through functions like pcfromdepth, which converts depth images to point clouds using camera intrinsics and enables visualization via pcshow for interactive 3D displays. Blender facilitates depth-derived point cloud generation and rendering, allowing users to import, manipulate, and visualize large datasets in immersive 3D environments. Interpreting visualized depth maps involves challenges like occlusions, where foreground objects obscure background depths, resulting in gaps or artifacts in the output, and noise from sensor limitations, which introduces speckles that obscure fine details. To address representation in 3D space, point clouds are generated from depth maps using the camera intrinsics matrix KK, with the 3D point P\mathbf{P} at pixel coordinates (u,v)(u, v) and depth ZZ computed as: P=ZK1[uv1]\mathbf{P} = Z \cdot K^{-1} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.