Hubbry Logo
search
logo
2200384

Ray casting

logo
Community Hub0 Subscribers
Read side by side
from Wikipedia
Ray-cast image of idealized universal joint with shadow

Ray casting is the methodological basis for 3D CAD/CAM solid modeling and image rendering. It is essentially the same as ray tracing for computer graphics where virtual light rays are "cast" or "traced" on their path from the focal point of a camera through each pixel in the camera sensor to determine what is visible along the ray in the 3D scene.

The term "Ray Casting" was introduced by Scott Roth while at the General Motors Research Labs from 1978–1980. His paper, "Ray Casting for Modeling Solids",[1] describes modeled solid objects by combining primitive solids, such as blocks and cylinders, using the set operators union (+), intersection (&), and difference (−). The general idea of using these binary operators for solid modeling is largely due to Voelcker and Requicha's geometric modelling group at the University of Rochester.[2][3] See solid modeling for a broad overview of solid modeling methods.

Before ray casting (and ray tracing), computer graphics algorithms projected surfaces or edges (e.g., lines) from the 3D world to the image plane where visibility logic had to be applied. The world-to-image plane projection is a 3D homogeneous coordinate system transformation, also known as 3D projection, affine transformation, or projective transform (homography). Rendering an image this way is difficult to achieve with hidden surface/edge removal. Plus, silhouettes of curved surfaces have to be explicitly solved for whereas it is an implicit by-product of ray casting, so there is no need to explicitly solve for it whenever the view changes.

Ray casting greatly simplified image rendering of 3D objects and scenes because a line transforms to a line. So, instead of projecting curved edges and surfaces in the 3D scene to the 2D image plane, transformed lines (rays) are intersected with the objects in the scene. A homogeneous coordinate transformation is represented by a 4×4 matrix. The mathematical technique is common to computer graphics and geometric modeling.[4] A transform includes rotations around the three axes, independent scaling along the axes, translations in 3D, and even skewing. Transforms are easily concatenated via matrix arithmetic. For use with a 4×4 matrix, a point is represented by [X, Y, Z, 1], and a direction vector is represented by [Dx, Dy, Dz, 0]. (The fourth term is for translation, which does not apply to direction vectors.)

Concept

[edit]
Demonstration of a ray casting sweep through a video game level

Ray casting is the most basic of many computer graphics rendering algorithms that use the geometric algorithm of ray tracing. Ray tracing-based rendering algorithms operate in image order to render three-dimensional scenes to two-dimensional images. Geometric rays are traced from the eye of the observer to sample the light (radiance) travelling toward the observer from the ray direction. The speed and simplicity of ray casting comes from computing the color of the light without recursively tracing additional rays that sample the radiance incident on the point that the ray hit. This eliminates the possibility of accurately rendering reflections, refractions, or the natural falloff of shadows; however all of these elements can be faked to a degree, by creative use of texture maps or other methods. The high speed of calculation made ray casting a handy rendering method in early real-time 3D video games.

The idea behind ray casting is to trace rays from the eye, one per pixel, and find the closest object blocking the path of that ray—think of an image as a screen-door, with each square in the screen being a pixel. This is then the object the eye sees through that pixel. Using the material properties and the effect of the lights in the scene, this algorithm can determine the shading of this object. The simplifying assumption is made that if a surface faces a light, the light will reach that surface and not be blocked or in shadow. The shading of the surface is computed using traditional 3D computer graphics shading models. One important advantage ray casting offered over older scanline algorithms was its ability to easily deal with non-planar surfaces and solids, such as cones and spheres. If a mathematical surface can be intersected by a ray, it can be rendered using ray casting. Elaborate objects can be created by using solid modelling techniques and easily rendered.

From the abstract for the paper "Ray Casting for Modeling Solids":[5]

To visualize and analyze the composite solids modeled, virtual light rays are cast as probes. By virtue of its simplicity, ray casting is reliable and extensible. The most difficult mathematical problem is finding line-surface intersection points. So, surfaces as planes, quadrics, tori, and probably even parametric surface patches may bound the primitive solids. The adequacy and efficiency of ray casting are issues addressed here. A fast picture generation capability for interactive modeling is the biggest challenge.

Camera models
Camera models

Light rays and the camera geometry form the basis for all geometric reasoning here. This figure shows a pinhole camera model for perspective effect in image processing and a parallel camera model for mass analysis. The simple pinhole camera model consists of a focal point (or eye point) and a square pixel array (or screen). Straight light rays pass through the pixel array to connect the focal point with the scene, one ray per pixel. To shade pictures, the rays’ intensities are measured and stored as pixels. The reflecting surface responsible for a pixel’s value intersects the pixel’s ray.

When the focal length, distance between focal point and screen, is infinite, then the view is called “parallel” because all light rays are parallel to each other, perpendicular to the screen. Although the perspective view is natural for making pictures, some applications need rays that can be uniformly distributed in space.

Concept model

[edit]

For modeling convenience, a typical standard coordinate system for the camera has the screen in the X–Y plane, the scene in the +Z half space, and the focal point on the −Z axis.

Camera local coordinate system with the "screen" in the Z=0 plane

A ray is simply a straight line in the 3D space of the camera model. It is best defined in parameterized form as a point vector (X0, Y0, Z0) and a direction vector (Dx, Dy, Dz). In this form, points on the line are ordered and accessed via a single parameter t. For every value of t, a corresponding point (X, Y, Z) on the line is defined:

X = X0 + t · Dx
Y = Y0 + t · Dy
Z = Z0 + t · Dz

If the vector is normalized, then the parameter t is distance along the line. The vector can be normalized easily with the following computation:

Dist = √(Dx2 + Dy2 + Dz2)
D'x = Dx / Dist
D'y = Dy / Dist
D'z = Dz / Dist

Given geometric definitions of the objects, each bounded by one or more surfaces, the result of computing one ray’s intersection with all bounded surfaces in the screen is defined by two arrays:

Ray parameters:   t[1], t[2], …, t[n]
Surface pointers: S[1], S[2], …, S[n]

Where n is the number of ray-surface intersections. The ordered list of ray parameters, t[i], denote the enter–exit points. The ray enters a solid at point t[1], exits at t[2], enters a solid at t[3], etc. Point t[1] is closest to the camera and t[n] is furthest.

In association with the ray parameters, the surface pointers contain a unique address for the intersected surface’s information. The surface can have various properties such as color, specularity, transparency with/without refraction, translucency, etc. The solid associated with the surface may have its own physical properties such as density. This could be useful, for instance, when an object consists of an assembly of different materials and the overall center of mass and moments of inertia are of interest.

Applications

[edit]

Three algorithms using ray casting are to make line drawings, to make shaded pictures, and to compute volumes and other physical properties. Each algorithm, given a camera model, casts one ray per pixel in the screen. For computing volume, the resolution of the pixel screen to use depends on the desired accuracy of the solution. For line drawings and picture shading, the resolution determines the quality of the image.

Line drawings

[edit]
Example line drawings made by casting rays. Two are standard plan views. One shows hidden edges as dashed lines.

To draw the visible edges of a solid, generate one ray per pixel moving top-down, left-right in the screen. Evaluate each ray in order to identify the visible surface S[1], the first surface pointer in the sorted list of ray-surface intersections. If the visible surface at pixel location (X, Y) is different than the visible surface at pixel (X−1, Y), then display a vertical line one pixel long centered at (X−½, Y). Similarly, if the visible surface at (X, Y) is different than the visible surface at pixel (X, Y−1), then display a horizontal line one pixel long centered at (X, Y−½). The resulting drawing will consist of horizontal and vertical edges only, looking jagged in coarse resolutions.

Roth’s ray casting system generated the images of solid objects on the right. Box enclosures, dynamic bounding, and coherence were used for optimization. For each picture, the screen was sampled with a density of about 100×100 (e.g., 10,000) rays and new edges were located via binary searches. Then all edges were followed by casting additional rays at one pixel increments on the two sides of the edges. Each picture was drawn on a Tektronix tube at 780×780 resolution.

Shaded pictures

[edit]

To make a shaded picture, again cast one ray per pixel in the screen. This time, however, use the visible surface pointer S[1] at each pixel to access the description of the surface. From this, compute the surface normal at the visible point t[1]. The pixel’s value, the displayable light intensity, is proportional to the cosine of the angle formed by the surface normal and the light-source-to-surface vector. Processing all pixels this way produces a raster-type picture of the scene.

Computing volume and moments of inertia

[edit]

The volume (and similar properties) of a solid bounded by curved surfaces is easily computed by the “approximating sums” integration method, by approximating the solid with a set of rectangular parallelepipeds. This is accomplished by taking an “in-depth” picture of the solid in a parallel view. Casting rays through the screen into the solid partitions the solid into volume elements. Two dimensions of the parallelepipeds are constant, defined by the 2D spacing of rays in the screen. The third dimension is variable, defined by the enter–exit point computed. Specifically, if the horizontal and vertical distances between rays in the screen is S, then the volume “detected” by each ray is:

S × S ×  (t[2]-t[1]  +  t[4]-t[3]  + … + t[n]-t[n-1]) / L

where L is defined as the length of the direction vector. (If already normalized, this is equal to 1.)

L = √(Dx2 + Dy2 + Dz2)

Each (t[i]-t[i-1])/L is a length of a ray segment that is inside of the solid.

This figure shows the parallelepipeds for a modeled solid using ray casting. This is a use of parallel-projection camera model.

Solid modeled by parallelepipeds

In–out ray classification

[edit]
Ray in binary solid construction

This figure shows an example of the binary operators in a composition tree using “+” and “−” where a single ray is evaluated.

The ray casting procedure starts at the top of the solid composition tree, recursively descends to the bottom, classifies the ray with respect to the primitive solids, and then returns up the tree combining the classifications of the left and right subtrees.

This figure illustrates the combining of the left and right classifications for all three binary operators.

The three binary operations: union (+), intersection (&), and difference (−)

Realistic shaded pictures

[edit]

Ray casting is a natural modeling tool for making shaded pictures. The grayscale ray-casting system developed by Scott Roth and Daniel Bass at GM Research Labs produced pictures on a Ramtek color raster display around 1979. To compose pictures, the system provided the user with the following controls:

  • View
    • Viewing direction and position
    • Focal length: width-angle perspective to parallel
    • Zoom factor
  • Illumination
    • Number of light sources
    • Locations and intensities of lights
    • Optionally shadow
    • Intensities of ambient light and background
  • Surface reflectance
Two point light sources produce shadows

This figure shows a table scene with shadows from two point light sources.

Shading algorithms that implement all of the realistic effects are computationally expensive, but relatively simple. For example, the following figure shows the additional rays that could be cast for a single light source.

Follow up rays for effects
Follow up rays for effects

For a single pixel in the image to be rendered, the algorithm casts a ray starting at the focal point and determines that it intersects a semi-transparent rectangle and a shiny circle. An additional ray must then be cast starting at that point in the direction symmetrically opposite the surface normal at the ray-surface intersection point in order to determine what is visible in the mirrored reflection. That ray intersects the triangle which is opaque. Finally, each ray-surface intersection point is tested to determine if it is in shadow. The “Shadow feeler” ray is cast from the ray-surface intersection point to the light source to determine if any other surface blocks that pathway.

Turner Whitted calls the secondary and additional rays “Recursive Ray Tracing”.[6] [A room of mirrors would be costly to render, so limiting the number of recursions is prudent.] Whitted modeled refraction for transparencies by generating a secondary ray from the visible surface point at an angle determined by the solid’s index of refraction. The secondary ray is then processed as a specular ray. For the refraction formula and pictorial examples, see Whitted’s paper.

Enclosures and efficiency

[edit]

Ray casting qualifies as a brute force method for solving problems. The minimal algorithm is simple, particularly in consideration of its many applications and ease of use, but applications typically cast many rays. Millions of rays may be cast to render a single frame of an animated film. Computer processing time increases with the resolution of the screen and the number of primitive solids/surfaces in the composition.

Tree of enclosures

By using minimum bounding boxes around the solids in the composition tree, the exhaustive search for a ray-solid intersection resembles an efficient binary search. The brute force algorithm does an exhaustive search because it always visits all the nodes in the tree—transforming the ray into primitives’ local coordinate systems, testing for ray-surface intersections, and combining the classifications—even when the ray clearly misses the solid. In order to detect a “clear miss”, a faster algorithm uses the binary composition tree as a hierarchical representation of the space that the solid composition occupies. But all position, shape, and size information is stored at the leaves of the tree where primitive solids reside. The top and intermediate nodes in the tree only specify combine operators.

Characterizing with enclosures the space that all solids fill gives all nodes in the tree an abstract summary of position and size information. Then, the quick “ray intersects enclosure” tests guide the search in the hierarchy. When the test fails at an intermediate node in the tree, the ray is guaranteed to classify as out of the composite, so recursing down its subtrees to further investigate is unnecessary.

Accurately assessing the cost savings for using enclosures is difficult because it depends on the spatial distribution of the primitives (the complexity distribution) and on the organization of the composition tree. The optimal conditions are:

  • No primitive enclosures overlap in space
  • Composition tree is balanced and organized so that sub-solids near in space are also nearby in the tree

In contrast, the worst condition is:

  • All primitive enclosures mutually overlap

The following are miscellaneous performance improvements made in Roth’s paper on ray casting, but there have been considerable improvements subsequently made by others.

Early Outs
If the operator at a composite node in the tree is − or & and the ray classifies as out of the composite’s left sub-solid, then the ray will classify as out of the composite regardless of the ray’s classification with respect to the right sub-solid. So, classifying the ray with respect to the right sub-solid is unnecessary and should be avoided for efficiency.
Transformations
By initially combining the screen-to-scene transform with the primitive’s scene-to-local transform and storing the resulting screen-to-local transforms in the primitive’s data structures, one ray transform per ray-surface intersection is eliminated.
Recursion
Given a deep composition tree, recursion can be expensive in combination with allocating and freeing up memory. Recursion can be simulated using static arrays as stacks.
Dynamic Bounding
If only the visible edges of the solid are to be displayed, the ray casting algorithm can dynamically bound the ray to cut off the search. That is, after finding that a ray intersects a sub-solid, the algorithm can use the intersection point closest to the screen to tighten the depth bound for the “ray intersections box” test. This only works for the + part of the tree, starting at the top. With – and &, nearby “in” parts of the ray may later become “out”.
Coherence
The principle of coherence is that the surfaces visible at two neighboring pixels are more likely to be the same than different. Developers of computer graphics and vision systems have applied this empirical truth for efficiency and performance. For line drawings, the image area containing edges is normally much less than the total image area, so ray casting should be concentrated around the edges and not in the open regions. This can be effectively implemented by sparsely sampling the screen with rays and then locating, when neighboring rays identify different visible surfaces, the edges via binary searches.

Anti-aliasing

[edit]

The jagged edges caused by aliasing is an undesirable effect of point sampling techniques and is a classic problem with raster display algorithms. Linear or smoothly curved edges will appear jagged and are particularly objectionable in animations because movement of the image makes the edges appear fuzzy or look like little moving escalators. Also, details in the scene smaller than the spacing between rays may be lost. The jagged edges in a line drawing can be smoothed by edge following. The purpose of such an algorithm is to minimize the number of lines needed to draw the picture within one pixel accuracy. Smooth edges result. The line drawings above were drawn this way.

To smooth the jagged edges in a shaded picture with subpixel accuracy, additional rays should be cast for information about the edges. (See Supersampling for a general approach.) Edges are formed by the intersection of surfaces or by the profile of a curved surface. Applying "Coherence" as described above via binary search, if the visible surface at pixel (X,Y) is different than the visible surface at pixel (X+1,Y), then a ray could be generated midway between them at (X+½,Y) and the visible surface there identified. The distance between sample points could be further subdivided, but the search need not be deep. The primary search depth to smooth jagged edges is a function of the intensity gradient across the edge. The cost for smoothing jagged edges is affordable, since:

  • the area of the image that contains edges is usually a small percentage of the total area; and
  • the extra rays cast in binary searches can be bounded in depth (that of the visible primitives forming the edges).

History

[edit]

For the history of ray casting, see Ray tracing (graphics) as both are essentially the same technique under different names. Scott Roth had invented the term "ray casting" before having heard of "ray tracing". Additionally, Scott Roth's development of ray casting at GM Research Labs occurred concurrently with Turner Whitted's ray tracing work at Bell Labs.

Ray casting in early computer games

[edit]
Game using ray casting rendering, making use of advanced techniques to render floor at multiple height levels

In early first person games, raycasting was used to efficiently render a 3D world from a 2D playing field using a simple one-dimensional scan over the horizontal width of the screen.[7] Early first-person shooters used 2D ray casting as a technique to create a 3D effect from a 2D world. While the world appears 3D, the player cannot look up or down or only in limited angles with shearing distortion.[7][8] This style of rendering eliminates the need to fire a ray for each pixel in the frame as is the case with modern engines; once the hit point is found the projection distortion is applied to the surface texture and an entire vertical column is copied from the result into the frame. This style of rendering also imposes limitations on the type of rendering which can be performed, for example depth sorting but depth buffering may not. That is polygons must be full in front of or behind one another, they may not partially overlap or intersect.

Wolfenstein 3D

[edit]

The video game Wolfenstein 3D was built from a square based grid of uniform height walls meeting solid-colored floors and ceilings. In order to draw the world, a single ray was traced for every column of screen pixels and a vertical slice of wall texture was selected and scaled according to where in the world the ray hits a wall and how far it travels before doing so.[9]

The purpose of the grid based levels was twofold — ray-wall collisions can be found more quickly since the potential hits become more predictable and memory overhead is reduced. However, encoding wide-open areas takes extra space.

ShadowCaster

[edit]

The Raven Software game ShadowCaster uses an improved Wolfenstein-based engine with added floors and ceilings texturing and variable wall heights.

Comanche series

[edit]

The Voxel Space engine developed by NovaLogic for the Comanche games traced a ray through each column of screen pixels and tested each ray against points in a heightmap. Then it transformed each element of the heightmap into a column of pixels, determined which are visible (that is, have not been occluded by pixels that have been drawn in front), and drew them with the corresponding color from the texture map.[10]

Beyond raycasting

[edit]

Later DOS games like id Software's DOOM kept many of the raycasting 2.5D restrictions for speed but went on to switch to alternative rendering techniques (like BSP), making them no longer raycasting engines.[11]

Computational geometry setting

[edit]

In computational geometry, the ray casting problem is also known as the ray shooting problem and may be stated as the following query problem: given a set of objects in d-dimensional space, preprocess them into a data structure so that for each query ray, the initial object hit by the ray can be found quickly. The problem has been investigated for various settings: space dimension, types of objects, restrictions on query rays, etc.[12] One technique is to use a sparse voxel octree.

See also

[edit]

References

[edit]
[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Ray casting is a rendering technique in computer graphics that generates images by projecting rays from a virtual camera (or eye point) through each pixel of the image plane into a 3D scene, identifying the nearest intersection with scene objects to determine visibility, surface properties, and basic shading effects such as shadows and highlights.[1] This image-order algorithm simplifies the simulation of light propagation compared to more complex methods, focusing primarily on primary rays for direct illumination without recursive secondary rays for phenomena like reflections or refractions.[1] The technique was pioneered by Arthur Appel in 1968 as a method to address the hidden surface problem in solid modeling and to incorporate shading and shadows for more realistic machine-generated images of three-dimensional objects.[2] In Appel's approach, rays are traced from the viewing position through the image to detect occlusions and compute tonal variations based on surface normals, light sources, and material properties, enabling automated production of shaded renderings that mimic photographic depth and texture.[2] This foundational work laid the groundwork for subsequent advancements in ray-based rendering, distinguishing ray casting from scanline methods by its pixel-centric processing that naturally handles complex geometries without preprocessing.[3] Ray casting gained prominence in real-time applications during the early 1990s, notably in id Software's Wolfenstein 3D (1992), where it efficiently rendered pseudo-3D labyrinths from 2D grid maps by casting rays to calculate wall distances and apply texture mapping, achieving interactive frame rates on limited hardware. Beyond gaming, it is widely applied in volume rendering for medical imaging, where rays traverse voxel-based datasets (e.g., CT or MRI scans) to composite semitransparent volumes and reveal internal structures like organs or tumors with high fidelity.[4] Modern implementations leverage hardware acceleration, such as GPUs, to extend ray casting for interactive visualization in fields including scientific simulation and virtual reality, while serving as a core component in hybrid rendering pipelines.[5]

Fundamentals

Definition and Principles

Ray casting is a rendering technique in computer graphics that simulates the process of viewing three-dimensional scenes by projecting rays from an observer's viewpoint through each pixel of the image plane to identify visible surfaces. This method determines the color of each pixel by computing the nearest intersection point between the ray and the scene's geometry, effectively modeling how light would reach the eye in a synthetic camera setup. Originally proposed for solid modeling, ray casting enables the visualization of complex solids composed of primitive shapes combined through operations like union, intersection, and difference.[6] The core principles of ray casting involve generating primary rays that originate from the camera or eye position and extend in directions defined by the pixel coordinates on the image plane. Unlike more advanced ray tracing, ray casting employs primary rays and non-recursive secondary rays (such as shadow rays) without recursion for effects like reflections or refractions, focusing on resolving visibility and basic shading including direct shadows. This approach decouples the computation for each pixel, allowing independent processing that contrasts with scanline rendering, which processes the image row by row and requires maintaining active edge lists for efficiency. Ray casting's ray-based probing simplifies handling non-planar surfaces and arbitrary solid complexities compared to scanline methods' reliance on planar projections. Basic shading may involve casting additional non-recursive shadow rays from the intersection point to each light source to determine if the point is occluded, enabling direct shadows without recursion.[6][6] Key advantages of ray casting include its conceptual simplicity, which facilitates implementation and extensibility for interactive applications, and its computational efficiency for real-time rendering in scenarios with moderate scene complexity. These attributes stem from the direct per-pixel computation and avoidance of global scene preprocessing, making it suitable for early CAD/CAM systems and basic 3D visualizations. However, limitations arise from the absence of recursive secondary rays, preventing the simulation of global illumination effects like indirect shadows or interreflections from multiple light bounces.[6][6][2] An illustrative example of the basic ray casting algorithm can be expressed in pseudocode as follows (note: full shading may include shadow ray tests):
for each [pixel](/page/Pixel) in the [image plane](/page/Image_plane):
    ray = generate_ray_from_camera(pixel_coordinates)
    closest_intersection = None
    min_distance = infinity
    for each object in the scene:
        intersection = intersect(ray, object)
        if intersection and intersection.distance < min_distance:
            min_distance = intersection.distance
            closest_intersection = intersection
    if closest_intersection:
        pixel_color = shade(closest_intersection)  # May involve shadow rays to lights
    else:
        pixel_color = background_color
This loop initializes a ray for each pixel, tests it against scene objects to find the nearest hit, and assigns a color based on local shading, demonstrating the technique's straightforward iteration over pixels and geometry.[7]

Mathematical Model

In ray casting, a ray is mathematically parameterized as a point along a half-line originating from a point P\mathbf{P} (typically the camera position) in the direction of a normalized vector D\mathbf{D}, given by the equation R(t)=P+tD\mathbf{R}(t) = \mathbf{P} + t \mathbf{D} where t0t \geq 0 is a scalar parameter representing distance along the ray.[8] This formulation allows efficient computation of points along the ray for intersection tests.[9] To generate rays for rendering an image, the viewport is set up using the camera's field of view (FOV) angle θ\theta. For an image plane at distance dd from the camera (often d=1d = 1), the ray direction D\mathbf{D} for a pixel at normalized coordinates (u,v)(u, v) in the range [1,1][-1, 1] is derived as D=\normalize(utan(θ/2),vtan(θ/2),1)\mathbf{D} = \normalize(u \cdot \tan(\theta/2), v \cdot \tan(\theta/2), -1) in camera space, where the 1-1 in the z-component points toward the scene.[10] This setup ensures rays fan out from the camera through the pixel grid, covering the desired viewing frustum.[9] Intersection testing determines if and where a ray hits scene primitives. For a plane defined by a point Q\mathbf{Q} on the plane and normal N\mathbf{N} (unit length), the intersection parameter tt solves the equation N(R(t)Q)=0\mathbf{N} \cdot (\mathbf{R}(t) - \mathbf{Q}) = 0, yielding t=N(QP)NDt = \frac{\mathbf{N} \cdot (\mathbf{Q} - \mathbf{P})}{\mathbf{N} \cdot \mathbf{D}}; a valid hit occurs if t>0t > 0 and ND0\mathbf{N} \cdot \mathbf{D} \neq 0.[8] For a sphere centered at C\mathbf{C} with radius rr, substituting the ray equation into the sphere equation R(t)C2=r2\|\mathbf{R}(t) - \mathbf{C}\|^2 = r^2 results in the quadratic equation at2+bt+c=0a t^2 + b t + c = 0, where a=DDa = \mathbf{D} \cdot \mathbf{D}, b=2(PC)Db = 2 (\mathbf{P} - \mathbf{C}) \cdot \mathbf{D}, and c=(PC)(PC)r2c = (\mathbf{P} - \mathbf{C}) \cdot (\mathbf{P} - \mathbf{C}) - r^2. The discriminant Δ=b24ac\Delta = b^2 - 4ac determines the number of intersections: if Δ>0\Delta > 0, the smallest positive root t=bΔ2at = \frac{-b - \sqrt{\Delta}}{2a} gives the nearest hit point.[11] Scene traversal in ray casting involves, for each generated ray, testing it against all primitives in the scene to find the closest valid intersection. The algorithm iterates over objects, computing intersection parameters ti>0t_i > 0 for each, and selects the minimum tt among them to identify the nearest hit; if no such tt exists, the ray misses the scene.[12] For efficiency with many objects, basic acceleration like bounding volume hierarchies can prune tests by first checking ray-box intersections along primary rays, but the core logic remains selecting the global minimum t>0t > 0. For shadow rays, a similar traversal is performed from the hit point toward each light, checking for any intersection (occlusion).[9]

Applications

Rendering and Visualization

Ray casting serves as a foundational technique for generating 2D images from 3D scenes by projecting rays from a viewpoint through each pixel of the image plane to determine visible surfaces, enabling efficient visualization without full recursion. This method excels in producing line drawings and basic shaded representations, particularly in resource-constrained environments like early computer-aided design (CAD) systems, where it facilitates quick previews of complex geometries. By computing intersections and visibility, ray casting supports the creation of wireframe outlines and simple depth-based visuals, prioritizing conceptual clarity over photorealistic detail. In line drawings, ray casting is employed for wireframe or silhouette rendering by detecting edges through grazing ray intersections—where rays are tangent to surfaces—or by identifying depth discontinuities between adjacent pixels. These techniques allow the extraction of visible edges while removing hidden ones, essential for outlining object boundaries in technical illustrations. For instance, Arthur Appel's 1968 algorithm pioneered this approach by casting rays to quantify invisibility for line segments, enabling accurate hidden line removal in solid models. This method detects silhouettes as rays that just graze object surfaces, producing clean contours that highlight structural features without surface filling. For shaded pictures, ray casting applies basic local illumination models after finding the nearest surface intersection, such as Lambertian shading, which models diffuse reflection based on the cosine of the angle between the surface normal N\mathbf{N} and light direction L\mathbf{L}. The intensity is computed as $ I = k_d (\mathbf{N} \cdot \mathbf{L}) $, where $ k_d $ is the diffuse coefficient, providing a simple yet effective way to assign grayscale or color values per pixel for basic tonal variation. This local model, independent of viewer position, yields flat-shaded visuals suitable for quick assessments, as detailed in early shading techniques for machine renderings. Applications in visualization include hidden line removal for engineering drawings, where ray casting determines edge visibility to produce interpretable 2D projections from 3D CAD models, and early 3D previews in CAD software, allowing designers to inspect assemblies without full rasterization. Ray casting is also widely used in volume rendering for medical imaging, where rays are cast through voxel-based datasets from scans such as CT or MRI to accumulate color and opacity values, compositing semitransparent volumes to visualize internal structures like organs or tumors. This approach enables high-fidelity representation of complex volumetric data for diagnostic and surgical planning purposes.[4] A specific example is generating depth maps or simple orthographic projections via ray casting, where parallel rays are cast perpendicular to the projection plane to record intersection distances, creating a 2D representation of scene depth for analysis or compositing. This approach, using uniform ray directions, simplifies computation for non-perspective views and supports tasks like occlusion testing in visualization pipelines.

Solid Modeling and Computation

Ray casting serves as a foundational technique in solid modeling for computing geometric properties of complex objects represented through constructive solid geometry (CSG), where primitives are combined via Boolean operations such as union, intersection, and difference. In this context, rays are projected through the model to determine intersection points with boundaries, enabling the classification of space into interior and exterior regions without requiring explicit boundary representations. This approach facilitates efficient evaluation of CSG trees by traversing primitives along each ray and applying set operations to resolve the final occupancy status, a method particularly prominent in CAD/CAM systems during the 1970s and 1980s for verifying and analyzing solid models.[6] One key application is volume computation, where ray casting uses a grid of parallel rays to estimate the enclosed volume of a solid by summing the lengths of interior segments along each ray, scaled by the projected area per ray: $ V \approx \sum_i l_i \Delta A $, where $ l_i $ is the interior length along the $ i $-th ray and $ \Delta A $ is the cross-sectional area represented by each ray. This method provides an accurate approximation for CSG solids and avoids exhaustive boundary tessellation, making it suitable for irregular or composite shapes. Similarly, mass properties like moments of inertia can be derived from ray representations (ray-reps), which capture interior segments along rays as intervals for numerical integration, supporting engineering analyses such as computation of inertia tensors directly from CSG descriptions.[6] In CAD/CAM workflows, ray casting enables intersection detection between rays and primitives for Boolean operations, ensuring robust handling of overlaps and differences in solid models. This is critical for tasks like feature recognition and model validation, where rays probe the geometry to identify entry and exit points, facilitating operations on primitives such as blocks and cylinders. The technique extends to simulations in rapid prototyping, where ray casting simulates material deposition paths or verifies printability by detecting self-intersections and support requirements in layered models derived from CSG. By leveraging parallel ray grids, these computations achieve high efficiency, making ray casting integral to early CAD systems for prototyping and manufacturing preparation.[6]

Techniques

Shading and Image Generation

To achieve realistic shading in ray casting, the Phong illumination model is extended and applied directly at the ray-surface intersection points to compute local lighting effects. This empirical model calculates the outgoing radiance intensity $ I $ as
I=Iaka+Idkd(NL)+Isks(RV)n, I = I_a k_a + I_d k_d (\mathbf{N} \cdot \mathbf{L}) + I_s k_s (\mathbf{R} \cdot \mathbf{V})^n,
where $ I_a $, $ I_d $, and $ I_s $ represent the ambient, diffuse, and specular light intensities, respectively; $ k_a $, $ k_d $, and $ k_s $ are the material's ambient, diffuse, and specular reflection coefficients; $ \mathbf{N} $ is the normalized surface normal; $ \mathbf{L} $ is the normalized vector from the intersection to the light source; $ \mathbf{R} $ is the normalized reflection vector of $ \mathbf{L} $ about $ \mathbf{N} $; $ \mathbf{V} $ is the normalized view direction (opposite the incoming ray); and $ n $ is the specular exponent controlling highlight sharpness. The ambient term provides uniform base illumination, the diffuse term models scattering based on the angle between normal and light, and the specular term approximates glossy highlights from viewer-light geometry. This local computation per intersection enhances surface detail without recursion, producing shaded images from basic ray casting.[13] For scenes with multiple light sources, ray casting sums the local illumination contributions from each light at the intersection point, enabling complex direct lighting while maintaining efficiency. A shadow ray—or "feeler" ray—is cast from the intersection toward each light source to check for occlusions; if the shadow ray intersects another surface before reaching the light, that light's contribution is attenuated or excluded at the point.[2] These shadow rays involve only a single intersection test per light, distinct from recursive paths in advanced tracing, allowing basic hard shadows without global propagation. This per-ray summation supports up to several lights in early implementations, balancing realism and computation.[2] Despite these advances, ray casting's shading remains limited to direct, local effects, omitting indirect lighting where light reflects multiple times between surfaces and caustics from refractive focusing, leading to perceptually "flat" images lacking subtle environmental interactions. Full ray tracing addresses this by recursively tracing secondary rays for bounced illumination, simulating global effects like color bleeding that ray casting cannot capture without extensions. Consequently, ray-cast images exhibit sharp shadows and surface highlights but miss the depth and inter-object lighting that enhance photorealism. Texture mapping integrates into ray casting by deriving UV coordinates from the parametric representation of the intersection point on the surface geometry, then sampling a 2D texture image at those coordinates to replace or modulate the base material color in the shading computation. For primitive shapes like spheres or planes, UVs are computed analytically (e.g., via spherical or planar projections); for complex models, precomputed UVs on polygons are interpolated along the hit location. This technique adds surface detail efficiently, as the texture lookup occurs post-intersection without additional rays.[14]

Classification and Optimization

In ray casting, classification of ray-scene interactions is essential for determining whether points along a ray lie inside or outside modeled objects, particularly in polygon filling and constructive solid geometry (CSG) applications. The even-odd rule assesses enclosure by counting the number of times a ray from the test point crosses object boundaries; an odd number of crossings indicates the point is inside, while an even number signifies outside. This parity-based approach simplifies evaluation for non-self-intersecting polygons and CSG unions or intersections, as described in early solid modeling systems where rays traverse primitive solids to compute boundary parities.[6] For handling complex topologies, such as self-intersecting polygons or multiply-connected shapes in CSG, the winding number rule provides a more robust classification. It calculates the net winding of boundaries around the test point by summing the signed angles or directions of ray crossings (positive for counterclockwise, negative for clockwise); a non-zero winding number denotes an interior point, accommodating oriented surfaces and nested components. This method extends the even-odd rule to preserve topological consistency in advanced modeling scenarios.[15] Optimization of ray casting focuses on reducing the number of intersection tests through enclosures and acceleration structures. Bounding volumes, such as spheres or axis-aligned boxes, enclose groups of scene primitives, allowing rays to skip detailed tests if they miss the bound, thereby pruning inefficient computations. Hierarchical bounding volume structures, pioneered by Rubin and Whitted in 1980, organize these enclosures in trees to exploit spatial coherence.[16] Spatial partitioning via octrees or k-d trees further enhances efficiency by subdividing the scene into manageable regions. Octrees recursively partition space into eight equal child volumes, enabling rapid ray traversal through voxel adjacency and intersection culling, as introduced by Glassner for accelerating ray-scene interactions. K-d trees, developed by Kaplan, alternate splits along coordinate axes to create balanced partitions, supporting efficient nearest-neighbor queries and ray stepping.[17] These hierarchies typically reduce intersection complexity from O(n) for n primitives to O(log n) by limiting tests to relevant subregions. Additional techniques include the slab method for axis-aligned bounding boxes, which computes ray entry and exit parameters by intersecting with pairs of parallel planes along each axis, minimizing branch-heavy computations. Early ray termination on opaque hits further optimizes by halting traversal once a fully blocking surface is found, avoiding redundant downstream tests in primary visibility rays.[18]

Anti-Aliasing

Aliasing in ray casting occurs primarily due to the discrete sampling of continuous scenes at individual pixel centers, resulting in jagged edges known as jaggies on object boundaries.[19] This undersampling also produces moiré patterns when rendering repetitive textures or fine details, where high-frequency scene elements alias into lower-frequency artifacts visible in the final image.[20] To mitigate these issues, supersampling casts multiple rays per pixel—typically in a regular grid—and averages their resulting colors to approximate the integral over the pixel area more accurately.[19] Adaptive sampling extends this by varying the number of rays based on local depth variance, allocating more samples to regions with significant depth discontinuities, such as edges, while using fewer in uniform areas to balance quality and efficiency.[21] Jittered sampling enhances supersampling by introducing random offsets to ray origins within the pixel grid, distributing samples more evenly and reducing patterned artifacts compared to uniform grids.[22] For targeted refinement, edge detection algorithms analyze depth or color gradients in initial low-sample renders to identify boundaries, enabling selective supersampling only at those locations.[23] Post-processing filters integrate with ray casting by applying operations like Gaussian blur directly to the accumulated color or depth buffers after ray intersection and shading, smoothing residual aliasing without additional ray computations.[24] In the standard pixel ray generation process, a single ray per pixel exacerbates these aliasing effects, underscoring the need for such enhancements.[25]

History and Evolution

Origins and Early Developments

The concept of ray casting in computer graphics originated in the late 1960s as a method for solving the hidden surface removal problem and simulating shadows in rendered images. In 1968, Arthur Appel at IBM developed an early ray casting algorithm that traced rays from the viewpoint through the image plane to determine visible surfaces and occlusion by other objects, marking one of the first computational approaches to realistic shading in machine-generated pictures.[2] This technique laid foundational principles for later rendering methods by prioritizing ray-object intersections over scanline-based alternatives prevalent in contemporary research. During the 1970s, academic efforts at institutions like the University of Utah's Computer Graphics Laboratory advanced related visibility algorithms, with researchers such as Martin Newell and colleagues developing scanline rendering techniques that contrasted with emerging ray-based methods, influencing the evolution of efficient image synthesis.[26] These developments highlighted ray casting's potential for handling complex scenes, though computational limitations favored rasterization for real-time applications. The term "ray casting" itself was formally introduced in 1982 by Scott D. Roth at General Motors Research Laboratories, who applied it to solid modeling systems using constructive solid geometry (CSG), enabling the rendering of combined primitive solids through ray intersection tests.[6] A pivotal advancement bridging ray casting to more sophisticated ray tracing occurred in 1980 with Turner Whitted's work at Bell Labs, which extended primary ray casting with recursive secondary rays for reflections, refractions, and shadows, demonstrating practical image generation with enhanced realism.[18] By the 1980s, ray casting began integrating into specialized graphics hardware pipelines, particularly for volume visualization and scientific rendering, as processing power improved to support intersection computations in industrial and research contexts.[27]

Role in Computer Games

Ray casting emerged as a pivotal technique in early 3D video games during the early 1990s, enabling real-time pseudo-3D rendering on limited hardware such as 286 and 386 PCs. By casting rays from the player's viewpoint for each vertical column of screen pixels, the method determined wall distances and applied texture mapping to vertical strips, creating the illusion of depth without full 3D geometry processing. This approach was computationally efficient, transforming simple 2D maps into navigable 3D environments while respecting the era's memory and processing constraints.[28][29] A landmark example is Wolfenstein 3D (1992), developed by id Software, which employed ray casting to render maze-like levels for first-person navigation. Rays traced through a grid-based map to detect walls, with textures scaled based on distance and flat-shaded floors and ceilings added via simple interpolation for visual depth. The technique supported enemy sprites as 2D overlays sorted by distance, contributing to the game's immersive feel despite its 2.5D nature.[30][31] Subsequent titles built on this foundation. ShadowCaster (1993), developed by Raven Software using a licensed id engine variant, incorporated ray casting with rotated sectors to allow non-orthogonal walls and variable heights, enhancing level complexity while maintaining performance. Similarly, NovaLogic's Comanche series (starting 1992) adapted ray casting through its Voxel Space engine to generate dynamic helicopter simulation landscapes, tracing rays vertically through heightmap voxels to render terrain elevations in real time.[32][33] In these games, optimizations like binary space partitioning (BSP) trees were occasionally integrated for static scenes to streamline ray traversal and visibility culling, though core ray casting handled primary rendering. However, inherent limitations persisted, including the inability to depict sloped floors or ceilings and restricted movement to horizontal planes without true vertical freedom, confining designs to flat, labyrinthine layouts. Ray casting thus pioneered the first-person shooter genre but became outdated by 1996, when id Software's Quake shifted to a fully polygonal 3D engine for unrestricted geometry and dynamics.[28][34]

Modern Contexts

Computational Geometry

In computational geometry, ray casting serves as a fundamental technique for performing ray shooting queries within arrangements of geometric primitives, such as line segments or hyperplanes, where the goal is to determine the first intersection point along a given ray. This approach preprocesses the arrangement into data structures that enable efficient query resolution, often achieving near-linear preprocessing time and logarithmic query time for simple polygons with n vertices. For instance, geodesic triangulations can be constructed to support ray shooting in O(log n) time after O(n log n) preprocessing, facilitating queries in polygonal domains without explicit visibility computation.[35][36] Ray casting also underpins the computation of visibility maps in both 2D and 3D settings, representing the regions of space from which certain segments or triangles are visible. In 3D, visibility maps for segments and triangles on polyhedral terrains can exhibit complexities up to O(n^5), where n is the number of features, and ray shooting algorithms help delineate these maps by tracing rays to identify occlusions and visible portions. The 3D visibility complex extends this by organizing all free-space visibility relations among geometric objects, allowing ray casting to query visible elements efficiently in environments with obstacles. These maps are crucial for abstract geometric analysis, distinct from rendering applications.[37][38] Advanced algorithms leverage Davenport-Schinzel sequences to bound the complexity of ray traversals in dynamic scenes, where geometric elements may move or change over time. These sequences limit alternations in intersection orders, enabling data structures that support updates and queries for ray shooting with near-optimal performance, such as O(n α(n)) size for the lower envelope of motion paths, where α is the inverse Ackermann function. In robotics, ray casting with such sequences aids path planning by simulating ray shots to detect feasible trajectories amid obstacles, as in penetration growth models that expand free space for collision-free paths.[39][40] Specific applications include collision detection in simulations, where ray casting traces paths between dynamic objects to identify intersections, ensuring geometrically exact contacts in cloth or rigid body interactions using GPU-accelerated methods. A classic use is the point-in-polygon test, employing ray casting to count edge crossings from a query point to infinity; an odd count indicates the point lies inside, based on the Jordan curve theorem and implemented efficiently for convex or simple polygons. Data structures like priority queues manage sorted lists of potential intersections along the ray by distance, allowing sequential processing to retrieve the nearest hit in O(k log m) time, where k is the number of intersections and m the number of objects.[41][42]

Extensions and Comparisons

Ray casting serves as a foundational subset of ray tracing, limited to emitting primary rays from the viewpoint to detect the first intersection with scene geometry, without recursion for secondary effects like reflections or refractions.[43] In contrast, ray tracing extends this by recursively tracing additional rays from intersection points to simulate light bounces, enabling global illumination and more physically accurate rendering.[43] This distinction positions ray casting as computationally lighter, suitable for real-time applications where full ray tracing's overhead for multiple ray generations would be prohibitive.[44] One key extension of ray casting is ray marching, which adapts the technique for rendering implicit surfaces defined by signed distance fields (SDFs), particularly in shader-based implementations.[45] Ray marching iteratively advances rays along their direction by the estimated distance to the nearest surface, as provided by the SDF, until an intersection is approximated, allowing efficient handling of procedural or fractal geometries without explicit meshes.[45] This method has gained traction in GPU shaders for real-time procedural rendering, offering compact representations and support for advanced features like soft shadows through temporal accumulation.[45] Hybrid approaches further evolve ray casting by integrating it with rasterization in real-time graphics engines, leveraging GPU acceleration structures for selective ray queries.[46] For instance, rasterization handles primary visibility for broad scene coverage, while ray casting—often via hardware-accelerated bounding volume hierarchies—performs targeted operations like object picking or shadow determination, reducing overall computational cost.[46] NVIDIA's OptiX framework exemplifies this, enabling deformable mesh updates and instanced geometry handling for dynamic scenes in games and simulations.[46] In modern game development, ray casting continues to be employed for in-game lighting calculations, such as determining shadows and generating visibility data in systems like Enlighten, benefiting from hardware acceleration in GPUs like the NVIDIA RTX series.[47][48] In modern virtual and augmented reality (VR/AR) contexts, ray casting facilitates intuitive object selection through 2020s APIs like WebXR, where rays are projected from user devices to detect and interact with virtual elements in mixed reality environments.[49] This enables precise hit-testing against scene geometry, supporting actions such as highlighting or manipulating AR objects based on the closest intersection.[49] Additionally, machine learning-accelerated denoising enhances ray casting outputs, particularly for noisy low-sample renders, by training neural networks on paired clean and noisy images to reconstruct high-quality results in real time.[50] Techniques like those in NVIDIA's denoising pipelines apply convolutional networks to filter variance while preserving edges, making ray casting viable for interactive previews.[50] Compared to rasterization, ray casting excels in handling complex, non-polygonal geometry—such as volumes or implicits—by directly querying intersections, though it remains prone to aliasing without additional sampling and is slower for dense primary visibility due to per-pixel ray computations.[44] Rasterization, optimized for triangle projections, achieves higher throughput for static scenes but struggles with secondary effects, prompting hybrids where ray casting augments rasterized bases.[44] Looking ahead, hardware ray tracing (RT) cores in GPUs, like those in NVIDIA RTX series including the GeForce RTX 50 series released in 2025, increasingly blend ray casting with rasterization pipelines, using dedicated intersection units and AI tensor cores for denoising to enable real-time global effects and neural rendering at scale.[48] This convergence supports future trends in high-fidelity interactive graphics, including VR/AR and procedural worlds, with ongoing advancements in acceleration structures for broader adoption.[48]

References

User Avatar
No comments yet.