Hubbry Logo
Feature (computer vision)Feature (computer vision)Main
Open search
Feature (computer vision)
Community hub
Feature (computer vision)
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Feature (computer vision)
Feature (computer vision)
from Wikipedia

In computer vision and image processing, a feature is a piece of information about the content of an image; typically about whether a certain region of the image has certain properties. Features may be specific structures in the image such as points, edges or objects. Features may also be the result of a general neighborhood operation or feature detection applied to the image. Other examples of features are related to motion in image sequences, or to shapes defined in terms of curves or boundaries between different image regions.

More broadly a feature is any piece of information that is relevant for solving the computational task related to a certain application. This is the same sense as feature in machine learning and pattern recognition generally, though image processing has a very sophisticated collection of features. The feature concept is very general and the choice of features in a particular computer vision system may be highly dependent on the specific problem at hand.

Definition

[edit]

There is no universal or exact definition of what constitutes a feature, and the exact definition often depends on the problem or the type of application. Nevertheless, a feature is typically defined as an "interesting" part of an image, and features are used as a starting point for many computer vision algorithms.

Since features are used as the starting point and main primitives for subsequent algorithms, the overall algorithm will often only be as good as its feature detector. Consequently, the desirable property for a feature detector is repeatability: whether or not the same feature will be detected in two or more different images of the same scene.

Feature detection is a low-level image processing operation. That is, it is usually performed as the first operation on an image and examines every pixel to see if there is a feature present at that pixel. If this is part of a larger algorithm, then the algorithm will typically only examine the image in the region of the features. As a built-in pre-requisite to feature detection, the input image is usually smoothed by a Gaussian kernel in a scale-space representation and one or several feature images are computed, often expressed in terms of local image derivative operations.

Occasionally, when feature detection is computationally expensive and there are time constraints, a higher-level algorithm may be used to guide the feature detection stage so that only certain parts of the image are searched for features.

There are many computer vision algorithms that use feature detection as the initial step, so as a result, a very large number of feature detectors have been developed. These vary widely in the kinds of feature detected, the computational complexity and the repeatability.

When features are defined in terms of local neighborhood operations applied to an image, a procedure commonly referred to as feature extraction, one can distinguish between feature detection approaches that produce local decisions whether there is a feature of a given type at a given image point or not, and those who produce non-binary data as result. The distinction becomes relevant when the resulting detected features are relatively sparse. Although local decisions are made, the output from a feature detection step does not need to be a binary image. The result is often represented in terms of sets of (connected or unconnected) coordinates of the image points where features have been detected, sometimes with subpixel accuracy.

When feature extraction is done without local decision making, the result is often referred to as a feature image. Consequently, a feature image can be seen as an image in the sense that it is a function of the same spatial (or temporal) variables as the original image, but where the pixel values hold information about image features instead of intensity or color. This means that a feature image can be processed in a similar way as an ordinary image generated by an image sensor. Feature images are also often computed as integrated step in algorithms for feature detection.

Feature vectors and feature spaces

[edit]

In some applications, it is not sufficient to extract only one type of feature to obtain the relevant information from the image data. Instead, two or more different features are extracted, resulting in two or more feature descriptors at each image point. A common practice is to organize the information provided by all these descriptors as the elements of one single vector, commonly referred to as a feature vector. The set of all possible feature vectors constitutes a feature space.[1]

A common example of feature vectors appears when each image point is to be classified as belonging to a specific class. Assuming that each image point has a corresponding feature vector based on a suitable set of features, meaning that each class is well separated in the corresponding feature space, the classification of each image point can be done using standard classification method.

Simplified example of training a neural network in object detection: The network is trained by multiple images that are known to depict starfish and sea urchins, which are correlated with "nodes" that represent visual features. The starfish match with a ringed texture and a star outline, whereas most sea urchins match with a striped texture and oval shape. However, the instance of a ring textured sea urchin creates a weakly weighted association between them.
Subsequent run of the network on an input image (left):[2] The network correctly detects the starfish. However, the weakly weighted association between ringed texture and sea urchin also confers a weak signal to the latter from one of two features. In addition, a shell that was not included in the training gives a weak signal for the oval shape, also resulting in a weak signal for the sea urchin output. These weak signals may result in a false positive result for sea urchin.
In reality, textures and outlines would not be represented by single nodes, but rather by associated weight patterns of multiple nodes.

Another and related example occurs when neural network-based processing is applied to images. The input data fed to the neural network is often given in terms of a feature vector from each image point, where the vector is constructed from several different features extracted from the image data. During a learning phase, the network can itself find which combinations of different features are useful for solving the problem at hand.

Types

[edit]

Edges

[edit]

Edges are points where there is a boundary (or an edge) between two image regions. In general, an edge can be of almost arbitrary shape, and may include junctions. In practice, edges are usually defined as sets of points in the image that have a strong gradient magnitude. Furthermore, some common algorithms will then chain high gradient points together to form a more complete description of an edge. These algorithms usually place some constraints on the properties of an edge, such as shape, smoothness, and gradient value.

Locally, edges have a one-dimensional structure.

Corners/interest points

[edit]

The terms corners and interest points are used somewhat interchangeably and refer to point-like features in an image, which have a local two-dimensional structure. The name "Corner" arose since early algorithms first performed edge detection, and then analyzed the edges to find rapid changes in direction (corners). These algorithms were then developed so that explicit edge detection was no longer required, for instance by looking for high levels of curvature in the image gradient. It was then noticed that the so-called corners were also being detected on parts of the image that were not corners in the traditional sense (for instance a small bright spot on a dark background may be detected). These points are frequently known as interest points, but the term "corner" is used by tradition[citation needed].

Blobs / regions of interest points

[edit]

Blobs provide a complementary description of image structures in terms of regions, as opposed to corners that are more point-like. Nevertheless, blob descriptors may often contain a preferred point (a local maximum of an operator response or a center of gravity) which means that many blob detectors may also be regarded as interest point operators. Blob detectors can detect areas in an image that are too smooth to be detected by a corner detector.

Consider shrinking an image and then performing corner detection. The detector will respond to points that are sharp in the shrunk image, but may be smooth in the original image. It is at this point that the difference between a corner detector and a blob detector becomes somewhat vague. To a large extent, this distinction can be remedied by including an appropriate notion of scale. Nevertheless, due to their response properties to different types of image structures at different scales, the LoG and DoH blob detectors are also mentioned in the article on corner detection.

Ridges

[edit]

For elongated objects, the notion of ridges is a natural tool. A ridge descriptor computed from a grey-level image can be seen as a generalization of a medial axis. From a practical viewpoint, a ridge can be thought of as a one-dimensional curve that represents an axis of symmetry, and in addition has an attribute of local ridge width associated with each ridge point. Unfortunately, however, it is algorithmically harder to extract ridge features from general classes of grey-level images than edge-, corner- or blob features. Nevertheless, ridge descriptors are frequently used for road extraction in aerial images and for extracting blood vessels in medical images—see ridge detection.

Detection

[edit]

Feature detection includes methods for computing abstractions of image information and making local decisions at every image point whether there is an image feature of a given type at that point or not. The resulting features will be subsets of the image domain, often in the form of isolated points, continuous curves or connected regions.

The extraction of features are sometimes made over several scalings. One of these methods is the scale-invariant feature transform (SIFT).

Common feature detectors and their classification:
Feature detector Edge Corner Blob Ridge
Canny[3] Yes No No No
Sobel Yes No No No
Harris & Stephens / Plessey[4] Yes Yes No No
SUSAN[5] Yes Yes No No
Shi & Tomasi[6] No Yes No No
Level curve curvature[7] No Yes No No
FAST[8] No Yes No No
Laplacian of Gaussian[7] No Yes Yes No
Difference of Gaussians[9][10] No Yes Yes No
Determinant of Hessian[7] No Yes Yes No
Hessian strength feature measures[11][12] No Yes Yes No
MSER[13] No No Yes No
Principal curvature ridges[14][15][16] No No No Yes
Grey-level blobs[17] No No Yes No

Extraction

[edit]

Once features have been detected, a local image patch around the feature can be extracted. This extraction may involve quite considerable amounts of image processing. The result is known as a feature descriptor or feature vector. Among the approaches that are used to feature description, one can mention N-jets and local histograms (see scale-invariant feature transform for one example of a local histogram descriptor). In addition to such attribute information, the feature detection step by itself may also provide complementary attributes, such as the edge orientation and gradient magnitude in edge detection and the polarity and the strength of the blob in blob detection.

Low-level

[edit]

Curvature

[edit]

Image motion

[edit]

Shape based

[edit]

Flexible methods

[edit]
  • Deformable, parameterized shapes
  • Active contours (snakes)

Representation

[edit]

A specific image feature, defined in terms of a specific structure in the image data, can often be represented in different ways. For example, an edge can be represented as a Boolean variable in each image point that describes whether an edge is present at that point. Alternatively, we can instead use a representation that provides a certainty measure instead of a Boolean statement of the edge's existence and combine this with information about the orientation of the edge. Similarly, the color of a specific region can either be represented in terms of the average color (three scalars) or a color histogram (three functions).

When a computer vision system or computer vision algorithm is designed the choice of feature representation can be a critical issue. In some cases, a higher level of detail in the description of a feature may be necessary for solving the problem, but this comes at the cost of having to deal with more data and more demanding processing. Below, some of the factors which are relevant for choosing a suitable representation are discussed. In this discussion, an instance of a feature representation is referred to as a feature descriptor, or simply descriptor.

Certainty or confidence

[edit]

Two examples of image features are local edge orientation and local velocity in an image sequence. In the case of orientation, the value of this feature may be more or less undefined if more than one edge are present in the corresponding neighborhood. Local velocity is undefined if the corresponding image region does not contain any spatial variation. As a consequence of this observation, it may be relevant to use a feature representation that includes a measure of certainty or confidence related to the statement about the feature value. Otherwise, it is a typical situation that the same descriptor is used to represent feature values of low certainty and feature values close to zero, with a resulting ambiguity in the interpretation of this descriptor. Depending on the application, such an ambiguity may or may not be acceptable.

In particular, if a featured image will be used in subsequent processing, it may be a good idea to employ a feature representation that includes information about certainty or confidence. This enables a new feature descriptor to be computed from several descriptors, for example, computed at the same image point but at different scales, or from different but neighboring points, in terms of a weighted average where the weights are derived from the corresponding certainties. In the simplest case, the corresponding computation can be implemented as a low-pass filtering of the featured image. The resulting feature image will, in general, be more stable to noise.

Averageability

[edit]

In addition to having certainty measures included in the representation, the representation of the corresponding feature values may itself be suitable for an averaging operation or not. Most feature representations can be averaged in practice, but only in certain cases can the resulting descriptor be given a correct interpretation in terms of a feature value. Such representations are referred to as averageable.

For example, if the orientation of an edge is represented in terms of an angle, this representation must have a discontinuity where the angle wraps from its maximal value to its minimal value. Consequently, it can happen that two similar orientations are represented by angles that have a mean that does not lie close to either of the original angles and, hence, this representation is not averageable. There are other representations of edge orientation, such as the structure tensor, which are averageable.

Another example relates to motion, where in some cases only the normal velocity relative to some edge can be extracted. If two such features have been extracted and they can be assumed to refer to same true velocity, this velocity is not given as the average of the normal velocity vectors. Hence, normal velocity vectors are not averageable. Instead, there are other representations of motions, using matrices or tensors, that give the true velocity in terms of an average operation of the normal velocity descriptors.[citation needed]

Matching

[edit]

Features detected in each image can be matched across multiple images to establish corresponding features such as corresponding points.

The algorithm is based on comparing and analyzing point correspondences between the reference image and the target image. If any part of the cluttered scene shares correspondences greater than the threshold, that part of the cluttered scene image is targeted and considered to include the reference object there.[18]

See also

[edit]

References

[edit]

Further reading

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
In , a feature is defined as a distinctive element or pattern within an , such as an edge, corner, , or blob, that encapsulates meaningful information about the image content and serves as a building block for higher-level analysis tasks like , recognition, and matching. These features are typically extracted from specific regions of interest and represented either as keypoints (precise locations) or descriptors (numerical vectors capturing local properties like orientation and ). The draws from biological vision models, where features mimic how the human identifies salient points, but in computational terms, they enable robust processing under variations in illumination, viewpoint, and scale. Features are broadly categorized into low-level (e.g., edges detected via gradient-based operators like Sobel or Canny) and mid-to-high-level types (e.g., scale-invariant keypoints from methods like or SURF), with extraction often involving multi-scale analysis to handle image transformations. Early techniques, such as the introduced in 1988, identify corners by analyzing the autocorrelation matrix of image gradients to find points with high intensity changes in multiple directions, making them suitable for tracking and stereo vision. Subsequent advancements, like David Lowe's in 2004, build on difference-of-Gaussian filters for scale-space detection and generate 128-dimensional descriptors for reliable matching across images, achieving invariance to scaling, rotation, and partial occlusion. More recent deep learning approaches, such as convolutional neural networks (CNNs) in models like or ResNet, and transformer-based models like Vision Transformers (ViTs), automate by hierarchically extracting abstract representations from raw pixels, surpassing hand-crafted methods in accuracy for complex scenes. The detection and description of features underpin numerous applications, including (SLAM) in , , and for , where robustness to noise and variability is paramount. Challenges in feature extraction include the semantic gap between low-level descriptors and high-level understanding, as well as computational efficiency for real-time processing, driving ongoing research into hybrid and learned feature paradigms. Overall, features remain a foundational element in bridging raw image data to interpretable insights in systems.

Fundamentals

Definition

In , digital images are represented as two-dimensional arrays of , where each encodes an intensity value derived from the light intensity captured by a , often addressing challenges like sampling and to approximate continuous scenes. A feature refers to a distinctive portion of an image—such as a point, line, or —that exhibits notable properties differing from its local neighborhood and conveys semantically meaningful visual information. The origins of features trace back to early efforts in the 1960s and 1970s, which initially employed manually crafted rule-based techniques to identify basic image elements for tasks like scene interpretation, gradually progressing toward automated computational methods. Seminal work by Roberts in 1965 introduced automated on line drawings to perceive three-dimensional solids from two-dimensional projections, laying foundational principles for feature-based analysis. Further advancements, such as Marr and Poggio's 1976 cooperative algorithm for stereo disparity computation, enabled automated matching of corresponding features across views to recover depth, exemplifying the shift to robust detection paradigms. Features play a pivotal role in by distilling the vast pixel-level details of an into salient structures, thereby reducing and supporting advanced applications like , tracking, and segmentation. This abstraction allows systems to focus on invariant and informative aspects of scenes, enabling efficient higher-level processing such as detection and reconstruction without exhaustive analysis of every pixel. Features are typically encoded as vectors to quantify their characteristics for comparison and matching across images.

Feature Vectors and Spaces

In computer vision, a feature vector is a numerical representation that encapsulates key attributes of a detected feature in an image, such as its spatial position, intensity value, or local gradient information, enabling quantitative analysis and comparison across images. For instance, at a feature location (x,y)(x, y), a basic feature vector might be constructed as f=[x,y,I(x,y),I(x,y)]\mathbf{f} = [x, y, I(x,y), \nabla I(x,y)], where I(x,y)I(x,y) denotes the image intensity and I(x,y)=[Ix(x,y),Iy(x,y)]\nabla I(x,y) = [I_x(x,y), I_y(x,y)] represents the gradient components, forming a 5-dimensional vector that captures both location and edge-like properties in grayscale images. This vectorization allows features to be processed algorithmically for tasks like matching or classification, with more advanced descriptors expanding to higher dimensions for robustness. The collection of all such feature vectors populates a feature space, a multidimensional where each dimension corresponds to a specific attribute, and features are treated as points within this to facilitate similarity assessments. Similarity between two feature vectors f1\mathbf{f}_1 and f2\mathbf{f}_2 is commonly measured using the d(f1,f2)=f1f22d(\mathbf{f}_1, \mathbf{f}_2) = \|\mathbf{f}_1 - \mathbf{f}_2\|_2, which quantifies their proximity and supports operations like nearest-neighbor search or clustering in applications such as . For example, in a 2D edge feature vector for images, the might be 4-dimensional (position and magnitude/direction), where nearby points indicate similar edge structures. High-dimensional feature spaces, however, can suffer from the curse of dimensionality, where increased dimensions lead to sparsity and computational inefficiency, complicating similarity computations and model training. To address this, techniques like (PCA) are applied, which project feature vectors onto a lower-dimensional subspace by retaining principal components that capture the maximum variance, thereby preserving essential information while reducing and storage needs. In PCA-SIFT, for instance, the original 128-dimensional SIFT descriptors are compressed to 36 dimensions via PCA trained on natural image gradients, mitigating dimensionality issues and improving matching performance.

Types of Features

Edges

In computer vision, edges represent abrupt changes in image intensity that delineate boundaries between distinct regions, such as those separating objects from their backgrounds or indicating variations in material properties. These discontinuities typically arise from discontinuities in the scene's , illumination, or depth, forming the foundational elements for higher-level image understanding. Edges possess several key properties that characterize their role as features. Orientation describes the direction perpendicular to the edge, such as horizontal or vertical alignments, which helps in analyzing structural patterns within an image. Strength quantifies the magnitude of the intensity transition, with stronger edges corresponding to more pronounced boundaries. Location refers to the precise pixel coordinates where the change occurs, essential for accurate segmentation. Additionally, edges can be classified as thin or thick; thin edges are idealized one-pixel-wide lines, while thick edges span multiple pixels due to gradual intensity shifts or imaging artifacts. Illustrative examples of edges include horizontal and vertical lines in synthetic test images, such as step functions or ramp edges used to evaluate feature extraction robustness. In real-world scenarios, edges manifest as the sharp outlines surrounding text characters in scanned documents, where high-contrast boundaries enable . Detecting and utilizing edges presents notable challenges, particularly their sensitivity to , which can amplify minor fluctuations into spurious features and degrade edge quality. Furthermore, textures within uniform regions, like fabric patterns or foliage, often produce false edges that mimic true boundaries, complicating the distinction between meaningful object contours and irrelevant details.

Corners and Interest Points

In , corners are defined as discrete points in an where the intensity varies significantly in multiple, non-collinear directions, often arising at the intersections of edges or high-curvature locations. These points are also known as interest points, a broader term encompassing localized features that exhibit distinctive characteristics suitable for tasks like image matching and alignment. Unlike extended linear structures, corners provide sub-pixel precision and serve as stable anchors in feature spaces for positioning corresponding elements across images. Corners possess inherent properties that make them valuable for robust feature analysis, including the potential for rotation invariance when derived from orientation-independent measures such as eigenvalue-based responses. They also demonstrate stability under small transformations like affine distortions or minor viewpoint shifts, owing to their localization at regions of high magnitude in directions, which minimizes displacement errors in tracking applications. These attributes ensure across similar views, though sensitivity to scale changes may require additional processing for broader invariance. Representative examples of corners include the sharp junctions at corners in images of urban , where orthogonal edges meet to form reliable keypoints for . In natural scenes, such as landscapes or cluttered environments, interest points manifest as keypoints around textured elements like tree branches or rock edges, enabling correspondence in panoramic stitching or . The concept of corners as interest points traces its early applications to in the 1980s, where operators were developed to detect precise locations for matching and geometric reconstruction from aerial imagery. This foundational work emphasized selecting points with high accuracy potential, influencing subsequent techniques for feature-based mapping.

Blobs and Regions of Interest

In , blobs are defined as connected regions within an where properties, such as intensity or texture, remain approximately constant and differ notably from the surrounding background, enabling the identification of compact, homogeneous areas suitable for tasks like object segmentation. These regions are typically brighter or darker than their neighbors, forming isotropic structures that contrast with more elongated features like ridges. Key properties of blobs include their scale, which represents the size of the and is determined relative to the ; , often approximated as elliptical or circular based on boundary contours or second-moment matrices; and , the central point computed as the intensity-weighted average of positions within the . Multi-scale blobs extend this by linking such regions across different resolution levels in a representation, allowing detection of structures invariant to variations in size and supporting hierarchical analysis for robust feature handling. This multi-scale approach enhances invariance to imaging conditions by associating blobs at finer scales with larger counterparts at coarser levels. Representative examples of blobs include faces in portrait images, where the skin region forms a homogeneous intensity blob distinguishable from the background, aiding in initial segmentation for recognition tasks, and tumors in , such as exudative lesions in scans or cell nuclei in , which appear as distinct intensity clusters for diagnostic analysis. In these contexts, the of a blob often serves as a reference point, potentially aligning with interest points for further processing.

Ridges

In computer vision, ridges are defined as elongated curvilinear structures in an , characterized as loci where the intensity achieves a local maximum in the direction transverse to the ridge, analogous to the crest of a hill in a cross-sectional profile perpendicular to the direction. This definition emphasizes ridges as one-dimensional features embedded in two-dimensional , distinguishing them from broader discontinuities like edges, which often form the boundaries enclosing ridge-like regions. Key properties of ridges include their , which captures local bending along the feature; , representing the extent of the continuous curve; and branching, where ridges may split or merge to form networks. Additionally, ridges relate to representations, serving as topological skeletons that preserve the overall shape and connectivity of underlying structures in the image. Ridges are particularly relevant in applications involving vascular or analysis, such as detecting vessels in images, where they highlight tubular structures critical for medical diagnostics, or identifying coastlines and roadways in aerial photographs for geographic mapping. A primary challenge in working with ridges is their sensitivity to changes in viewpoint, which can alter the apparent and intensity profile, complicating consistent feature identification across different conditions. Noise and scale variations further exacerbate this, often requiring careful parameter selection to maintain ridge integrity.

Detection Methods

Edge and Ridge Detection

Edge detection identifies boundaries in images where pixel intensities change abruptly, forming linear discontinuities that delineate object silhouettes or material properties. This process fundamentally relies on computing the first-order derivatives of the image intensity function, as these highlight points of maximum rate of change to the edge. The magnitude and direction of the vector at each provide the strength and orientation of potential edges, respectively. Ridge detection complements by focusing on elongated, linear structures such as vessels, roads, or contours where intensity varies minimally along the structure but sharply across it. It employs second-order derivatives, captured via the , to identify local maxima in the first derivative along the ridge direction. Eigenvalue analysis of the distinguishes ridges from other features: for a 2D image, eigenvalues λ1\lambda_1 and λ2\lambda_2 (with λ1λ2|\lambda_1| \geq |\lambda_2|) satisfy λ2λ1|\lambda_2| \ll |\lambda_1| at ridge points, where λ1\lambda_1 reflects perpendicular to the ridge and λ2\lambda_2 aligns with it. Gradient approximation for edge detection often uses discrete convolution operators like the Sobel kernel, which provides a robust estimate of partial derivatives while incorporating smoothing. The horizontal component is given by Gx=[101202101]I,G_x = \begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix} * I,
Add your contribution
Related Hubs
User Avatar
No comments yet.