Hubbry Logo
Machine visionMachine visionMain
Open search
Machine vision
Community hub
Machine vision
logo
8 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Contribute something
Machine vision
Machine vision
from Wikipedia
Early Automatix (now part of Omron) machine vision system Autovision II from 1983 being demonstrated at a trade show. Camera on tripod is pointing down at a light table to produce backlit image shown on screen, which is then subjected to blob extraction.

Machine vision is the technology and methods used to provide imaging-based automatic inspection and analysis for such applications as automatic inspection, process control, and robot guidance, usually in industry. Machine vision refers to many technologies, software and hardware products, integrated systems, actions, methods and expertise. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of computer science. It attempts to integrate existing technologies in new ways and apply them to solve real world problems. The term is the prevalent one for these functions in industrial automation environments but is also used for these functions in other environment vehicle guidance.

The overall machine vision process includes planning the details of the requirements and project, and then creating a solution. During run-time, the process starts with imaging, followed by automated analysis of the image and extraction of the required information.

Definition

[edit]

Definitions of the term "Machine vision" vary, but all include the technology and methods used to extract information from an image on an automated basis, as opposed to image processing, where the output is another image. The information extracted can be a simple good-part/bad-part signal, or more a complex set of data such as the identity, position and orientation of each object in an image. The information can be used for such applications as automatic inspection and robot and process guidance in industry, for security monitoring and vehicle guidance.[1][2][3] This field encompasses a large number of technologies, software and hardware products, integrated systems, actions, methods and expertise.[3][4] Machine vision is practically the only term used for these functions in industrial automation applications; the term is less universal for these functions in other environments such as security and vehicle guidance. Machine vision as a systems engineering discipline can be considered distinct from computer vision, a form of basic computer science; machine vision attempts to integrate existing technologies in new ways and apply them to solve real world problems in a way that meets the requirements of industrial automation and similar application areas.[3]: 5 [5] The term is also used in a broader sense by trade shows and trade groups such as the Automated Imaging Association and the European Machine Vision Association. This broader definition also encompasses products and applications most often associated with image processing.[4] The primary uses for machine vision are automatic inspection and industrial robot/process guidance.[6][7]: 6–10 [8] In more recent times the terms computer vision and machine vision have converged to a greater degree.[9]: 13  See glossary of machine vision.

Imaging based automatic inspection and sorting

[edit]

The primary uses for machine vision are imaging-based automatic inspection and sorting and robot guidance.;[6][7]: 6–10  in this section the former is abbreviated as "automatic inspection". The overall process includes planning the details of the requirements and project, and then creating a solution.[10][11] This section describes the technical process that occurs during the operation of the solution.

Methods and sequence of operation

[edit]

The first step in the automatic inspection sequence of operation is acquisition of an image, typically using cameras, lenses, and lighting that has been designed to provide the differentiation required by subsequent processing.[12][13] MV software packages and programs developed in them then employ various digital image processing techniques to extract the required information, and often make decisions (such as pass/fail) based on the extracted information.[14]

Equipment

[edit]

The components of an automatic inspection system usually include lighting, a camera or other imager, a processor, software, and output devices.[7]: 11–13 

Imaging

[edit]

The imaging device (e.g. camera) can either be separate from the main image processing unit or combined with it in which case the combination is generally called a smart camera or smart sensor.[15][16] Inclusion of the full processing function into the same enclosure as the camera is often referred to as embedded processing.[17] When separated, the connection may be made to specialized intermediate hardware, a custom processing appliance, or a frame grabber within a computer using either an analog or standardized digital interface (Camera Link, CoaXPress).[18][19][20][21] MV implementations also use digital cameras capable of direct connections (without a framegrabber) to a computer via FireWire, USB or Gigabit Ethernet interfaces.[21][22]

While conventional (2D visible light) imaging is most commonly used in MV, alternatives include multispectral imaging, hyperspectral imaging, imaging various infrared bands,[23] line scan imaging, 3D imaging of surfaces and X-ray imaging.[6] Key differentiations within MV 2D visible light imaging are monochromatic vs. color, frame rate, resolution, and whether or not the imaging process is simultaneous over the entire image, making it suitable for moving processes.[24]

Though the vast majority of machine vision applications are solved using two-dimensional imaging, machine vision applications utilizing 3D imaging are a growing niche within the industry.[25][26] The most commonly used method for 3D imaging is scanning based triangulation which utilizes motion of the product or image during the imaging process. A laser is projected onto the surfaces of an object. In machine vision this is accomplished with a scanning motion, either by moving the workpiece, or by moving the camera & laser imaging system. The line is viewed by a camera from a different angle; the deviation of the line represents shape variations. Lines from multiple scans are assembled into a depth map or point cloud.[27] Stereoscopic vision is used in special cases involving unique features present in both views of a pair of cameras.[27] Other 3D methods used for machine vision are time of flight and grid based.[27][25] One method is grid array based systems using pseudorandom structured light system as employed by the Microsoft Kinect system circa 2012.[28][29]

Image processing

[edit]

After an image is acquired, it is processed.[20] Central processing functions are generally done by a CPU, a GPU, a FPGA or a combination of these.[17] Deep learning training and inference impose higher processing performance requirements.[30] Multiple stages of processing are generally used in a sequence that ends up as a desired result. A typical sequence might start with tools such as filters which modify the image, followed by extraction of objects, then extraction (e.g. measurements, reading of codes) of data from those objects, followed by communicating that data, or comparing it against target values to create and communicate "pass/fail" results. Machine vision image processing methods include;

  • Stitching/Registration: Combining of adjacent 2D or 3D images.[citation needed]
  • Filtering (e.g. morphological filtering)[31]
  • Thresholding: Thresholding starts with setting or determining a gray value that will be useful for the following steps. The value is then used to separate portions of the image, and sometimes to transform each portion of the image to simply black and white based on whether it is below or above that grayscale value.[32]
  • Pixel counting: counts the number of light or dark pixels[citation needed]
  • Segmentation: Partitioning a digital image into multiple segments to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze.[33][34]
  • Edge detection: finding object edges[35]
  • Color Analysis: Identify parts, products and items using color, assess quality from color, and isolate features using color.[6]
  • Blob detection and extraction: inspecting an image for discrete blobs of connected pixels (e.g. a black hole in a grey object) as image landmarks.[36]
  • Neural network / deep learning / machine learning processing: weighted and self-training multi-variable decision making[37] Circa 2019 there is a large expansion of this, using deep learning and machine learning to significantly expand machine vision capabilities. The most common result of such processing is classification. Examples of classification are object identification,"pass fail" classification of identified objects and OCR.[37]
  • Pattern recognition including template matching. Finding, matching, and/or counting specific patterns. This may include location of an object that may be rotated, partially hidden by another object, or varying in size.[38]
  • Barcode, Data Matrix and "2D barcode" reading[39]
  • Optical character recognition: automated reading of text such as serial numbers[40]
  • Gauging/Metrology: measurement of object dimensions (e.g. in pixels, inches or millimeters)[41]
  • Comparison against target values to determine a "pass or fail" or "go/no go" result. For example, with code or bar code verification, the read value is compared to the stored target value. For gauging, a measurement is compared against the proper value and tolerances. For verification of alpha-numberic codes, the OCR'd value is compared to the proper or target value. For inspection for blemishes, the measured size of the blemishes may be compared to the maximums allowed by quality standards.[39]

Outputs

[edit]

A common output from automatic inspection systems is pass/fail decisions.[14] These decisions may in turn trigger mechanisms that reject failed items or sound an alarm. Other common outputs include object position and orientation information for robot guidance systems.[6] Additionally, output types include numerical measurement data, data read from codes and characters, counts and classification of objects, displays of the process or results, stored images, alarms from automated space monitoring MV systems, and process control signals.[10][13] This also includes user interfaces, interfaces for the integration of multi-component systems and automated data interchange.[42]

Deep learning

[edit]

The term deep learning has variable meanings, most of which can be applied to techniques used in machine vision for over 20 years. However the usage of the term in "machine vision" began in the later 2010s with the advent of the capability to successfully apply such techniques to entire images in the industrial machine vision space.[43] Conventional machine vision usually requires the "physics" phase of a machine vision automatic inspection solution to create reliable simple differentiation of defects. An example of "simple" differentiation is that the defects are dark and the good parts of the product are light. A common reason why some applications were not doable was when it was impossible to achieve the "simple"; deep learning removes this requirement, in essence "seeing" the object more as a human does, making it now possible to accomplish those automatic applications.[43] The system learns from a large amount of images during a training phase and then executes the inspection during run-time use which is called "inference".[43]

Imaging based robot guidance

[edit]

Machine vision commonly provides location and orientation information to a robot to allow the robot to properly grasp the product. This capability is also used to guide motion that is simpler than robots, such as a 1 or 2 axis motion controller.[6] The overall process includes planning the details of the requirements and project, and then creating a solution. This section describes the technical process that occurs during the operation of the solution. Many of the process steps are the same as with automatic inspection except with a focus on providing position and orientation information as the result.[6]

Market

[edit]

As recently as 2006, one industry consultant reported that MV represented a $1.5 billion market in North America.[44] However, the editor-in-chief of an MV trade magazine asserted that "machine vision is not an industry per se" but rather "the integration of technologies and products that provide services or applications that benefit true industries such as automotive or consumer goods manufacturing, agriculture, and defense."[4]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Machine vision, also known as industrial , is a technology that enables computers and automated systems to acquire, process, and interpret visual information from the environment using digital cameras, sensors, and algorithms, often to perform tasks such as , , and guidance with precision and speed surpassing human capabilities. It integrates hardware components like , lenses, image sensors, and frame grabbers with software for image analysis, typically focusing on real-time in controlled industrial settings rather than general scene understanding. The field emerged in the late and as part of broader advancements in and , with early systems emphasizing for tasks, such as defect detection on assembly lines. By the 1990s, machine vision had become integral to industries like automotive and electronics, driven by improvements in computing power and sensor technology. Recent developments since the incorporate and , enhancing accuracy in complex environments and expanding applications beyond traditional inspection. Key processes in machine vision include image acquisition (capturing visual data via area-scan or line-scan cameras), preprocessing (enhancing images through and contrast adjustment), feature extraction (identifying edges, shapes, or patterns), and (using algorithms for or ). Systems often employ sensors sensitive to specific wavelengths for tasks in varying lighting conditions, with resolution and speed tailored to applications like high-volume production. Unlike broader , which aims for human-like scene comprehension, machine vision prioritizes reliability and efficiency in repetitive, deterministic tasks. Applications span manufacturing for (e.g., detecting flaws in semiconductors), for precise part handling, and for automated harvesting using RGB-D sensors to identify ripe produce. In , it facilitates reading and inventory tracking, while in pharmaceuticals, it ensures label integrity and dosage verification. Emerging uses include integration with Industry 4.0 for smart factories, where machine vision supports and adaptive .

Definition and Fundamentals

Definition

Machine vision is the that enables machines to acquire, process, and interpret visual information from the environment to perform automated tasks, primarily in industrial settings such as , , and guidance of processes. It integrates devices with software algorithms to replicate human visual capabilities, allowing systems to detect defects, verify assemblies, or guide robotic operations with high precision. This field emphasizes practical implementation in controlled manufacturing environments to enhance efficiency and reduce . Key characteristics of machine vision include real-time processing for high-speed , robustness against variations in or positioning within structured settings, and seamless integration with hardware components like digital cameras, sensors, and systems. These features ensure reliable performance in repetitive tasks, often surpassing human inspectors in consistency and speed, while generating actionable data for process optimization. For instance, machine vision systems can analyze images at rates exceeding thousands per minute, enabling continuous monitoring in production lines. Unlike , which broadly encompasses AI-driven perception for diverse applications including autonomous driving or with a focus on adaptability and complex scene understanding, machine vision prioritizes deterministic, rule-based algorithms tailored for industrial reliability and speed in predefined scenarios. Machine vision systems are typically embedded in workflows, emphasizing hardware-software for immediate operational feedback rather than generalizable learning models. The terminology "machine vision" originated in the 1970s through academic research at institutions like MIT's Lab, gaining prominence in the with the commercialization of vision systems for factory automation. Over time, it has evolved to include synonyms such as "industrial vision" for broader automation contexts and "smart cameras" referring to integrated, self-contained imaging devices developed in the late that combine capture, processing, and output in compact units. These terms reflect the field's shift toward more accessible, embedded technologies.

Historical Development

The origins of machine vision trace back to the early , when researchers began exploring and three-dimensional perception using computers. In 1963, Lawrence G. Roberts completed his PhD thesis at MIT titled "Machine Perception of Three-Dimensional Solids," which demonstrated algorithms for extracting 3D geometric information from 2D images, laying foundational concepts for computer-based visual analysis. This work, conducted at MIT's Artificial Intelligence Laboratory, spurred initial experiments in scene understanding and , marking the inception of machine vision as a distinct field. The 1970s saw technological advancements that enabled practical implementations, including the invention of the (CCD) sensor in 1969 by and at , which revolutionized image capture by providing high-quality digital sensors for low-light conditions. David Marr's theoretical contributions during this decade further advanced the field; his 1982 book Vision outlined a computational theory of , proposing a hierarchical framework from primal sketches to 3D models that influenced subsequent machine vision algorithms. Early commercial applications emerged, such as ' use of vision systems in the late 1970s for component assembly inspection, predating off-the-shelf solutions. The 1980s marked the commercialization and institutionalization of machine vision. was founded in 1981 by , a MIT , becoming the first dedicated machine vision company and developing systems for industrial use. In 1984, the Automated Vision Association was established to promote standards and adoption in imaging technology; it was renamed the Automated Imaging Association (AIA) and in 2021 merged with other groups to form part of the Association for Advancing Automation (A3). These developments coincided with the impact of , which exponentially increased processing power, allowing more complex image analysis on affordable hardware. By the , machine vision transitioned from analog to digital paradigms, with the late decade seeing widespread adoption of that facilitated software-driven processing and reduced costs. The introduction of the interface standard in 2000 by the AIA standardized high-speed data transfer between cameras and computers, enabling reliable integration in . The 2000s brought the rise of embedded systems, where compact processors and smart cameras allowed vision technology to be integrated directly into machinery, enhancing real-time applications like and .

Core Components

Imaging Hardware

Imaging hardware forms the foundational layer of machine vision systems, responsible for capturing high-quality visual data from the environment. These components include sensors, , , and supporting interfaces, each optimized to meet the demands of industrial , , and tasks. Selection of hardware depends on factors such as resolution requirements, speed, environmental conditions, and the need for precise feature extraction.

Sensors

Machine vision sensors primarily consist of (CCD) and complementary metal-oxide-semiconductor () image sensors, each offering distinct advantages in performance and application suitability. CCD sensors excel in applications requiring high image quality, low noise, and uniform sensitivity across , as they transfer charge across the to a single output , resulting in superior and reduced . In contrast, CMOS sensors integrate amplifiers at each pixel, enabling faster readout speeds, lower power consumption, and on-chip processing capabilities, which make them ideal for high-speed and cost-sensitive deployments. Modern CMOS sensors have largely closed the gap in image quality with CCDs due to advancements in pixel design and techniques. Sensors are also categorized by configuration: area-scan and line-scan cameras. Area-scan cameras capture a two-dimensional of a defined field in a single exposure, making them suitable for inspecting stationary or discrete objects, such as components on a or assembled products, where full-frame detail is needed quickly. Line-scan cameras, however, acquire images line by line as the object or camera moves, forming a complete through continuous scanning; this configuration is preferred for high-resolution of continuous materials like webs, films, or fast-moving production lines, allowing for extended fields of view without resolution loss. Line-scan systems can achieve higher effective frame rates by exposing new lines while transferring previous data, enhancing throughput in dynamic environments.

Lighting Systems

Effective illumination is critical in machine vision to enhance contrast, reduce shadows, and highlight defects or features that might otherwise be invisible under ambient . Lighting systems employ various sources, including light-emitting diodes (LEDs), lamps, and structured projectors, each tailored to specific needs. LEDs dominate modern setups due to their long lifespan (often exceeding 50,000 hours), energy efficiency, low heat generation, and ability to provide stable, uniform illumination without flickering, making them versatile for continuous operation in automated lines. lamps, such as quartz- variants, offer high-intensity for applications requiring deep penetration or color-critical inspections, though their shorter lifespan and higher power draw limit their use compared to LEDs. Structured light systems, often using LED projectors with patterns like stripes or grids, project known geometric shapes onto surfaces to capture 3D information or detect surface irregularities by analyzing distortions in the reflected light. These techniques significantly improve contrast for and defect identification, particularly on reflective or uneven materials, enabling sub-millimeter accuracy in measurements.

Lenses and Optics

Lenses and optical components determine the clarity, perspective, and accuracy of captured images, with key parameters including , , and distortion characteristics. dictates the field of view and : shorter focal lengths provide wider views for broad-area , while longer ones enable detailed close-ups for precision tasks. , the range of distances over which the image remains in acceptable focus, is inversely related to the lens ; higher f-numbers yield greater depth but reduce light intake, balancing sharpness across varying object planes. Distortion correction is essential to maintain geometric accuracy, as barrel or distortions can skew measurements; software post-processing often compensates, but lens design minimizes inherent aberrations. Telecentric lenses, a specialized optic, feature an entrance or at infinity, ensuring constant magnification regardless of object distance within the , which eliminates perspective errors and is crucial for applications like dimensional gauging where sub-pixel precision is required. These lenses provide , ideal for inspecting flat or cylindrical parts without size variation due to tilt or position shifts.

Supporting Hardware

Supporting hardware facilitates the reliable transfer and integration of image data into processing pipelines. Frame grabbers are specialized cards or devices that capture and buffer digital images from sensors, synchronizing acquisition with external triggers and enabling real-time processing in high-bandwidth scenarios. Standardized interfaces such as GigE Vision and USB3 Vision ensure across vendors. GigE Vision leverages Ethernet for cable lengths up to 100 meters, supporting multi-camera synchronization over networks with bandwidths up to 1 Gbps per link, suitable for distributed systems. USB3 Vision provides plug-and-play connectivity with transfer rates exceeding 5 Gbps over shorter distances (up to 5-10 meters), offering low-cost integration without dedicated frame grabbers for most applications. As of 2025, higher-speed options like 10GigE Vision (up to 10 Gbps) and 2.0 (up to 12.5 Gbps) are increasingly adopted for demanding applications requiring ultra-high frame rates and resolutions. Environmental considerations are paramount for hardware durability in industrial settings, where , , , and extremes prevail. IP-rated enclosures, such as IP65 or IP67, protect cameras and against ingress of solids and liquids; IP65 shields against and low-pressure water jets, while IP67 withstands temporary immersion up to 1 meter. These rugged housings, often with cooling fins or fans, ensure operational reliability in harsh environments like or outdoor automation.

Image Acquisition and Processing

Image acquisition in machine vision forms the initial stage of the software pipeline, where raw images are captured to ensure high-quality data for analysis. Synchronization of camera triggers is essential to align image capture with dynamic processes, such as object movement on assembly lines, often implemented via hardware signals or network-based commands in protocols like GigE Vision to achieve sub-millisecond precision across multiple cameras. Exposure control dynamically adjusts shutter duration and sensor gain to balance brightness and noise under inconsistent lighting, using algorithms that evaluate scene histograms to prevent saturation or loss of detail in high-contrast environments. Resolution selection optimizes pixel dimensions—typically ranging from VGA to multi-megapixel—based on the trade-off between spatial detail needed for fine measurements and computational efficiency for real-time processing. Pre-processing refines captured images by mitigating distortions and enhancing relevant features. applies techniques like Gaussian filtering, which convolves the image with a symmetric kernel to suppress additive noise while smoothing uniform areas; the filter response is given by G(x,y)=12πσ2exp(x2+y22σ2),G(x,y) = \frac{1}{2\pi\sigma^2} \exp\left(-\frac{x^2 + y^2}{2\sigma^2}\right), where σ\sigma determines the degree of blurring, effectively reducing Gaussian noise while preserving edges. employs operators such as the , which approximates the image gradient through 3×3 convolutions: Gx=[101202101]I,Gy=[121000121]I,G_x = \begin{bmatrix} -1 & 0 & 1 \\ -2 & 0 & 2 \\ -1 & 0 & 1 \end{bmatrix} * I, \quad G_y = \begin{bmatrix} -1 & -2 & -1 \\ 0 & 0 & 0 \\ 1 & 2 & 1 \end{bmatrix} * I,
Add your contribution
Related Hubs
Contribute something
User Avatar
No comments yet.