Recent from talks
Contribute something to knowledge base
Content stats: 0 posts, 0 articles, 0 media, 0 notes
Members stats: 0 subscribers, 0 contributors, 0 moderators, 0 supporters
Subscribers
Supporters
Contributors
Moderators
Hub AI
Object recognition (cognitive science) AI simulator
(@Object recognition (cognitive science)_simulator)
Hub AI
Object recognition (cognitive science) AI simulator
(@Object recognition (cognitive science)_simulator)
Object recognition (cognitive science)
Visual object recognition refers to the ability to identify the objects in view based on visual input. One important signature of visual object recognition is "object invariance", or the ability to identify objects across changes in the detailed context in which objects are viewed, including changes in illumination, object pose, and background context.
Neuropsychological evidence affirms that there are four specific stages identified in the process of object recognition. These stages are:
Within these stages, there are more specific processes that take place to complete the different processing components. In addition, other existing models have proposed integrative hierarchies (top-down and bottom-up), as well as parallel processing, as opposed to this general bottom-up hierarchy.
Visual recognition processing is typically viewed as a bottom-up hierarchy in which information is processed sequentially with increasing complexities. During this process, lower-level cortical processors, such as the primary visual cortex, are at the bottom of the hierarchy. Higher-level cortical processors, such as the inferotemporal cortex (IT), are at the top, where visual recognition is facilitated. A highly recognized bottom-up hierarchical theory is James DiCarlo's Untangling description whereby each stage of the hierarchically arranged ventral visual pathway performs operations to gradually transform object representations into an easily extractable format. In contrast, an increasingly popular recognition processing theory, is that of top-down processing. One model, proposed by Moshe Bar (2003), describes a "shortcut" method in which early visual inputs are sent, partially analyzed, from the early visual cortex to the prefrontal cortex (PFC). Possible interpretations of the crude visual input is generated in the PFC and then sent to the inferotemporal cortex (IT) subsequently activating relevant object representations which are then incorporated into the slower, bottom-up process. This "shortcut" is meant to minimize the number of object representations required for matching thereby facilitating object recognition. Lesion studies have supported this proposal with findings of slower response times for individuals with PFC lesions, suggesting use of only the bottom-up processing.
A significant aspect of object recognition is that of object constancy: the ability to recognize an object across varying viewing conditions. These varying conditions include object orientation, lighting, and object variability (size, color, and other within-category differences). For the visual system to achieve object constancy, it must be able to extract a commonality in the object description across different viewpoints and the retinal descriptions.[9] Participants who did categorization and recognition tasks while undergoing a functional magnetic found as increased blood flow indicating activation in specific regions of the brain. The categorization task consisted of participants placing objects from canonical or unusual views as either indoor or outdoor objects. The recognition task occurs by presenting the participants with images that they had viewed previously. Half of these images were in the same orientation as previously shown, while the other half were presented in the opposing viewpoint. The brain regions implicated in mental rotation, such as the ventral and dorsal visual pathways and the prefrontal cortex, showed the greatest increase in blood flow during these tasks, demonstrating that they are critical for the ability to view objects from multiple angles. Several theories have been generated to provide insight on how object constancy may be achieved for the purpose of object recognition including, viewpoint-invariant, viewpoint-dependent and multiple views theories.[citation needed]
Viewpoint-invariant theories suggest that object recognition is based on structural information, such as individual parts, allowing for recognition to take place regardless of the object's viewpoint. Accordingly, recognition is possible from any viewpoint as individual parts of an object can be rotated to fit any particular view.[10][citation needed] This form of analytical recognition requires little memory as only structural parts need to be encoded, which can produce multiple object representations through the interrelations of these parts and mental rotation.[10][citation needed] Participants in a study were presented with one encoding view from each of 24 preselected objects, as well as five filler images. Objects were then represented in the central visual field at either the same orientation or a different orientation than the original image. Then participants were asked to name if the same or different depth- orientation views of these objects presented. The same procedure was then executed when presenting the images to the left or right visual field. Viewpoint-dependent priming was observed when test views were presented directly to the right hemisphere, but not when test views were presented directly to the left hemisphere. The results support the model that objects are stored in a manner that is viewpoint dependent because the results did not depend on whether the same or a different set of parts could be recovered from the different-orientation views.
This model, proposed by Marr and Nishihara (1978), states that object recognition is achieved by matching 3-D model representations obtained from the visual object with 3-D model representations stored in memory as vertical shape precepts.[clarification needed] Through the use of computer programs and algorithms, Yi Yungfeng (2009) was able to demonstrate the ability for the human brain to mentally construct 3D images using only the 2D images that appear on the retina. Their model also demonstrates a high degree of shape constancy conserved between 2D images, which allow the 3D image to be recognized. The 3-D model representations obtained from the object are formed by first identifying the concavities of the object, which separate the stimulus into individual parts. Recent research suggests that an area of the brain, known as the caudal intraparietal area (CIP), is responsible for storing the slant and tilt of a plan surface that allow for concavity recognition. Rosenburg et al. implanted monkeys with a scleral search coil for monitoring eye position while simultaneously recording single neuron activation from neurons within the CIP. During the experiment, monkeys sat 30 cm away from an LCD screen that displayed the visual stimuli. Binocular disparity cues were displayed on the screen by rendering stimuli as green-red anaglyphs and the slant-tilt curves ranged from 0 to 330. A single trial consisted of a fixation point and then the presentation of a stimulus for 1 second. Neuron activation were then recorded using the surgically inserted micro electrodes. These single neuron activation for specific concavities of objects lead to the discovery that each axis of an individual part of an object containing concavity are found in memory stores. Identifying the principal axis of the object assists in the normalization process via mental rotation that is required because only the canonical description of the object is stored in memory. Recognition is acquired when the observed object viewpoint is mentally rotated to match the stored canonical description.[citation needed]
An extension of Marr and Nishihara's model, the recognition-by-components theory, proposed by Biederman (1987), proposes that the visual information gained from an object is divided into simple geometric components, such as blocks and cylinders, also known as "geons" (geometric ions), and are then matched with the most similar object representation that is stored in memory to provide the object's identification (see Figure 1).
Object recognition (cognitive science)
Visual object recognition refers to the ability to identify the objects in view based on visual input. One important signature of visual object recognition is "object invariance", or the ability to identify objects across changes in the detailed context in which objects are viewed, including changes in illumination, object pose, and background context.
Neuropsychological evidence affirms that there are four specific stages identified in the process of object recognition. These stages are:
Within these stages, there are more specific processes that take place to complete the different processing components. In addition, other existing models have proposed integrative hierarchies (top-down and bottom-up), as well as parallel processing, as opposed to this general bottom-up hierarchy.
Visual recognition processing is typically viewed as a bottom-up hierarchy in which information is processed sequentially with increasing complexities. During this process, lower-level cortical processors, such as the primary visual cortex, are at the bottom of the hierarchy. Higher-level cortical processors, such as the inferotemporal cortex (IT), are at the top, where visual recognition is facilitated. A highly recognized bottom-up hierarchical theory is James DiCarlo's Untangling description whereby each stage of the hierarchically arranged ventral visual pathway performs operations to gradually transform object representations into an easily extractable format. In contrast, an increasingly popular recognition processing theory, is that of top-down processing. One model, proposed by Moshe Bar (2003), describes a "shortcut" method in which early visual inputs are sent, partially analyzed, from the early visual cortex to the prefrontal cortex (PFC). Possible interpretations of the crude visual input is generated in the PFC and then sent to the inferotemporal cortex (IT) subsequently activating relevant object representations which are then incorporated into the slower, bottom-up process. This "shortcut" is meant to minimize the number of object representations required for matching thereby facilitating object recognition. Lesion studies have supported this proposal with findings of slower response times for individuals with PFC lesions, suggesting use of only the bottom-up processing.
A significant aspect of object recognition is that of object constancy: the ability to recognize an object across varying viewing conditions. These varying conditions include object orientation, lighting, and object variability (size, color, and other within-category differences). For the visual system to achieve object constancy, it must be able to extract a commonality in the object description across different viewpoints and the retinal descriptions.[9] Participants who did categorization and recognition tasks while undergoing a functional magnetic found as increased blood flow indicating activation in specific regions of the brain. The categorization task consisted of participants placing objects from canonical or unusual views as either indoor or outdoor objects. The recognition task occurs by presenting the participants with images that they had viewed previously. Half of these images were in the same orientation as previously shown, while the other half were presented in the opposing viewpoint. The brain regions implicated in mental rotation, such as the ventral and dorsal visual pathways and the prefrontal cortex, showed the greatest increase in blood flow during these tasks, demonstrating that they are critical for the ability to view objects from multiple angles. Several theories have been generated to provide insight on how object constancy may be achieved for the purpose of object recognition including, viewpoint-invariant, viewpoint-dependent and multiple views theories.[citation needed]
Viewpoint-invariant theories suggest that object recognition is based on structural information, such as individual parts, allowing for recognition to take place regardless of the object's viewpoint. Accordingly, recognition is possible from any viewpoint as individual parts of an object can be rotated to fit any particular view.[10][citation needed] This form of analytical recognition requires little memory as only structural parts need to be encoded, which can produce multiple object representations through the interrelations of these parts and mental rotation.[10][citation needed] Participants in a study were presented with one encoding view from each of 24 preselected objects, as well as five filler images. Objects were then represented in the central visual field at either the same orientation or a different orientation than the original image. Then participants were asked to name if the same or different depth- orientation views of these objects presented. The same procedure was then executed when presenting the images to the left or right visual field. Viewpoint-dependent priming was observed when test views were presented directly to the right hemisphere, but not when test views were presented directly to the left hemisphere. The results support the model that objects are stored in a manner that is viewpoint dependent because the results did not depend on whether the same or a different set of parts could be recovered from the different-orientation views.
This model, proposed by Marr and Nishihara (1978), states that object recognition is achieved by matching 3-D model representations obtained from the visual object with 3-D model representations stored in memory as vertical shape precepts.[clarification needed] Through the use of computer programs and algorithms, Yi Yungfeng (2009) was able to demonstrate the ability for the human brain to mentally construct 3D images using only the 2D images that appear on the retina. Their model also demonstrates a high degree of shape constancy conserved between 2D images, which allow the 3D image to be recognized. The 3-D model representations obtained from the object are formed by first identifying the concavities of the object, which separate the stimulus into individual parts. Recent research suggests that an area of the brain, known as the caudal intraparietal area (CIP), is responsible for storing the slant and tilt of a plan surface that allow for concavity recognition. Rosenburg et al. implanted monkeys with a scleral search coil for monitoring eye position while simultaneously recording single neuron activation from neurons within the CIP. During the experiment, monkeys sat 30 cm away from an LCD screen that displayed the visual stimuli. Binocular disparity cues were displayed on the screen by rendering stimuli as green-red anaglyphs and the slant-tilt curves ranged from 0 to 330. A single trial consisted of a fixation point and then the presentation of a stimulus for 1 second. Neuron activation were then recorded using the surgically inserted micro electrodes. These single neuron activation for specific concavities of objects lead to the discovery that each axis of an individual part of an object containing concavity are found in memory stores. Identifying the principal axis of the object assists in the normalization process via mental rotation that is required because only the canonical description of the object is stored in memory. Recognition is acquired when the observed object viewpoint is mentally rotated to match the stored canonical description.[citation needed]
An extension of Marr and Nishihara's model, the recognition-by-components theory, proposed by Biederman (1987), proposes that the visual information gained from an object is divided into simple geometric components, such as blocks and cylinders, also known as "geons" (geometric ions), and are then matched with the most similar object representation that is stored in memory to provide the object's identification (see Figure 1).
