Hubbry Logo
Predictive codingPredictive codingMain
Open search
Predictive coding
Community hub
Predictive coding
logo
7 pages, 0 posts
0 subscribers
Be the first to start a discussion here.
Be the first to start a discussion here.
Predictive coding
Predictive coding
from Wikipedia

In neuroscience, predictive coding (also known as predictive processing) is a theory of brain function which postulates that the brain is constantly generating and updating a "mental model" of the environment. According to the theory, such a mental model is used to predict input signals from the senses that are then compared with the actual input signals from those senses. Predictive coding is member of a wider set of theories that follow the Bayesian brain hypothesis.

Origins

[edit]

Theoretical ancestors to predictive coding date back as early as 1860 with Helmholtz's concept of unconscious inference.[1] Unconscious inference refers to the idea that the human brain fills in visual information to make sense of a scene. For example, if something is relatively smaller than another object in the visual field, the brain uses that information as a likely cue of depth, such that the perceiver ultimately (and involuntarily) experiences depth. The understanding of perception as the interaction between sensory stimuli (bottom-up) and conceptual knowledge (top-down) continued to be established by Jerome Bruner who, starting in the 1940s, studied the ways in which needs, motivations and expectations influence perception, research that came to be known as 'New Look' psychology. In 1981, McClelland and Rumelhart examined the interaction between processing features (lines and contours) which form letters, which in turn form words.[2] While the features suggest the presence of a word, they found that when letters were situated in the context of a word, people were able to identify them faster than when they were situated in a non-word without semantic context. McClelland and Rumelhart's parallel processing model describes perception as the meeting of top-down (conceptual) and bottom-up (sensory) elements.

In the late 1990s, the idea of top-down and bottom-up processing was translated into a computational model of vision by Rao and Ballard.[3] Their paper demonstrated that there could be a generative model of a scene (top-down processing), which would receive feedback via error signals (how much the visual input varied from the prediction), which would subsequently lead to updating the prediction. The computational model was able to replicate well-established receptive field effects, as well as less understood extra-classical receptive field effects such as end-stopping.

In 2004,[4] Rick Grush proposed a model of neural perceptual processing, the emulation theory of representation, according to which the brain constantly generates predictions based on a generative model (what Grush called an ‘emulator’) and compares that prediction to the actual sensory input. The difference, or ‘sensory residual’, would then be used to update the model so as to produce a more accurate estimate of the perceived domain. On Grush’s account, the top-down and bottom-up signals would be combined in a way sensitive to the expected noise (aka uncertainty) in the bottom-up signal, so that in situations in which the sensory signal was known to be less trustworthy, the top-down prediction would be given greater weight, and vice versa. The emulation framework was also shown to be hierarchical, with modality-specific emulators providing top-down expectations for sensory signals as well as higher-level emulators providing expectations of the distal causes of those signals. Grush applied the theory to visual perception, visual and motor imagery, language, and theory of mind phenomena.

General framework

[edit]
Conceptual Schematic of Predictive Coding with 2 levels

Predictive coding was initially developed as a model of the sensory system, where the brain solves the problem of modelling distal causes of sensory input through a version of Bayesian inference. It assumes that the brain maintains active internal representations of the distal causes, which enable it to predict the sensory inputs.[5] A comparison between predictions and sensory input yields a difference measure (e.g. prediction error, free energy, or surprise) which, if it is sufficiently large beyond the levels of expected statistical noise, will cause the internal model to update so that it better predicts sensory input in the future.

If, instead, the model accurately predicts driving sensory signals, activity at higher levels cancels out activity at lower levels, and the internal model remains unchanged. Thus, predictive coding inverts the conventional view of perception as a mostly bottom-up process, suggesting that it is largely constrained by prior predictions, where signals from the external world only shape perception to the extent that they are propagated up the cortical hierarchy in the form of prediction error.

Prediction errors can not only be used for inferring distal causes, but also for learning them via neural plasticity.[3] Here the idea is that the representations learned by cortical neurons reflect the statistical regularities in the sensory data. This idea is also present in many other theories of neural learning, such as sparse coding, with the central difference being that in predictive coding not only the connections to sensory inputs are learned (i.e., the receptive field), but also top-down predictive connections from higher-level representations. This makes predictive coding similar to some other models of hierarchical learning, such as Helmholtz machines and Deep belief networks, which however employ different learning algorithms. Thus, the dual use of prediction errors for both inference and learning is one of the defining features of predictive coding.[6]

Precision weighting

[edit]

The precision of incoming sensory input is their predictability based on signal noise and other factors. Estimates of the precision are crucial for effectively minimizing prediction error, as it allows to weight sensory inputs and predictions according to their reliability.[7] For instance, the noise in the visual signal varies between dawn and dusk, such that greater conditional confidence is assigned to sensory prediction errors in broad daylight than at nightfall.[8] Similar approaches are successfully used in other algorithms performing Bayesian inference, e.g., for Bayesian filtering in the Kalman filter.

It has also been proposed that such weighting of prediction errors in proportion to their estimated precision is, in essence, attention,[9] and that the process of devoting attention may be neurobiologically accomplished by ascending reticular activating systems (ARAS) optimizing the “gain” of prediction error units. However, it has also been argued that precision weighting can only explain “endogenous spatial attention”, but not other forms of attention.[10]

Active inference

[edit]

The same principle of prediction error minimization has been used to provide an account of behavior in which motor actions are not commands but descending proprioceptive predictions. In this scheme of active inference, classical reflex arcs are coordinated so as to selectively sample sensory input in ways that better fulfill predictions, thereby minimizing proprioceptive prediction errors.[9] Indeed, Adams et al. (2013) review evidence suggesting that this view of hierarchical predictive coding in the motor system provides a principled and neurally plausible framework for explaining the agranular organization of the motor cortex.[11] This view suggests that “perceptual and motor systems should not be regarded as separate but instead as a single active inference machine that tries to predict its sensory input in all domains: visual, auditory, somatosensory, interoceptive and, in the case of the motor system, proprioceptive.”[11]

Dual process theory

[edit]

The dual process theory of automatic and conscious cognitive processes lays the groundwork of understanding the operation of the human mind in psychology [12][13][14]. Ideas related to dual process theories can be traced to William James’s work in 1890, which distinguished between habitual processes, based on automatic associations formed through experience, and voluntary processes, involving more effortful, conscious reasoning. It reflects a more hierarchical and conscious reasoning that represents a more willful process. These initial concepts of automatic and voluntary cognitive processes map onto modern dual process theory, conceptualized by many psychologists [14][15].

Some researchers have drawn parallels between dual process theories and predictive coding. In this view, automatic processing (often termed “system 1”) is compared to the initial processing of sensory information, whereas deliberate processing (“system 2”) is compared to the active maintenance of internal representations and the comparison between those representations and sensory input. Specifically, as noted by Jonathan Evans [16], system 2 of the dual process theory, which is characterized by a reflective process that allows an individual to override the intuitive process (i.e., initial perception of the input), closely aligns with the computation of the prediction error (i.e., discrepancy between expected value and the actual outcome).  Although the process of active maintenance of internal representation, the constant monitoring of the conflict between new experiences and the internal representation, and the shifting of the posterior (i.e., updated belief after combining the prior belief and the actual evidence) is not uniquely differentiated in the dual-process model, the shared mechanism that requires a more deliberate and effortful cognitive process is captured by System 2.

Mental representation

[edit]

Representation in the field of cognitive psychology and philosophy refers to the mental encoding of external stimuli. Specifically, it is defined as a hypothetical internal cognitive symbol that represents external reality or its abstractions (for more information about representation, see Mental Representation). In the field of philosophy, mental representation is treated as a mediator between the real world and the observer (for more information, see Philosophy of Mind). The field of cognitive psychology has attempted to test this concept by using neuroimaging methods, with the most common methods being functional magnetic resonance imaging (fMRI). This method involves looking at people’s brain activity related to external stimuli (such as showing them a picture)[17].

In computational neuroscience, active predictive coding models typically include representations of visual stimuli as well as representations of goal-directed behavior. These representations interact to adapt to and learn new concepts[18]. Some work has attempted to link findings in neuroscience and cognitive psychology by examining how prediction errors change over time. Changes in prediction error have been interpreted as evidence that internal representations are constantly adjusted in response to new sensory input, so that they better match events in the external world.

Neural theory in predictive coding

[edit]

Much of the early work that applied a predictive coding framework to neural mechanisms came from sensory processing, particularly in the visual cortex.[3][19] These theories assume that the cortical architecture can be divided into hierarchically stacked levels, which correspond to different cortical regions. Every level is thought to house (at least) two types of neurons: “prediction neurons”, which aim to predict the bottom-up inputs to the current level, and "error neurons", which signal the difference between input and prediction. These neurons are thought to be mainly non-superficial and superficial pyramidal neurons, while interneurons take up different functions.[19]

Within cortical regions, there is evidence that different cortical layers may facilitate the integration of feedforward and feed-backward projections across hierarchies.[19] These cortical layers have therefore been assumed to be central in the computation of predictions and prediction errors, with the basic unit being a cortical column.[19][20] A common view is that[19][21]

  • error neurons reside in supragranular layers 2 and 3, since these neurons show sparse activity and tend to respond to unexpected events,
  • prediction neurons reside in deep layer 5, where many neurons exhibit dense responses,
  • precision weighting might be implemented through diverse mechanism, such as neuromodulators or long range projections from other brain areas (e.g., thalamus).

However, thus far there is no consensus on how the brain most likely implements predictive coding. Some theories, for example, propose that supragranular layers contain, not only error, but also prediction neurons.[19] It is also still debated through which mechanisms error neurons might compute the prediction error.[22] Since prediction errors can be both negative and positive, but biological neurons can only show positive activity, more complex error coding schemes are required. To circumvent this problem, more recent theories have proposed that error computation might take place in neural dendrites instead.[23][24] The neural architecture and computations proposed in these dendritic theories are similar to what has been proposed in Hierarchical temporal memory theory of cortex.

Applying predictive coding

[edit]

Perception

[edit]

The empirical evidence for predictive coding is most robust for perceptual processing. As early as 1999, Rao and Ballard proposed a hierarchical visual processing model in which higher-order visual cortical area sends down predictions and the feedforward connections carry the residual errors between the predictions and the actual lower-level activities.[3] According to this model, each level in the hierarchical model network (except the lowest level, which represents the image) attempts to predict the responses at the next lower level via feedback connections, and the error signal is used to correct the estimate of the input signal at each level concurrently.[3] Emberson et al. established the top-down modulation in infants using a cross-modal audiovisual omission paradigm, determining that even infant brains have expectation about future sensory input that is carried downstream from visual cortices and are capable of expectation-based feedback.[25] Functional near-infrared spectroscopy (fNIRS) data showed that infant occipital cortex responded to unexpected visual omission (with no visual information input) but not to expected visual omission. These results establish that in a hierarchically organized perception system, higher-order neurons send down predictions to lower-order neurons, which in turn sends back up the prediction error signal.

Interoception

[edit]

There have been several competing models for the role of predictive coding in interoception.

In 2013, Anil Seth proposed that our subjective feeling states, otherwise known as emotions, are generated by predictive models that are actively built out of causal interoceptive appraisals.[26] In relation to how we attribute internal states of others to causes, Sasha Ondobaka, James Kilner, and Karl Friston (2015) proposed that the free energy principle requires the brain to produce a continuous series of predictions with the goal of reducing the amount of prediction error that manifests as “free energy”.[27] These errors are then used to model anticipatory information about what the state of the outside world will be and attributions of causes of that world state, including understanding of causes of others’ behavior. This is especially necessary because, to create these attributions, our multimodal sensory systems need interoceptive predictions to organize themselves. Therefore, Ondobaka posits that predictive coding is key to understanding other people's internal states.

In 2015, Lisa Feldman Barrett and W. Kyle Simmons proposed the Embodied Predictive Interoception Coding model, a framework that unifies Bayesian active inference principles with a physiological framework of corticocortical connections.[28] Using this model, they posited that agranular visceromotor cortices are responsible for generating predictions about interoception, thus, defining the experience of interoception.

Contrary to the inductive notion that emotion categories are biologically distinct, Barrett proposed later the theory of constructed emotion, which is the account that a biological emotion category is constructed based on a conceptual category—the accumulation of instances sharing a goal.[29][30] In a predictive coding model, Barrett hypothesizes that, in interoception, our brains regulate our bodies by activating "embodied simulations" (full-bodied representations of sensory experience) to anticipate what our brains predict that the external world will throw at us sensorially and how we will respond to it with action. These simulations are either preserved if, based on our brain's predictions, they prepare us well for what actually subsequently occurs in the external world, or they, and our predictions, are adjusted to compensate for their error in comparison to what actually occurs in the external world and how well-prepared we were for it. Then, in a trial-error-adjust process, our bodies find similarities in goals among certain successful anticipatory simulations and group them together under conceptual categories. Every time a new experience arises, our brains use this past trial-error-adjust history to match the new experience to one of the categories of accumulated corrected simulations that it shares the most similarity with. Then, they apply the corrected simulation of that category to the new experience in the hopes of preparing our bodies for the rest of the experience. If it does not, the prediction, the simulation, and perhaps the boundaries of the conceptual category are revised in the hopes of higher accuracy next time, and the process continues. Barrett hypothesizes that, when prediction error for a certain category of simulations for x-like experiences is minimized, what results is a correction-informed simulation that the body will reenact for every x-like experience, resulting in a correction-informed full-bodied representation of sensory experience—an emotion. In this sense, Barrett proposes that we construct our emotions because the conceptual category framework our brains use to compare new experiences, and to pick the appropriate predictive sensory simulation to activate, is built on the go.

Human development

[edit]

From a developmental perspective, predictive coding has been examined in relation to the biological maturation of the brain systems involved in sensation and cognition, highlighting how the brain’s capacity to generate and update predictions evolves across early life. Evidence from neonatal studies demonstrates that prediction error mechanisms emerge very early in life: event-related potential recordings (patterns of brain activity observed in relation to specific events) show that even newborns differentiate between expected and unexpected sounds, suggesting the presence of a very basic form of sensory prediction [31][32][33]. As children grow, these predictive capacities become more sophisticated as their brains mature and they gain experience (see more detailed information in Development of the nervous system). Research across later human developmental stages indicates that the development of abilities to direct and control attention and the inferential reasoning process occur together, as repeated interactions with the environment strengthen the brain’s internal models of sensory regularities [34][35][36](for more information on the sophistication and specialization of neural connections across developmental stages, see Synaptic pruning). This developmental trajectory has been described as a shift from mainly reactive sensory processing in infancy toward proactive, model-based perception in childhood. As networks of connected brain regions mature, they support top-down modulation, in which prior knowledge shapes how sensory information is processed, and precision weighting, which refers to how strongly prior expectations versus new sensory input are taken into account (see Precision Weighting section of this Wikipedia page). Altogether, this line of work has been interpreted as suggesting that predictive coding may contribute to the development of efficient perception, attention, and learning across childhood, providing a computational framework for understanding how experience shapes the developing brain.

Studies of predictive coding in a developmental context often involve using repetition suppression (a reduction in a specific pattern of brain activity observed when someone is exposed to the same stimuli repeatedly), as it is commonly treated as a measure of reduced prediction error. In other words, diminished prediction error would indicate that the participant has been updating their mental representation (i.e., expectation) to be closer to the presented stimuli. Therefore, examination of the development of repetition suppression has been treated as a proxy for the development of predictive inference and mental representation. The application of predictive coding in human development is not without its limitations. For instance, most research studies testing predictive coding through neural measures (e.g., event-related brain potentials) require responses from the participant, which is not possible for infants. Furthermore, developmental changes in the anatomy and network of the brain make the interpretations of prediction error more complex, which warrants caution in the interpretations of the current literature.[37]

Neurodevelopmental disorders

[edit]

Differences in predictive coding processes have been proposed to play a role in neurodevelopmental disorders, such as autism spectrum disorders and Attention-Deficit Hyperactivity Disorder (ADHD). Given the role of predictive coding in guiding the perception of the environment as well as further interaction with the environment, some authors have suggested that differences in attention and cognitive processes related to predictive coding could serve as potential biomarkers, or biological correlates, for understanding neurodevelopmental disorders [38].

Under typical development, the predictive coding framework suggests that perception and interpretation of the perceived information rely on higher-order cognitive processes that minimize prediction error by continuously adjusting expectations to match incoming sensory input. According to predictive coding accounts, individuals with neurodevelopmental disorders, such as autism spectrum disorder (ASD) and ADHD, may show imbalances in how much weight is given to prior expectations versus incoming sensory evidence—a phenomenon sometimes referred to as precision weighting dysfunction[39]. For example, in autism, research studies suggest that prior beliefs may be underweighted, leading to an overreliance on moment-to-moment sensory input and difficulties filtering out irrelevant stimuli, which manifests as sensory hypersensitivity and reduced ability to account for context when processing the stimulus[40]. Across disorders, such differences have been proposed to lead to less accurate internal representations, which may impair the brain’s ability to form accurate predictions about social cues, rewards, and environments. As a result, predictive coding abnormalities have been proposed as a possible cognitive model that could help link diverse symptom profiles in neurodevelopmental conditions to underlying differences in hierarchical information processing and learning.

Psychopathology

[edit]

Altered predictive coding in psychological disorders has received wide attention, likely in an attempt to explain how symptoms of psychological disorders occur. Below are the descriptions of the current research looking at how problems with predictive coding may contribute to different psychological disorders.

Psychotic Disorders. Psychotic disorders are characterized by symptoms of hallucination (seeing, hearing, feeling, smelling, or tasting something that is not actually there) and delusion (a strongly held false belief that persists despite clear conflicting evidence). In applications of predictive coding, a mismatch between priors and prediction errors may explain these psychotic symptoms[41][42]. There are three ways in which impaired predictive coding might contribute to these symptoms: 1) overweighing of sensory prediction errors, 2) weakened top-down priors, and 3) disrupted hierarchical communication between frontal and sensory regions[43]. However, research disentangling different contributors to prediction error is limited.

Unlike typical conditions, where perception depends on balancing prior expectations (top-down predictions) with sensory evidence (bottom-up input), weighted by their precision, or estimated reliability, some studies show evidence that precision weighting may be dysregulated in people with psychosis, which leads to either underweighted priors or overweighted sensory prediction errors[44]. Under this account, internal noise may be misinterpreted as meaningful sensory data, which could contribute to hallucinations, and spurious associations may contribute to delusional beliefs that are resistant to  updating. Neurophysiological evidence supports this imbalance: individuals with schizophrenia show reduced mismatch negativity (MMN) and impaired prediction-error signaling in frontotemporal circuits of the brain, indicating failures to suppress or appropriately update sensory predictions[45][46].

At higher cognitive levels, some researchers link predictive coding accounts to the concept of aberrant salience, which refers to the attribution of undue importance to stimuli that would typically be considered irrelevant. This mechanism aligns with dopaminergic dysfunction, as dopamine is hypothesized to encode the precision of prediction errors; hyperdopaminergic states amplify noisy error signals, fueling delusional inferences and unstable perception. Together, these findings have been interpreted as consistent with the idea that psychosis may involve a breakdown in hierarchical predictive coding, in which disturbances in both low-level sensory prediction and high-level belief formation interact to produce characteristic symptoms[44].

Eating Disorders. The predictive coding framework has been applied to the study of eating disorders. In this approach, some theorists propose that disordered eating behaviors may partly arise from differences in interoception, the perception of internal bodily signals. Studies of interoception in the eating disorder field have focused on gastrointestinal interoception, which is defined as the process by which the nervous system detects and integrates signals originating from the gastrointestinal system. Specifically, recent studies started focusing on the relationships between different facets of gastrointestinal interoception profiles and various disordered eating behaviors (e.g., binge eating, restrictive eating), warranting the utility of a predictive coding framework in furthering the understanding of the mechanisms that drive disordered eating behaviors [47][48][49][50].

Computer science

[edit]

With the rising popularity of representation learning, the theory has also been actively pursued and applied in machine learning and related fields.[51][52][53]

Challenges

[edit]

One of the biggest challenges to test predictive coding has been the imprecision of exactly how prediction error minimization works.[54] In some studies, the increase in BOLD signal has been interpreted as error signal while in others it indicates changes in the input representation.[54] A crucial question that needs to be addressed is what exactly constitutes error signal and how it is computed at each level of information processing.[19] Another challenge that has been posed is predictive coding's computational tractability. According to Kwisthout and van Rooij, the subcomputation in each level of the predictive coding framework potentially hides a computationally intractable problem, which amounts to “intractable hurdles” that computational modelers have yet to overcome.[55]

Future research could focus on clarifying the neurophysiological mechanism and computational model of predictive coding.[according to whom?]

Studies of predictive coding

[edit]

Electroencephalography (EEG) and event-related potential (ERP) research have been widely used to investigate predictive coding in humans. Within this framework, ERP components are often interpreted as neural markers of prediction-error signaling, generated when sensory input differs from what is expected. For example, the mismatch negativity (MMN), elicited by unexpected sounds, reflects automatic detection of prediction violations and adapts with learning and attention. Components related to performance monitoring, such as the error-related negativity (ERN/Ne) and error positivity, have been associated with the discrepancy between the internal representation of the correct response versus actual response [56][57][58][59]. Later components such as the P300 and feedback-related negativity (FRN) have been linked to higher-order updating of cognitive or reward models. These findings have been interpreted as consistent with predictive coding models in which processing is organized hierarchically, from early perceptual mismatches to more abstract belief revisions.

Despite these insights, linking ERP components uniquely to prediction errors remains challenging. ERPs represent aggregate neural activity from overlapping sources, and their amplitudes are influenced by multiple cognitive processes such as attention, novelty, and salience of the stimuli, practice effects, and habituation to the stimuli. For instance, P300 amplitude often reflects general updating or arousal. EEG and ERP paradigms have provided important evidence for predictive processing, although alternative explanations remain. Nonetheless, careful experimental design and model-based analyses are required to distinguish between genuine prediction-error signals from broader cognitive or perceptual influences.[60]

See also

[edit]

References

[edit]
Revisions and contributorsEdit on WikipediaRead on Wikipedia
from Grokipedia
Predictive coding is a specific neural implementation within the broader predictive processing framework, a unifying theory in and that models , , and action as forms of inference driven by prediction error minimization. This framework posits the functions as a predictive , generating top-down predictions about sensory inputs from higher cortical levels and comparing them with bottom-up sensory to compute and minimize prediction errors, a process akin to the rolling optimization in model predictive control (MPC) from engineering, where future states are iteratively predicted and adjusted to reduce discrepancies. This predictive processing influences perceptions such as pain, fatigue, and movement outcomes. The hierarchical process, rooted in and connected to the free energy principle, allows the to model the causes of sensory signals rather than passively encoding raw , optimizing for statistical regularities in the environment. The concept traces its modern origins to early ideas of unconscious perceptual inference proposed by in the , but it was formalized in the late through computational models. A seminal contribution came from Rajesh P. N. Rao and Dana H. Ballard in 1999, who developed a hierarchical model for the where feedback connections transmit predictions of lower-level activity, while feedforward pathways carry residual errors, explaining phenomena like extra-classical effects and endstopping in visual neurons. Karl Friston extended this framework in the 2000s by integrating it with the free-energy principle, framing prediction error minimization as a strategy to reduce surprise or free energy in generative models of the world, applicable across sensory modalities and cognitive functions. At its core, predictive coding operates through a multi-level hierarchy of cortical areas, where each level represents increasingly abstract features of the sensory world using latent variables. Predictions flow downward to anticipate activity at lower levels, and discrepancies—termed prediction errors—are propagated upward to update the internal model, with precision weighting modulating the influence of errors based on expected reliability. This mechanism aligns with empirical observations, such as repetition suppression in neural responses, where predictable stimuli elicit reduced activity due to fulfilled predictions. Learning occurs by adjusting model parameters to reduce long-term errors, akin to variational Bayesian inference. Predictive coding has broad implications beyond basic , influencing models of , where precision adjustments prioritize salient errors, and action, through active inference where predictions guide behavior to confirm expectations. It also informs computational , linking aberrant prediction error signaling to disorders like , where excessive or imprecise errors may underlie hallucinations or delusions. In , predictive coding inspires energy-efficient learning algorithms that mimic cortical hierarchies for tasks like image recognition. Ongoing research continues to test its neural plausibility through and electrophysiological studies, refining its role in unifying diverse brain functions.

Historical Development

Early Concepts in Cybernetics and Signal Processing

The foundational ideas of predictive coding emerged in the mid-20th century within the field of , pioneered by during the 1940s. Wiener's work on feedback control systems, initially motivated by anti-aircraft fire control during , involved predicting the future positions of targets through of stationary , using linear filters to minimize prediction errors in noisy environments. This approach emphasized the role of in stabilizing systems by comparing predicted outputs to actual observations and adjusting accordingly. In his seminal 1948 book Cybernetics: Or Control and Communication in the Animal and the Machine, Wiener extended these engineering principles to biological systems, arguing that feedback mechanisms underpin adaptive behavior in living organisms, such as neural regulation and sensory-motor coordination. Parallel developments in built on Wiener's theory, integrating it with to address in communication channels. In the 1940s, Wiener formalized as a method for estimating signal values based on past observations, optimizing filters to reduce mean-squared error and thereby enhance signal detection amid . Claude Shannon's 1948 provided the theoretical underpinning by quantifying —the excess information in signals beyond what is necessary for reliable transmission—and demonstrating how exploiting statistical could minimize requirements while reducing errors. These concepts laid the groundwork for error-minimizing systems in , where predictions of signal trajectories allowed for efficient encoding by transmitting only deviations from expected patterns. A key application of these ideas appeared in perceptual models during the 1960s, notably the "analysis-by-synthesis" framework proposed for . In his 1960 paper, Kenneth N. Stevens introduced a model where incoming auditory signals are interpreted by generating internal predictions of possible , then synthesizing and matching these hypotheses against the input to select the best fit, thereby minimizing perceptual discrepancies. This approach, building on cybernetic feedback and predictive filtering, highlighted how top-down expectations could guide bottom-up analysis in , influencing early computational models of human audition. By the 1970s, these principles manifested in practical data compression techniques, such as differential (DPCM), which served as a direct precursor to broader predictive coding applications. DPCM, an extension of , predicts the current signal sample from previous ones and encodes only the difference (or prediction error), achieving significant bitrate reductions—often by factors of 2 to 4—for speech and other signals while maintaining fidelity. Pioneered in works like Bishnu S. Atal and Manfred R. Schroeder's 1970 paper on adaptive predictive coding, DPCM demonstrated how error signals could be quantized and transmitted efficiently, inspiring later adaptations in and foreshadowing biological interpretations of prediction in sensory systems.

Key Milestones in Neuroscience

The concept of predictive coding in traces its philosophical roots to Hermann von Helmholtz's theory of , proposed in the 1860s, which posited that involves the making automatic, inferential judgments about sensory inputs to construct a coherent view of the world, often without conscious awareness. This idea laid a foundational precursor for later neuroscientific models by emphasizing top-down influences on , though it remained largely qualitative until the late . An early neurophysiological application appeared in 1982, when Mandyam V. Srinivasan, Simon B. Laughlin, and Andreas Dubs proposed predictive coding as a mechanism for inhibition in the . Their model suggested that retinal neurons predict the activity of neighboring cells based on spatial correlations in natural images, transmitting only prediction errors to reduce and enhance efficient coding of visual . This work provided one of the first explicit neural implementations of predictive coding principles in . In 1992, further developed the idea for higher cortical areas, proposing a computational architecture for the where hierarchical layers generate predictions about lower-level features, with error signals propagating upward to refine internal models. This framework drew on to explain how the could represent complex scenes through predictive feedback connections. The formalization of predictive coding as a theory occurred in the 1990s, particularly with Rajesh P. N. Rao and Dana H. Ballard's 1999 paper, which proposed that neurons implement hierarchical predictions where higher-level areas send top-down signals to anticipate lower-level sensory features, thereby minimizing prediction errors and explaining extra-classical effects observed in experiments. Their model demonstrated how such mechanisms could account for neural responses in areas like V1 and V2, marking a pivotal shift toward viewing the as a predictive system. From the 2000s onward, Karl Friston advanced predictive coding by integrating it with the free-energy principle in his 2005 paper, arguing that the minimizes variational free energy as a proxy for surprise, unifying , action, and learning under a framework where prediction errors drive adaptive across hierarchical cortical structures. Friston's contributions elevated predictive coding from a perceptual model to a comprehensive theory of function, influencing fields like and computational . The 2010s saw a surge in predictive coding's adoption within , particularly through its alignment with the Bayesian brain hypothesis, which frames as probabilistic where priors and likelihoods update via error signals to optimize predictions. This integration was highlighted by events such as the 2013 conference "The World Inside the Brain: Internal Predictive Models in Humans and Robots," which fostered interdisciplinary discussions on how predictive mechanisms underpin neural computation from to .

Core Principles

Prediction and Error Signals

Predictive processing (PP) serves as a unifying framework in cognitive science and neuroscience, positing that perception, cognition, and action arise from hierarchical Bayesian inference aimed at minimizing prediction errors. Unlike traditional stimulus-response models, which view the brain as passively accumulating bottom-up sensory data for sequential analysis, PP emphasizes active, inferential processing where the brain constructs representations of the world by generating top-down predictions and updating them based on sensory evidence. This shift from passive reception to predictive construction enhances efficiency and contextual integration, reducing redundancy and improving robustness to noise or ambiguity. In predictive coding, a key implementation of PP, the brain maintains internal generative models that produce top-down predictions about expected sensory inputs based on prior knowledge and . Predictive coding inverts traditional sensory processing: higher cortical areas generate top-down predictions, while lower areas signal only residuals (prediction errors). These predictions are compared against actual bottom-up sensory , generating prediction that quantify the mismatch between what was anticipated and what is observed. Prediction serve as the primary signals for updating and refining the internal models, enabling and without transmitting redundant upward through the sensory . Core assumptions of PP include the brain functioning as a Bayesian inference engine, where priors—probabilistic expectations derived from past experiences—guide predictions, and sensory inputs serve as likelihoods to update these beliefs. Prediction errors are weighted by precision, which reflects the reliability or expected variance of the signal; high-precision errors (e.g., from reliable sensory channels) drive stronger model updates, while low-precision ones are discounted. This precision weighting mechanism modulates attention, directing resources to salient or uncertain aspects of the input, and plays a role in emotion, where interoceptive prediction errors can signal affective states like anxiety or surprise. Prediction errors play a central role in perceptual inference by driving adjustments to the generative models, thereby minimizing discrepancies over time and facilitating accurate representations of the environment. This continuous process of predicting sensory inputs and minimizing prediction errors is analogous to model predictive control (MPC) in engineering, where systems iteratively forecast future states over a rolling horizon and adjust controls based on discrepancies between predictions and observations. When sensory input aligns closely with predictions, errors are suppressed through an 'explain away' mechanism, where accurate predictions suppress activity at lower levels—evidenced by reduced BOLD signals in V1 for predictable stimuli and repetition suppression across modalities. This allows the to focus resources on novel or unexpected features; conversely, large errors trigger revisions in higher-level expectations to better anticipate future inputs. This error-driven process underpins efficient , as it prioritizes deviations that carry informational value for and action, enforcing priors by rejecting unexpected signals. The core mechanism involves a bidirectional flow of information: forward (bottom-up) propagation of prediction errors from lower sensory levels to higher cortical areas signals surprises that require , while backward (top-down) transmission of from higher to lower levels anticipates and preempts sensory data. This can be conceptualized as follows:
  • Sensory Input: enters at the lowest level.
  • Prediction Comparison: Top-down meet incoming signals, computing (e.g., e=xx^e = x - \hat{x}, where xx is observed input and x^\hat{x} is predicted).
  • Error Propagation: Unsuppressed ascend to update higher models.
  • Prediction Update: Revised models descend new to refine lower-level processing.
Such a loop ensures that only prediction errors, rather than all sensory details, are relayed upward, optimizing neural bandwidth. Unlike classical processing, which relies on unidirectional transmission of sensory from periphery to cortex for sequential analysis, predictive coding emphasizes this bidirectional interplay to achieve greater efficiency and contextual integration. models treat as passive accumulation of bottom-up signals, often leading to redundant computations, whereas predictive coding actively anticipates inputs, reducing the need for exhaustive signaling and enhancing robustness to noise or .

Hierarchical Inference

In predictive coding, hierarchical inference operates through a multi-layer architecture in the , where lower levels process and predict fine-grained sensory details, while higher levels infer more abstract causes of those sensations. This structure allows the to build increasingly complex representations of the world by integrating information across scales, with each level contributing to an overall model of sensory inputs. For instance, primary sensory areas like the early handle basic features, whereas higher cortical regions interpret contextual or categorical information. In the context of PP, this hierarchy extends to perception, where lower levels predict sensory features and higher levels predict perceptual objects; to emotion, where interoceptive hierarchies model bodily states; and to attention, which is implemented via precision weighting that amplifies relevant prediction errors across levels. Error propagation in this involves residual errors being passed upward from lower to higher levels via connections, signaling unexplained aspects of the input that require refinement of higher-level representations. In response, higher levels generate and send downward through feedback connections, which suppress or modulate activity at lower levels to better align with expected sensory patterns. This bidirectional enables efficient by minimizing discrepancies across the hierarchy without redundant transmission of all sensory data. Priors at each level incorporate statistical regularities, and precision weighting ensures that updates prioritize high-confidence evidence, facilitating adaptive responses in perception, emotional regulation, and attentional focus. At each level, the constructs hierarchical generative models that approximate the probabilistic of the environment, allowing predictions to be generated from abstract causes downward to sensory specifics. These models learn statistical regularities, such as the dependencies between features, to form a coherent "top-down" of bottom-up . In visual processing, for example, lower levels might predict oriented edges in a scene, while higher levels predict entire objects like a face, with errors from edge mismatches propagating up to adjust object representations and refined predictions flowing back to sharpen . Similarly, in emotional processing, hierarchical models predict autonomic responses, with prediction errors contributing to feelings of valence or arousal.

Mathematical Framework

Bayesian Foundations

The Bayesian brain hypothesis posits that the brain functions as a probabilistic machine, maintaining internal representations of the world in the form of probability distributions and continuously updating these representations by integrating prior beliefs with incoming sensory evidence according to Bayesian principles. Under this framework, sensory inputs serve as likelihoods that inform the revision of priors—pre-existing expectations about environmental causes—yielding posterior distributions that best explain observed data. This hypothesis suggests that neural processes approximate optimal to handle uncertainty in and , with empirical support from psychophysical studies demonstrating aligns with Bayesian predictions in tasks involving sensory integration. In predictive coding, this Bayesian approach is operationalized through a , where the posits hidden causes underlying sensory inputs and generates top-down predictions of expected sensations based on prior distributions over those causes. Prediction errors arise when actual sensory data deviate from these predictions, signaling the need to update the model's parameters via approximate Bayesian updates; this process inverts the to infer the most likely hidden states, effectively minimizing surprise or prediction mismatch. The 's hierarchical structure allows priors at higher levels to constrain lower-level inferences, enabling efficient approximation of intractable Bayesian computations in real-time neural processing. Predictive coding achieves through variational methods, which bound and minimize the free energy—a proxy for surprise or the between predicted and actual sensory states—to approximate intractable posterior distributions over hidden causes. This variational free energy minimization provides a tractable scheme for the to optimize its internal model, ensuring predictions align with sensory evidence while regularizing against through prior constraints. By iteratively refining approximations, the converges on Bayes-optimal representations without exhaustive computation. A central feature of this framework is empirical Bayes, wherein hyperparameters governing the priors are not fixed but learned directly from sensory data across hierarchical levels, inducing data-driven empirical priors that adapt the to environmental statistics. This approach leverages the hierarchical nature of neural architectures to estimate higher-level parameters from aggregated lower-level evidence, enhancing the model's flexibility and accuracy in inferring causes.

Prediction Error Minimization Equations

In predictive coding, the fundamental prediction error at a given level is defined as the difference between the observed sensory input xx and the top-down prediction μ\mu generated from higher-level representations. This error, denoted ε=xμ\varepsilon = x - \mu, quantifies the mismatch that drives perceptual inference by signaling discrepancies between expectations and reality. The core objective of predictive coding is to minimize this prediction error, typically formulated as an to reduce the sum of squared errors across observations, ε2\sum \varepsilon^2. In a Bayesian framework, this minimization approximates the reduction of variational free energy FF, which bounds the between an approximate posterior q(μ)q(\mu) and the true posterior p(μx)p(\mu|x), expressed as F=KL[q(μ)p(μx)]ε2/σ2F = \mathrm{KL}[q(\mu) \| p(\mu|x)] \approx \sum \varepsilon^2 / \sigma^2, where σ2\sigma^2 represents sensory variance. To achieve minimization, predictions are updated iteratively via on the free energy. The update rule takes the form μt+1=μtF/μ\mu^{t+1} = \mu^t - \partial F / \partial \mu, where the change in the predictive μ\mu at time step tt is proportional to the of FF with respect to μ\mu, effectively adjusting higher-level representations to better explain sensory data. In hierarchical predictive coding, propagate across multiple levels, with the prediction at level ll given by εl=xlg(μl+1)\varepsilon_l = x_l - g(\mu_{l+1}), where gg is the generative function mapping predictions from the higher level l+1l+1 to the representation at level ll. This enables successive refinement, as at lower levels inform updates at higher levels, fostering a unified process throughout the .

Neural Implementations

Cortical Hierarchies and Feedback

In the neocortex, predictive coding is anatomically supported by a hierarchical organization of cortical areas interconnected through reciprocal feedforward and feedback pathways, enabling the flow of prediction errors upward and predictions downward. Feedforward connections, conveying sensory-driven prediction errors, primarily originate from the superficial layers (layers 2 and 3) of lower cortical areas and target layer 4 of higher areas, where initial error computations occur upon integration with incoming thalamic inputs. Conversely, feedback connections, carrying top-down predictions, arise from the deep layers (layers 5 and 6) of higher areas and project to the superficial layers of lower areas, modulating sensory processing by subtracting expected signals from incoming data. This layer-specific segregation aligns with the core mechanics of predictive coding, where superficial layers primarily process and transmit prediction errors upward via feedforward connections, and deep layers generate and transmit top-down predictions downward via feedback connections, facilitating hierarchical inference across the cortical column. Feedback loops in predictive coding are exemplified by top-down projections from primary visual cortex (V1) to the lateral geniculate nucleus (LGN) in the thalamus, where layer 6 pyramidal neurons in V1 convey predictions to modulate thalamic relay cells before sensory signals reach the cortex. These projections suppress LGN activity for expected stimuli, effectively implementing error minimization at early sensory stages by gating redundant information. Anatomical evidence underscores the dominance of such feedback: in the cat visual system, corticogeniculate synapses from V1 onto LGN relay neurons significantly outnumber retinogeniculate synapses, highlighting the substantial infrastructure for predictive modulation despite weaker individual synaptic strengths compared to direct retinal inputs. The canonical microcircuit model provides a unified framework for these anatomical features, positing a standardized columnar organization across sensory cortices where reciprocal connections support bidirectional signaling for and error exchange. In this model, layer 4 acts as the primary site for bottom-up error signals derived from sensory discrepancies, which are then routed to superficial-layer output neurons for upward transmission of prediction errors via connections, while deep-layer neurons generate and disseminate top-down via long-range feedback axons. This architecture, observed consistently in visual, auditory, and somatosensory cortices, ensures efficient hierarchical processing, with empirical support from laminar recordings showing distinct oscillatory patterns—gamma for error-driven and alpha/beta for predictive feedback—that align with the model's predictions.

Precision Weighting Mechanisms

In predictive coding, precision weighting refers to the process by which the assigns importance to prediction errors based on their estimated reliability, effectively modulating the influence of sensory and top-down during . Precision is formally defined as the inverse of the variance in the associated with a signal, denoted as π=1/σ2\pi = 1/\sigma^2, where σ2\sigma^2 represents the variance of the ; this metric quantifies the confidence or certainty in a given or . By weighting errors according to their precision, the system prioritizes more reliable signals in updating internal models, thereby optimizing the balance between bottom-up sensory data and hierarchical priors. A key distinction in precision weighting arises between sensory precision and the precision of priors. High sensory precision indicates low in incoming data, leading the to trust bottom-up prediction errors more heavily and adjust generative models accordingly; conversely, low sensory precision, such as in noisy environments, increases reliance on precise priors from higher cortical levels to suppress or reinterpret ambiguous inputs. This dynamic allows predictive coding to adapt to varying levels of , ensuring robust perceptual inference even when sensory evidence is unreliable. Neuromodulatory systems play a crucial role in tuning these precision weights. Acetylcholine, for instance, enhances sensory precision by increasing the gain on prediction error signals in sensory cortices, thereby amplifying the impact of reliable bottom-up inputs during tasks requiring focused attention. Dopamine, on the other hand, modulates the precision of unsigned prediction errors in cortical regions, facilitating learning and by selectively weighting errors that signal novelty or salience. These mechanisms are integrated into the core minimization process of predictive coding through weighted prediction errors, expressed as ϵ=ϵ/σ\epsilon' = \epsilon / \sigma, where ϵ\epsilon is the raw ; the objective then becomes minimizing the sum of weighted squared errors, πϵ2\sum \pi \epsilon^2, which balances the contributions of precise signals in variational free-energy minimization. This formulation ensures that remains statistically efficient, prioritizing errors from sources with high precision while downweighting those from noisy or uncertain origins.

Applications in Perception and Cognition

Sensory Processing

In predictive coding frameworks, sensory processing involves the generating top-down predictions about incoming exteroceptive signals from external environments, such as visual and auditory stimuli, and updating these predictions based on bottom-up error signals to minimize discrepancies. This process enables efficient by suppressing predictable sensory inputs while amplifying unexpected ones, thereby resolving perceptual ambiguities in real-time. For example, in auditory processing, constant sounds generate low prediction errors and are thus suppressed or habituated to, becoming ignorable, whereas variable or changing sounds produce higher prediction errors, signaling novelty or importance and amplifying their salience. For instance, in vision, the anticipates object locations and features based on prior experiences, allowing it to infer the world without processing every detail exhaustively. Within the broader predictive processing framework, perception arises from hierarchical prediction, where multiple levels of the neural hierarchy generate and refine predictions about sensory inputs. This hierarchical structure allows the brain to construct a coherent model of the environment by integrating predictions from higher levels (e.g., object recognition) with lower-level sensory data, minimizing prediction errors across scales. Visual phenomena like the rubber hand illusion and motion aftereffects illustrate how prediction error resolution shapes exteroceptive . In the rubber hand illusion, synchronous visual and tactile stimulation of a fake hand induces ownership feelings by generating prediction errors between expected and observed multisensory inputs; the brain resolves these errors by updating its internal model to incorporate the artificial limb as part of the body. Similarly, motion aftereffects occur when prolonged exposure to motion in one direction creates a strong prediction for continued movement; upon cessation, the opposing static input produces a large error signal, perceived as in the opposite direction until predictions adapt. These examples demonstrate predictive coding's role in integrating sensory cues hierarchically to maintain perceptual stability. Attention in predictive processing emerges as a mechanism for modulating the precision of prediction errors, whereby the brain selectively weights errors at certain hierarchical levels to prioritize salient or unexpected stimuli. This precision weighting enhances the influence of relevant sensory inputs on perceptual inference, directing cognitive resources toward resolving high-precision errors that signal potential novelty or threat. In auditory processing, predictive coding facilitates speech perception through anticipatory filling-in of phonemes, where the brain uses contextual priors to predict ambiguous or noisy sounds. For example, when a phoneme is obscured by noise, top-down predictions from higher-level language knowledge generate expected acoustic patterns, reducing error signals and enabling seamless comprehension without full bottom-up reconstruction.00134-0) This mechanism enhances robustness in noisy environments, as seen in studies where expected speech tokens elicit reduced neural responses compared to unexpected ones. Empirical support for these processes comes from fMRI studies revealing prediction error signals in early sensory areas. Summerfield et al. (2008) demonstrated that activity in humans reflects prediction errors during perceptual inference, with reduced responses to expected stimuli and heightened activity for mismatches, consistent with predictive coding's attenuation of fulfilled predictions. Such findings localize error signaling to primary and secondary sensory regions, underscoring the framework's neural plausibility. Predictive coding enhances sensory efficiency by reducing bandwidth demands through prediction of expected inputs, which explains the prevalence of sparse coding in sensory neurons. By transmitting only signals rather than , the system minimizes , allowing sparse neural representations where only a small fraction of neurons fire to convey rich information about the environment. This aligns with observations in visual and auditory cortices, where predictable stimuli evoke sparser activity, optimizing under neural constraints.

Interoception and Embodiment

In predictive coding frameworks, the brain generates top-down predictions about interoceptive signals—arising from internal bodily states such as visceral sensations, including heartbeat timing and intensity—to anticipate and maintain physiological homeostasis. These predictions minimize surprise by comparing expected interoceptive inputs against actual afferent signals, thereby enabling efficient regulation of bodily functions like cardiovascular and gastrointestinal activity. For instance, heartbeat-evoked potentials demonstrate how neural responses to cardiac signals are attenuated when aligned with predictions, reflecting active inference in interoceptive processing. A key extension of this process involves , where predictive coding supports proactive regulation of energy needs and internal milieu before disruptions occur, rather than merely reacting to homeostatic imbalances. As outlined by , this predictive approach to integrates to forecast and preempt bodily demands, such as metabolic adjustments, fostering adaptive self-regulation. Interoceptive prediction errors further contribute to emotional , where discrepancies between predicted and actual bodily states give rise to feelings of anxiety or surprise as signals of potential dysregulation. These errors also influence perceptions of pain and fatigue through interoceptive inference and prediction error minimization; for instance, in pain, mismatches in the neural-endocrine-immune ensemble's predictions about bodily integrity generate subjective pain experiences as the system seeks to restore homeostasis, while in fatigue, persistent interoceptive prediction errors lead to a loss of confidence in control predictions, manifesting as both exertional and pathological fatigue. These errors inform the brain's generative models of the embodied , shaping subjective emotional experiences through hierarchical updating. In predictive processing, emotions arise from hierarchical interoceptive inference, where higher-level predictions about bodily causes generate affective states, integrating exteroceptive and proprioceptive signals to form a unified sense of emotion. This process underscores the embodied nature of cognition, as the physical body serves as a primary source of priors and error signals in the brain's predictive models. The implications for embodied cognition highlight how predictive processing unifies perception, emotion, and action within an embodied framework, where the brain's inferences are grounded in sensorimotor interactions with the environment and body. This view posits that cognition is not disembodied computation but emerges from minimizing free energy through embodied predictions, influencing concepts of agency and selfhood. Empirical evidence highlights the insula cortex as a central hub for processing interoceptive prediction errors, integrating ascending signals from the viscera with descending predictions to support error-based learning of bodily states. Functional imaging studies show heightened insula activity during mismatches in interoceptive predictions, underscoring its role in embodiment and .

Meditation and Predictive Processing

Hypotheses in cognitive neuroscience suggest that meditation practices may modulate predictive processing by adjusting the precision of predictions and reducing the generation of excess or counterfactual predictions, potentially leading to decreased internal monologue, ego-centric thinking, and enhanced mental calmness. For instance, deconstructive meditation techniques are proposed to promote the pruning or dissolution of rigid predictive models, fostering greater plasticity in the brain's generative models and reducing mental chatter associated with unfulfilled predictions. This framework posits that mindfulness-based interventions facilitate the updating of priors and minimization of prediction errors, aligning with Bayesian principles of inference to support present-moment awareness and emotional regulation. These connections remain theoretical, with empirical support emerging from studies on long-term meditators showing altered neural responses consistent with predictive coding mechanisms.

Applications in Action and Decision-Making

Active Inference

Active inference extends the predictive coding framework from passive to active engagement with the environment, positing that agents select actions to minimize future prediction errors by sampling sensory data that aligns with their internal models. Under this formulation, updates beliefs to reduce surprise through error minimization, while action actively reshapes the sensory landscape to confirm or fulfill those beliefs, effectively treating behavior as an "imperative" form of inference. This approach unifies and action under the , where agents avoid surprises—defined as discrepancies between expected and observed states—by either updating their generative models or intervening in the world. Central to active inference is the minimization of expected free energy, a quantity that bounds the surprise anticipated under a given of actions. The expected free energy GG for a π\pi decomposes into an epistemic component, which resolves by gathering information, and pragmatic components, which minimize by achieving preferred outcomes. Formally, G(π)=EQ(oπ)[DKL[Q(μπ,o)Q(μπ)]]+pragmatic terms (e.g., expected [utility](/page/Utility) or [risk](/page/Risk)),G(\pi) = \mathbb{E}_{Q(o|\pi)} \left[ D_{\text{KL}} [ Q(\mu | \pi, o) || Q(\mu | \pi) ] \right] + \text{pragmatic terms (e.g., expected [utility](/page/Utility) or [risk](/page/Risk))}, where the KL divergence term captures the expected information gain from updating s about hidden states μ\mu given future observations oo, relative to the prior belief under the , and pragmatic terms encode costs or divergences from prior preferences. Policies are selected by choosing the π\pi that minimizes G(π)G(\pi), balancing to reduce epistemic with exploitation to fulfill generative priors on sensory states. This ensures agents act to make their predictions self-fulfilling, such as by moving toward expected rewarding locations. A representative example is saccadic eye movements in visual processing, where the agent generates predictions about retinal input based on spatial priors. To minimize expected free energy, the eyes execute rapid saccades toward regions of high predictive uncertainty or salience, effectively testing hypotheses about the visual scene and resolving ambiguities in the generative model. This active sampling reduces surprise by aligning incoming sensory data with anticipated patterns, demonstrating how active inference drives exploratory behavior to refine perceptual inferences. Friston introduced this imperative extension of predictive coding in 2010, framing active inference as the behavioral counterpart to perceptual error minimization within the free energy principle. Active inference has significant implications for learning, agency, and the sense of self within the predictive processing framework. Learning emerges from the iterative updating of generative models through prediction error minimization during active interaction with the environment, allowing agents to adapt their internal representations and acquire new knowledge over time. The sense of agency is generated by precise predictions of the sensory consequences of one's own actions, fostering a subjective experience of control and intentionality in behavior. Furthermore, active inference contributes to the construction of a coherent sense of self by integrating hierarchical predictions about bodily states, actions, and environmental interactions into embodied self-models, which underpin consciousness and self-awareness.

Motor Control

In motor control, predictive coding manifests through forward models that anticipate the sensory outcomes of actions, enabling the brain to generate movements and correct them based on prediction errors. Predictive processing affects movement outcomes by generating predictions about the sensory consequences of motor actions and minimizing errors to achieve desired results. These forward models rely on efference copies, which are internal replicas of motor commands sent to sensory areas to predict the consequences of self-initiated actions, thereby distinguishing self-generated sensory inputs from external stimuli. This mechanism allows for efficient processing by suppressing expected sensations, reducing the computational load during voluntary movements. The plays a central role in implementing predictive coding for motor adaptation via error-based learning, where it uses climbing fiber signals to convey prediction errors and refine internal models. For instance, in prism adaptation experiments, where visual feedback is shifted by prism goggles, the drives rapid recalibration of reaching movements by minimizing discrepancies between predicted and actual hand positions, as evidenced by impaired adaptation in patients with cerebellar lesions. This process supports fine-tuning of motor outputs through iterative updates to forward models, enhancing accuracy in tasks requiring precise coordination. Motor predictions in predictive coding operate hierarchically, with lower levels handling kinematic details like joint angles and muscle activations, while higher levels integrate goal-directed intentions and contextual plans. In the , this hierarchy is reflected in agranular architecture that prioritizes descending predictions over ascending error signals, allowing top-down intentions to guide action without constant low-level corrections. for these mechanisms includes the suppression of self-produced tactile sensations, such as reduced tickle responses during self-touch compared to external touch, mediated by discharges that align predictions with actual feedback.

Applications in Psychiatry and Disorders

Psychosis and Hallucinations

In predictive coding frameworks, disruptions in the balance between top-down predictions and bottom-up sensory evidence are implicated in the generation of psychotic symptoms, particularly through alterations in the precision assigned to prior beliefs versus prediction errors. Within the broader predictive processing framework, which unifies perception, cognition, and action as inference driven by prediction error minimization, these disruptions represent aberrant signaling that leads to maladaptive inferences. High precision on internal priors in can lead to persistent false predictions that are not adequately updated by sensory input, resulting in hallucinations experienced as veridical perceptions. This mechanism posits that overly rigid expectations override ambiguous or noisy sensory data, fostering experiences detached from external reality. A key aspect of this account involves elevated precision weighting of prior beliefs, where the fails to attenuate strong internal models in favor of new evidence, thereby sustaining hallucinatory content. In individuals with , this high prior precision manifests as un-updated predictions that dominate perception, explaining the persistence of hallucinations even in the absence of confirmatory stimuli. from behavioral and studies supports this, showing increased susceptibility to suggestion-induced hallucinations under conditions of sensory uncertainty. Predictive processing extends this by emphasizing how such errors in hierarchical inference contribute to the construction of a distorted reality, differing from traditional models by highlighting active Bayesian-like updating failures. The dopamine hypothesis of schizophrenia integrates with predictive coding by suggesting that excess dopaminergic activity enhances the salience or precision of prediction errors, promoting aberrant assignment of significance to neutral stimuli and contributing to delusional ideation. This aberrant salience arises when dopamine modulates the gain on unexpected signals, leading to false inferences about environmental relevance and reinforcing psychotic beliefs. Such dysregulation links neurochemical imbalances to the phenomenological experience of heightened motivational pull toward irrelevant cues. Antipsychotic medications have cell type–specific effects that modulate particular neuronal populations and synaptic interactions, linking circuit findings to pathophysiological mechanisms in psychosis. Bayesian models of delusions in reveal weakened sensory updating, where patients exhibit reduced flexibility in revising beliefs based on new evidence, favoring priors instead. These computational approaches highlight how imprecise error signaling perpetuates maladaptive across psychotic states. Regarding positive symptoms, predictive coding provides a specific account of auditory verbal hallucinations (AVHs), the most common hallucinatory experience in , affecting up to 70% of patients. In this view, AVHs emerge from deficient predictive suppression of self-generated speech signals, causing internally produced thoughts to be misattributed as external voices due to unmet predictions. Functional MRI studies demonstrate reduced deactivation in during self-speech in hallucinating patients, reflecting impaired forward modeling and heightened precision on erroneous external attributions. This failure in hierarchical loops treats endogenous activity as exogenous input, vividly simulating heard speech.

Neurodevelopmental Conditions

Predictive coding impairments in neurodevelopmental conditions, such as autism spectrum disorder (ASD) and attention-deficit/hyperactivity disorder (ADHD), are characterized by atypical processing of prediction errors and priors from early development, leading to altered sensory integration and attention. In ASD, individuals often exhibit reduced precision assigned to top-down priors, resulting in a greater reliance on bottom-up sensory details and a detail-focused perceptual style. This mechanism is thought to stem from inflexible adjustment of prediction error precision, where unexpected sensory inputs fail to update internal models effectively, contributing to sensory sensitivities and challenges in generalizing experiences. For instance, EEG studies in children with ASD show diminished P300 responses to unexpected auditory deviants and enhanced activation to expected stimuli, indicating disrupted hierarchical error signaling. In ADHD, predictive coding disruptions manifest as an over-reliance on novel sensory details, with reduced neural responses to expected events and heightened activation to surprises, which may underlie deficits and . This pattern suggests difficulties in modulating precision for anticipated inputs, leading to inefficient filtering of irrelevant stimuli and persistent of the environment. A 2024 neurodevelopmental perspective frames ADHD as involving divergences in predictive model formation and error minimization, particularly in sensory attenuation during action. Eye-tracking evidence further supports these impairments; in ASD, individuals show fewer anticipatory shifts to predicted locations in social and nonsocial routines, reflecting weakened predictive use of cues like eye direction or object trajectories. One study found that autistic participants were less likely to direct toward expected outcomes following learned visual associations, with prediction errors eliciting atypical scanning patterns. A 2021 systematic review of empirical evidence on prediction in ASD highlights domain-general differences, including reduced habituation to repeated stimuli and altered frontostriatal responses to errors, which may impair adaptive learning from infancy. A 2025 study on predictive coding and in developmental disorders proposes that early predictive impairments contribute to broader cognitive atypicalities in ASD and ADHD, with interventions targeting precision weighting showing promise for enhancing processing. These findings underscore lifelong developmental trajectories influenced by predictive coding, distinct from acquired disruptions in other psychiatric contexts.

Anxiety and Depression

The predictive processing framework has been applied to understand anxiety and depression through aberrations in prediction error signaling and precision weighting. In anxiety disorders, heightened precision on threat-related priors leads to over-reliance on negative predictions, amplifying perceived uncertainty and resulting in excessive worry and avoidance behaviors. This is evidenced by studies showing that anxiety modulates the gain on interoceptive prediction errors, causing misalignment between predicted and actual bodily signals, which perpetuates symptoms like panic. For major depression, predictive processing posits that negative biases in priors and reduced updating by positive evidence contribute to persistent low mood and anhedonia. Aberrant prediction error minimization favors pessimistic internal models, with empirical support from neuroimaging revealing altered precision weighting in reward-related circuits. This framework links depressive symptoms to failures in hierarchical inference, where top-down expectations suppress bottom-up sensory inputs, reinforcing a constricted sense of agency and self.

Trauma and Post-Traumatic Stress Disorder

In trauma-related disorders such as post-traumatic stress disorder (PTSD), predictive processing accounts for symptoms through disrupted error signaling following exposure to overwhelming events, leading to inflexible priors that resist updating. This results in hypervigilance, flashbacks, and re-experiencing as the brain fails to minimize prediction errors associated with traumatic memories, treating them as ongoing threats. The framework also explains comorbidities like psychosis in trauma survivors, where extreme precision on trauma priors overrides sensory evidence. Complex PTSD (C-PTSD) is particularly illuminated by this approach, highlighting how chronic trauma alters interoceptive predictions and embodied cognition. Empirical evidence from computational models supports interventions that target precision adjustment to facilitate recovery.

Applications in Artificial Intelligence

Predictive Models in Machine Learning

Predictive coding has inspired the development of architectures in that emphasize hierarchical prediction and error minimization for tasks. One foundational example is the predictive coding network proposed by Rao and Ballard in 1999, which models visual processing as a generative process where higher-level neurons predict the activity of lower-level ones, updating representations based on prediction errors to learn features like oriented edges in natural images. This approach enables efficient feature extraction by focusing computations on discrepancies between predictions and sensory inputs, rather than exhaustive bottom-up processing. In applications, predictive coding principles underpin variants of , where prediction errors serve as signals for tasks like and denoising. For instance, in denoising , the network learns to reconstruct clean inputs from noisy versions by minimizing reconstruction errors, analogous to resolving prediction mismatches in predictive coding; this has been shown to improve robustness in image restoration tasks. Similarly, for , high prediction errors from reconstructions flag outliers, as deviations from learned generative models indicate unusual data points, with applications in fraud detection and fault monitoring. These methods draw from predictive coding's error-driven learning, promoting sparse and efficient representations without . Predictive coding also connects to deep learning through variants of Boltzmann machines, which use energy-based formulations for probabilistic representation learning. The Helmholtz machine, an early hierarchical model, employs top-down generative passes akin to predictions and bottom-up inference to approximate posteriors, using layers to learn disentangled features in settings. Extensions like multi-prediction deep Boltzmann machines further integrate multiple predictive objectives to enhance generative capabilities and representation quality. Variational autoencoders, which optimize evidence lower bounds via prediction-like inference, similarly embody these ideas, linking predictive coding to modern deep generative models. A key advantage of these predictive coding-inspired energy-based models is their ability to reduce computational demands by prioritizing error signals over full forward passes, enabling scalable learning in high-dimensional spaces. For example, by suppressing predictable activity through top-down inhibition, minimize expenditure while maintaining accurate inferences, as demonstrated in recurrent architectures where predictive mechanisms emerge from efficiency constraints. This not only lowers training costs but also aligns with biological plausibility, fostering advancements in resource-efficient AI.

Recent Advances in Neural Networks

In the context of neural networks, predictive coding employs hierarchical generative models to perform approximate Bayesian inference, where errors arising from prediction mismatches drive local updates to refine representations. Energy-based formulations further position predictive coding as a foundational alternative to backpropagation, enabling local learning rules that enhance biological plausibility and efficiency in neuromorphic systems. While this approach holds high theoretical appeal for bridging neuroscience and artificial intelligence, its practical scaling remains limited by challenges in handling large-scale datasets and complex architectures. Recent developments in predictive coding have significantly advanced bio-inspired architectures, particularly through spiking and hierarchical models that enhance efficiency and biological plausibility in systems. A prominent example is Predictive Coding Light (PCL), introduced in 2025, which proposes a recurrent hierarchical designed for unsupervised representation learning. PCL employs excitatory feedforward connections alongside inhibitory recurrent and top-down pathways to suppress predictable , thereby minimizing energy consumption in neuromorphic hardware. Trained using spike timing-dependent plasticity (STDP) on event-based vision data, such as from dynamic vision sensors, PCL develops receptive fields resembling simple and complex cells in the , including orientation tuning and cross-orientation suppression. On the DVS128 Gesture dataset, PCL achieves 89.12% classification accuracy while substantially reducing spiking activity compared to baseline models without inhibition, demonstrating its potential for energy-efficient processing in edge AI applications. Building on this, a 2025 study explored predictive coding-inspired deep neural networks (DNNs) to replicate brain-like responses, positioning them as biologically plausible models of cortical . By incorporating predictive coding dynamics into recurrent DNN architectures, the model generates activity patterns that mimic neural responses observed in biological systems, such as error signaling and hierarchical prediction updates. This approach was evaluated on tasks involving sensory prediction, where the networks exhibited emergent properties like sparse representations and to inputs, aligning closely with electrophysiological from visual areas. The findings suggest that predictive coding principles can bridge the gap between artificial DNNs and neural realism, offering a framework for more interpretable AI systems that emulate computation. In the domain of , a 2025 model in Neural Computation leverages predictive coding to enable multi-level novelty detection within hierarchical networks. The recurrent predictive coding network (rPCN) and its hierarchical extension (hPCN) use local to minimize prediction errors, with error neurons naturally signaling novelty across abstraction levels—from low-level sensory features to high-level semantic concepts. Tested on datasets like MNIST and , the hPCN detects pixel-level anomalies (e.g., separability score of approximately 2) and object-level deviations (score near 0 at top layers), while the rPCN matches human capacity with 83% accuracy on 10,000 images. This unified framework integrates novelty detection with associative memory and representation learning, outperforming traditional autoencoders in robustness to correlated inputs and providing a biologically grounded method for scalable anomaly identification in AI. Finally, recent hybrid models have integrated predictive coding with architectures to improve efficient prediction in large-scale AI, emphasizing bio-plausibility and computational scalability. A 2025 survey highlights generalizations of predictive coding to non-Gaussian distributions, enabling its application in transformer-based systems for tasks like sequence modeling and . For instance, predictive-coding-based , as explored in Pinchetti et al. (2024), approximate standard transformer performance with comparable model complexity, achieving near-equivalent accuracy on benchmarks such as language modeling while incorporating local Hebbian-like updates for energy efficiency. These hybrids facilitate hierarchical prediction in massive datasets, reducing reliance on and promoting neuromorphic compatibility for real-world deployment.

Challenges and Empirical Status

Supporting Evidence and Experiments

Neuroimaging studies have provided substantial evidence for prediction error signals in predictive coding through electroencephalography (EEG) and magnetoencephalography (MEG). The mismatch negativity (MMN), an early auditory evoked potential peaking around 150-200 ms post-stimulus, is interpreted as a neural marker of prediction errors when deviant stimuli violate expected sensory patterns. In hierarchical predictive coding models, MMN reflects bottom-up error signals propagating from primary auditory cortex to higher-order regions, with sources localized to the superior temporal gyrus and frontal areas via dipole modeling. A 2020 review synthesized EEG and MEG data showing that omission of expected stimuli elicits MMN-like responses, supporting generative predictions over mere adaptation effects. More recent laminar recordings in non-human primates confirm that gamma-band activity carries prediction errors upward through cortical layers, while beta-band signals convey top-down predictions. Behavioral paradigms, such as and priming experiments, demonstrate how hierarchical predictions and action. In visual adaptation tasks, repeated exposure to stimuli leads to repetition suppression in fMRI signals, interpreted as fulfilled predictions reducing neural activity, with stronger suppression for expected versus unexpected repetitions. Priming experiments reveal hierarchical : local priming effects (e.g., faster responses to repeated low-level features) interact with global predictions, as shown in oddball paradigms where global rule violations elicit late P300 components only when is engaged. These findings support multi-level predictive coding, where lower-level predictions adapt sensory tuning, and higher-level ones modulate precision weighting for behavioral relevance. For instance, in auditory local-global paradigms, EEG markers distinguish local deviants (early MMN) from global ones (late positivity), evidencing layered error processing. Recent studies up to 2025 highlight developmental aspects of predictive coding in . A 2025 review in Developmental Cognitive Neuroscience examines EEG evidence from neonates to children, showing that preterm infants (31-32 weeks ) exhibit repetition suppression and differential omission responses to predictable versus jittered stimuli, indicating early precision-weighted predictions. In 6-month-olds, fNIRS and EEG reveal top-down predictions during visual omissions cued by auditory learning, with stronger cortical responses correlating to later and outcomes at 12-18 months. These paradigms, including unimodal oddballs, underscore how attentional modulation enhances signals from infancy, fostering . Cross-species evidence from and reinforces predictive coding in sensory tasks. Primate studies using laminar in demonstrate hierarchical error signaling: ascending gamma oscillations encode mismatches between predicted and actual inputs, while descending beta rhythms refine predictions across areas V1 to V4. Computational models trained on natural scenes replicate these dynamics, with V1 neurons showing orientation-selective error suppression matching empirical data. Such findings across species validate core predictive coding mechanisms in sensory .

Criticisms and Limitations

Critics of predictive coding theory argue that it overemphasizes free energy minimization as a core mechanism of function, potentially portraying the as an overly unified optimizer when diverse biological processes may be at play. In particular, the free-energy principle underlying predictive coding has been challenged for lacking conclusive evidence that the consistently optimizes free energy through variational , with empirical support remaining inconclusive and the principle possibly functioning more as a formal modeling tool than a fundamental imperative. This overemphasis risks obscuring the mechanistic details of neural operations and the historical contingencies shaping biological systems, advocating instead for explanatory pluralism that incorporates multiple theoretical perspectives. Empirical investigations into key components of predictive coding, such as precision weighting of prediction errors, reveal mixed support in human studies, highlighting significant gaps in validation. For instance, while precision weighting can account for certain "contra-vanilla" patterns where expected stimuli elicit larger neural responses, such as in tasks or attentional cueing paradigms, it fails to consistently predict reductions in neural latency under high-precision conditions, as observed in EEG and fMRI . These inconsistencies arise partly from the overlap in definitions of precision (encompassing , , and expectation), limiting the ability to disentangle effects, and from sparse evidence for associated frequency-domain changes or neuromodulatory links in human cortex. Overall, the theory's reliance on precision mechanisms lacks robust, direct intracranial evidence in humans, underscoring the need for more targeted and behavioral experiments. Predictive coding is often distinguished from the broader framework of predictive processing, with the former referring to a specific hierarchical neural implementation involving top-down predictions and bottom-up error signals, while the latter encompasses a wider range of strategies without committing to precise neural architectures. This distinction highlights a limitation of predictive coding: its mechanistic specificity may not fully capture the flexibility of predictive processing, potentially restricting its explanatory scope to perceptual and low-level cognitive tasks. Addressing these criticisms requires future research to emphasize causal interventions, such as optogenetic manipulations in animal models to test prediction error pathways, alongside advanced computational simulations that integrate predictive coding with biophysical constraints for more realistic benchmarking against empirical data. Such approaches would help resolve empirical ambiguities and clarify the theory's boundaries relative to alternatives.

References

Add your contribution
Related Hubs
User Avatar
No comments yet.