Respect all members: no insults, harassment, or hate speech.
Be tolerant of different viewpoints, cultures, and beliefs. If you do not agree with others, just create separate note, article or collection.
Clearly distinguish between personal opinion and fact.
Verify facts before posting, especially when writing about history, science, or statistics.
Promotional content must be published on the “Related Services and Products” page—no more than one paragraph per service. You can also create subpages under the “Related Services and Products” page and publish longer promotional text there.
Do not post materials that infringe on copyright without permission.
Always credit sources when sharing information, quotes, or media.
Be respectful of the work of others when making changes.
Discuss major edits instead of removing others' contributions without reason.
If you notice rule-breaking, notify community about it in talks.
Do not share personal data of others without their consent.
In behavioral psychology, reinforcement refers to consequences that increase the likelihood of an organism's future behavior, typically in the presence of a particular antecedent stimulus.[1] For example, a rat can be trained to push a lever to receive food whenever a light is turned on; in this example, the light is the antecedent stimulus, the lever pushing is the operant behavior, and the food is the reinforcer. Likewise, a student that receives attention and praise when answering a teacher's question will be more likely to answer future questions in class; the teacher's question is the antecedent, the student's response is the behavior, and the praise and attention are the reinforcements. Punishment is the inverse to reinforcement, referring to any behavior that decreases the likelihood that a response will occur. In operant conditioning terms, punishment does not need to involve any type of pain, fear, or physical actions; even a brief spoken expression of disapproval is a type of punishment.[2]
Consequences that lead to appetitive behavior such as subjective "wanting" and "liking" (desire and pleasure) function as rewards or positive reinforcement.[3] There is also negative reinforcement, which involves taking away an undesirable stimulus. An example of negative reinforcement would be taking an aspirin to relieve a headache.
Reinforcement is an important component of operant conditioning and behavior modification. The concept has been applied in a variety of practical areas, including parenting, coaching, therapy, self-help, education, and management.
addiction – a biopsychosocial disorder characterized by persistent use of drugs (including alcohol) despite substantial harm and adverse consequences
addictive drug – psychoactive substances that with repeated use are associated with significantly higher rates of substance use disorders, due in large part to the drug's effect on brain reward systems
dependence – an adaptive state associated with a withdrawal syndrome upon cessation of repeated exposure to a stimulus (e.g., drug intake)
drug sensitization or reverse tolerance – the escalating effect of a drug resulting from repeated administration at a given dose
drug withdrawal – symptoms that occur upon cessation of repeated drug use
psychological dependence – dependence that is characterised by emotional-motivational withdrawal symptoms (e.g., anhedonia and anxiety) that affect cognitive functioning.
reinforcing stimuli – stimuli that increase the probability of repeating behaviors paired with them
rewarding stimuli – stimuli that the brain interprets as intrinsically positive and desirable or as something to approach
sensitization – an amplified response to a stimulus resulting from repeated exposure to it
substance use disorder – a condition in which the use of substances leads to clinically and functionally significant impairment or distress
drug tolerance – the diminishing effect of a drug resulting from repeated administration at a given dose
In the behavioral sciences, the terms "positive" and "negative" refer when used in their strict technical sense to the nature of the action performed by the conditioner rather than to the responding operant's evaluation of that action and its consequence(s). "Positive" actions are those that add a factor, be it pleasant or unpleasant, to the environment, whereas "negative" actions are those that remove or withhold from the environment a factor of either type. In turn, the strict sense of "reinforcement" refers only to reward-based conditioning; the introduction of unpleasant factors and the removal or withholding of pleasant factors are instead referred to as "punishment", which when used in its strict sense thus stands in contradistinction to "reinforcement". Thus, "positive reinforcement" refers to the addition of a pleasant factor, "positive punishment" refers to the addition of an unpleasant factor, "negative reinforcement" refers to the removal or withholding of an unpleasant factor, and "negative punishment" refers to the removal or withholding of a pleasant factor.
This usage is at odds with some non-technical usages of the four term combinations, especially in the case of the term "negative reinforcement", which is often used to denote what technical parlance would describe as "positive punishment" in that the non-technical usage interprets "reinforcement" as subsuming both reward and punishment and "negative" as referring to the responding operant's evaluation of the factor being introduced. By contrast, technical parlance would use the term "negative reinforcement" to describe encouragement of a given behavior by creating a scenario in which an unpleasant factor is or will be present but engaging in the behavior results in either escaping from that factor or preventing its occurrence, as in Martin Seligman’s experiment involving dogs learning to avoid electric shocks.
B. F. Skinner was a well-known and influential researcher who articulated many of the theoretical constructs of reinforcement and behaviorism. Skinner defined reinforcers according to the change in response strength (response rate) rather than to more subjective criteria, such as what is pleasurable or valuable to someone. Accordingly, activities, foods or items considered pleasant or enjoyable may not necessarily be reinforcing (because they produce no increase in the response preceding them). Stimuli, settings, and activities only fit the definition of reinforcers if the behavior that immediately precedes the potential reinforcer increases in similar situations in the future; for example, a child who receives a cookie when he or she asks for one. If the frequency of "cookie-requesting behavior" increases, the cookie can be seen as reinforcing "cookie-requesting behavior". If however, "cookie-requesting behavior" does not increase the cookie cannot be considered reinforcing.
The sole criterion that determines if a stimulus is reinforcing is the change in probability of a behavior after administration of that potential reinforcer. Other theories may focus on additional factors such as whether the person expected a behavior to produce a given outcome, but in the behavioral theory, reinforcement is defined by an increased probability of a response.
Laboratory research on reinforcement is usually dated from the work of Edward Thorndike, known for his experiments with cats escaping from puzzle boxes.[7] A number of others continued this research, notably B.F. Skinner, who published his seminal work on the topic in The Behavior of Organisms, in 1938, and elaborated this research in many subsequent publications.[8] Notably Skinner argued that positive reinforcement is superior to punishment in shaping behavior.[9] Though punishment may seem just the opposite of reinforcement, Skinner claimed that they differ immensely, saying that positive reinforcement results in lasting behavioral modification (long-term) whereas punishment changes behavior only temporarily (short-term) and has many detrimental side-effects.
A great many researchers subsequently expanded our understanding of reinforcement and challenged some of Skinner's conclusions. For example, Azrin and Holz defined punishment as a “consequence of behavior that reduces the future probability of that behavior,”[10] and some studies have shown that positive reinforcement and punishment are equally effective in modifying behavior.[citation needed] Research on the effects of positive reinforcement, negative reinforcement and punishment continue today as those concepts are fundamental to learning theory and apply to many practical applications of that theory.
The term operant conditioning was introduced by Skinner to indicate that in his experimental paradigm, the organism is free to operate on the environment. In this paradigm, the experimenter cannot trigger the desirable response; the experimenter waits for the response to occur (to be emitted by the organism) and then a potential reinforcer is delivered. In the classical conditioning paradigm, the experimenter triggers (elicits) the desirable response by presenting a reflex eliciting stimulus, the unconditional stimulus (UCS), which they pair (precede) with a neutral stimulus, the conditional stimulus (CS).
Reinforcement is a basic term in operant conditioning. For the punishment aspect of operant conditioning, see punishment (psychology).
Positive reinforcement occurs when a desirable event or stimulus is presented as a consequence of a behavior and the chance that this behavior will manifest in similar environments increases.[11]: 253 For example, if reading a book is fun, then experiencing the fun positively reinforces the behavior of reading fun books. The person who receives the positive reinforcement (i.e., who has fun reading the book) will read more books to have more fun.
Negative reinforcement increases the rate of a behavior that avoids or escapes an aversive situation or stimulus.[11]: 252–253 That is, something unpleasant is already happening, and the behavior helps the person avoid or escape the unpleasantness. In contrast to positive reinforcement, which involves adding a pleasant stimulus, in negative reinforcement, the focus is on the removal of an unpleasant situation or stimulus. For example, if someone feels unhappy, then they might engage in a behavior (e.g., reading books) to escape from the aversive situation (e.g., their unhappy feelings).[11]: 253 The success of that avoidant or escapist behavior in removing the unpleasant situation or stimulus reinforces the behavior.
Doing something unpleasant to people to prevent or remove a behavior from happening again is punishment, not negative reinforcement.[11]: 252 The main difference is that reinforcement always increases the likelihood of a behavior (e.g., channel surfing while bored temporarily alleviated boredom; therefore, there will be more channel surfing while bored), whereas punishment decreases it (e.g., hangovers are an unpleasant stimulus, so people learn to avoid the behavior that led to that unpleasant stimulus).
Extinction occurs when a given behavior is ignored (i.e. followed up with no consequence). Behaviors disappear over time when they continuously receive no reinforcement. During a deliberate extinction, the targeted behavior spikes first (in an attempt to produce the expected, previously reinforced effects), and then declines over time. Neither reinforcement nor extinction need to be deliberate in order to have an effect on a subject's behavior. For example, if a child reads books because they are fun, then the parents' decision to ignore the book reading will not remove the positive reinforcement (i.e., fun) the child receives from reading books. However, if a child engages in a behavior to get attention from the parents, then the parents' decision to ignore the behavior will cause the behavior to go extinct, and the child will find a different behavior to get their parents' attention.
Reinforcers serve to increase behaviors whereas punishers serve to decrease behaviors; thus, positive reinforcers are stimuli that the subject will work to attain, and negative reinforcers are stimuli that the subject will work to be rid of or to end.[12] The table below illustrates the adding and subtracting of stimuli (pleasant or aversive) in relation to reinforcement vs. punishment.
Distinguishing between positive and negative reinforcement can be difficult and may not always be necessary. Focusing on what is being removed or added and how it affects behavior can be more helpful.
An event that punishes behavior for some may reinforce behavior for others.
Some reinforcement can include both positive and negative features, such as a drug addict taking drugs for the added euphoria (positive reinforcement) and also to eliminate withdrawal symptoms (negative reinforcement).
Reinforcement in the business world is essential in driving productivity. Employees are constantly motivated by the ability to receive a positive stimulus, such as a promotion or a bonus. Employees are also driven by negative reinforcement, such as by eliminating unpleasant tasks.
Though negative reinforcement has a positive effect in the short term for a workplace (i.e. encourages a financially beneficial action), over-reliance on a negative reinforcement hinders the ability of workers to act in a creative, engaged way creating growth in the long term.[13]
A primary reinforcer, sometimes called an unconditioned reinforcer, is a stimulus that does not require pairing with a different stimulus in order to function as a reinforcer and most likely has obtained this function through the evolution and its role in species' survival.[14][page needed] Examples of primary reinforcers include food, water, and sex. Some primary reinforcers, such as certain drugs, may mimic the effects of other primary reinforcers. While these primary reinforcers are fairly stable through life and across individuals, the reinforcing value of different primary reinforcers varies due to multiple factors (e.g., genetics, experience). Thus, one person may prefer one type of food while another avoids it. Or one person may eat much food while another eats very little. So even though food is a primary reinforcer for both individuals, the value of food as a reinforcer differs between them.
A secondary reinforcer, sometimes called a conditioned reinforcer, is a stimulus or situation that has acquired its function as a reinforcer after pairing with a stimulus that functions as a reinforcer. This stimulus may be a primary reinforcer or another conditioned reinforcer (such as money).
When trying to distinguish primary and secondary reinforcers in human examples, use the "caveman test." If the stimulus is something that a caveman would naturally find desirable (e.g. candy) then it is a primary reinforcer. If, on the other hand, the caveman would not react to it (e.g. a dollar bill), it is a secondary reinforcer. As with primary reinforcers, an organism can experience satisfaction and deprivation with secondary reinforcers.
A generalized reinforcer is a conditioned reinforcer that has obtained the reinforcing function by pairing with many other reinforcers and functions as a reinforcer under a wide-variety of motivating operations. (One example of this is money because it is paired with many other reinforcers).[15]: 83
In reinforcer sampling, a potentially reinforcing but unfamiliar stimulus is presented to an organism without regard to any prior behavior.
Socially-mediated reinforcement involves the delivery of reinforcement that requires the behavior of another organism. For example, another person is providing the reinforcement.
The Premack principle is a special case of reinforcement elaborated by David Premack, which states that a highly preferred activity can be used effectively as a reinforcer for a less-preferred activity.[15]: 123
Reinforcement hierarchy is a list of actions, rank-ordering the most desirable to least desirable consequences that may serve as a reinforcer. A reinforcement hierarchy can be used to determine the relative frequency and desirability of different activities, and is often employed when applying the Premack principle.[citation needed]
Contingent outcomes are more likely to reinforce behavior than non-contingent responses. Contingent outcomes are those directly linked to a causal behavior, such a light turning on being contingent on flipping a switch. Note that contingent outcomes are not necessary to demonstrate reinforcement, but perceived contingency may increase learning.
Contiguous stimuli are stimuli closely associated by time and space with specific behaviors. They reduce the amount of time needed to learn a behavior while increasing its resistance to extinction. [citation needed] Giving a dog a piece of food immediately after sitting is more contiguous with (and therefore more likely to reinforce) the behavior than a several minute delay in food delivery following the behavior.
Noncontingent reinforcement refers to response-independent delivery of stimuli identified as reinforcers for some behaviors of that organism. However, this typically entails time-based delivery of stimuli identified as maintaining aberrant behavior, which decreases the rate of the target behavior.[16] As no measured behavior is identified as being strengthened, there is controversy surrounding the use of the term noncontingent "reinforcement".[17]
In his 1967 paper, Arbitrary and Natural Reinforcement, Charles Ferster proposed classifying reinforcement into events that increase the frequency of an operant behavior as a natural consequence of the behavior itself, and events that affect frequency by their requirement of human mediation, such as in a token economy where subjects are rewarded for certain behavior by the therapist.
In 1970, Baer and Wolf developed the concept of "behavioral traps."[18] A behavioral trap requires only a simple response to enter the trap, yet once entered, the trap cannot be resisted in creating general behavior change. It is the use of a behavioral trap that increases a person's repertoire, by exposing them to the naturally occurring reinforcement of that behavior. Behavioral traps have four characteristics:
They are "baited" with desirable reinforcers that "lure" the student into the trap.
Only a low-effort response already in the repertoire is necessary to enter the trap.
Interrelated contingencies of reinforcement inside the trap motivate the person to acquire, extend, and maintain targeted skills.[19]
They can remain effective for long periods of time because the person shows few, if any, satiation effects.
Thus, artificial reinforcement can be used to build or develop generalizable skills, eventually transitioning to naturally occurring reinforcement to maintain or increase the behavior. Another example is a social situation that will generally result from a specific behavior once it has met a certain criterion.
Behavior is not always reinforced every time it is emitted, and the pattern of reinforcement strongly affects how fast an operant response is learned, what its rate is at any given time, and how long it continues when reinforcement ceases. The simplest rules controlling reinforcement are continuous reinforcement, where every response is reinforced, and extinction, where no response is reinforced. Between these extremes, more complex schedules of reinforcement specify the rules that determine how and when a response will be followed by a reinforcer.
Specific schedules of reinforcement reliably induce specific patterns of response, and these rules apply across many different species. The varying consistency and predictability of reinforcement is an important influence on how the different schedules operate. Many simple and complex schedules were investigated at great length by B.F. Skinner using pigeons.
A chart demonstrating the different response rate of the four simple schedules of reinforcement, each hatch mark designates a reinforcer being given
Ratio schedule – the reinforcement depends only on the number of responses the organism has performed.
Continuous reinforcement (CRF) – a schedule of reinforcement in which every occurrence of the instrumental response (desired response) is followed by the reinforcer.[15]: 86
Simple schedules have a single rule to determine when a single type of reinforcer is delivered for a specific response.
Fixed ratio (FR) – schedules deliver reinforcement after every nth response.[15]: 88 An FR 1 schedule is synonymous with a CRF schedule.
(ex. Every three times a rat presses a button, that rat receives a slice of cheese)
Variable ratio schedule (VR) – reinforced on average every nth response, but not always on the nth response.[15]: 88
(ex. Gamblers win 1 out every an 10 turns on a slot machine, however this is an average and they could hypothetically win on any given turn)
Fixed interval (FI) – reinforced after n amount of time.
(ex. Every 10 minutes, a rat receives a slice of cheese when it presses a button. Eventually, the rat will learn to ignore the button until each 10 minute interval has elapsed)
Variable interval (VI) – reinforced on an average of n amount of time, but not always exactly n amount of time.[15]: 89
(ie. A radio host gives away concert tickets approximately every hour, but the exact minutes may vary)
Fixed time (FT) – Provides a reinforcing stimulus at a fixed time since the last reinforcement delivery, regardless of whether the subject has responded or not. In other words, it is a non-contingent schedule.
Variable time (VT) – Provides reinforcement at an average variable time since last reinforcement, regardless of whether the subject has responded or not.
Simple schedules are utilized in many differential reinforcement[20] procedures:
Differential reinforcement of alternative behavior (DRA) - A conditioning procedure in which an undesired response is decreased by placing it on extinction or, less commonly, providing contingent punishment, while simultaneously providing reinforcement contingent on a desirable response. An example would be a teacher attending to a student only when they raise their hand, while ignoring the student when he or she calls out.
Differential reinforcement of other behavior (DRO) – Also known as omission training procedures, an instrumental conditioning procedure in which a positive reinforcer is periodically delivered only if the participant does something other than the target response. An example would be reinforcing any hand action other than nose picking.[15]: 338
Differential reinforcement of incompatible behavior (DRI) – Used to reduce a frequent behavior without punishing it by reinforcing an incompatible response. An example would be reinforcing clapping to reduce nose picking
Differential reinforcement of low response rate (DRL) – Used to encourage low rates of responding. It is like an interval schedule, except that premature responses reset the time required between behavior.
Differential reinforcement of high rate (DRH) – Used to increase high rates of responding. It is like an interval schedule, except that a minimum number of responses are required in the interval in order to receive reinforcement.
Fixed ratio: activity slows after reinforcer is delivered, then response rates increase until the next reinforcer delivery (post-reinforcement pause).
Variable ratio: rapid, steady rate of responding; most resistant to extinction.
Fixed interval: responding increases towards the end of the interval; poor resistance to extinction.
Variable interval: steady activity results, good resistance to extinction.
Ratio schedules produce higher rates of responding than interval schedules, when the rates of reinforcement are otherwise similar.
Variable schedules produce higher rates and greater resistance to extinction than most fixed schedules. This is also known as the Partial Reinforcement Extinction Effect (PREE).
The variable ratio schedule produces both the highest rate of responding and the greatest resistance to extinction (for example, the behavior of gamblers at slot machines).
Fixed schedules produce "post-reinforcement pauses" (PRP), where responses will briefly cease immediately following reinforcement, though the pause is a function of the upcoming response requirement rather than the prior reinforcement.[21]
The PRP of a fixed interval schedule is frequently followed by a "scallop-shaped" accelerating rate of response, while fixed ratio schedules produce a more "angular" response.
fixed interval scallop: the pattern of responding that develops with fixed interval reinforcement schedule, performance on a fixed interval reflects subject's accuracy in telling time.
Organisms whose schedules of reinforcement are "thinned" (that is, requiring more responses or a greater wait before reinforcement) may experience "ratio strain" if thinned too quickly. This produces behavior similar to that seen during extinction.
Ratio strain: the disruption of responding that occurs when a fixed ratio response requirement is increased too rapidly.
Ratio run: high and steady rate of responding that completes each ratio requirement. Usually higher ratio requirement causes longer post-reinforcement pauses to occur.
Partial reinforcement schedules are more resistant to extinction than continuous reinforcement schedules.
Ratio schedules are more resistant than interval schedules and variable schedules more resistant than fixed ones.
Momentary changes in reinforcement value lead to dynamic changes in behavior.[22]
Compound schedules combine two or more different simple schedules in some way using the same reinforcer for the same behavior. There are many possibilities; among those most often used are:
Alternative schedules – A type of compound schedule where two or more simple schedules are in effect and whichever schedule is completed first results in reinforcement.[23]
Conjunctive schedules – A complex schedule of reinforcement where two or more simple schedules are in effect independently of each other, and requirements on all of the simple schedules must be met for reinforcement.
Multiple schedules – Two or more schedules alternate over time, with a stimulus indicating which is in force. Reinforcement is delivered if the response requirement is met while a schedule is in effect.
Mixed schedules – Either of two, or more, schedules may occur with no stimulus indicating which is in force. Reinforcement is delivered if the response requirement is met while a schedule is in effect.
Administrating two reinforcement schedules at the same timeConcurrent schedules – A complex reinforcement procedure in which the participant can choose any one of two or more simple reinforcement schedules that are available simultaneously. Organisms are free to change back and forth between the response alternatives at any time.
Concurrent-chain schedule of reinforcement' – A complex reinforcement procedure in which the participant is permitted to choose during the first link which of several simple reinforcement schedules will be in effect in the second link. Once a choice has been made, the rejected alternatives become unavailable until the start of the next trial.
Interlocking schedules – A single schedule with two components where progress in one component affects progress in the other component. In an interlocking FR 60 FI 120-s schedule, for example, each response subtracts time from the interval component such that each response is "equal" to removing two seconds from the FI schedule.
Chained schedules – Reinforcement occurs after two or more successive schedules have been completed, with a stimulus indicating when one schedule has been completed and the next has started
Tandem schedules – Reinforcement occurs when two or more successive schedule requirements have been completed, with no stimulus indicating when a schedule has been completed and the next has started.
Higher-order schedules – completion of one schedule is reinforced according to a second schedule; e.g. in FR2 (FI10 secs), two successive fixed interval schedules require completion before a response is reinforced.
The psychology term superimposed schedules of reinforcement refers to a structure of rewards where two or more simple schedules of reinforcement operate simultaneously. Reinforcers can be positive, negative, or both. An example is a person who comes home after a long day at work. The behavior of opening the front door is rewarded by a big kiss on the lips by the person's spouse and a rip in the pants from the family dog jumping enthusiastically. Another example of superimposed schedules of reinforcement is a pigeon in an experimental cage pecking at a button. The pecks deliver a hopper of grain every 20th peck, and access to water after every 200 pecks.
Superimposed schedules of reinforcement are a type of compound schedule that evolved from the initial work on simple schedules of reinforcement by B.F. Skinner and his colleagues (Skinner and Ferster, 1957). They demonstrated that reinforcers could be delivered on schedules, and further that organisms behaved differently under different schedules. Rather than a reinforcer, such as food or water, being delivered every time as a consequence of some behavior, a reinforcer could be delivered after more than one instance of the behavior. For example, a pigeon may be required to peck a button switch ten times before food appears. This is a "ratio schedule". Also, a reinforcer could be delivered after an interval of time passed following a target behavior. An example is a rat that is given a food pellet immediately following the first response that occurs after two minutes has elapsed since the last lever press. This is called an "interval schedule".
In addition, ratio schedules can deliver reinforcement following fixed or variable number of behaviors by the individual organism. Likewise, interval schedules can deliver reinforcement following fixed or variable intervals of time following a single response by the organism. Individual behaviors tend to generate response rates that differ based upon how the reinforcement schedule is created. Much subsequent research in many labs examined the effects on behaviors of scheduling reinforcers.
If an organism is offered the opportunity to choose between or among two or more simple schedules of reinforcement at the same time, the reinforcement structure is called a "concurrent schedule of reinforcement". Brechner (1974, 1977) introduced the concept of superimposed schedules of reinforcement in an attempt to create a laboratory analogy of social traps, such as when humans overharvest their fisheries or tear down their rainforests. Brechner created a situation where simple reinforcement schedules were superimposed upon each other. In other words, a single response or group of responses by an organism led to multiple consequences. Concurrent schedules of reinforcement can be thought of as "or" schedules, and superimposed schedules of reinforcement can be thought of as "and" schedules. Brechner and Linder (1981) and Brechner (1987) expanded the concept to describe how superimposed schedules and the social trap analogy could be used to analyze the way energy flows through systems.
Superimposed schedules of reinforcement have many real-world applications in addition to generating social traps. Many different human individual and social situations can be created by superimposing simple reinforcement schedules. For example, a human being could have simultaneous tobacco and alcohol addictions. Even more complex situations can be created or simulated by superimposing two or more concurrent schedules. For example, a high school senior could have a choice between going to Stanford University or UCLA, and at the same time have the choice of going into the Army or the Air Force, and simultaneously the choice of taking a job with an internet company or a job with a software company. That is a reinforcement structure of three superimposed concurrent schedules of reinforcement.
Superimposed schedules of reinforcement can create the three classic conflict situations (approach–approach conflict, approach–avoidance conflict, and avoidance–avoidance conflict) described by Kurt Lewin (1935) and can operationalize other Lewinian situations analyzed by his force field analysis. Other examples of the use of superimposed schedules of reinforcement as an analytical tool are its application to the contingencies of rent control (Brechner, 2003) and problem of toxic waste dumping in the Los Angeles County storm drain system (Brechner, 2010).
In operant conditioning, concurrent schedules of reinforcement are schedules of reinforcement that are simultaneously available to an animal subject or human participant, so that the subject or participant can respond on either schedule. For example, in a two-alternative forced choice task, a pigeon in a Skinner box is faced with two pecking keys; pecking responses can be made on either, and food reinforcement might follow a peck on either. The schedules of reinforcement arranged for pecks on the two keys can be different. They may be independent, or they may be linked so that behavior on one key affects the likelihood of reinforcement on the other.
It is not necessary for responses on the two schedules to be physically distinct. In an alternate way of arranging concurrent schedules, introduced by Findley in 1958, both schedules are arranged on a single key or other response device, and the subject can respond on a second key to change between the schedules. In such a "Findley concurrent" procedure, a stimulus (e.g., the color of the main key) signals which schedule is in effect.
Concurrent schedules often induce rapid alternation between the keys. To prevent this, a "changeover delay" is commonly introduced: each schedule is inactivated for a brief period after the subject switches to it.
When both the concurrent schedules are variable intervals, a quantitative relationship known as the matching law is found between relative response rates in the two schedules and the relative reinforcement rates they deliver; this was first observed by R.J. Herrnstein in 1961. Matching law is a rule for instrumental behavior which states that the relative rate of responding on a particular response alternative equals the relative rate of reinforcement for that response (rate of behavior = rate of reinforcement). Animals and humans have a tendency to prefer choice in schedules.[24]
Shaping is the reinforcement of successive approximations to a desired instrumental response. In training a rat to press a lever, for example, simply turning toward the lever is reinforced at first. Then, only turning and stepping toward it is reinforced. Eventually the rat will be reinforced for pressing the lever. The successful attainment of one behavior starts the shaping process for the next. As training progresses, the response becomes progressively more like the desired behavior, with each subsequent behavior becoming a closer approximation of the final behavior.[25]
The intervention of shaping is used in many training situations, and also for individuals with autism as well as other developmental disabilities. When shaping is combined with other evidence-based practices such as Functional Communication Training (FCT),[26] it can yield positive outcomes for human behavior. Shaping typically uses continuous reinforcement, but the response can later be shifted to an intermittent reinforcement schedule.
Shaping is also used for food refusal.[27] Food refusal is when an individual has a partial or total aversion to food items. This can be as minimal as being a picky eater to so severe that it can affect an individual's health. Shaping has been used to have a high success rate for food acceptance.[28]
Chaining involves linking discrete behaviors together in a series, such that the consequence of each behavior is both the reinforcement for the previous behavior, and the antecedent stimulus for the next behavior. There are many ways to teach chaining, such as forward chaining (starting from the first behavior in the chain), backwards chaining (starting from the last behavior) and total task chaining (teaching each behavior in the chain simultaneously). People's morning routines are a typical chain, with a series of behaviors (e.g. showering, drying off, getting dressed) occurring in sequence as a well learned habit.
Challenging behaviors seen in individuals with autism and other related disabilities have successfully managed and maintained in studies using a scheduled of chained reinforcements.[29] Functional communication training is an intervention that often uses chained schedules of reinforcement to effectively promote the appropriate and desired functional communication response.[30]
This section needs expansion. You can help by adding to it. (February 2024)
There has been research on building a mathematical model of reinforcement. This model is known as MPR, which is short for mathematical principles of reinforcement. Peter Killeen has made key discoveries in the field with his research on pigeons.[31]
Reinforcement and punishment are ubiquitous in human social interactions, and a great many applications of operant principles have been suggested and implemented. Following are a few examples.
Positive and negative reinforcement play central roles in the development and maintenance of addiction and drug dependence. An addictive drug is intrinsically rewarding; that is, it functions as a primary positive reinforcer of drug use. The brain's reward system assigns it incentive salience (i.e., it is "wanted" or "desired"),[32][33][34] so as an addiction develops, deprivation of the drug leads to craving. In addition, stimuli associated with drug use – e.g., the sight of a syringe, and the location of use – become associated with the intense reinforcement induced by the drug.[32][33][34] These previously neutral stimuli acquire several properties: their appearance can induce craving, and they can become conditioned positive reinforcers of continued use.[32][33][34] Thus, if an addicted individual encounters one of these drug cues, a craving for the associated drug may reappear. For example, anti-drug agencies previously used posters with images of drug paraphernalia as an attempt to show the dangers of drug use. However, such posters are no longer used because of the effects of incentive salience in causing relapse upon sight of the stimuli illustrated in the posters.
Animal trainers and pet owners were applying the principles and practices of operant conditioning long before these ideas were named and studied, and animal training still provides one of the clearest and most convincing examples of operant control. Of the concepts and procedures described in this article, a few of the most salient are: availability of immediate reinforcement (e.g. the ever-present bag of dog yummies); contingency, assuring that reinforcement follows the desired behavior and not something else; the use of secondary reinforcement, as in sounding a clicker immediately after a desired response; shaping, as in gradually getting a dog to jump higher and higher; intermittent reinforcement, reducing the frequency of those yummies to induce persistent behavior without satiation; chaining, where a complex behavior is gradually put together.[35]
Providing positive reinforcement for appropriate child behaviors is a major focus of parent management training. Typically, parents learn to reward appropriate behavior through social rewards (such as praise, smiles, and hugs) as well as concrete rewards (such as stickers or points towards a larger reward as part of an incentive system created collaboratively with the child).[36] In addition, parents learn to select simple behaviors as an initial focus and reward each of the small steps that their child achieves towards reaching a larger goal (this concept is called "successive approximations").[36][37] They may also use indirect rewards such through progress charts. Providing positive reinforcement in the classroom can be beneficial to student success. When applying positive reinforcement to students, it's crucial to make it individualized to that student's needs. This way, the student understands why they are receiving the praise, they can accept it, and eventually learn to continue the action that was earned by positive reinforcement. For example, using rewards or extra recess time might apply to some students more, whereas others might accept the enforcement by receiving stickers or check marks indicating praise.
Both psychologists and economists have become interested in applying operant concepts and findings to the behavior of humans in the marketplace. An example
is the analysis of consumer demand, as indexed by the amount of a commodity that is purchased. In economics, the degree to which price influences consumption is called "the price elasticity of demand." Certain commodities are more elastic than others; for example, a change in price of certain foods may have a large effect on the amount bought, while gasoline and other essentials may be less affected by price changes. In terms of operant analysis, such effects may be interpreted in terms of motivations of consumers and the relative value of the commodities as reinforcers.[38]
As stated earlier in this article, a variable ratio schedule yields reinforcement after the emission of an unpredictable number of responses. This schedule typically generates rapid, persistent responding. Slot machines pay off on a variable ratio schedule, and they produce just this sort of persistent lever-pulling behavior in gamblers. Because the machines are programmed to pay out less money than they take in, the persistent slot-machine user invariably loses in the long run. Slots machines, and thus variable ratio reinforcement, have often been blamed as a factor underlying gambling addiction.[39]
The concept of praise as a means of behavioral reinforcement in humans is rooted in B.F. Skinner's model of operant conditioning. Through this lens, praise has been viewed as a means of positive reinforcement, wherein an observed behavior is made more likely to occur by contingently praising said behavior.[40] Hundreds of studies have demonstrated the effectiveness of praise in promoting positive behaviors, notably in the study of teacher and parent use of praise on child in promoting improved behavior and academic performance,[41][42] but also in the study of work performance.[43] Praise has also been demonstrated to reinforce positive behaviors in non-praised adjacent individuals (such as a classmate of the praise recipient) through vicarious reinforcement.[44] Praise may be more or less effective in changing behavior depending on its form, content and delivery. In order for praise to effect positive behavior change, it must be contingent on the positive behavior (i.e., only administered after the targeted behavior is enacted), must specify the particulars of the behavior that is to be reinforced, and must be delivered sincerely and credibly.[45]
Acknowledging the effect of praise as a positive reinforcement strategy, numerous behavioral and cognitive behavioral interventions have incorporated the use of praise in their protocols.[46][47] The strategic use of praise is recognized as an evidence-based practice in both classroom management[46] and parenting training interventions,[42] though praise is often subsumed in intervention research into a larger category of positive reinforcement, which includes strategies such as strategic attention and behavioral rewards.
Traumatic bonding occurs as the result of ongoing cycles of abuse in which the intermittent reinforcement of reward and punishment creates powerful emotional bonds that are resistant to change.[48][49]
The other source indicated that
[50]
'The necessary conditions for traumatic bonding are that one person must dominate the other and that the level of abuse chronically spikes and then subsides. The relationship is characterized by periods of permissive, compassionate, and even affectionate behavior from the dominant person, punctuated by intermittent episodes of intense abuse. To maintain the upper hand, the victimizer manipulates the behavior of the victim and limits the victim's options so as to perpetuate the power imbalance. Any threat to the balance of dominance and submission may be met with an escalating cycle of punishment ranging from seething intimidation to intensely violent outbursts. The victimizer also isolates the victim from other sources of support, which reduces the likelihood of detection and intervention, impairs the victim's ability to receive countervailing self-referent feedback, and strengthens the sense of unilateral dependency ... The traumatic effects of these abusive relationships may include the impairment of the victim's capacity for accurate self-appraisal, leading to a sense of personal inadequacy and a subordinate sense of dependence upon the dominating person. Victims also may encounter a variety of unpleasant social and legal consequences of their emotional and behavioral affiliation with someone who perpetrated aggressive acts, even if they themselves were the recipients of the aggression.
Most video games are designed around some type of compulsion loop, adding a type of positive reinforcement through a variable rate schedule to keep the player playing the game, though this can also lead to video game addiction.[51]
As part of a trend in the monetization of video games in the 2010s, some games offered "loot boxes" as rewards or purchasable by real-world funds that offered a random selection of in-game items, distributed by rarity. The practice has been tied to the same methods that slot machines and other gambling devices dole out rewards, as it follows a variable rate schedule. While the general perception that loot boxes are a form of gambling, the practice is only classified as such in a few countries as gambling and otherwise legal. However, methods to use those items as virtual currency for online gambling or trading for real-world money has created a skin gambling market that is under legal evaluation.[52]
The standard definition of behavioral reinforcement has been criticized as circular, since it appears to argue that response strength is increased by reinforcement, and defines reinforcement as something that increases response strength (i.e., response strength is increased by things that increase response strength). However, the correct usage[53] of reinforcement is that something is a reinforcer because of its effect on behavior, and not the other way around. It becomes circular if one says that a particular stimulus strengthens behavior because it is a reinforcer, and does not explain why a stimulus is producing that effect on the behavior. Other definitions have been proposed, such as F.D. Sheffield's "consummatory behavior contingent on a response", but these are not broadly used in psychology.[54]
Increasingly, understanding of the role reinforcers play is moving away from a "strengthening" effect to a "signalling" effect.[55] That is, the view that reinforcers increase responding because they signal the behaviors that are likely to result in reinforcement. While in most practical applications, the effect of any given reinforcer will be the same regardless of whether the reinforcer is signalling or strengthening, this approach helps to explain a number of behavioral phenomena including patterns of responding on intermittent reinforcement schedules (fixed interval scallops) and the differential outcomes effect.[56]
^ abLeaf, Justin B.; Cihon, Joseph H.; Leaf, Ronald; McEachin, John; Liu, Nicholas; Russell, Noah; Unumb, Lorri; Shapiro, Sydney; Khosrowshahi, Dara (June 2022). "Concerns About ABA-Based Intervention: An Evaluation and Recommendations". Journal of Autism and Developmental Disorders. 52 (6): 2838–2853. doi:10.1007/s10803-021-05137-y. ISSN1573-3432. PMC9114057. PMID34132968. Punishment, from a behavior analytic perspective, describes any context in which a response is followed by an event (i.e., stimulus change) that results in a decrease in the probability of similar responses in similar situations.... Absent from this definition are things like pain, fear, discomfort, and the like. Suppose a person parks their car taking up two spaces and a passerby comments, "That's inconsiderate." If the probability of taking up two spaces while parking subsequently decreases, we can reasonably presume that punishment occurred.
^Schultz W (July 2015). "Neuronal Reward and Decision Signals: From Theories to Data". Physiological Reviews. 95 (3): 853–951. doi:10.1152/physrev.00023.2014. PMC4491543. PMID26109341. Rewards in operant conditioning are positive reinforcers. ... Operant behavior gives a good definition for rewards. Anything that makes an individual come back for more is a positive reinforcer and therefore a reward. Although it provides a good definition, positive reinforcement is only one of several reward functions. ... Rewards are attractive. They are motivating and make us exert an effort. ... Rewards induce approach behavior, also called appetitive or preparatory behavior, and consummatory behavior. ... Thus any stimulus, object, event, activity, or situation that has the potential to make us approach and consume it is by definition a reward. ... Intrinsic rewards are activities that are pleasurable on their own and are undertaken for their own sake, without being the means for getting extrinsic rewards. ... Intrinsic rewards are genuine rewards in their own right, as they induce learning, approach, and pleasure, like perfectioning, playing, and enjoying the piano. Although they can serve to condition higher order rewards, they are not conditioned, higher order rewards, as attaining their reward properties does not require pairing with an unconditioned reward.
^Malenka RC, Nestler EJ, Hyman SE (2009). "Chapter 15: Reinforcement and Addictive Disorders". In Sydor A, Brown RY (eds.). Molecular Neuropharmacology: A Foundation for Clinical Neuroscience (2nd ed.). New York: McGraw-Hill Medical. pp. 364–375. ISBN9780071481274.
^Nestler EJ (December 2013). "Cellular basis of memory for addiction". Dialogues in Clinical Neuroscience. 15 (4): 431–443. PMC3898681. PMID24459410. Despite the importance of numerous psychosocial factors, at its core, drug addiction involves a biological process: the ability of repeated exposure to a drug of abuse to induce changes in a vulnerable brain that drive the compulsive seeking and taking of drugs, and loss of control over drug use, that define a state of addiction. ... A large body of literature has demonstrated that such ΔFosB induction in D1-type [nucleus accumbens] neurons increases an animal's sensitivity to drug as well as natural rewards and promotes drug self-administration, presumably through a process of positive reinforcement ... Another ΔFosB target is cFos: as ΔFosB accumulates with repeated drug exposure it represses c-Fos and contributes to the molecular switch whereby ΔFosB is selectively induced in the chronic drug-treated state.41. ... Moreover, there is increasing evidence that, despite a range of genetic risks for addiction across the population, exposure to sufficiently high doses of a drug for long periods of time can transform someone who has relatively lower genetic loading into an addict.
^Volkow ND, Koob GF, McLellan AT (January 2016). "Neurobiologic Advances from the Brain Disease Model of Addiction". New England Journal of Medicine. 374 (4): 363–371. doi:10.1056/NEJMra1511480. PMC6135257. PMID26816013. Substance-use disorder: A diagnostic term in the fifth edition of the Diagnostic and Statistical Manual of Mental Disorders (DSM-5) referring to recurrent use of alcohol or other drugs that causes clinically and functionally significant impairment, such as health problems, disability, and failure to meet major responsibilities at work, school, or home. Depending on the level of severity, this disorder is classified as mild, moderate, or severe. Addiction: A term used to indicate the most severe, chronic stage of substance-use disorder, in which there is a substantial loss of self-control, as indicated by compulsive drug taking despite the desire to stop taking the drug. In the DSM-5, the term addiction is synonymous with the classification of severe substance-use disorder.
^ abcdefgMiltenberger, R. G. "Behavioral Modification: Principles and Procedures". Thomson/Wadsworth, 2008.
^Tucker M, Sigafoos J, Bushell H (October 1998). "Use of noncontingent reinforcement in the treatment of challenging behavior. A review and clinical guide". Behavior Modification. 22 (4): 529–47. doi:10.1177/01454455980224005. PMID9755650. S2CID21542125.
^Droleskey RE, Andrews K, Chiarantini L, DeLoach JR (1992). "Use of fluorescent probes for describing the process of encapsulation by hypotonic dialysis". The Use of Resealed Erythrocytes as Carriers and Bioreactors. Advances in Experimental Medicine and Biology. Vol. 326. pp. 73–80. doi:10.1007/978-1-4615-3030-5_9. ISBN978-1-4613-6321-7. PMID1284187.
^Baer DM, Wolf MM. "The entry into natural communities of reinforcement". In Ulrich R, Stachnik T, Mabry J (eds.). Control of human behavior. Vol. 2. Glenview, IL: Scott Foresman. pp. 319–24.
^Vollmer TR, Iwata BA (1992). "Differential reinforcement as treatment for behavior disorders: procedural and functional variations". Research in Developmental Disabilities. 13 (4): 393–417. doi:10.1016/0891-4222(92)90013-v. PMID1509180.
^Tarbox and Lanagan Bermudez, Jonathan and Taira (2017). Treating Feeding Challenges in Autism. San Diego: Academic Press. pp. 1–6. ISBN978-0-12-813563-1.
^Turner, Virginia R; et al. (2020). "Response Shaping to Improve Food Acceptance for Children with Autism: Effects of Small and Large Food Sets". Research in Developmental Disabilities. 98 103574. doi:10.1016/j.ridd.2020.103574. PMID31982827. S2CID210922007.
^ abcdEdwards S (2016). "Reinforcement principles for addiction medicine; from recreational drug use to psychiatric disorder". Neuroscience for Addiction Medicine: From Prevention to Rehabilitation - Constructs and Drugs. Progress in Brain Research. Vol. 223. pp. 63–76. doi:10.1016/bs.pbr.2015.07.005. ISBN9780444635457. PMID26806771. Abused substances (ranging from alcohol to psychostimulants) are initially ingested at regular occasions according to their positive reinforcing properties. Importantly, repeated exposure to rewarding substances sets off a chain of secondary reinforcing events, whereby cues and contexts associated with drug use may themselves become reinforcing and thereby contribute to the continued use and possible abuse of the substance(s) of choice. ... An important dimension of reinforcement highly relevant to the addiction process (and particularly relapse) is secondary reinforcement (Stewart, 1992). Secondary reinforcers (in many cases also considered conditioned reinforcers) likely drive the majority of reinforcement processes in humans. In the specific case of drug [addiction], cues and contexts that are intimately and repeatedly associated with drug use will often themselves become reinforcing ... A fundamental piece of Robinson and Berridge's incentive-sensitization theory of addiction posits that the incentive value or attractive nature of such secondary reinforcement processes, in addition to the primary reinforcers themselves, may persist and even become sensitized over time in league with the development of drug addiction (Robinson and Berridge, 1993). ... Negative reinforcement is a special condition associated with a strengthening of behavioral responses that terminate some ongoing (presumably aversive) stimulus. In this case we can define a negative reinforcer as a motivational stimulus that strengthens such an "escape" response. Historically, in relation to drug addiction, this phenomenon has been consistently observed in humans whereby drugs of abuse are self-administered to quench a motivational need in the state of withdrawal (Wikler, 1952).
^ abcBerridge KC (April 2012). "From prediction error to incentive salience: mesolimbic computation of reward motivation". The European Journal of Neuroscience. 35 (7): 1124–43. doi:10.1111/j.1460-9568.2012.07990.x. PMC3325516. PMID22487042. When a Pavlovian CS+ is attributed with incentive salience it not only triggers 'wanting' for its UCS, but often the cue itself becomes highly attractive – even to an irrational degree. This cue attraction is another signature feature of incentive salience. The CS becomes hard not to look at (Wiers & Stacy, 2006; Hickey et al., 2010a; Piech et al., 2010; Anderson et al., 2011). The CS even takes on some incentive properties similar to its UCS. An attractive CS often elicits behavioral motivated approach, and sometimes an individual may even attempt to 'consume' the CS somewhat as its UCS (e.g., eat, drink, smoke, have sex with, take as drug). 'Wanting' of a CS can turn also turn the formerly neutral stimulus into an instrumental conditioned reinforcer, so that an individual will work to obtain the cue (however, there exist alternative psychological mechanisms for conditioned reinforcement too).
^ abcBerridge KC, Kringelbach ML (May 2015). "Pleasure systems in the brain". Neuron. 86 (3): 646–64. doi:10.1016/j.neuron.2015.02.018. PMC4425246. PMID25950633. An important goal in future for addiction neuroscience is to understand how intense motivation becomes narrowly focused on a particular target. Addiction has been suggested to be partly due to excessive incentive salience produced by sensitized or hyper-reactive dopamine systems that produce intense 'wanting' (Robinson and Berridge, 1993). But why one target becomes more 'wanted' than all others has not been fully explained. In addicts or agonist-stimulated patients, the repetition of dopamine-stimulation of incentive salience becomes attributed to particular individualized pursuits, such as taking the addictive drug or the particular compulsions. In Pavlovian reward situations, some cues for reward become more 'wanted' more than others as powerful motivational magnets, in ways that differ across individuals (Robinson et al., 2014b; Saunders and Robinson, 2013). ... However, hedonic effects might well change over time. As a drug was taken repeatedly, mesolimbic dopaminergic sensitization could consequently occur in susceptible individuals to amplify 'wanting' (Leyton and Vezina, 2013; Lodge and Grace, 2011; Wolf and Ferrario, 2010), even if opioid hedonic mechanisms underwent down-regulation due to continual drug stimulation, producing 'liking' tolerance. Incentive-sensitization would produce addiction, by selectively magnifying cue-triggered 'wanting' to take the drug again, and so powerfully cause motivation even if the drug became less pleasant (Robinson and Berridge, 1993).
^McGreevy PD, Boakes RA (2007). Carrots and sticks: principles of animal training. Cambridge: Cambridge University Press. ISBN978-0-521-68691-4.
^Domjan, M. (2009). The Principles of Learning and Behavior. Wadsworth Publishing Company. 6th Edition. pages 244–249.
^Lozano Bleda JH, Pérez Nieto MA (November 2012). "Impulsivity, intelligence, and discriminating reinforcement contingencies in a fixed-ratio 3 schedule". The Spanish Journal of Psychology. 15 (3): 922–9. doi:10.5209/rev_sjop.2012.v15.n3.39384. PMID23156902. S2CID144193503.
^Baker GL, Barnes HJ (1992). "Superior vena cava syndrome: etiology, diagnosis, and treatment". American Journal of Critical Care. 1 (1): 54–64. doi:10.4037/ajcc1992.1.1.54. PMID1307879.
^ abGarland AF, Hawley KM, Brookman-Frazee L, Hurlburt MS (May 2008). "Identifying common elements of evidence-based psychosocial treatments for children's disruptive behavior problems". Journal of the American Academy of Child and Adolescent Psychiatry. 47 (5): 505–14. doi:10.1097/CHI.0b013e31816765c2. PMID18356768.
^ abSimonsen B, Fairbanks S, Briesch A, Myers D, Sugai G (2008). "Evidence-based Practices in Classroom Management: Considerations for Research to Practice". Education and Treatment of Children. 31 (1): 351–380. doi:10.1353/etc.0.0007. S2CID145087451.
^Dutton; Painter (1981). "Traumatic Bonding: The development of emotional attachments in battered women and other relationships of intermittent abuse". Victimology (7).
^McCormack J, Arnold-Saritepe A, Elliffe D (June 2017). "The differential outcomes effect in children with autism". Behavioral Interventions. 32 (4): 357–369. doi:10.1002/bin.1489.
Brechner KC (1974). An experimental analysis of social traps (PhD thesis). Arizona State University.
Brechner KC (1977). "An experimental analysis of social traps". Journal of Experimental Social Psychology. 13 (6): 552–64. doi:10.1016/0022-1031(77)90054-3.
Brechner KC (1987). Social Traps, Individual Traps, and Theory in Social Psychology. Bulletin No. 870001. Pasadena, CA: Time River Laboratory.
Brechner KC, Linder DE (1981). "A social trap analysis of energy distribution systems". In Baum A, Singer JE (eds.). Advances in Environmental Psychology. Vol. 3. Hillsdale, NJ: Lawrence Erlbaum & Associates.
Chance P (2003). Learning and Behavior (5th ed.). Toronto: Thomson-Wadsworth.
Cowie S (2019). "Some weaknesses of a response-strength account of reinforcer effects". European Journal of Behavior Analysis. 21 (2): 1–16. doi:10.1080/15021149.2019.1685247. S2CID210503231.
Harter JK, Shmidt FL, Keyes CL (2002). "Well-Being in the Workplace and its Relationship to Business Outcomes: A Review of the Gallup Studies.". In Keyes CL, Haidt J (eds.). Flourishing: The Positive Person and the Good Life. Washington D.C.: American Psychological Association. pp. 205–224.
Reinforcement is a fundamental process in behavioral psychology in which an event or stimulus following a particular behavior strengthens or increases the likelihood of that behavior recurring in similar future situations.[1] This concept is central to operant conditioning, a learning theory that emphasizes how voluntary behaviors are shaped by their consequences rather than by associations between stimuli, as in classical conditioning.[2]The origins of reinforcement trace back to early 20th-century work by Edward Thorndike, who proposed the law of effect, stating that behaviors followed by satisfying consequences are more likely to be repeated, while those followed by discomfort are less likely.[3]B.F. Skinner later expanded this into a systematic framework in the 1930s and 1940s through his experiments with animals in controlled environments, such as the "Skinner box," where he demonstrated how reinforcement could precisely control behavior rates and patterns.[2] Skinner's approach shifted focus from internal mental states to observable environmental contingencies, establishing reinforcement as a key mechanism for understanding learning across species.Reinforcement operates through two primary types: positive reinforcement, which involves presenting a desirable stimulus (e.g., food or praise) immediately after a behavior to increase its frequency, and negative reinforcement, which involves removing an aversive stimulus (e.g., noise or pain) to achieve the same effect.[4] Both types strengthen behavior by altering its consequences, but they differ in whether a stimulus is added or subtracted; neither involves punishment, which decreases behavior.[4] Reinforcers can be primary (innate, like food satisfying hunger) or secondary (learned, like money gaining value through association), and their effectiveness depends on factors such as immediacy, intensity, and delivery schedules.[5]Beyond theory, reinforcement principles have wide applications in education, therapy, and animal training, informing techniques like token economies in classrooms and behavior modification programs for disorders such as autism.[6] Schedules of reinforcement—continuous (every response reinforced) or intermittent (partial reinforcement)—further influence behavior persistence, with intermittent schedules often producing more resistant habits, as seen in gambling.[2] These applications underscore reinforcement's role in shaping everyday human and animal conduct while raising ethical considerations about manipulation and autonomy.[6]
Fundamentals
Definition and Core Concepts
Reinforcement is defined as any consequence of a behavior that increases the probability of that behavior recurring in the future, serving as a fundamental process in associative learning where environmental outcomes shape behavioral patterns.[7] This concept emphasizes the role of consequences in modifying behavior, distinguishing it from antecedent stimuli that elicit responses in other forms of learning.[8]At its core, reinforcement operates through the association between a voluntary behavior and its subsequent outcome, leading to behavior modification that strengthens adaptive responses over time.[2] Unlike classical conditioning, which pairs neutral stimuli with innate reflexes to produce involuntary responses—such as salivation triggered by a bell—reinforcement focuses on consequences following self-initiated actions, thereby increasing the frequency of those actions.[8] For instance, in controlled laboratory settings, providing a food reward immediately after an animal presses a lever results in higher rates of lever-pressing behavior, illustrating how reinforcement directly boosts response likelihood without relying on prior stimulus pairing.[9]From an evolutionary perspective, reinforcement functions as an adaptive mechanism that promotes survival by reinforcing behaviors essential for resource acquisition and threat avoidance across species.[10] In phylogenetic terms, this is evident in foraging behaviors observed in diverse animals, where successful food-seeking actions are strengthened by nutritional rewards, enhancing fitness in variable environments.[11] Such processes underscore reinforcement's role in enabling organisms to learn and adapt within their lifetimes, complementing slower genetic evolution.[12]
Terminology
In operant conditioning, a reinforcer is any stimulus or event that follows a specific behavior and increases the likelihood of that behavior occurring again in the future.[9] This functional definition, originating from B.F. Skinner's foundational work, emphasizes the consequence's effect on behavior rather than its inherent qualities.[1] The response, also termed the operant, refers to the voluntary behavior that precedes and produces the reinforcer, distinguishing it from reflexive actions in classical conditioning.[13] A reinforcement schedule describes the specific pattern or timing by which reinforcers are delivered contingent on responses.[9]Key distinctions clarify common terminological confusions. A reinforcer differs from a reward, as the former is defined objectively by its behavioral impact—increasing response probability—while the latter often implies a subjectively pleasing or valued outcome, which may or may not function as a reinforcer depending on the context.[14] In negative reinforcement scenarios, escape involves terminating an already-present aversive stimulus through a response, whereas avoidance prevents the aversive stimulus from occurring in the first place, both serving to strengthen the response.[15] The discriminative stimulus (often denoted as S^D) is an environmental cue that signals the availability of reinforcement for a given response, setting the occasion for the behavior without eliciting it directly.[16]Misconceptions frequently arise regarding reinforcement's nature. Reinforcement does not inherently imply positivity or pleasure; it solely denotes any process that elevates behavior frequency, encompassing both the addition of desirable stimuli (positive reinforcement) and the subtraction of undesirable ones (negative reinforcement).[17] For instance, buckling a seatbelt to silence a car's alarm exemplifies negative reinforcement by removing an aversive sound, thereby increasing the buckling response.[1] Another error is equating negative reinforcement with punishment, but the former boosts behavior while the latter suppresses it.[15]
Historical Development
Early Influences
The roots of reinforcement theory can be traced to ancient philosophical ideas on associationism, which posited that mental processes arise from the linking of ideas through experience. Aristotle, in his work De Memoria et Reminiscentia (circa 350 BCE), outlined three fundamental laws of association—similarity, contrast, and contiguity—suggesting that recollections are triggered by related ideas encountered in sequence or resemblance, laying early groundwork for understanding how experiences shape behavior.[18] This associationist framework emphasized experiential connections over innate knowledge, influencing later empiricist philosophers who viewed the mind as a blank slate molded by sensory input.John Locke further advanced these ideas in his Essay Concerning Human Understanding (1690), arguing that all knowledge derives from experience rather than pre-existing ideas, with simple ideas combining into complex ones through association. Locke's empiricism rejected innate principles, proposing instead that repeated associations between sensations and ideas form the basis of learning, a concept that prefigured reinforcement by highlighting how pleasurable or repeated experiences strengthen mental bonds.[19] These philosophical precursors shifted focus from rationalism to observable experiential learning, setting the stage for scientific investigations into behavior modification.In the late 19th century, Edward Thorndike formalized these notions through empirical animal studies, introducing the Law of Effect in his 1898 dissertation Animal Intelligence. The law stated that behaviors followed by satisfying consequences are more likely to be repeated, while those followed by discomfort are less likely, as connections between stimuli and responses are strengthened or weakened accordingly.[20]Thorndike demonstrated this via puzzle box experiments with cats, where animals learned to escape enclosures through trial-and-error, gradually reducing errors over trials as successful actions—such as pulling a loop to open the door—were reinforced by freedom and food rewards.Thorndike's work bridged philosophy and experimental psychology, influencing the emergence of behaviorism by prioritizing measurable behaviors over internal mental states. John B. Watson, in his 1913 manifesto "Psychology as the Behaviorist Views It," explicitly drew on Thorndike's emphasis on observable connections, rejecting introspection and advocating for psychology as the science of behavior shaped by environmental contingencies. This transition solidified reinforcement principles as central to understanding learning through external consequences, paving the way for later developments like B.F. Skinner's operant conditioning.[21]
Key Experiments and Theorists
Burrhus Frederic Skinner, a pivotal figure in behaviorist psychology, developed the operant conditioning chamber—commonly known as the Skinner box—in the 1930s as a controlled laboratory apparatus to systematically study how environmental consequences shape voluntary behaviors in animals, such as rats pressing a lever to obtain food pellets.[22] This device isolated the subject from external distractions and allowed precise measurement of response rates, enabling Skinner to demonstrate that behaviors increase in frequency when followed by reinforcers and decrease when followed by punishers.[7] In his foundational 1938 book The Behavior of Organisms: An Experimental Analysis, Skinner formalized operant conditioning as a distinct mechanism from classical conditioning, arguing that reinforcement strengthens stimulus-response connections through repeated consequences rather than reflexive associations.[23]A landmark experiment by Skinner, detailed in his 1948 paper "'Superstition' in the Pigeon," illustrated the concept of adventitious reinforcement, where unintended correlations between behavior and reward lead to superstitious responses.[24] In the study, hungry pigeons confined to a chamber received food at fixed intervals regardless of their actions; over time, they exhibited idiosyncratic behaviors—such as circling, head-bobbing, or wing-flapping—that coincidentally occurred just before food delivery, which the birds then repeated ritualistically, mimicking human superstitions and highlighting how random reinforcement can sustain maladaptive habits.[25] This work underscored the power of timing in reinforcement schedules, as the pigeons' responses persisted even after the reinforcement contingency was removed.Clark Hull, another influential theorist, advanced reinforcement principles through his drive-reduction theory outlined in the 1943 book Principles of Behavior: An Introduction to Behavior Theory, which posited that reinforcement primarily functions by reducing biological drives, such as hunger or thirst, thereby motivating learning and habit formation.[26] Hull integrated this with Pavlovian conditioning by framing drives as internal stimuli that amplify the effectiveness of external cues, suggesting that reinforced behaviors satisfy innate needs and create habit strengths proportional to the drive's intensity and reinforcement frequency.[27] His mathematical approach to habit formation influenced subsequent models, though it emphasized physiological underpinnings more than Skinner's environmental focus.[26]Following World War II, reinforcement theory saw significant expansion into applied domains, including education, clinical therapy, and organizational behavior management, where techniques like token economies and programmed instruction drew directly from Skinner's and Hull's experimental foundations to modify human conduct in real-world settings.[28]
Mechanisms in Learning
Operant Conditioning Basics
Operant conditioning, developed by B.F. Skinner, is a learning process in which voluntary behaviors are modified through their consequences, such as rewards or punishments that increase or decrease the likelihood of the behavior recurring.[9] Unlike reflexive responses, operant behaviors are emitted by the organism without a specific eliciting stimulus, allowing for the shaping of new actions through environmental feedback. Skinner introduced this paradigm in his 1938 book The Behavior of Organisms, emphasizing that behavior operates on the environment to produce outcomes that, in turn, influence future actions.[29]At the core of operant conditioning is the three-term contingency, also known as the ABC model, which describes the relationship between an antecedent (a stimulus that sets the occasion for behavior), the behavior itself, and the consequence that follows.[1] This framework posits that antecedents signal opportunities for behavior, while consequences determine whether the behavior strengthens or weakens over time.[9] Positive and negative reinforcement serve as key consequence types within this model, increasing behavior probability by adding or removing stimuli, respectively.[13]The process begins with the acquisition phase, during which a novel behavior is established through initial reinforcement, gradually increasing its frequency as the organism associates the action with favorable outcomes.[13] Once acquired, maintenance occurs via continued reinforcement delivery, sustaining the behavior's strength even as environmental demands vary.[1] Skinner's experiments with animals in controlled chambers demonstrated how consistent consequences could reliably produce these phases, forming the basis for applied behavior analysis.In contrast to classical conditioning, which pairs stimuli to elicit involuntary responses without unconditioned stimuli beyond initial reflexes, operant conditioning targets self-initiated behaviors shaped proactively by consequences rather than passive associations.[9] This distinction highlights operant's focus on purposeful actions in complex environments, as Skinner argued that classical methods alone could not explain the full range of learned behaviors.[1]
Positive and Negative Reinforcement
Positive reinforcement involves the presentation of a desirable stimulus following a behavior, which increases the likelihood of that behavior recurring. In B.F. Skinner's foundational experiments, a hungry rat placed in an operant conditioning chamber, known as a Skinner box, would eventually press a lever, resulting in the delivery of a food pellet; over repeated trials, the rate of lever pressing significantly increased as the food acted as the reinforcing stimulus.[7] This process strengthens the association between the behavior and its consequence, enhancing behavioral frequency in future similar situations.[4]Negative reinforcement, in contrast, entails the removal or termination of an aversive stimulus after a behavior occurs, similarly increasing the probability of that behavior. For instance, in Skinner's setup, a rat subjected to an electric shock on the chamber floor would learn to press the lever to discontinue the shock, leading to a higher rate of lever pressing over time to avoid the discomfort.[7] Although both types of reinforcement bolster behavior through contingency, negative reinforcement is frequently misconstrued as punishment because it involves unpleasant stimuli; however, unlike punishment, it augments rather than suppresses the targeted response.[4]Empirical studies with rats demonstrate that positive and negative reinforcement produce comparable strengthening effects on behavior. Skinner's analyses showed that the rate of responding under positive reinforcement (e.g., food delivery) and negative reinforcement (e.g., shock termination) followed similar cumulative response curves, indicating equipotent influences on behavioral acquisition and maintenance.[7] In practical applications, positive reinforcement underpins token economies, where individuals earn symbolic tokens (exchangeable for rewards) for desired behaviors, as pioneered by Ayllon and Azrin in therapeutic settings to boost patient engagement and compliance.[30] Negative reinforcement features prominently in escape and avoidance learning paradigms, where rats in shuttle boxes learn to cross to the safe side to evade or end foot shocks, yielding robust response rates akin to those from appetitive reinforcers.[31]
Extinction and Reinforcement Distinctions
Extinction refers to the gradual weakening and eventual cessation of a previously reinforced behavior in operant conditioning when the reinforcing stimulus is no longer provided following the response.[1] This process occurs as the organism discerns that the behavior no longer yields the expected outcome, leading to a decline in its frequency over repeated trials without reinforcement.[32] Early in extinction, an "extinction burst" often emerges, characterized by a temporary surge in the behavior's intensity, duration, or rate, as the subject intensifies efforts to reinstate the reinforcement.[33] Experimental evidence from animal and human studies confirms this burst, demonstrating its occurrence across various response types when transitioning from reinforcement to non-reinforcement conditions.[34]If extinction persists without reintroduction of reinforcement, the behavior diminishes substantially, though it may exhibit spontaneous recovery—the sudden reemergence of the response after a period of rest or non-exposure to the context.[32] This recovery typically manifests at a reduced level compared to the original acquisition phase and further weakens with additional extinction trials.[35]Spontaneous recovery highlights the temporary nature of extinction rather than permanent erasure of the learned association, a finding rooted in foundational operant experiments.[36] Additionally, the degree of resistance to extinction—the persistence of the behavior during withholding—varies based on prior reinforcement history, with some schedules fostering greater durability.[1]In contrast to reinforcement, which strengthens behavior, punishment aims to suppress it by associating the response with undesirable consequences.[1] Positive punishment introduces an aversive stimulus, such as an electric shock or reprimand, immediately after the behavior to decrease its occurrence, while negative punishment removes a positive stimulus, like privileges or attention, achieving a similar suppressive effect.[37] These differ from positive reinforcement (adding a desirable stimulus) and negative reinforcement (removing an aversive one), as punishment targets reduction rather than enhancement of the behavior.[1]Punishment and reinforcement also diverge in long-term outcomes and ethical implications: reinforcement promotes stable, voluntary behavior changes with fewer side effects, whereas punishment often yields only transient suppression, potentially eliciting fear, resentment, or compensatory avoidance behaviors.[37] Studies show punishment can increase aggression or emotional distress in subjects, undermining its efficacy over time compared to reinforcement strategies.[38] Ethically, punishment raises concerns about inflicting harm or coercion, particularly in human applications like education or therapy, where it may violate principles of autonomy and well-being; thus, experts advocate prioritizing reinforcement to foster positive, lasting modifications.[39]
Reinforcement Schedules
Continuous and Intermittent Schedules
In operant conditioning, continuous reinforcement (CRF) involves delivering a reinforcer immediately after every instance of the target behavior, resulting in the most rapid acquisition of new behaviors.[40] This schedule is particularly effective during the initial stages of learning, as the consistent pairing of response and reward strengthens the association quickly, often leading to high response rates in experimental settings with animals, such as pigeons pecking a key in a Skinner box.[41] However, behaviors established under CRF exhibit low resistance to extinction; once reinforcement ceases, the response rate drops sharply, sometimes within minutes, due to the learner's expectation of immediate reward.[2]Intermittent reinforcement, by contrast, provides a reinforcer only after some, but not all, occurrences of the behavior, fostering greater persistence and resistance to extinction compared to CRF.[41] This schedule mimics real-world contingencies where rewards are unpredictable, leading to sustained responding even during periods without reinforcement, as the learner continues in anticipation of eventual reward. Intermittent schedules are categorized into ratio-based (dependent on the number of responses, such as fixed-ratio or variable-ratio) and interval-based (dependent on time elapsed since the last reinforcement, such as fixed-interval or variable-interval), each producing distinct behavioral patterns but sharing the advantage of durability over continuous methods.[41]A common application involves starting with CRF to establish a behavior efficiently, then transitioning to intermittent reinforcement for maintenance, as seen in animal training where initial food rewards for every correct action give way to rewards on a partial basis to build long-term reliability.[2] This shift enhances behavioral stability, reducing the risk of rapid decline if rewards become unavailable, and has been foundational in Skinner's experimental analyses of operant behavior.[41]
Ratio and Interval Schedules
In operant conditioning, ratio schedules of reinforcement deliver a reinforcer based on the number of responses emitted by the organism, independent of the time taken to produce those responses. Fixed-ratio (FR) schedules provide reinforcement after a predetermined number of responses, such as every fifth response in an FR-5 schedule, leading to a pattern where responding pauses briefly after reinforcement before resuming at a high rate to meet the next quota.[41] Variable-ratio (VR) schedules, in contrast, reinforce after an unpredictable number of responses that averages around a specified value, such as a VR-5 schedule where the actual requirement might vary between 1 and 9 responses; this unpredictability results in consistently high and steady response rates, as seen in behaviors like gambling on slot machines.[41][2]Interval schedules, meanwhile, base reinforcement on the passage of time rather than the sheer number of responses, with the reinforcer delivered contingent on at least one response occurring after the time interval elapses. In fixed-interval (FI) schedules, reinforcement follows the first response after a constant time period, such as every 30 seconds in an FI-30s schedule, which typically produces a scalloped pattern of responding: a pause immediately after reinforcement, followed by an accelerating rate as the interval nears completion.[41] Variable-interval (VI) schedules reinforce the first response after an average time interval that varies across trials, like a VI-30s schedule with intervals ranging from 10 to 50 seconds; this generates moderate but steady response rates without pronounced pauses, as the unpredictability discourages timing-based delays.[41][2]Overall, ratio schedules generally elicit higher and more persistent response rates compared to interval schedules due to their direct tie to output quantity, while interval schedules introduce temporal constraints that shape more variable temporal patterns in behavior.[41]
Effects on Behavior Persistence
Reinforcement schedules significantly influence the persistence of learned behaviors, particularly their resistance to extinction—the process where responding diminishes after reinforcement cessation. The partial reinforcement extinction effect (PREE) demonstrates that behaviors acquired under intermittent reinforcement schedules exhibit greater persistence than those under continuous reinforcement, as organisms continue responding longer in anticipation of unpredictable rewards. This effect was first systematically observed in studies involving conditioned responses, where partial reinforcement during acquisition led to slower extinction rates compared to continuous schedules.Among intermittent schedules, variable-ratio (VR) schedules produce the highest resistance to extinction, fostering highly persistent behaviors due to the unpredictability of reinforcement, which mimics gambling-like persistence in real-world scenarios such as slot machine play. In contrast, fixed-interval (FI) schedules yield the lowest persistence, as behaviors weaken more rapidly during extinction because the temporal predictability allows quicker adaptation to non-reinforcement.[41] Variable-interval (VI) and fixed-ratio (FR) schedules fall between these extremes, with VI showing moderate persistence similar to everyday habits like checking email. These differences arise from how schedules shape expectation and response patterns during acquisition, directly impacting long-term behavioral stability.[41]Compound schedules, which integrate multiple basic schedules, further modulate behavior persistence by creating more complex contingencies that can either enhance or complicate extinction resistance. Conjunctive schedules require the simultaneous or combined fulfillment of multiple criteria for reinforcement, such as completing both an FR 10 and an FI 5-minute requirement before delivery; this setup often increases persistence by demanding sustained high-rate responding across integrated demands, making extinction more challenging as the organism must abandon multiple embedded expectancies.[41] For instance, in animal studies, conjunctive schedules have been shown to prolong post-reinforcement pausing less than pure interval schedules while boosting overall resistance to extinction through the ratio component's influence.Tandem schedules involve the sequential execution of multiple schedules without discriminative stimuli to signal transitions, requiring the organism to complete one (e.g., FR) before accessing the next (e.g., VI) for reinforcement; this promotes persistent, chained responding as internal tracking of progress sustains motivation, often resulting in greater extinction resistance than simple sequential schedules due to the lack of cues that might signal completion.[41] Behaviors under tandem schedules persist longer in extinction because the absence of transition signals prevents abrupt shifts in response patterns, encouraging continued effort across phases.[41]Superimposed schedules apply multiple reinforcement contingencies to the same response class simultaneously, such as a progressive-ratio requirement overlaid on a basic interval schedule; this complexity can amplify persistence by escalating response demands, leading to behaviors that resist extinction more robustly as the organism adapts to layered unpredictability, though it may also increase variability in long-term stability.[41] In experimental analyses, superimposed schedules have demonstrated enhanced resistance in steady-state responding, particularly when the primary schedule is variable.[41]Concurrent schedules present two or more independent reinforcement schedules simultaneously, each associated with a different response or alternative, allowing choice behavior where persistence is determined by relative reinforcement rates and immediacy. Organisms allocate responses proportionally to the richer schedule (matching law), resulting in persistent preference for high-yield options even during extinction, as partial reinforcement in the chosen alternative sustains overall behavioral output longer than in isolated schedules.[41] For example, pigeons in concurrent VI setups continue key-pecking the more reinforcing side disproportionately, showing compounded extinction resistance tied to comparative value.
Advanced Techniques
Shaping and Chaining
Shaping is a technique in operant conditioning that involves the differential reinforcement of successive approximations to a desired target behavior, gradually guiding the organism toward the final response when the behavior does not occur spontaneously.[43] This method, developed by B. F. Skinner, allows for the establishment of novel behaviors by reinforcing behaviors that become increasingly similar to the target, starting from any initial response in the organism's repertoire.[43] For example, in one classic demonstration, Skinner trained a pigeon to peck a disk by first reinforcing any head movement toward the disk, then only movements closer to it, and progressively requiring pecking actions until the target behavior was achieved.[43] Continuous reinforcement schedules are often used initially during shaping to ensure rapid acquisition, transitioning to intermittent schedules as the behavior strengthens.[41]Chaining extends shaping by linking multiple discrete behaviors into a cohesive sequence, where each component response serves as a discriminative stimulus for the next, forming a behavioral chain that culminates in reinforcement.[41] In forward chaining, training begins with the first behavior in the sequence, reinforcing it until established, then adding and reinforcing the subsequent behaviors one by one until the entire chain is complete.[41] Backward chaining, conversely, starts with the final behavior, which is immediately reinforced, and works retrospectively to teach preceding links, ensuring the learner experiences success at the chain's end early on.[41] Discriminative stimuli, such as cues signaling when a response will be reinforced, are critical in chaining to control the transition between links and maintain the sequence's integrity.[43]In training applications, variants like errorless learning integrate shaping and chaining to minimize incorrect responses and frustration, particularly for complex discriminations.[44] Developed by Herbert S. Terrace, this approach fades in the discriminative stimuli gradually, starting with highly salient differences between correct and incorrect options to prevent errors, as demonstrated in pigeon experiments where discrimination learning occurred with zero errors in most cases.[44] By avoiding punishment or extinction of errors, errorless procedures enhance efficiency and reduce emotional side effects, making them suitable for building chains in skill acquisition.[44]
Primary and Secondary Reinforcers
Primary reinforcers are stimuli that inherently satisfy biological needs and thus strengthen preceding behaviors without requiring learning, demonstrating unlearned effectiveness across species such as rats, pigeons, and humans.[2] Examples include food, which reduces hunger; water, which quenches thirst; and oxygen, which alleviates deprivation, all of which directly impact survival and reproduction.[7] However, their reinforcing power is limited by satiation, where repeated exposure diminishes effectiveness until deprivation recurs, as observed in experimental settings where food reinforcement ceases after consumption meets physiological needs.[2]In contrast, secondary reinforcers, often termed conditioned reinforcers, acquire their value through associative learning, specifically by being paired with primary reinforcers in operant or classical conditioning paradigms.[43] This process, first systematically explored in operant contexts, allows neutral stimuli to become motivating; for example, a token or chip gains reinforcing properties when consistently exchanged for food in laboratory token economies with animals.[2] Similarly, in human applications, praise or good grades function as secondary reinforcers after repeated association with tangible rewards like affection or privileges, enabling broader behavioral control without direct biological satisfaction.[43]The derivation of secondary reinforcers involves principles of generalization, where their effectiveness extends to similar stimuli (e.g., various forms of currency reinforcing spending behaviors), and fading, where prolonged absence of primary pairing can weaken their impact over time.[2] This learned quality makes secondary reinforcers versatile tools in behavioral techniques like shaping, where they bridge incremental steps toward complex behaviors more efficiently than primaries alone.[43]
Natural vs. Artificial Reinforcement
Natural reinforcement occurs through ecological contingencies inherent to an organism's environment, where behaviors are strengthened by naturally occurring consequences that promote survival and adaptation. For instance, in foraging scenarios, the successful discovery of food reinforces search and exploration behaviors in animals, as these outcomes directly satisfy biological needs without external intervention.[45] Such processes are shaped by evolutionary adaptations, where repeated reinforcement of adaptive behaviors over generations enhances fitness in natural settings.[46]In contrast, artificial reinforcement involves contrived contingencies designed by humans in controlled environments, such as laboratories or therapeutic applications, to isolate and manipulate specific variables for study or behavior modification. The Skinner box, developed by B.F. Skinner, exemplifies this approach, where animals like rats press levers to receive food pellets on schedules determined by the experimenter, allowing precise analysis of reinforcement effects independent of natural variability. In applied settings like behavior therapy, artificial reinforcers—such as tokens or praise—are used to shape behaviors that may not yet contact natural consequences.Artificial reinforcement can effectively mimic natural contingencies to facilitate learning, as seen in operant simulations that replicate ecological foraging patches to study decision-making under variable rewards.[47] However, over-reliance on artificial systems poses pitfalls, including the development of maladaptive behaviors that fail to generalize to natural environments, necessitating a gradual transition to inherent reinforcers to ensure long-term persistence.[48] Primary reinforcers, such as food or water, often align closely with natural reinforcement due to their biological immediacy.
Mathematical and Theoretical Models
Basic Models of Reinforcement
The drive-reduction model, proposed by Clark L. Hull, posits that reinforcement occurs when a behavior reduces an internal drive arising from a biological need, such as hunger or thirst, thereby restoring homeostasis and strengthening the association between the stimulus and response.[49] For instance, eating alleviates the drive of hunger, reinforcing the behavior of seeking food in the presence of relevant cues. Hull formalized this in a hypothetical-deductive framework, where the strength of learned habits (denoted as sHr) develops through repeated reinforced trials, and the overall reaction potential (sEr), which determines the likelihood of a response, is given by the product of habit strength and motivational factors:sEr=sHr×D×K×J×VHere, D represents drive strength, K captures the incentive motivation, J accounts for the delay between response and reinforcement, and V represents stimulus intensity dynamism, with inhibitory terms subtracted in fuller formulations. This model provided a quantitative basis for understanding reinforcement in operant conditioning paradigms, emphasizing physiological mechanisms over subjective experience.[50]Subsequent incentive models refined Hull's approach by decoupling reinforcement from strict drive reduction, highlighting the independent role of the reinforcer's hedonic or appetitive value. Kenneth W. Spence, building on Hull's framework, argued that behavior is invigorated not solely by reducing internal drives but by the external incentive's ability to elicit anticipatory excitation, shifting focus toward the goal object's attractiveness as a motivator.[51] This perspective better accounted for behaviors driven by novel or non-homeostatic rewards, such as exploratory actions in non-deprived states. A key extension is the matching law, formulated by Richard J. Herrnstein, which predicts that in situations with multiple reinforcement options, organisms distribute their behavior in proportion to the relative rates of reinforcement obtained from each, reflecting efficient allocation based on incentive value rather than absolute drive levels.[52]Despite their influence, basic models of reinforcement like drive-reduction and incentive theories face limitations for overemphasizing biological and mechanical processes while underplaying cognitive factors, such as expectancies or representations of outcomes.[53] These critiques spurred transitions to cognitive-behavioral integrations, where reinforcement is viewed through lenses of information processing and goal-directed agency, though the foundational models remain seminal for explaining core motivational dynamics.[54]
Quantitative Approaches
The Rescorla-Wagner model provides a foundational quantitative framework for understanding associative learning in classical conditioning, which has influenced broader models of reinforcement learning. It posits that learning occurs through the adjustment of predictive value based on prediction errors, formalized by the delta rule: the change in associative strength for a stimulus, ΔV, is given by ΔV=α(λ−V), where V is the current associative strength (predicted value of the unconditioned stimulus, or US), λ is the maximum associable value of the US on a given trial, and α is the learning rateparameter reflecting the salience of the stimulus and US.[55] During acquisition, when a conditioned stimulus (CS) is paired with reinforcement (US present, λ>0), V incrementally approaches λ, simulating the buildup of conditioned responding. In extinction, the absence of reinforcement sets λ=0, causing V to decay toward zero, which models the decline in responding over unreinforced trials. This model effectively predicts phenomena such as blocking and overshadowing by limiting total associative change across stimuli, with the sum of V values constrained by a total capacity parameter.[55]The matching law, developed by Herrnstein, quantifies how organisms allocate behavior across multiple response options in proportion to the reinforcement rates available from each. In its basic form, it states that the ratio of responses emitted to two alternatives equals the ratio of reinforcements obtained: R1+R2R1=r1+r2r1, where R1 and R2 are the response rates to alternatives 1 and 2, and r1 and r2 are the corresponding reinforcement rates. This relation was empirically derived from experiments with pigeons on concurrent variable-interval schedules, where response allocation closely matched reinforcement proportions across a wide range of conditions. Deviations from strict matching often arise due to factors like response effort or reinforcer type, leading to response bias (a constant multiplier b) in the generalized matching law: R2R1=b(r2r1)s, where s is a sensitivity parameter (ideally 1 for perfect matching, but typically less than 1, indicating undermatching). Generalizations extend the law to multi-alternative choices and include absolute response levels via additional parameters, such as a baseline response rate, enabling predictions of behavior in complex environments like human choice scenarios.Optimal foraging theory incorporates reinforcement principles into decision-making models for resource acquisition, with the marginal value theorem specifying conditions for leaving depleting resource patches. The theorem predicts that a forager should depart a patch when the instantaneous marginal rate of energy gain equals the expected overall rate of gain in the environment, formalized as the solution to R′(t∗)=t∗+τR(t∗), where R(t) is the cumulative gain function in the patch after time t, R′(t) is its derivative (marginal rate), and τ is the travel time between patches.[56] This optimal leaving time t∗ maximizes net energy intake rate across the foraging bout, assuming patches deplete over time and travel costs fixed. Applications demonstrate its utility in predicting patch residence times in diverse species, such as birds and insects, where empirical data align with model predictions under varying resource distributions and handling times. Generalizations account for patch variability and predation risks by adjusting the equality threshold.[56]
Applications
In Animal and Human Training
In animal training, clicker training employs a distinct clicking sound as a secondary reinforcer to precisely mark desired behaviors, bridging the gap between the action and the delivery of primary rewards like food, thereby facilitating faster learning through operant conditioning.[57] This method has been widely adopted for its non-invasive nature and effectiveness in shaping complex behaviors without physical coercion.[58]Applications of positive reinforcement extend to zoos, where it enables animals to voluntarily participate in husbandry and veterinary procedures, reducing stress and improving welfare outcomes. For instance, training group-housed sooty mangabeys to shift enclosures achieved over 90% compliance, saving significant time in daily care while minimizing distress.[59] Similarly, service animals, such as guide dogs, are trained using positive reinforcement to perform tasks like alerting to medical needs, enhancing reliability and reducing handler stress through reward-based motivation.[60]Evidence from dolphin programs underscores the motivational impact of positive reinforcement, with bottlenose dolphins exhibiting heightened anticipatory behaviors—such as increased surface looking—prior to human-animal interactions signaled by conditioned cues, correlating with greater voluntary participation rates (β = 0.274, P = 0.008).[61] These programs demonstrate how reinforcement fosters cooperation in aquatic environments, supporting conservation efforts and cognitive enrichment.In human applications, positive reinforcement aids skill acquisition in sports by building athlete confidence and motivation; coaches use verbal praise and rewards to reinforce technique mastery, leading to improved performance and reduced anxiety.[62] In therapy settings, it promotes behavioral changes through techniques like those in applied behavior analysis, where rewards immediately follow target skills to encourage repetition and generalization.[63]Parent management training incorporates reinforcement principles to address child behavior, emphasizing consistent rewards for positive actions alongside negative punishment strategies, such as time-outs, which involve brief removal of attention to decrease undesired conduct without physical harm.[64] Techniques like shaping and chaining are often integrated to build complex skills incrementally.Meta-analyses indicate that positive reinforcement outperforms punishment for achieving long-term compliance in children; for example, praise increases child compliance, while physical punishment shows no sustained benefits and may exacerbate issues.[65][66] This superiority holds across contexts, promoting enduring behavioral persistence over temporary suppression.[67]
In Addiction and Dependence
In addiction, drugs often function as positive reinforcers by producing euphoria or enhanced pleasure, which strengthens the behavior of drug-seeking and consumption through associative learning mechanisms.[68] For instance, initial exposure to substances like cocaine or alcohol activates reward pathways in the brain, leading to repeated use to recapture these rewarding effects.[69] Conversely, negative reinforcement plays a key role as individuals use drugs to alleviate withdrawal symptoms or reduce stress and anxiety, thereby maintaining dependence by removing aversive states.[70] This dual reinforcement dynamic escalates drug use, transitioning from recreational patterns to compulsive behavior.Tolerance develops as repeated drug exposure diminishes the euphoric effects, requiring higher doses to achieve the same reinforcement, while sensitization heightens the motivational salience of drug cues, amplifying craving without necessarily increasing the drug's direct rewarding impact.[71] According to the incentive-sensitization theory, this sensitization primarily affects mesolimbic dopamine systems, making environmental cues more potent motivators for drug-seeking over time. In dependence models, variable reinforcement schedules, akin to those in gambling, promote persistent behavior through unpredictable rewards, leading to "chasing losses" where individuals continue use despite negative consequences to pursue intermittent highs.[72] This mechanism extends to interpersonal relationships, where intermittent reinforcement through sporadic rewards such as giggles, encouragement, or thumbs-up maintains engagement without full commitment, fostering behavioral persistence analogous to addictive dependence.[73] Cue-induced relapse is facilitated by secondary reinforcers, where previously neutral stimuli (e.g., drug paraphernalia) acquire reinforcing properties through conditioning, triggering intense cravings and resumption of use even after periods of abstinence.[74]Interventions like contingency management leverage reinforcement principles by providing tangible rewards, such as vouchers exchangeable for goods, contingent on verified abstinence, effectively countering addictive patterns.[75] In opioid use disorder studies, voucher-based programs have demonstrated significant reductions in illicit opioid use and prolonged abstinence durations compared to standard care, with meta-analyses confirming efficacy across substance types.[76] These approaches mimic controlled positive reinforcement to promote recovery, though sustained effects require ongoing implementation to prevent relapse.[77]
In Economics and Decision-Making
In behavioral economics, reinforcement principles derived from operant conditioning explain how economic choices are shaped by the consequences of prior actions, emphasizing the role of rewards in strengthening preferred behaviors over time. This approach integrates psychological mechanisms with traditional economic models to account for deviations from rational utility maximization, such as suboptimal allocation of resources in response to variable payoffs. Seminal work highlights how positive reinforcement increases the likelihood of repeating value-seeking behaviors, while negative reinforcement or punishment discourages alternatives.[78]A prominent application involves delay discounting, where individuals systematically prefer smaller immediate reinforcers to larger delayed ones, reflecting the higher reinforcing potency of immediacy. This preference, often characterized by hyperbolic discounting functions, influences intertemporal choices like saving versus spending, as immediate rewards provide quicker feedback that reinforces impulsive decisions. In prospect theory, reinforcement from outcomes further modulates risk preferences: gains act as positive reinforcers promoting conservative choices in the domain of gains, while losses serve as potent negative reinforcers, amplifying risk-seeking to avoid further deprivation and explaining phenomena like loss aversion.[79][80]In decision-making contexts, melioration describes the dynamic process by which agents adjust behavior toward options yielding higher local rates of reinforcement, frequently leading to over-matching where choices disproportionately favor richer alternatives despite long-term costs. This principle, rooted in operant theory, applies to economic scenarios like labor-leisure trade-offs or investment portfolios, where short-term reinforcements drive suboptimal global outcomes. The matching law quantifies this by predicting that the ratio of time or effort allocated to options matches the ratio of obtained reinforcements, providing a behavioral foundation for analyzing consumerdemand elasticity and resource distribution.[81][82]These concepts extend to practical applications in consumer behavior, where marketing strategies employ variable-ratio reinforcement schedules—similar to slot machines—to sustain engagement and purchases by unpredictably delivering rewards like discounts or loyalty points. In policy design, nudges harness reinforcement contingencies by restructuring choice environments to make beneficial options more salient and immediately rewarding, such as default enrollment in retirement savings plans that reinforce saving through automatic gains. Evidence from laboratory games, including simulated investment tasks, shows that repeated positive reinforcement from risky successes escalates subsequent risk-taking, as participants over-weight recent wins in line with operant strengthening. Economic adaptations of quantitative reinforcement models, like the generalized matching law, further refine predictions of these behaviors by incorporating sensitivity to reinforcement rates.[83][84]
In Education and Child Behavior
In educational and parenting contexts, reinforcement strategies play a central role in shaping child behavior and promoting learning by increasing the likelihood of desired actions through positive consequences. These approaches draw from behavioral principles to foster self-regulation, academic engagement, and social skills, often emphasizing immediate and consistent rewards to build long-term habits.[85]Parent-Child Interaction Therapy (PCIT) is an evidence-based intervention for children aged 2 to 7 with disruptive behaviors, where parents are coached in real-time to use positive reinforcement during interactions. In the Child-Directed Interaction phase, caregivers apply PRIDE skills—Praise for appropriate behavior, Reflect the child's statements, Imitate play, Describe actions, and show Enthusiasm—to strengthen the parent-child bond and reduce noncompliance. Studies demonstrate PCIT's effectiveness, with treated children showing significant decreases in disruptive behaviors and improvements in parental discipline skills post-intervention.[85]Praise serves as a secondary reinforcer in child behavior management, acquiring its reinforcing value through repeated pairing with primary rewards like attention or tangible items, thereby motivating compliance without material costs. Experimental pairings of praise with preferred stimuli have established it as a conditioned reinforcer, increasing task completion and reducing problem behaviors in young children. For instance, behavior-specific praise, such as "Great job sharing your toy," reinforces prosocial actions more effectively than general approval.[86]Token economy systems in classrooms extend this by providing symbolic reinforcers—such as points, stickers, or tickets—that children exchange for privileges or items, systematically increasing on-task behavior and academic participation. These systems operate on positive reinforcement principles, where tokens are delivered immediately after target behaviors like completing assignments, leading to sustained improvements in classroom conduct. Systematic reviews confirm their utility in reducing disruptions and boosting engagement, particularly when combined with clear rules and varied backups.[87]In educational settings, mastery learning incorporates immediate feedback as a reinforcement mechanism to ensure students achieve proficiency before advancing, allowing corrective instruction based on formative assessments. Developed by Benjamin Bloom, this model provides targeted reinforcement through retries and praise for progress, resulting in effect sizes of 0.59 on academic outcomes and reduced variability in achievement.[88]Differential reinforcement of alternative behaviors (DRA) targets skill-building in education by withholding reinforcement for undesired actions while rewarding incompatible, appropriate alternatives, such as praising quiet participation over outbursts. This technique promotes functional replacements, like using words to request help instead of tantrums, and has been shown to decrease problem behaviors in school environments through consistent application.[89]Empirical evidence highlights reinforcement's impact on children with attention-deficit/hyperactivity disorder (ADHD), where classroom interventions like token systems and praise reduce off-task and disruptive behaviors by up to 50% compared to controls. These strategies enhance focus and compliance without medication, though effects vary by implementation fidelity.[90]Cultural variations influence praise as reinforcement; for example, American parents more frequently praise independence and achievement, while Arab and Jewish groups emphasize compliance and family harmony, affecting child motivation differently across contexts. In East Asian classrooms, such as in China, teachers use praise and rewards more extensively to build positive relationships, contrasting with Western emphases on individual effort.[91][92]
Contemporary Extensions
Reinforcement in Neuroscience
In neuroscience, reinforcement is fundamentally linked to the mesolimbic dopamine system, where dopamine neurons in the ventral tegmental area (VTA) project to key targets like the nucleus accumbens, encoding signals that drive learning and motivation.[93]Dopamine release in this pathway serves as a reward prediction error (RPE) signal, representing the discrepancy between expected and actual rewards, which updates value representations to reinforce adaptive behaviors. The nucleus accumbens, a primary recipient of these projections, plays a central role in valuation by integrating sensory and motivational inputs to assign subjective worth to stimuli and actions, thereby facilitating reinforcement-driven choices.[94]Dopaminergic processes distinguish between phasic and tonic release modes, each contributing uniquely to reinforcement dynamics. Phasic dopamine bursts, occurring in short pulses, primarily signal unexpected rewards or errors, promoting rapid synaptic plasticity and associative learning in downstream circuits.[95] In contrast, tonic dopamine maintains baseline levels, modulating overall arousal, motivation, and the threshold for phasic responses without directly encoding errors.[96] These signals integrate with the prefrontal cortex (PFC), where dopamine modulates executive functions; for instance, D1 and D2 receptors in the PFC regulate decision-making by balancing exploration and exploitation during reinforced tasks, enabling context-dependent action selection.[97]Recent optogenetics studies since 2020 have causally confirmed dopamine's role in encoding reinforcement, revealing how targeted VTA stimulation drives associative learning and incentive value assignment in rodents.[98] For example, optogenetic activation of dopamine neurons during cue-reward pairings strengthens behavioral preferences, underscoring their sufficiency for reinforcement without external rewards.[99] Studies from 2023-2025 have further advanced this, including optogenetic manipulation showing dopamine's role in deep network teaching signals for decision-making in mice, and stimulation rescuing reinforcement deficits in Alzheimer's models.[100][101] These findings have implications for disorders like Parkinson's disease, where dopamine depletion disrupts RPE signaling, impairing reinforcement learning and contributing to motor and cognitive deficits; therapeutic dopamine restoration partially ameliorates these effects by reinstating value-based decision-making.[102]
Reinforcement Learning in AI
Reinforcement learning (RL) in artificial intelligence involves an agent interacting with an environment to learn optimal behaviors through trial and error, receiving rewards or penalties to maximize long-term cumulative reward. This paradigm draws inspiration from behavioral models of reinforcement, adapting them to computational frameworks where the agent observes states, selects actions, and updates its policy based on outcomes. Central to RL is the Markov decision process (MDP), which formalizes the environment as a tuple of states, actions, transition probabilities, rewards, and discount factor, enabling the agent to make sequential decisions under uncertainty.[103]A foundational algorithm in RL is Q-learning, a model-free, off-policy method that estimates the value of state-action pairs to derive an optimal policy. In Q-learning, the agent maintains a Q-function Q(s,a), representing the expected future reward for taking action a in state s. The update rule is given by:Q(s,a)←Q(s,a)+α[r+γa′maxQ(s′,a′)−Q(s,a)]where α is the learning rate, r is the immediate reward, γ is the discount factor, and s′ is the next state. This temporal-difference update converges to the optimal Q-values under suitable conditions, allowing the agent to select actions greedily via argmaxaQ(s,a). Introduced in 1992, Q-learning has influenced numerous extensions, including deep Q-networks that combine it with neural networks for high-dimensional state spaces.[104]In robotics, RL excels in pathfinding tasks, where agents learn to navigate dynamic environments while avoiding obstacles. For instance, deep RL agents trained on simulated maps use range sensor inputs to optimize trajectories, achieving collision-free paths in real-world mobile robots by balancing exploration and exploitation through reward shaping. In gaming, RL has produced landmark achievements, such as AlphaGo, which defeated world champions in Go using policy gradient methods to refine move probabilities via self-play and Monte Carlo tree search, integrating value and policy networks for evaluation and selection. These policy gradients, derived from the REINFORCE algorithm, enable gradient ascent on expected rewards, scaling to complex combinatorial games. Multi-agent RL extends single-agent methods to cooperative or competitive scenarios, where multiple agents learn joint policies; for example, in traffic simulation, agents coordinate to minimize congestion, addressing non-stationarity through centralized training with decentralized execution.[105][106][107]Advancements in the 2020s have emphasized model-based RL to enhance sample efficiency, where agents learn an explicit dynamics model of the environment to simulate trajectories and plan ahead, reducing reliance on real-world interactions. Techniques like MuZero, building on AlphaGo, integrate model learning with model-free updates to achieve superhuman performance in Atari games and board games without prior knowledge. These methods have improved planning in resource-constrained settings, such as robotics, by generating synthetic data for policy optimization. By 2024-2025, RL has transformed generative AI, with reinforcement learning from human feedback (RLHF) enabling large language models (LLMs) to align with user preferences, as seen in models like DeepSeek's January 2025 release rivaling ChatGPT in reasoning and task execution. Applications have expanded to personalized healthcare, optimizing treatments like chemotherapy via RL algorithms, and supply chain optimization, with the RL industry valued at over $122 billion as of 2025.[108][109][110][111][112][113][114]However, ethical concerns arise in deploying RL to autonomous systems, including unintended reward hacking where agents exploit loopholes, bias amplification from training data, and accountability gaps in safety-critical decisions like self-driving vehicles. Frameworks for ethical RL advocate incorporating human values through constrained optimization and transparency audits to mitigate risks in real-world applications.
Criticisms and Limitations
Reinforcement theory has been criticized for its overemphasis on external consequences as the primary drivers of behavior, often overlooking the role of internal cognitive processes. Edward C. Tolman's experiments on latent learning demonstrated that rats could form cognitive maps of mazes without immediate reinforcement, suggesting that learning occurs independently of rewards and challenging the stimulus-response reinforcement paradigm central to the theory.[115] This critique highlights how the theory reduces complex behaviors to mechanistic responses, ignoring latent cognitive structures that guide actions in the absence of overt rewards.[116]Further theoretical limitations arise from the reductionist view of motivation, which simplifies human drives to external reinforcements while neglecting multifaceted internal factors such as emotions, beliefs, and social contexts. Noam Chomsky's analysis of B.F. Skinner's Verbal Behavior argued that applying reinforcement principles to language acquisition fails to account for the innate, creative aspects of human cognition, rendering the approach overly simplistic for explaining generative behaviors. Such reductionism limits the theory's applicability to scenarios involving intrinsic motivations or non-reward-based learning, where behaviors persist despite the absence of external incentives.Ethical concerns surrounding reinforcement theory center on its potential for manipulation in practical applications, where controlling consequences can undermine individual autonomy. Richard A. Winett and Richard C. Winkler examined classroom behavior modification programs, finding that reinforcement techniques were frequently used to enforce docility and compliance, raising issues of coercive control over students' natural expressions.[117] Additionally, the overjustification effect illustrates how extrinsic reinforcers can erode intrinsic motivation; Edward L. Deci's studies showed that rewarding previously enjoyable tasks led to decreased interest once rewards were removed, potentially fostering dependency on external controls.Modern criticisms extend to cultural biases embedded in reinforcement research, which predominantly draws from Western, individualistic contexts and may not generalize across diverse societies. Studies on behavior modification with culturally different students reveal that reinforcement strategies often clash with collectivist values, where group harmony and relational dynamics take precedence over individual reward systems, leading to ineffective or insensitive interventions.[118] Integration with neuroscience also poses challenges, as emerging evidence indicates bidirectional influences between reinforcement processes and brain mechanisms, complicating the theory's unidirectional focus on environmental contingencies. Yael Niv's review notes that while dopamine signals align with reinforcement learning predictions, cognitive and affective factors reciprocally modulate these pathways, requiring a more holistic model beyond classical reinforcement principles.[119]