How Animals Learn


3
How Animals Learn


Haleh Amanieh and Nicole R. Dorey


3.1 Introduction: What Is Learning?


Working with animals daily allows us a great advantage—getting to know their behavior. We get to know their likes, their dislikes, and how they tend to act in certain circumstances. Knowledge about an animal’s motivations to engage in or avoid certain behaviors and consequences that follow those behaviors gives us the opportunity to deeply understand the animal. The first step toward understanding animal behavior is understanding behavior in general. Behavior is anything an organism does as it interacts with its environment. From playing catch to reacting to human emotional states, animals exhibit a large variety of behavior, all of which can be analyzed.


It is apparent that each animal has a unique set of highly probable behaviors. Some dogs jump when someone new walks in, while others might hide. These differences are due to their individual experiences, also known as their learning history. Learning occurs when an animal’s behavior changes as a result of its experiences. Learning is much more than just the formal acquisition of a new behavior. As long as an animal is experiencing its environment, it is learning. Animals are learning all the time, meaning that their behavior is constantly changing, even if just a little bit. Knowing how easily and often behavior can change raises the question: How do animals learn? Answers to that question can help us effectively teach new behaviors or address behaviors that can be problematic.


Learning can be broken up into two categories: associative and non‐associative learning. Just like in its name, associative learning takes place when two or more events become associated with or related to each other. Events that can be paired may be two environmental stimuli, such as the sound of a can opener with the smell of food, or a behavior and a consequence, such as pawing a food bowl and the addition of more food. On the flip side, non‐associative learning does not involve a relationship between two events. This type of learning takes place with repeated exposure to a stimulus that occurs unrelated to any other stimulus. Depending upon the salience and timing of the stimulus, this exposure might cause the animal to pay less or more attention, exhibiting habituation and sensitization, respectively.


3.2 Non‐associative Learning


Habituation is a type of non‐associative learning in which an animal stops or reduces its response to a stimulus after repeated exposure to that stimulus. Consider Brutus, a terrier who barks at the sound of a lawn mower buzzing outside. As soon as the lawn mower turns on, Brutus barks wildly at the new sound. However, after a few minutes of the lawn mower continuously buzzing, Brutus calms down. In this example, the lawn mower buzz is the stimulus that elicits the response of barking. The response eventually stops even though the stimulus is still present in the animal’s environment. Brutus habituates to the buzzing. Essentially, he gets used to it. Habituation to the sound occurred without any other stimulus present in Brutus’s environment. The process of habituation is used widely to reduce animals’ fear response to harmless stimuli.


Sensitization is the opposite of habituation in that repeated exposure to a stimulus increases an animal’s response to the stimulus. As a new dog owner, Ruth had no idea that dogs can be so deathly terrified of fireworks. She naively took her Lhasa apso, Scruffy, to see fireworks to celebrate the New Year. When the fireworks started, Scruffy started to nervously pace around and pant heavily. Even though Ruth tried to calm her down by petting her and holding her close, it was no use. As the fireworks continued, she became increasingly nervous. After just a few minutes, Scruffy somehow got out of her collar and ran away into the crowd (Ruth found Scruffy shortly after, of course). Ruth expected Scruffy to habituate to the sound, but instead, she became sensitized to it. The presence of other stimuli was not relevant to Scruffy learning to become more and more sensitized to the noise. Her response to the stimulus became more intense as the stimulus continued to be present in her environment.


3.3 Associative Learning


3.3.1 Respondent Conditioning


One way that associative learning takes place is when a stimulus gets paired with another stimulus, a process called classical or respondent conditioning. Pavlov famously demonstrated this process in the early 1900s. While researching the physiology of digestion in dogs, Pavlov observed that dogs salivated in the presence of food. This was no surprise because Pavlov knew that salivation was a reflex elicited by the presence of food. However, Pavlov was puzzled when he noticed that the dogs began to salivate in the presence of the technician who normally fed the dogs. Pavlov began an experiment based on his serendipitous findings to uncover the process that he informally observed. In his experiment, he presented the sound of a metronome (commonly misreported as having been a “bell”) right before giving the dogs food. By itself, the metronome did not elicit salivation. However, after several pairings of the metronome followed by food, the metronome became associated with food and elicited salivation by itself!


The process of classical conditioning can be easily understood if we divide it into three phases: before conditioning, during conditioning, and after conditioning. Before conditioning, a stimulus automatically elicits an unlearned behavior (i.e., produces an involuntary response). This is the unconditioned stimulus because it automatically triggers a response. For similar reasons, the response that is naturally triggered by the stimulus is the unconditioned response. The food in Pavlov’s experiment served as the unconditioned stimulus and the salivation as the unconditioned response. Initially, the metronome was a neutral stimulus because it did not produce a response (yet!).


During conditioning, the neutral stimulus and the unconditioned stimulus are repeatedly presented together. The most effective method by which classical conditioning takes place is when the neutral stimulus precedes the unconditioned stimulus. The association occurs less effectively if the neutral stimulus occurs after or during the unconditioned stimulus, or if there is a long period of time between their presentations. During this phase, the metronome and food were repeatedly paired. This is the phase in which associative learning takes place; the metronome and food become related and the animal learns to salivate in the presence of the metronome.


After conditioning, the neutral stimulus becomes a conditioned stimulus and can reliably elicit the response by itself. When a conditioned stimulus elicits the response, the response is called a conditioned response. The conditioned response and unconditioned response are the same response; the difference is in what stimulus caused the response to happen. The associative learning is demonstrated in this phase when the metronome can produce salivation by itself. The metronome is now a conditioned stimulus and salivation produced by the metronome is the conditioned response. This example is a common one, but it can be hard to translate processes discovered in a laboratory to the real world. Instead, let’s look at an example you may have witnessed yourself.


When a caregiver enters the kennel area to feed, from a dog’s perspective the person makes a lot of noise, and these sounds are distinct. An animal that is naive to the shelter environment might not notice these sounds. Thus, the sound of the first run door opening is a neutral stimulus: one that elicits no response and thus has no meaning. Food is the unconditioned stimulus. It requires no conditioning to elicit a response (in this case, salivation). The animal’s salivation is the unconditioned response, because if caregivers present the animal with food, it will salivate automatically (without training). After multiple pairings of the first door opening with the daily feeding, the once‐neutral stimulus (the sound of the door opening) is now the conditioned stimulus and causes a conditioned response (salivating at the sound of the run door opening). See another example in Box 3.1 and try to label the neutral stimulus, unconditioned stimulus, unconditioned response, conditioned response, and the conditioned stimulus yourself.


3.3.2 Operant Conditioning


A second form of associative learning occurs when a behavior is paired with a consequence, a process called operant conditioning. Though B. F. Skinner originated the term operant conditioning (also known as instrumental conditioning), his approach to studying animal behavior was largely based on the work of Edward L. Thorndike. As a graduate student, Edward Thorndike studied how success and failure affect behavior (i.e., trial and error learning) by putting cats (among other species) inside a “puzzle box.” The cats had incentive to leave the box; they were hungry and there was food outside of the box that entrapped them. The box could be opened from the inside, but only if the cat pressed a lever, pulled a string, and lifted a latch. Naturally, a cat with no experience would struggle haphazardly to get out of the box. During its struggle, it would accidentally press the lever, pull the string, and lift the latch, and voila! the door would open. At first, the cats were slow and unsystematic when trying to open the box. However, Thorndike observed that the cats opened the box faster with more practice. Based on these observations, Thorndike developed the “law of effect,” which states that behaviors resulting in a pleasant consequence are likely to be repeated, and those resulting in an unpleasant consequence are likely to stop.


B. F. Skinner found Thorndike’s experimental setup to be lacking, mainly because he had to place the cat in the puzzle box after every successful escape. Thus, he looked to create new equipment. The apparatus he made was an operant chamber, a box in which a pigeon could peck an illuminated disk or a rat could press a lever to earn food (see Figure 3.1). With this apparatus, Skinner was able to control exactly when the animals would be rewarded and didn’t have to take the animal out after every trial. Furthermore, data from the operant chamber were collected electronically on a device called a cumulative recorder. He ran a series of experiments in which he tested how an animal’s response rate increased or decreased as a result of the frequency of reward. Skinner differentiated between the behaviors in his operant chamber from reflexes by using the term operant behavior.

Photo depicts a pigeon in a modern, touch-screen-equipped operant chamber.

Figure 3.1 A pigeon in a modern, touch‐screen‐equipped operant chamber.


Unlike responses learned through respondent conditioning, operant behaviors are those that “operate” or act on their environment to produce consequences. A key distinction between respondent behaviors and operant behaviors is that operant behaviors are strengthened and weakened by consequences. For example, if the key is turned then the car starts; if the tail is pulled then the dog bites; if the target is touched then food is delivered; if a leash is pulled then the dog is choked; if the electric fence is touched then the animal is shocked. With operant conditioning, the consequence only occurs if the animal engages in a particular behavior; the consequence impacts the likelihood that the behavior occurs again.


Through his research, Skinner demonstrated the effects of reinforcement and punishment. He found that behavior can be changed by its consequences and went on to distinguish between two types of consequences based on how they affect behavior. Behaviors that are followed by reinforcement are strengthened and more likely to occur again in the future. Thorndike’s cat that pressed the lever, pulled a string, and lifted a latch to leave the box was likely to repeat that sequence and even get faster at it because there was food available after escaping. On the other hand, behaviors that are followed by punishment are weakened and less likely to occur again. If instead of getting food after escaping the cat experienced an electric shock, the cat is less likely to repeat the sequence needed to escape the box. It is important to note that reinforcement and punishment are defined functionally. This means that it doesn’t matter what the consequence is, it could be food, a sound, or an object. As long as a stimulus increases behavior, it is reinforcement, and as long as it decreases behavior, it is punishment.


Table 3.1 The four contingencies in operant conditioning.
















Increases behavior (reinforcement) Decreases behavior (punishment)
Stimulus is added
(positive)
Positive reinforcement Positive punishment
Stimulus is removed (negative) Negative reinforcement Negative punishment

Skinner (1938, 1953) identified four basic arrangements by which operant conditioning occurs (see Table 3.1). In this context, the words “positive” and “negative” are related to mathematical terms; “positive” means adding a stimulus to the situation, and “negative” means taking away a stimulus. Adding or removing a stimulus can increase or decrease behavior, depending on the situation. To train a dog to sit, a trainer might offer a dog a treat after she sits down. This would be an instance of positive reinforcement because the consequence consisted of a treat added to the dog’s environment resulting in an increased likelihood of sitting in the future. A cat owner might describe using a spray bottle to reduce furniture scratching. This would be an instance of positive punishment because the consequence—the water spray—was added to the cat’s environment and decreased scratching.


In negative reinforcement, a response results in the removal of an aversive event, and the response increases. The negative reinforcer is ordinarily something the animal tries to avoid or escape, such as a shock from an electric fence. For example, consider training a dog to sit. Instead of offering the dog a treat, a trainer might put pressure on the dog’s bottom to get the dog to sit and then release the pressure once the dog is sitting. Assuming the behavior of sitting increases, the behavior of sitting was negatively reinforced. The response (sitting) results in the removal of an event (pressure from the trainer’s hand) and the likelihood of the response increases (sitting when hand is on their bottom). A second example of negative reinforcement is a guard dog barking at a fence as a person walks by. If that person leaves the dog’s sight, the dog is likely to bark at the next person that comes to the fence. The response (barking) results in the removal of an event (seeing a person) and the likelihood of the response increases (barking when a person walks by).


The last basic arrangement is negative punishment. In this case, the removal of a stimulus decreases the target behavior. For example, if a dog jumps on their owner to get the person’s attention, the owner might remove that attention by walking away or turning their back to the dog in an attempt to decrease the behavior. If the jumping up behavior decreases when attention is removed, this is an example of negative punishment. Negative punishment occurs when a behavior results in the removal of a pleasant stimulus, causing a decrease in the behavior’s occurrence in the future.


3.4 Effectiveness of Consequences


There are two major factors that can determine the effectiveness of reinforcement and punishment: when and how often the consequences occur. Remember that operant conditioning takes place when a behavior is paired or associated with a consequence. It becomes increasingly difficult for an association to take place if the consequence is delayed from the moment behavior occurs (Wilkenfield et al. 1992). Therefore, timing (the when) is one important factor for the effectiveness of consequences during the acquisition of new behaviors.


Browne et al. (2013) demonstrated the importance of timing by attempting to teach dogs to sniff the inside of one of two containers with either an immediately delivered reinforcer or a reinforcer delayed by 1 second. Most dogs (86%) were able to learn the behavior within 20 minutes when treats were delivered immediately. In contrast, only 40% of dogs learned the behavior when treats were delayed by 1 second. In fact, if a consequence is delayed from the moment of the target behavior, then it is possible that other behaviors get associated with the consequence instead.


The problem of timing is a common one with pet owners. The following scenario might be familiar: Many dog owners come home to find that their dog has rummaged through the trash. In an attempt to punish trash‐rummaging behavior, the owner scolds the dog, perhaps by yelling or confining the dog to a crate. The problem, though, is that it is likely the dog rummaged through the trash hours before the owner came home. Then, even though the dog was peacefully chewing on its dog bone upon the owner’s return, it experienced an aversive consequence. Subsequently, the scolding was associated with appropriate behavior instead of the trash‐rummaging behavior that the owner attempted to punish. Timing, or more specifically, immediacy, is crucial for the development of a behavior‐consequence association.


The second major factor that determines the effectiveness of a reinforcer or punisher in establishing a new or eliminating an unwanted behavior is how often the behavior is followed by the consequence. Formally, how often a consequence follows a behavior is called a schedule. If a consequence follows every instance of behavior, then the consequence is on a continuous schedule. In contrast, if a consequence does not follow every time a behavior occurs, then the consequence is on an intermittent schedule. For a strong association between a behavior and a consequence to develop, the consequence needs to follow the behavior every time it occurs. This is especially true when attempting to teach a new behavior with reinforcement or when attempting to reduce an unwanted behavior with punishment (Zimmerman and Ferster 1963).


Schedules of consequence deliveries are usually referred to as reinforcement schedules, though they are relevant to punishment as well. Schedules of reinforcement can differ in two ways. First, they can differ based on whether the reinforcer is delivered after a certain number of responses or after some amount of time passed. In ratio schedules, reinforcement is delivered following a particular number of responses. Interval schedules are set to deliver reinforcement when one response is made after some amount of time has passed. Continuous and intermittent consequence deliveries can be broken down into four schedules: fixed ratio, variable ratio, fixed interval, and variable interval (see Table 3.2).


In fixed schedules, the number of responses needed to obtain reinforcement or the amount of time that needs to pass is the same every time. With fixed ratio schedules, the number of responses that need to occur for reinforcement to be delivered stays the same after each delivery. The number of responses can be 1, 10, or more. Regardless, the same number of responses is required for reinforcement to occur. For example, in scent detection dogs might not get reinforced with the target scent until the 10th bag they smell. With fixed interval schedules, the amount of time that must pass before a response is reinforced is the same across deliveries. Whether the interval is one minute or one hour, the same amount of time must pass before a response is reinforced. For example, a dog begging at the table will not be reinforced for the begging behavior until after the owner is done with dinner and gives the dog a handout.


In variable schedules, the number of responses or the interval duration for reinforcement changes around some average. A variable ratio schedule requires a different number of responses each time reinforcement occurs. That is, the number of responses can change from one reinforcement to the next (e.g., 5 responses may occur prior to one reinforcement, while 10 may occur prior to the next reinforcement, but overall the average number of responses to reinforcement is, for instance, 6). Similarly, with a variable interval schedule, the amount of time between reinforcements changes. For instance, on a variable interval schedule of five seconds, reinforcement might be delivered when the animal responds after two seconds has passed this time and not until nine seconds has passed the next time. Box 3.2 explores some examples of variable schedule reinforcement in the shelter.


Table 3.2 Reinforcement schedules.
























Reinforcement
schedule
Definition Example
Fixed interval Reinforcement is delivered at a predictable time interval Letting animals out in the play yard: every morning at 9 a.m. the animal caregiver opens the enclosure door, but the animal’s behavior of checking the door to go outside isn’t reinforced until it checks the door after 9 a.m.
Variable interval Response is reinforced after an interval of time that varies but centers around some average amount of time Animal feedings: the time of feeding an animal may vary from day to day, but on average a caregiver provides food every eight hours. Therefore, the animal’s response to checking the bowl will not be reinforced until an average of eight hours has passed.
Fixed ratio Response is reinforced only after a specified number of responses Multiple repetitions: a trainer wants an animal to do multiple repetitions of the same behavior. Therefore, the trainer delivers reinforcement after every two correct responses.
Variable ratio Response is reinforced after an average number of responses Opening the door: an animal might paw at the door several times to be let through. The owner lets the animal in after the animal paws on average five times.

Though intermittent schedules don’t work as well as continuous reinforcement for establishing a new behavior, they work really well in maintaining an already established behavior (Jenkins and Stanley 1950). Typically, after a dog is trained to sit, trainers reduce the number of reinforcers she receives for sitting. The trainer gradually transitions the continuous schedule of reinforcement to an intermittent schedule. As long as the dog receives a treat once in a while, she reliably sits on cue. Changing a continuous schedule of reinforcement to an intermittent one is often called “schedule thinning.” This procedure is beneficial for trainers because not only does it reduce the number of reinforcers needed to maintain behavior, but it also causes the animal to perform consistently. Intermittent schedules result in unpredictable deliveries of reinforcers that essentially teach the animal to be a devoted “gambler.” Without knowing when a response will be reinforced, the animal performs the behavior consistently and reliably! Based on laboratory research, once a behavior is maintained intermittently, it can be very hard to eliminate (Harper and McLean 1992).


The effects of intermittent reinforcement are commonly found in the shelter. Food‐dispensing toys are often provided to facilitate an enriching environment. Caregivers might vary the type of food or switch between food or scent. However, the dog has a preference for food enrichment over the scent enrichment, and after some experience getting the toy with food on some occasions and getting the toy with scent on others, whether or not the toy contains food is a mystery to the dog! The effect of the intermittent presence of food is evident in the dog’s behavior: the dog is likely to check the toy every time it is placed into its enclosure. The behavior of checking the toy is on an intermittent schedule of reinforcement, leading the behavior to occur reliably when the toy is present (even though the food reinforcer only occurs sometimes).

Oct 18, 2022 | Posted by in SUGERY, ORTHOPEDICS & ANESTHESIA | Comments Off on How Animals Learn

Full access? Get Clinical Tree

Get Clinical Tree app for offline access