Does Retrieval Practice Enhance Memorization of Piano Melodies?1
Paula Telesco, Meltem Karaca, Hannah Ewing, Kelsey Gilbert, Sarah Lipitz, and Jude Weinstein-Jones
Published online: 15 November 2021
- Issue: Volume 61, No.2
- DOI: https://doi.org/10.18177/sym.2021.61.sr.11525
- PDF: https://www.jstor.org/stable/48645695
- font size decrease font size increase font size
Abstract
Any music educator understands the importance of a solid music education. Research has shown that learning and performing music provides cognitive and neuroscientific benefits, such as enhanced speech processing, verbal and visual memory, working memory, mathematical skills, processing speed, and reasoning performance. Considering these cognitive and neuroscientific changes, it is clearly beneficial for individuals to receive musical training on an instrument, including learning to memorize music.
The motivation for this study was to investigate a particular strategy for memorizing music, that of retrieval practice, a study technique whereby novel material is studied and tested afterwards by means of a practice quiz, prior to a final test.
Retrieval practice involves retrieving information from long-term memory, which requires effort, into working memory. When compared to simply restudying information, the act of retrieving information from memory has been shown to improve long-term retention of that information. This finding is known as the “testing effect.”
Decades of cognitive psychology research has shown retrieval practice to be one of the most effective strategies to optimize learning in verbal domains. However, there are currently no studies that systematically investigate the use of retrieval practice for memorizing music. Hence, the current study provides a starting point, using a standard retrieval practice experimental design in a controlled investigation to focus on the effectiveness of this paradigm in music memorization.
Vincent Benitez
Most pianists would likely agree that the ability to memorize music is essential; it is one of the core abilities necessary for a performance career. Dickinson (2009/2010) points out that memorization is a skill often discussed but not often taught. She recommends utilizing four types of memory: kinesthetic, visual, aural, and analytical, with her analytical approach relying heavily on Schenker’s theory of tonality. Mishra (2010) presents a review of literature (N = 185) of the past 106 years that discusses the memorization of music to determine how much the recommendations have changed, if at all. The majority of the articles she surveyed (60 percent) focused on the memorization of keyboard music, and the methods included at least one of the four types discussed by Dickinson, or some combination of them. Mishra concludes that all writers continue to advocate for some combination of those four methods, yet musicians continue to struggle with memorization during performance. Thus, the motivation for this current study is to investigate another potentially effective strategy for memorizing piano music: retrieval practice.
Retrieval Practice
Retrieval practice is a study technique whereby novel material is studied and tested afterwards by means of a practice quiz, prior to a final test. A simple example of this involves the use of flashcards. Research in cognitive psychology has shown retrieval practice to be one of the most effective strategies to optimize learning in verbal domains (for reviews, see Roediger & Butler, 2011; Roediger & Karpicke, 2006a). Retrieval practice involves bringing information from long-term memory into working memory. When compared to simply restudying information, the act of retrieving information from memory has been shown to improve long-term retention of that information. This is known as the “testing effect” (Roediger & Karpicke, 2006b). From a metacognitive perspective, retrieval practice provides feedback to the student, who can then study more effectively by refocusing on what he or she could not retrieve from memory (e.g., Soderstrom & Bjork, 2014). Similarly, it provides feedback to the teacher, who can focus their teaching on what had not been successfully retrieved by the student (Roediger et al., 2011).
A typical design for a retrieval-practice experiment with text materials uses two learning schedules: study-study (SS) and study-test (ST) (e.g., Eglington & Kang, 2018; Roediger & Karpicke, 2006b; Smith et al., 2016). In both conditions, participants are asked to study some information, such as a list of words or a chapter of a book. After studying this material, participants in the SS condition are asked to restudy the material, whereas participants in the ST condition are asked to type out what they can recall from the study material. This is then followed by a final test after some retention interval. In one such study (Roediger & Karpicke, 2006b), participants took the final test either five minutes, two days, or one week later. After the 5-minute delay, those in the SS condition recalled more than those in the ST condition. However, after the two-day or one-week delay, those in the ST retrieval condition produced better memory performance than those in the SS condition.
The testing effect has been demonstrated in both classroom (McDaniel et al., 2007; Ariel & Karpicke, 2017) and laboratory settings using a wide range of materials, including prose passages (Roediger & Karpicke, 2006b), Swahili-English word pairs (Pyc & Rawson, 2009), face-name pairs (Carpenter & DeLosh, 2005), and word lists (Wheeler et al., 2003). The benefits of testing are present across a range of age groups, including elementary-school children (Goossens et al., 2014; Karpicke et al., 2016), middle-school children (Carpenter et al., 2009; McDaniel et al., 2011), and college students (Weinstein et al., 2016). Extant literature has examined retrieval practice in the context of explicit memory (conscious, intentional recollection of factual information) using free recall, cued recall, or recognition tests (for a meta-analysis, see Rowland, 2014).
While the testing effect has been shown to be robust with verbal materials, there is less research regarding whether testing enhances procedural skills (e.g., motor skills). Procedural skills depend on procedural, or implicit memory, which is acquired through practice and used unconsciously; procedural skills allow us to perform motor tasks, such as riding a bike or playing the piano without having to think about it (Roediger, 1990). Several recent studies have investigated the testing effect on procedural skills with medical and dental students. One study examined whether testing enhanced skill learning in a resuscitation course (Kromann et al., 2009). The findings suggested that testing resuscitation skills, rather than just practicing them, resulted in significantly better learning outcomes for medical students. Another study showed that testing improved skill learning among undergraduate dental students (Sennhenn-Kirchner et al., 2018). In the final assessment of their suturing skills, students in the repeated testing condition performed significantly better than those in the repeated practice condition.
Given that retrieval practice improves long-term retention and has been shown to be effective for learning new skills in different age groups and settings, we hypothesized that retrieval practice may be an effective strategy for learning music. The ability to play piano pieces from memory relies on both implicit and explicit memory (Ghilardi et al., 2009; Ettlinger et al., 2011). Implicit memory allows us to play the piano, while explicit memory allows us to play a specific piece of music from memory. We anticipated finding an advantage of retrieval practice over restudying, because retrieval practice has been shown to be beneficial in implicit and explicit memory tests (Kromann et al., 2009; Sennhenn-Kirchner et al., 2018; for a review, see Roediger & Karpicke, 2006a).
Previous Music Studies
There are currently no studies that systematically investigate the use of retrieval practice for memorizing music. Specific music practice strategies such as mental versus physical practice, and one-hand versus two-hand practice, have been explored experimentally (Barry, 1992; Ross, 1985). Furthermore, some music studies have examined the effects of blocked practice (AAA BBB CCC, etc.), random practice (CAB ACB, BAC, etc.), and interleaving, or serial practice (ABC ABC, ABC, etc.) to determine the role of contextual interference, which may be defined as cognitive disruption; that is, the more often the task changes, the greater the interference (Stambaugh & Demorest, 2010; Stambaugh, 2011; Abushanab & Bishara, 2013; Wong et al., 2020). In each case, these studies include a study/practice phase, a retention interval, and then a test phase, analogous to the Roediger and Karpicke study (2006b). All of these music studies, to one extent or another, show that while the results at the conclusion of the study/practice phase were often similar across all strategies, blocked practice (SS) tended to yield somewhat better results at the end of the study phase than either random practice or interleaved/serial practice (as in Roediger & Karpicke, 2006b).2Callahan (2019), though not conducting a systematic study, tested interleaving on a much larger scale, weaving it through an entire music theory syllabus. His results also favored interleaving over the more standard blocked approach to teaching music fundamentals. Paradoxically, random order and interleaved/serial practice (retrieval-like conditions) yielded more positive results than blocked practice after the retention interval (as in Roediger & Karpicke, 2006b). This paradox, whereby fixed-order produces the best result during the study phase, and random-order practice, or interleaving, produces the best result for the final test or performance, is termed the contextual interference hypothesis (Battig, 1978; Shea & Morgan, 1979).
In another study, Chaffin (2007) observed a professional pianist as she learned Debussy’s “Clair de lune” for a performance and noted how she attended to the musical structure and technical, interpretive, and expressive performance cues. Although the investigation of retrieval practice was not the goal of the research, Chaffin concludes that this professional pianist did engage in retrieval practice as a part of the learning process:
Retrieval practice was one of the main activities throughout the 4 3/4 hours needed to prepare the piece for performance. The pianist tried to play from memory almost from the start, used the musical structure to organize practice, and worked on performance cues to speed up retrieval from long-term memory. Performers practice memory retrieval, even when practice time is limited. (Chaffin, 2007, p. 377)
Chaffin notes that the problem with relying exclusively on motor (implicit) and auditory memory is that eventually, something will go wrong, leaving the pianist to have to improvise until they can get the performance back on track. He quotes Leon Fleischer: “Probably the least reliable [form of memory], in terms of public performance, is finger memory, because it’s the finger that deserts one first” (Chaffin, 2007, p. 378).
Thus, retrieval practice may be one of the mechanisms utilized by professional musicians when practicing and learning repertoire, in addition to aural, visual, and motor memory, in order to memorize and recall enormous amounts of literature. Rather than playing a work from start to finish a number of times (equivalent to the SS condition), the musician may work on a passage for a while (attending to the structural components of the work, such as section boundaries, thereby establishing retrieval cues), then try to play it from memory. This is equivalent to taking a practice quiz, in what we refer to as the ST condition. If the musician plays the passage from the score several times, then attempts to play from memory several times, we will refer to this as the blocked ST condition. If the musician alternates between using the music and playing from memory, we will refer to this as the alternating ST condition.
In short, while retrieval practice has the potential to be effective in practicing music, there are no experimental studies to date that systemically focus on the effects of retrieval practice on music memorization.
Current Study
The current study provides a starting point, using a standard retrieval practice experimental design in a controlled investigation to focus on the effectiveness of this paradigm in music memorization. Our two research questions were:
- To what extent does retrieval practice aid music memorization, compared to restudy?
- Which retrieval practice schedule (blocked ST or alternating ST) is more effective for music memorization?3We did not include interleaving in our study
We conducted a pilot study and two subsequent experiments over the course of three years. In the first experiment, participants practiced three different melodies with three different learning schedules (SS, blocked ST, and alternating ST), made metacognitive judgments regarding their future memory performance, and took a memory test of the piano melodies after 10 minutes. As the effectiveness of retrieval practice might generally become more observable after two days, we also wanted to include a longer retention interval (Roediger & Karpicke, 2006b). Therefore, in our second experiment we manipulated the retention interval so that participants were tested either 10 minutes or two days after the encoding phase. We hypothesized that practicing retrieval (in both the alternating and blocked conditions) would lead to better memory performance compared to the restudy condition. We also expected the 2-day delay to produce a stronger testing effect than the 10-minute delay.
To assess metacognitive monitoring, we asked participants to report judgments of their learning (JOLs), which are metacognitive judgments that individuals make while studying to assess their current state of knowledge (Koriat, 2007). We expected that participants would give higher JOLs for melodies that they practiced in the restudy condition compared to the melodies that they practiced in the retrieval conditions. The literature suggests that students are not metacognitively aware of which learning strategy produces better memory performance on a later test. Specifically, students report that they use restudying rather than retrieval practice for their study practice (Karpicke et al., 2009). Hence, we hypothesized that our participants would likewise believe they would learn the melodies better through restudying them rather than practicing retrieval.
Experiment 1
Recruitment and Method
Although we first attempted to recruit music majors for this study, we were not able to recruit a sufficient number of them. As a consequence, we also recruited from the University of Massachusetts Lowell Undergraduate General Psychology Course Participant Pool. Participants were given research participation credit for their class, and those who completed the experiment also received a $5 University of Massachusetts Lowell Bookstore gift card.
To select eligible participants, we conducted a pre-screening survey of 801 participants (Example 1). Students were asked if they could sightread simple piano pieces in the treble clef. If they responded “Yes,” they were asked if they could play the sample melody shown in Example 1, and how well they could do so. (We intentionally made the sample melody slightly more difficult than the melodies used in the experiment.)4In our pilot test, we discovered that our original music was too difficult, so for the experiment we chose four of Abushanab and Bishara’s (2013) simpler melodies, which we edited slightly for our prescreening and study (see Example 3). The four possible responses were “Not at all,” “Badly,” “Well,” or “Perfectly.” If students circled either “Well” or “Perfectly,” they were invited by email to participate. We invited 142 eligible students to participate, but it was still challenging to find enough students who could sight read music well enough to play the melodies perfectly or almost perfectly the first time through, which would enable them to encode the melody correctly—a necessary precondition for retrieval practice to be effective. Ultimately, we identified 32 eligible students, resulting in usable data for 28 of them for Experiment 1 (which was carried out over the course of one year).
Example 1: Pre-Screening Survey
We collected data on the participants through a questionnaire (see Appendix 2). The sample consisted of nine females and 19 males with a mean age of 19.93 (range 18–38). Fifteen participants were music majors, and 13 were not, although 24 of them reported prior training on piano. The mean number of years of piano training was 5.07 (range 0–14). Twenty-three of the participants also reported they had some prior vocal or instrumental training on a range of different instruments, with some students reporting training on as many as three additional instruments.5Instruments as reported included saxophone, clarinet, trumpet, horn, guitar, bass guitar, bassoon, flute, drums, mallet percussion, other percussion, violin, and viola. Fifteen participants reported that they considered themselves professional musicians whereas 13 participants did not.6Of the 15 participants who considered themselves to be professionals, 11 were music majors and 4 were non-music majors. Of the 13 participants who did not consider themselves to be professionals, 4 were music majors and 9 were non-music majors. Only one participant reported that their primary language was not English.
Design and Material
The design was within-subjects, with the learning schedule being the only manipulated variable. The three different learning conditions were SS (playing the melody 10 times while looking at the music: SSSSSSSSSS), blocked ST (playing the melody five times while looking at the music and five times from memory: SSSSSTTTTT), and alternating ST (STSTSTSTST). These are shown in Example 2.
Example 2: Experiment 1 Design
Study Phase: Participant practices each melody in one of the following ways:
- 10 times looking at music (Study/Study)
- 5 times looking at music, 5 times from memory (blocked Study/Practice Test)
- Alternating looking at music/playing from memory (alternating Study/Practice Test)
Final Test: 10 minutes after learning each melody, participant plays it from memory
We used the three melodies shown in Example 3.7Our melodies were based on those used by Abushanab & Bishara (2013), which they in turn based on melodies from the 4th edition of Ottman’s Music for Sight Singing (1996). Each single-line treble-clef melody is four measures long, comprises 14 quarter notes, and ends with a definitive tonic cadence on a half note. These 15-note melodies lie within the limits of what several researchers, cited in Berz (1995), claim is the short-term memory capacity for melodies. The melodies are clearly tonal, outlining simple harmonic progressions. The musical scores (the stimuli) were presented on an iPad, and a Yamaha electronic keyboard was provided for participants to play the melodies. Performances were recorded via MIDI into the program Logic.
Example 3: Test Melodies for Experiment 1
Participants were randomly assigned to a learning schedule, wherein they had to learn one melody in each condition. Every participant played each melody 10 times, but whether they did so with or without the music differed by the condition. Counterbalancing of melodies and learning schedules resulted in nine different versions of the learning phase.
Procedure
Participants were tested individually in a controlled laboratory environment in a session that lasted about an hour. Participants were read the following instructions: “Today you are going to be learning three different pieces. The pieces should be very easy to sight-read. You will get to play each piece multiple times with one hand; some of the time you will have the music in front of you, while other times you won’t, so you will have to try to play the piece from memory. After you play each piece 10 times, you will take a 10-minute break to do a computer task, and then try to play the piece again from memory. Please do not worry if you cannot remember much of the piece—just try your best.” Participants sat in front of the Yamaha keyboard equipped with MIDI recording capabilities and were presented with the notation for one of the three melodies in a PowerPoint slide on an iPad. On a Study (S) trial, participants were asked to play the melody one time from the music. If they made a mistake, they were instructed to continue playing the melody as best they could, without going back to correct it. On a Test (T) trial, the screen on the iPad was blank, except for an instruction to play the melody back from memory as best they could, again without going back to correct any mistakes. The experimenter changed PowerPoint slides after every trial, and participants attempted to perform the melody from the score or from memory on each trial as dictated by each condition. After every trial, participants also made metacognitive judgments regarding their future memory performance. They were specifically asked how well they thought they would be able to remember the melody on a subsequent memory test.
After completing 10 study and/or test trials on the first melody, participants completed a 10-minute perceptual distractor task, presented electronically on a PC computer using the E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA). The distractor task was followed by a memory test on the first melody. After the test, participants completed a 2-minute word search distractor. This was followed by the second study session, in which participants again completed 10 study and/or test trials on the second melody and then completed the 10-minute computerized distractor task. After the task, participants attempted to play from memory this second melody they had just learned. This was followed by a 2-minute word search distractor, after which participants moved on to the third melody. They again completed 10 study and/or test trials, followed by the 10-minute computerized distractor task, followed by the memory test. At the end of the third study session, participants were provided with a questionnaire, where they answered some questions regarding their musical skills, and provided information about their experience with playing these melodies on the keyboard.
Results and Discussion8The complete analysis of the data for each experiment is included in Appendix 1.
1. Performance during the encoding phase. Example 4 shows the performance accuracy of each of the three encoding conditions during the encoding phase (SS, blocked ST, alternating ST). Neither tempo nor rhythmic accuracy were evaluated, as we were focusing on melody memorization only. (The melodies in our pilot study had rhythmic profiles, but those proved to be too difficult.) Scores were tabulated by counting the number of correct notes, minus the number of incorrect notes. The results indicated that there was a significant difference in performance accuracy during the encoding phase across different learning schedules. On average, participants performed best in the SS condition (M = .95, SD = .08), followed by the blocked ST condition (M = .90, SD = .16), and then the alternating ST condition (M = .84, SD = .13). The advantage of the SS condition was expected since participants had the music in front of them for all 10 trials. Although the average performance difference between the SS and blocked ST conditions did not reach statistical significance, participants in the alternating ST condition performed significantly worse compared to the SS and blocked ST conditions.
Example 4: Experiment 1 Results/ Performance Accuracy during Learning Phase (N= 28)
Note, however, that the performance accuracy of the alternating ST retrieval condition started out far below the SS and blocked ST conditions, but ultimately surpassed the blocked ST condition; by the last play through, it approached the final play through of the SS condition, as did the blocked ST condition (Example 5). We should also note that since participants did not move on to a new melody until after they had taken the final test on the previous melody, there was less contextual interference than there would be if they had learned all three melodies before taking the final test.
Example 5: Experiment 1 Results: Final Test 10 Minutes after Learning Each Melody (N = 28)
2. Performance on the final test. The results, which can be seen in Example 5, showed that there was no significant difference in performance on the final test between SS (M = .85, SD = .28), blocked ST (M = .87, SD = .2), and alternating ST (M = .89, SD = .19) conditions. Nevertheless, the results were in the predicted direction: the SS schedule produced the lowest memory accuracy on the final test compared to both retrieval practice schedules. However, we should note that the overall performance was very high, possibly due to the fact that participants were tested on each melody just 10 minutes after learning it, before learning the next melody. In fact, we found that 71.4% of participants in the SS schedule, 50% of participants in the blocked ST schedule, and 60.7% of the participants in the alternating ST condition achieved the highest possible score. Thus, the distribution of scores across the three encoding conditions was negatively skewed, showing the possibility of a ceiling effect. To address this, we compared the differences in the means across the learning schedules in the lowest-performing group (n = 14). The differences in means between the learning schedules were larger in the lowest-performing group, though it was not significant (SS schedule = .71, blocked ST schedule = .76, and alternating ST schedule = .79). Thus, the differences in means between the learning schedules appeared to be larger when high performers (top half) were removed from the analysis.
3. Prediction accuracy. The results showed that there was no significant difference in overconfidence/underconfidence between SS (M = -.06, SD = .26), blocked ST (M = -.12, SD = .22), and alternating ST (M = -.09, SD = .18) learning schedules. The participants appeared to be generally underconfident in all three learning schedules about their subsequent memory performance.9Calculation for miscalibration of overconfidence and underconfidence is explained in Appendix 1. When we added self-report for those who considered themselves professional musicians vs. nonprofessional ones as a between-subjects factor into the analysis, the pattern of results did not change. Specifically, there was no significant main effect of profession self-report in prediction accuracy, and there was also no significant interaction between profession self-report and learning schedule in prediction accuracy. Thus, both self-reported professional musicians (n = 15) and self-reported nonprofessional musicians (n = 13) were underconfident across all three learning schedules.
Summary of Experiment 1
We had expected that participants’ JOLs would be higher for the melodies they learned in the SS condition compared to those learned in the blocked and alternating ST retrieval practice conditions. However, our results showed that participants were generally underconfident about their subsequent memory performance in all three conditions. In addition, we hypothesized that retrieval practice would be effective in memorizing piano melodies, but we did not observe the expected benefit of retrieval practice on memory performance. We therefore hypothesized that the benefit of retrieval practice in piano memorization would emerge after a longer retention interval because testing effects are more observable when there is a longer delay between the initial encoding phase and the testing one (Roediger & Karpicke, 2006b). Hence, we designed a second study incorporating a longer retention interval.
Experiment 2
In our second experiment, we manipulated learning schedules by extending the delay of the final test to two days for some participants and having all participants learn all three melodies, one after another, before taking the final memory test after either 10 minutes or two days. We predicted this longer retention interval would allow for the beneficial effects of retrieval practice to emerge, such that participants in the 2-day delay condition would demonstrate better memory performance than those in the 10-minute delay condition on the final memory test.
Recruitment and Method
The recruitment protocol for Experiment 2 (across two years) was identical to that of Experiment 1, and we encountered the same difficulties identifying eligible participants. We identified 69 eligible participants from the University of Massachusetts Lowell human subject pool and, as in Experiment 1, most of the participants reported that they had some piano training, or training on other instruments. Participants were randomly assigned to either the 10-minute delay condition (35 participants) or 2-day delay condition (34 participants). Unfortunately, as in Experiment 1, some students showed up and could not read music or play the piano, and so had to be excluded. In addition, some of the students in the 2-day delay condition did not return on the second day. Consequently, we ended up with usable data for 28 participants in the 10-minute delay condition and 21 participants in the 2-day delay condition, for a total of 49 participants.10Unfortunately, demographic data for 24 students are no longer available to us due to reasons beyond our control. Nevertheless, of the 25 participants for whom we do have demographic data, 12 participants were male, 13 were female, 11 were music majors, 14 were non-music majors. Nineteen participants reported some training on piano, while 23 had training on one or more other instruments, including piano, flute, clarinet, saxophone, voice, trumpet, tuba, trombone, guitar, bass guitar, mallet percussion, bells, other percussion, violin, viola, and cello. Seven participants identified themselves as professional musicians (6 were music majors, 1 was a non-music major). Sixteen participants identified themselves as nonprofessional musicians (4 were music majors, 12 were non-music majors). Two participants, including one music major, did not respond to that question.
Design and Material
The design was a 2 x 3 mixed-model factorial design:
- We employed two delay types (10 minutes or two days) manipulated between subjects;
- We employed the same three study schedules as in Experiment 1 (SS, blocked ST, and alternating ST), manipulated within subjects.
As in Experiment 1, participants were assigned to a study schedule, learning one melody in each condition, and the counterbalancing of melodies and learning schedules again resulted in nine different versions of the study phase. Since participants in the 2-day delay condition would not return for two days, we needed to change the format of the test trials and the final test so that participants could be cued to recall specific melodies after two days. Consequently, the final test for all participants (those in the 10-minute delay group and those in the 2-day delay group) was a cued-recall format, whereby participants were shown the first four notes of each melody; hence, participants had to recall only 11 out of the 15 notes from memory (Example 6).
Example 6: Experiment 2 Design
Study Phase: Participant practices each melody in one of the following ways:
- 10 times looking at music (Study/Study)
- 5 times looking at music, 5 times from memory (blocked Study/Practice Test)
- Alternating looking at music/playing from memory (alternating Study/Practice Test)
Final Test (cued):
10 minutes after learning all 3 melodies
OR
2 days after learning all 3 melodies
This cued-recall format necessitated changing slightly the first measures of melodies 1 and 2 from those used in Experiment 1. Since participants would have to be cued, each melody needed a unique first measure. Originally, melodies 1 and 2 both began on the same note, and melodies 2 and 3 had the same scale-step pattern for the first three notes (compare Examples 3 and 7). The musical scores were again presented in PowerPoint slides on an iPad, and a Yamaha electronic keyboard was used for playing the melodies. A MacBook Air running the program GarageBand was used to record the performances.
Example 7: Test Melodies for Experiment 2
Procedure
As in Experiment 1, participants were tested individually in a controlled laboratory environment where they sat in front of the electronic keyboard. During the encoding phase, all participants learned all three melodies, one after another, with a 2-minute crossword puzzle distraction activity between each melody. After the encoding phase, participants assigned to the 10-minute delay condition were given a distractor task consisting of general knowledge questions before taking their final test. For logistical reasons, we did not use the distractor task from the first experiment, because that required an additional computer. Participants in the 2-day delay condition were not given the distractor task; rather, they were dismissed and told to return to the laboratory in two days.
For the final test, all participants were given the cue for the first melody and attempted to play it from memory. Once they stopped, they were given the cue for the second melody, and so on. After the final test, participants in both conditions were given the questionnaire wherein they provided information about their previous musical experience and their general comments on the experiment, such as which learning schedule they would prefer when learning music.
Results
In Experiment 2, we focused on memory accuracy results on the final test. We predicted that retrieval practice would enhance memory performance for those in the 2-day condition because of the longer-retention interval. On the other hand, we did not have a directional a priori hypothesis about the performance of those in the 10-minute condition as a function of learning schedule.
After the 10-minute delay, the alternating ST condition (M = .62, SD = .34) showed the highest memory accuracy, which was followed by the SS condition (M = .49, SD = .35) and the blocked ST condition (M = .46, SD = .33), although these differences in memory performance did not reach statistical significance. In addition, memory performance in all three conditions was much less accurate than in Experiment 1. This was likely due to the fact that in Experiment 1, students would learn one melody at a time and take their final test before moving on to the next melody, whereas in Experiment 2, they learned all three melodies in succession, then had to perform all three from memory, leading to greater interference.
After a 2-day delay, the melodies that participants played in the alternating ST (M = .48, SD = .36) and blocked ST (M = .5, SD = .38) conditions had higher memory accuracy than those in the SS condition (M = .33, SD = .25). Even though there was no statistically significant difference, we can see that the pattern appears to be somewhat consistent with retrieval practice theories. Previous research has shown that retrieval practice is more effective when the retention interval is longer (Roediger & Karpicke, 2006b). In our study, both retrieval conditions produced similar and better results than the SS condition after a 2-day delay, though the differences were not statistically significant. Specifically, we can see the retrieval practice strategy results starting to diverge from the study-only practice strategy on the final test, as shown in Example 8. Thus, retrieval practice may be effective for memorizing music when participants are tested after a longer delay, but more research is needed to confirm this.
Example 8: Experiment 2 Results: Final Test after Learning all 3 Melodies (N = 49)
On the other hand, restudying did not appear to be an ineffective strategy if the delay is short: SS and blocked ST produced similar memory accuracy results when participants were tested in 10 minutes. This is consistent with earlier theories that suggest restudying can be effective when the retention interval is shorter. Nevertheless, we observed that alternated retrieval practice was the most effective learning strategy, producing the highest memory accuracy after a 10-minute delay.
Summary and Conclusions
The objective of this study was to investigate whether retrieval practice could be used as an effective strategy to improve music memorization. In particular, we wanted to examine if retrieval practice was a more effective strategy than simply restudying when participants were learning a melody on the piano. Our results, although not as robust as we had predicted, were in the predicted direction, but not statistically significant. One reason may be due to our small sample size—unfortunately, we were not able to recruit the number of participants for which we had planned. We encourage researchers to conduct future studies with larger sample sizes. Furthermore, we also examined the effect size to understand the practical significance of our results. We focused on Experiment 2, given that Experiment 1 had limitations because of the ceiling effect. In addition, we focused on the group comparisons for the 2-day delay condition, given that we expected to find a larger testing effect because of the longer retention interval. We found a small to medium effect size in favor of both blocked (d = 0.37) and alternating (d = 0.44) retrieval practice groups when contrasted with the study-study group.11We calculated effect size using Cohen’s d, small effect = 0.2, medium effect = 0.5, large effect = 0.8 (Cohen, 1988)t This suggests that the use of retrieval practice might have an effect in music memorization compared to simply restudying. Moreover, an earlier meta-analysis (Rowland, 2014) suggests that the testing effect is a robust one compared to restudying (g = 0.50). Thus, although we did not find a statistically significant difference across the learning schedules, the identified effect size in our study is comparable to the mean effect size suggested in the retrieval practice literature. Therefore, we suggest that retrieval practice might be a helpful learning strategy for music memorization, and we again encourage further investigation with larger sample sizes.
We also wanted to examine the extent to which participants could accurately judge which strategies were most effective for learning music. The results from Experiment 1 revealed that participants made lower JOLs than their actual memory scores, specifically about their memory performance on a later test. Although participants were underconfident in all three learning conditions, we observed that the underconfidence was larger in retrieval-practice conditions than in the restudy condition. While this difference was not significant, the pattern appears to be consistent with the previous literature. Because of the effortful processing associated with practicing retrieval, participants may believe they are not learning the material very well. Conversely, when participants restudy the material, the information is processed more fluently because restudying is less effortful compared to retrieval practice (Koriat & Bjork, 2005; Kornell & Son, 2009). Accordingly, the difference in ease of processing can lead to different metacognitive judgments when participants use different learning strategies.
These results potentially have implications for teaching music fundamentals (for instance, learning to read notes in different clefs, spelling intervals and triads, learning scales, etc.), basic technical performance skills (playing scales, arpeggios, etc.), to possibly memorizing complex instrumental and vocal music for performance. Furthermore, we believe these retrieval strategies may be used with younger students as well as college students. It is important to recognize that students feel most confident using an SS learning strategy. However, over a longer span of time than just a few minutes, a retrieval-practice strategy may yield greater results in memory and overall learning, helping to solidify that information in long-term memory.
The current findings serve as preliminary results on the effectiveness of retrieval practice in memorizing music. Additional studies could investigate simple but more musical tasks than our study did, such as playing melodies with rhythmic profiles and playing music with both hands. For these studies to be successful, one needs more skilled pianists who are willing to participate. Such studies may also include longer retention intervals (for example, a week) to assess the effectiveness of retrieval practice after a longer delay. In addition, future studies may test the effectiveness of retrieval practice on other instruments in order to determine whether potential domain-specific learning strategies can be identified for particular instruments. Lastly, additional studies may examine whether other evidence-based learning strategies can be effective for learning music.12For a review of the science of learning, see Weinstein et al., 2018. Considering the cognitive benefits associated with learning music, this line of research should be welcomed. Identifying the most effective learning techniques can help both professional and avocational musicians in their musical careers and their life-long enjoyment of musical performance.
Notes
1. This research was funded by an internal seed grant through the University of Massachusetts Lowell. Portions of this paper were presented at the 2018 Annual Meeting of the Association for Psychological Science in San Francisco, and at the 2019 International College Music Society Conference in Belgium.
2. Callahan (2019), though not conducting a systematic study, tested interleaving on a much larger scale, weaving it through an entire music theory syllabus. His results also favored interleaving over the more standard blocked approach to teaching music fundamentals.
3. We did not include interleaving in our study.
4. In our pilot test, we discovered that our original music was too difficult, so for the experiment we chose four of Abushanab and Bishara’s (2013) simpler melodies, which we edited slightly for our prescreening and study (see Example 3).
5. Instruments as reported included saxophone, clarinet, trumpet, horn, guitar, bass guitar, bassoon, flute, drums, mallet percussion, other percussion, violin, and viola.
6. Of the 15 participants who considered themselves to be professionals, 11 were music majors and 4 were non-music majors. Of the 13 participants who did not consider themselves to be professionals, 4 were music majors and 9 were non-music majors.
7. Our melodies were based on those used by Abushanab & Bishara (2013), which they in turn based on melodies from the 4th edition of Ottman’s Music for Sight Singing (1996).
8. The complete analysis of the data for each experiment is included in Appendix 1.
9. Calculation for miscalibration of overconfidence and underconfidence is explained in Appendix 1.
10. Unfortunately, demographic data for 24 students are no longer available to us due to reasons beyond our control. Nevertheless, of the 25 participants for whom we do have demographic data, 12 participants were male, 13 were female, 11 were music majors, 14 were non-music majors. Nineteen participants reported some training on piano, while 23 had training on one or more other instruments, including piano, flute, clarinet, saxophone, voice, trumpet, tuba, trombone, guitar, bass guitar, mallet percussion, bells, other percussion, violin, viola, and cello. Seven participants identified themselves as professional musicians (6 were music majors, 1 was a non-music major). Sixteen participants identified themselves as nonprofessional musicians (4 were music majors, 12 were non-music majors). Two participants, including one music major, did not respond to that question.
11. We calculated effect size using Cohen’s d, small effect = 0.2, medium effect = 0.5, large effect = 0.8 (Cohen, 1988)
12. For a review of the science of learning, see Weinstein et al., 2018.
References
Abushanab, B., & Bishara, A. J. (2013). Memory and metacognition for piano melodies: Illusory advantages of fixed-over random-order practice. Memory & Cognition, 41, 928–937. https://doi.org/10.3758/s13421-013-0311-z
Ariel, R., & Karpicke, J. D. (2017). Improving self-regulated learning with a retrieval practice intervention. Journal of Experimental Psychology: Applied, 23. https://doi.org/10.1037/xap0000133
Barry, N. H. (1992). The effects of practice strategies, individual differences in cognitive style, and gender upon technical accuracy and musicality of student instrumental performance. Psychology of Music, 20(2), 112–123. https://doi.org/10.1177/0305735692202002
Battig, W. F. (1978). The flexibility of human memory. In L. S. Cermak & F. I. M. Craik (Eds.), Levels of processing in human memory (pp. 23–44). Lawrence Erlbaum.
Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H., & Ullén, F. (2005). Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience, 8(9), 1148–1150. https://doi.org/10.1038/nn1516
Bergman Nutley, S., Darki, F., & Klingberg, T. (2014). Music practice is associated with development of working memory during childhood and adolescence. Frontiers in Human Neuroscience, 7, 926. https://doi.org/10.3389/fnhum.2013.00926
Berz, W. L. (1995). Working memory in music: A theoretical model. Music Perception: An Interdisciplinary Journal, 12(3), 353–364.
Bjork, R.A. (1975). Retrieval as a memory modifier: An interpretation of negative recency and related phenomena. In R.L. Solso (Ed.), Information processing and cognition: The Loyola symposium (pp. 123–144). Erlbaum.
Callahan, M. (2019). What happens when music theory pedagogy is interleaved? Pedagogy into Practice conference, 2019, https://jmtp.appstate.edu/conference/past-conferences/2019-santa-barbara
Carpenter, S. K., & DeLosh, E. L. (2005). Application of the testing and spacing effects to name learning. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition, 19(5), 619–636. https://doi.org/10.1002/acp.1101
Carpenter, S. K., & DeLosh, E. L. (2006). Impoverished cue support enhances subsequent retention: Support for the elaborative retrieval explanation of the testing effect. Memory & Cognition, 34(2), 268–276. https://doi.org/10.3758/BF03193405
Carpenter, S. K., Pashler, H., & Cepeda, N. J. (2009). Using tests to enhance 8th grade students’ retention of US history facts. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition, 23(6), 760–771. https://doi.org/10.1002/acp.1507
Carrier, M., & Pashler, H. (1992). The influence of retrieval on retention. Memory & Cognition, 20(6), 633–642. https://doi.org/10.3758/BF03202713
Chaffin, R. (2007). Learning Clair de Lune: Retrieval practice and expert memorization. Music Perception: An Interdisciplinary Journal, 24(4), 377–393. https://doi.org/10.1525/mp.2007.24.4.377
Coffman, D. D. (1990). Effects of mental practice, physical practice, and knowledge of results on piano performance. Journal of Research in Music Education, 38(3), 187–196. https://doi.org/10.2307/3345182
Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge.
Cull, W. L. (2000). Untangling the benefits of multiple study opportunities and repeated testing for cued recall. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition, 14(3), 215–235. https://doi.org/10.1002/(SICI)1099-0720(200005/06)14:3<215::AID-ACP640>3.0.CO;2-1
Dickinson, S. (2009/2010). A multi-level approach to more secure memorization. College Music Symposium 49/50, 271–283.
Eglington, L. G., & Kang, S. H. (2018). Retrieval practice benefits deductive inference. Educational Psychology Review, 30(1), 215–228. https://doi.org/10.1007/s10648-016-9386-y
Ericsson, K. A., Krampe, R. T., & Tesch-Römer, C. (1993). The role of deliberate practice in the acquisition of expert performance. Psychological Review, 100(3), 363. https://doi.org/10.1037/0033-295X.100.3.363
Ettlinger, M., Margulis, E. H., & Wong, P. C. (2011). Implicit memory in music and language. Frontiers in psychology, 2, 211. https://doi.org/10.3389/fpsyg.2011.00211
Franklin, M. S., Sledge Moore, K., Yip, C. Y., Jonides, J., Rattray, K., & Moher, J. (2008). The effects of musical training on verbal memory. Psychology of Music, 36(3), 353–365. https://doi.org/10.1177/0305735607086044
Gaser, C., & Schlaug, G. (2003). Brain structures differ between musicians and non- musicians. Journal of Neuroscience, 23(27), 9240–9245. https://doi.org/10.1523/JNEUROSCI.23-27-09240.2003
Ghilardi, M. F., Moisello, C., Silvestri, G., Ghez, C., & Krakauer, J. W. (2009). Learning of a sequential motor skill comprises explicit and implicit components that consolidate differently. Journal of Neurophysiology, 101(5), 2218–2229. https://doi.org/10.1152/jn.01138.2007
Ginsborg, J. (2004). Strategies for memorizing music. In A. Williamon (Ed.), Musical excellence: Strategies and techniques to enhance performance (pp. 123–140). Oxford University Press.
Goossens, N. A., Camp, G., Verkoeijen, P. P., Tabbers, H. K., & Zwaan, R. A. (2014). The benefit of retrieval practice over elaborative restudy in primary school vocabulary learning. Journal of Applied Research in Memory and Cognition, 3(3), 177–182. https://doi.org/10.1016/j.jarmac.2014.05.003
Hallam, S. (2010). The power of music: Its impact on the intellectual, social and personal development of children and young people. International Journal of Music Education, 28(3), 269–289. https://doi.org/10.1177/0255761410370658
Hewitt, M. P. (2001). The effects of modeling, self-evaluation, and self-listening on junior high instrumentalists’ music performance and practice attitude. Journal of Research in Music Education, 49(4), 307–322. https://doi.org/10.2307/3345614
Hopkins, R. F., Lyle, K. B., Hieb, J. L., & Ralston, P. A. (2016). Spaced retrieval practice increases college students’ short-and long-term retention of mathematics knowledge. Educational Psychology Review, 28(4), 853–873. https://doi.org/10.1007/s10648-015-9349-8
Hyde, K. L., Lerch, J., Norton, A., Forgeard, M., Winner, E., Evans, A. C., & Schlaug, G. (2009). Musical training shapes structural brain development. Journal of Neuroscience, 29(10), 3019–3025. https://doi.org/10.1523/JNEUROSCI.5118-08.2009
Jakobson, L. S., Lewycky, S. T., Kilgour, A. R., & Stoesz, B. M. (2008). Memory for verbal and visual material in highly trained musicians. Music Perception: An Interdisciplinary Journal, 26(1), 41–55. https://doi.org/10.1525/mp.2008.26.1.41
Kang, S. H., McDermott, K. B., & Roediger III, H. L. (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19(4–5), 528–558. https://doi.org/10.1080/09541440601056620
Karpicke, J. D. (2009). Metacognitive control and strategy selection: Deciding to practice retrieval during learning. Journal of Experimental Psychology: General, 138(4), 469. https://doi:10.1037/a0017341. https://pubmed.ncbi.nlm.nih.gov/19883131/
Karpicke, J. D., Blunt, J. R., & Smith, M. A. (2016). Retrieval-based learning: Positive effects of retrieval practice in elementary school children. Frontiers in Psychology, 7(350). https://doi.org/10.3389/fpsyg.2016.00350
Karpicke, J. D., Butler, A. C., & Roediger III, H. L. (2009). Metacognitive strategies in student learning: Do students practise retrieval when they study on their own? Memory, 17(4), 471–479. https://doi.org/10.1080/09658210802647009
Koriat, A. (2007). Metacognition and consciousness. In P. D. Zelazo, M. Moscovitch, & E. Thompson (Eds.), Cambridge handbook of consciousness (pp. 289–325). Cambridge University Press.
Koriat, A., & Bjork, R. A. (2005). Illusions of competence in monitoring one’s knowledge during study. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31(2), 187–194. https://doi.org/10.1037/0278-7393.31.2.187
Kornell, N., & Son, L. K. (2009). Learners’ choices and beliefs about self-testing. Memory, 17(5), 493–501. https://doi.org/10.1080/09658210902832915
Kromann, C. B., Jensen, M. L., & Ringsted, C. (2009). The effect of testing on skills learning. Medical Education, 43(1), 21–27. https://doi.org/10.1111/j.1365-2923.2008.03245.x
McDaniel, M. A., & Fisher, R. P. (1991). Tests and test feedback as learning sources. Contemporary Educational Psychology, 16(2), 192–201. https://doi.org/10.1016/0361-476X(91)90037-L
McDaniel, M. A., Agarwal, P. K., Huelser, B. J., McDermott, K. B., & Roediger III, H. L. (2011). Test-enhanced learning in a middle[-]school science classroom: The effects of quiz frequency and placement. Journal of Educational Psychology, 103(2), 399. https://doi.org/10.1037/a0021782
McDaniel, M. A., Roediger, H. L., & McDermott, K. B. (2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14(2), 200–206. https://doi.org/10.3758/BF03194052
Mishra, J. (2010). A Century of Memorization Pedagogy. Journal of Historical Research in Music Education. 32(1), 3–18.
Ottman, R. W. (1996). Music for Sight Singing (4th ed.). Prentice Hall.
Oxendine, J. B. (1984). Psychology of motor learning. Prentice Hall.
Pascual-Leone, A. (2001). The brain that plays music and is changed by it. Annals of the New York Academy of Sciences, 930(1), 315–329. https://doi.org/10.1111/j.1749-6632.2001.tb05741.x
Psychology Software Tools Inc. (2012). E-Prime 2.0. http://www.pstnet.com
Pyc, M. A., & Rawson, K. A. (2009). Testing the retrieval effort hypothesis: Does greater difficulty correctly recalling information lead to higher levels of memory?. Journal of Memory and Language, 60(4), 437–447. https://doi.org/10.1016/j.jml.2009.01.004
Roediger, H. L. (1990). Implicit memory: Retention without remembering. American Psychologist, 45(9), 1043–1056.
Roediger III, H. L., & Butler, A. C. (2011). The critical role of retrieval practice in long-term retention. Trends in cognitive sciences, 15(1), 20–27. https://doi.org/10.1016/j.tics.2010.09.003
Roediger III, H. L., & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1(3), 181–210. https://doi.org/10.1111/j.1745-6916.2006.00012.x
Roediger III, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17(3), 249–255. https://doi.org/10.1111/j.1467-9280.2006.01693.x
Roediger III, H. L., Putnam, A. L., & Smith, M. A. (2011). Chapter One – Ten benefits of testing and their applications to educational practice. Psychology of Learning and Motivation, 55, 1–36. https://doi.org/10.1016/B978-0-12-387691-1.00001-6
Ross, S. L. (1985). The effectiveness of mental practice in improving the performance of college trombonists. Journal of Research in Music Education, 33, 221–230. https://doi.org/10.2307/3345249
Rowland, C. A. (2014). The effect of testing versus restudy on retention: A meta-analytic review of the testing effect. Psychological Bulletin, 140(6), 1432. Retrieved from http://dx.doi.org/10.1037/a0037559
Rubin-Rabson, G. (1940). Studies in the psychology of memorizing piano music: II. A comparison of massed and distributed practice. Journal of Educational Psychology, 31(4), 270. https://doi.org/10.1037/h0061174
Schmithorst, V. J., & Wilke, M. (2002). Differences in white matter architecture between musicians and non-musicians: A diffusion tensor imaging study. Neuroscience Letters, 321(1–2), 57–60. https://doi.org/10.1016/S0304-3940(02)00054-X
Sennhenn‐Kirchner, S., Goerlich, Y., Kirchner, B., Notbohm, M., Schiekirka, S., Simmenroth, A., & Raupach, T. (2018). The effect of repeated testing vs repeated practice on skills learning in undergraduate dental education. European Journal of Dental Education, 22(1), e42–e47. https://doi.org/10.1111/eje.12254
Shea, J. B., & Morgan, R. L. (1979). Contextual interference effects on the acquisition, retention, and transfer of a motor skill. Journal of Experimental Psychology: Human Learning and Memory, 5, 179–187. https://doi.org/10.1037/0278-7393.5.2.179
Sikes, P. L. (2013). The effects of specific practice strategy use on university string players’ performance. Journal of Research in Music Education, 61(3), 318–333. https://doi.org/10.1177/0022429413497225
Smith, A. M., Floerke, V. A., & Thomas, A. K. (2016). Retrieval practice protects memory against acute stress. Science, 354(6315), 1046–1048. doi:10.1126/science.aah5067. https://pubmed.ncbi.nlm.nih.gov/27885031/
Soderstrom, N. C., & Bjork, R. A. (2014). Testing facilitates the regulation of subsequent study time. Journal of Memory and Language, 73, 99–115. https://doi.org/10.1016/j.jml.2014.03.003
Stambaugh, L. A., & Demorest, S. M. (2010). Effects of practice schedule on wind instrument performance: A preliminary application of a motor learning principle. Update: Applications of Research in Music Education, 28(2), 20–28. https://doi.org/10.1177/8755123310361768
Stambaugh, L. A. (2011). When repetition isn’t the best practice strategy: Effects of blocked and random practice schedules. Journal of Research in Music Education, 58(4), 368–383. https://doi.org/10.1177/0022429410385945
Standley, J. M. (2008). Does music instruction help children learn to read? Evidence of a meta-analysis. Applications of Research in Music Education, 27(1), 17–32. https://doi.org/10.1177/8755123308322270
Toppino, T. C., & Pagano, M. J. (2020). Metacognitive control over the distribution of retrieval practice with and without feedback and the efficacy of learners’ spacing choices. Memory & Cognition, 49, 467–479. https://doi.org/10.3758/s13421-020-01100-x
Tryon, W. W. (2001). Evaluating statistical difference, equivalence, and indeterminacy using inferential confidence intervals: An integrated alternative method of conducting null hypothesis statistical tests. Psychological methods, 6(4), 371. PMID:11778678
Vaughn, K. (2000). Music and mathematics: Modest support for the oft-claimed relationship. Journal of Aesthetic Education, 34(3/4), 149–166. https://doi.org/10.2307/3333641
Weinstein, Y., Madan, C. R., & Sumeracki, M. A. (2018). Teaching the science of learning. Cognitive Research: Principles and Implications, 3(1), 2. https://doi.org/10.1186/s41235-017-0087-y
Weinstein, Y., Nunes, L. D., & Karpicke, J. D. (2016). On the placement of practice questions during study. Journal of Experimental Psychology: Applied, 22(1), 72–84. doi:10.1037/xap0000071. https://pubmed.ncbi.nlm.nih.gov/26950160/
Wheeler, M., Ewers, M., & Buonanno, J. (2003). Different rates of forgetting following study versus test trials. Memory, 11(6), 571–580. https://doi.org/10.1080/09658210244000414
Wong, S. S. H., Chen, S., & Lim, S. W. H. (2020). Learning melodic musical intervals: To block or to interleave? Psychology of Music. https://doi.org/10.1177/0305735620922595
Appendix 1: Analysis of Experiment Results
Experiment 1: Results and Discussion
Performance during the encoding phase. We used an alpha level of .05 for all statistical tests. To examine if retrieval practice helped with better learning performance, a one-way repeated measures ANOVA was conducted. The results indicated that there was a significant difference in performance during the encoding phase across different learning schedules, F (2, 54) = 15.38, p < 0.01, η2 = .363. A post hoc pairwise comparison using the Bonferroni correction revealed that there was no significant difference between study-only (M = .95, SD = .08) and blocked retrieval practice schedule (M = .90, SD = .16). However, learning performance during the alternated retrieval practice schedule (M = .84, SD = .13) significantly differed from both study-only and blocked retrieval practice schedules. These results indicate that participants’ learning performance was comparable in study-only and blocked retrieval practice learning schedules. On the other hand, participants’ learning performance was the worst in the alternated retrieval practice schedule.
Memory accuracy on the final test. To examine if there was a significant difference in memory performance across different learning schedules, a one-way repeated measures ANOVA was conducted. The results showed that there was no significant difference in memory accuracy on the final test across study-only (M = .85, SD = .28), blocked retrieval practice (M = .87, SD = .2) and alternated retrieval practice (M = .89, SD = .19) learning schedules, F (2, 54) = 0.24, p > .05, η2 = .009. Even though there was no significant difference on the final test across different learning schedules, the results were in the predicted direction that the study-only schedule produced the lowest memory accuracy on the final test compared to both retrieval practice schedules. In addition, we compared the differences in the means across the learning schedules in the lowest-performing group (n = 14) to address the possible ceiling effect. Although there was no significant difference across the learning schedules, we observed that the differences in means between the learning schedules were larger, F (2, 26) = 0.30, p > .05, η2 = .023.
Prediction accuracy. We also investigated whether there was a significant difference in participants’ overconfidence/underconfidence regarding their future memory performance across different learning schedules. First, calibration was estimated by subtracting final memory performance scores from participants’ JOLs (prediction) scores. This produced a bias score, with positive values indicating overconfidence and negative scores indicating underconfidence. Because the data were collected on different scales (0–10 for predictions given that a 0–10 scale is used in previous literature and 0–15 for memory accuracy, given that each melody had 15 notes), both scores were rescaled between 0 to 1 in order to compare final memory scores to JOLs (Karpicke, 2009; Toppino & Pagano, 2020). After computing bias scores, a one-way repeated measures ANOVA was conducted to determine whether there was a significant difference across different learning schedules in terms of participants’ confidence in their subsequent memory performance. Since Mauchly’s test showed that the assumption of sphericity had been violated, χ2 = 10.78, p < .01, a Greenhouse-Geisser correction was used. The results showed that there was no significant difference in participants’ confidence across study only (M = -.06, SD = .26), blocked retrieval practice (M = -.12, SD = .22) and alternating retrieval practice (M = - .09, SD = .18) learning schedules, F (1.49, 40.32) = 0.59, p > .05, η2 = .021. In general, the participants appeared to be underconfident about their future memory performance in all three learning schedules. Thus, the participants’ metacognitive judgments were comparable across conditions regardless of which learning schedule they used. In addition, we included self-report for those who considered themselves professional musicians (n = 15) vs. nonprofessional musicians (n = 13) as a between-subjects factor and performed the analysis. Results showed that there was no significant main effect of profession self-report in prediction accuracy, F (1, 26) = 0.02, p > .05, η2 < .001. There was also no significant interaction between profession self-report and learning schedule in prediction accuracy, F (1.49, 38.83) = 2.13, p > .05, η2 = .076. This shows that both self-reported professional musicians (M = -.09) and self-reported non-musicians (M = -.09) appeared to be underconfident across all three learning schedules.
Experiment 2: Results and Discussion
Significance was set at .05 for all statistical tests. The results of the two-way mixed ANOVA showed that there was no significant main effect of learning schedule on memory accuracy, F (2, 94) = 2.84, p > .05, η2 = .057, suggesting that participants in all three learning schedules performed similarly on the memory test regardless of their delay condition. This means that overall, when we ignore whether the participant took the final test in 10 minutes or two days, the type of learning schedule did not significantly influence their memory accuracy. Overall, memory accuracy was highest in alternated retrieval practice condition (M = .56), followed by the blocked retrieval practice condition (M = .48) and the study-study condition (M = .42). However, none of these differences reached statistical difference.
In addition, there was no significant main effect of delay on memory accuracy F (1, 47) = 1.61, p > .05, η2 = .033, suggesting that participants in both delay conditions had a similar memory performance regardless of their learning schedule. This means that overall, when we ignore which learning schedule the participant used to learn the melody, the delay for the final test did not influence their memory accuracy. In general, participants had higher memory accuracy after 10 minutes (M = .53) compared to two days (M = .44), though the difference between delay conditions was not significant. Finally, there was no significant interaction between learning schedule and delay, F (2, 94) = 1.66, p > .05, η2 = .034. Thus, in contrast to our predictions, a significant benefit of testing did not emerge after a longer delay (two days).
Appendix 2: Instructions and Questionnaire
Introduction and Instructions from Experiment 1 (read for all participants)
“Thank you for coming to participate in the piano music memorization experiment. Please take a seat and read through this information sheet. Let me know if you have any questions.”
“In this experiment, I will be recording what you play via MIDI into Logic. I will be recording only what you play, not what you say. There will be no way to identify who you are from the recording.”
“Today you are going to be learning three different pieces. The pieces should be very easy to sight-read. You will get to play each piece multiple times with one hand; some of the time you will have the music in front of you, while other times you won’t, so you will have to try to play the piece from memory. After you play each piece 10 times, you will take a 10-minute break to do a computer task, and then try to play the piece again from memory. Please do not worry if you cannot remember much of the piece—just try your best.”
Main Script
“Now we will start with the first piece”
==========
Piano Questionnaire
1. If you had to learn similar melodies in the future, which practice schedule would you prefer to use?
- Repeatedly playing from the music
- Alternating playing from the music and playing without the music
- A block of playing from the music, followed by a block of playing without the music
2. Which of the three melodies do you believe you performed best on?
- The one you repeatedly played from the music
- The one you alternatingly played from the music and without the music
- The one you first played from the music 5 times and then without the music 5 times
3. Why do you believe this?
4. How many years of formal piano training have you had? _____
5. Have you had formal training on any other instruments or on voice, and if so, then how many years for each?
6. Would you consider yourself to be a professional musician? YES NO
7. On a scale of 1–5 (1 being the least, 5 being the most), how skilled are you in the following areas?
- Sight-Reading – being able to easily play any written music that someone puts in front of you
1 2 3 4 5
- Memorization – being able to easily play a tune without written music
1 2 3 4 5
- Performance – polishing pieces to a performance level
1 2 3 4 5
Gender ________________ Age ______
Is English your primary language? ______ No ______ Yes
If no, then how many years have you been speaking English? ______
What is your major? __________________
Do you require any special accommodations for exams? _____No ______Yes
If yes, please specify: ___________________________________________________________
Last modified on Monday, 18/04/2022
Paula Telesco, Meltem Karaca, Hannah Ewing, Kelsey Gilbert, Sarah Lipitz, and Jude Weinstein-Jones

Paula Telesco (PhD, The Ohio State University) is an Associate Professor of Music Theory and Aural Skills at the University of Massachusetts Lowell. Her research interests are music theory and aural-skills pedagogy, music cognition, the history of music theory, and musical enharmonicism. https://www.uml.edu/FAHSS/music/faculty/telesco-paula.aspx
Meltem Karaca is a PhD student at the University of Massachusetts Lowell. Her research interests are memory, metacognition, and self-perceptions of aging.
Hannah Ewing earned her BA in Psychology from the University of Massachusetts Lowell.
Kelsey Gilbert is a PsyD candidate in clinical psychology at the University of Hartford. She completed her undergraduate education at the University of Massachusetts Lowell where she was a student researcher in cognitive psychology. Her current research interests are eating disorders and weight stigma.
Sarah Lipitz is Doctoral candidate in Biobehavioral Health at Penn State University.
Jude Weinstein-Jones, PhD is a former cognitive psychologist and co-author of “How We Learn: A Visual Guide.” They now work in community health.