Graduate Versus Undergraduate Experimental Scholarship: What Should Be Our Expectations?

October 1, 1982

Graduate students in almost every discipline are required to do independent research within their respective fields. Undergraduates, however, are generally deemed incapable of producing consequential experimental research or scholarly papers on their own. Sometimes undergraduates who are exceptionally gifted write an "honors paper" or give evidence of unusual academic prowess in other endeavors. Yet it is still considered uncommon for the general undergraduate student to become actively involved in producing scholarship judged suitable for publication or presentation at professional meetings.

Music students who are studying to become Registered Music Therapists at both the undergraduate and equivalency (graduate) level represent a unique and interesting population for direct comparisons concerning experimental scholarship. The basic core course requirements for subsequent national registration of these students include a good deal of independent library and research projects generally subsumed in a series of courses taken by both undergraduates and graduate students that provides the basis for direct comparisons of scholarship. This study was designed to determine if undergraduates are indeed capable of producing consequential scholarly products comparable to those completed by graduates.

Investigations within this area are limited although results of a project by Farmer revealed "that undergraduate students can assume responsibility for identifying and solving problems and that they can formulate hypotheses, obtain data, organize data, apply statistical tests, and present appropriate interpretations and implications."1 Several case reports by undergraduates which contained rather sophisticated research techniques were reported by Michel and Madsen.2 Also, in a study that compared graduate and undergraduate research reports, it was found that there was no significant difference between the two groups as judged by a panel of experts (Hanser and Madsen).3


Graduate and undergraduate students enrolled in the last course of the Psychology of Music sequence at the Florida State University served as subjects (graduate n = 22; undergraduate n = 23). Four work-products were selected for comparison: (1) a test covering concepts and terminology necessary to understand the experimental literature; (2) an individual change project documented and written in research style; (3) a scholarly book review of the text used in the course and (4) a final research project.

The test covering concepts and terminology had been developed over the past several years and was judged to be both valid and reliable (KR40 = .887). The test was scored on the following criteria: 1 point for each of 20 true/false questions; 2 points for each of 10 multiple choice questions; 2 points for each of 10 fill-in questions; and 40 points for an essay question based upon the use of specific behavioral definitions, criteria for reaching a terminal objective, at least five sequential shaping steps to teach the selected objective, and the inclusion of an appropriate contingency contract in relationship to the objective. Allowances were made for multiple answers.

The individual change project was scored on the following criteria: 20 points for specific analysis of the behavior to be changed (Pinpoint) on the basis of subject description, history of the problem, behavioral description of the problem, and the desired direction of change; 20 points for precise record keeping (Record) on the basis of the unit of behavior being recorded, the unit of time of observation, the precise observation procedure, and computation of reliability; 20 points for the specific strategy for change (Consequate) on the basis of precise descriptions concerning what the subject must do to effect any change (e.g. to earn rewards or punishers); and 40 points for an evaluation of subsequent change (Evaluate) on the basis of correct set-up of graph, number of specified days strategy was implemented, evidence of behavioral change, appropriate use of consequences, and use of any additional explanations if applicable.

The scholarly book review covering the class text was scored on the following criteria: (1) Content follows standards for published reviews. Material was subjectively graded and assigned scores of 50 points (below average), 60 points (average), 70 points (good), and 75 points (excellent) for total content maximum of 75%; and (2) mechanical procedures follow standards for published reviews. Two typographical errors were allowed. One point for each remaining error (typographical, punctuative, or grammatical) was deducted from the total mechanical score. Mechanical procedures accounted for 25% of the total score.

Reliability was computed on each aspect for consistency of evaluating procedures. Reliability concerning the first three measures indicated: (1) examination scores .98; (2) individual change project .91; and (3) book review .87 (agreements/agreements + disagreements).

The research papers were scored by two experts on the basis of criteria specified by the editorial board for publication in the Journal of Music Therapy. These experts (both presently serving on editorial boards) independently rated the research papers on the basis of specified criteria including grammar, organization, clarity, research design, data analysis, and style. Ratings were then assigned: (1) publish (quality suitable for scholarly publication or presentation at a professional meeting), (2) publish (or present) with modifications, and (3) reject (quality not suitable for publication). Raters did not know that papers were submitted by students. These experts were asked "to evaluate research which had been submitted for publication as a reliability check for the editorial board."


Each subject received an individual score on the basis of specific criteria for each of the first three projects (i.e. examination covering concepts and terminology, the individual change-report and the scholarly book review). The final research product was judged as (1) publish, (2) publish with modifications or (3) reject, do not publish. Undergraduate subject scores over the first three aspects were compared to the graduate subject scores. Additionally, two independent t tests, the Mann-Whitney U test and χ2 for independent samples were computed for statistical comparisons.

Data indicate that on the examination covering concepts and terminology the undergraduates were indeed significantly different from graduates. The average for undergraduate scores was χ = 68.30. The graduate average was χ = 84.50 (t = 5.26 df = 43 p < .01). Analysis of scores over the individual change project did not evidence significant differences: undergraduates χ = 96.09; graduate χ = 93.86 (t = 1.3 df = 43 p > .05). Also, analysis of the book review scores did not indicate significant differences (undergraduates χ = 86.22; graduates χ = 87.59; U = 228.5 p > .05).

An analysis of the final research reports also indicated that there were no significant differences (undergraduates χ = 2.478; graduates χ = 2.522 χ2 = .12). It seems consequential that seven of the undergraduate papers and eight of the graduate papers were judged as suitable for publication (although approximately half with modifications). Nevertheless, since the scoring criteria for these papers were so stringent, it would appear that the quality level was quite high, especially considering that seven of 23 papers were selected from the undergraduates. Reliability for judges of the final research projects was .80 (agreements / agreements + disagreements).


It is difficult to control experimental variables, especially in an educational setting. In this study, a number of steps were taken to reduce the influence of unrelated variables apparent in such a study. It is assumed that motivation of the students could be different in ways untested yet perhaps related to this project. However, it should be remembered that the undergraduates took the same course as did the graduates (in different sections during the same academic term). Thus undergraduates did have a structured class environment very similar to that of the graduates. As regards controlling for subject history and motivation, it would appear that when the large difference between the two groups in age and experience is considered, other small disparities seem inconsequential.

This study attempted to isolate the research prowess of undergraduates as compared to graduates while attempting to control traditional contingencies (i.e., grades). Undergraduates followed the same point system for grades as did the graduates. It should be noted that all students completed these four projects during their structured class, but were not given specific "feedback" concerning how they could improve. Thus both groups represented similar as well as somewhat comparable units.

The significant difference between the two groups on the examination covering concepts and terminology seems interesting, especially in relationship to the other dependent measures. This test had been developed over a period of several years and represented a very traditional measure of student achievement. That is it was judged to be both valid and reliable and served to discriminate among students. The other measures used in this study were criterion referenced in that very specific directions were given for their execution and completion. Perhaps traditional differentiations attributed to undergraduates and graduates depend more on "norm referenced" measures as opposed to "criterion referenced" tasks. This study was not designed to test this issue and much more research seems needed in order to test this supposition. Also it seems obvious that the similarities between graduates and undergraduates might not reflect the superior acumen of the undergraduates but the lack of scholarship of the graduates. However, analysis of the final projects would appear to suggest that both groups produced work of very high quality. Even if this is the case much more research seems warranted. Since judges rating the final research reports were told that the papers had been submitted for publication, perhaps the research was rated differently than if judges had been told that the papers were produced by students (i.e., halo effect). This aspect also needs more research.

It is assumed that many contingencies (uncontrolled variables) were operating within this study. It is also assumed that these contingencies may have had something to do with explaining why graduate research efforts were not significantly better than undergraduate products. While much more research needs to be done in this area, the foremost question would appear to be: What measures can be taken to encourage both undergraduates and graduates in music to gain as much independent scholarship and research prowess as possible? Indeed, what should be our expectations?

This study was sponsored by the Center for Music Research, School of Music, The Florida State University.

1G.C. Farmer, "Research Experiences and Methods Courses," Improving College and University Teaching, Vol. 16 (Spring 1968), p. 148.

2D.E. Michel and C.K. Madsen, "Examples of Research in Music Therapy as a Function of Undergraduate Education," Journal of Music Therapy, Vol. 6, No. 1 (1969), p. 22.

3S.B. Hanser and C.K. Madsen, "Comparisons of Graduate and Undergraduate Research in Music Therapy," Journal of Music Therapy, Vol. 9, No. 2 (1972), pp. 88-93.

2002 Last modified on October 24, 2018