The following presentation is meant as a contribution supporting the view that music theory is an autonomous science. I deal with the methodological preconditions which, in my view, have to be fulfilled by practitioners of such a science. As a suitable testing ground for the alleged autonomy of music theory as a science I consider the problem of verification. By verification I mean the empirical verification of analytical statements about music. Since it is my view that a theory of music is a part of cognitive science, I discuss claims of autonomy with regard to the cognitive validity of music—analytical statements. That is, I am concerned with the question of whether and to what extent analytical statements about music are cognitively valid descriptions of actual listening situations. In positive tones, I am dealing with the nature of a comprehensive analysis of music.

A theory of music qualifies as a part of cognitive science if in its formulations it is taken into account that music comes into existence only through mental processes, and that understanding such processes is crucial for understanding a particular music. Thinking music cognitively means, in plain terms, not to take the existence of music for granted. While a musicologist can take it for granted that something is music, a music theorist cannot do so. He is obliged to take his own analytical processes into account when presenting something as music. As a listener who thinks aloud by talking, the music theorist must account for the mental processes of the human subject who functions as the ideal and authentic listener of a particular music. If the music theorist is not fulfilling this obligation, he will find it impossible to show explicitly that actual listeners indeed hear the music he analyzes as they are thought to do; he cannot even explicitly show that listeners can be taught to hear a music according to his analysis.

In this presentation I am concerned with the causes on account of which present-day music theories, in spite of their often formidable sophistication, lack cognitive validity as descriptions of actual listening situations. The lack of cognitive validity these theories exhibit is for me proportional to the lack of scientific autonomy of the discipline they represent. The lack of scientific autonomy of music theory is strikingly conveyed by the frequent usage that equates music theory with music analysis. The term "music analysis" designates both the performance of a musical task and the result of such a performance as communicated in terms of declarative knowledge. Evidently, a task performance is not a theory (although it certainly embodies one), nor are the results of an analytical process by definition a theory. A theory is a set of explicit hypotheses of which it can be shown that they account for some set of empirical data. It remains to be clarified what music-analytical hypotheses are hypotheses about, and what is the relevant set of empirical data with regard to which they can be verified.

Due to limitations of memory, every human being listening to a particular music performs a music analysis. The analysis renders a theory concerning the features on account of which the music makes sense to the listener. Therefore, every human listener can be said to possess a common-sense or pop-theory of music. It is not evident to me why this pop-theory of music, if it works in actual performances of listening tasks, should be any less substantial than a scientific theory of music. I am rather inclined to think that usually pop-theories of music are far more substantial than scientific theories of music, since there is no more substantial proof of the truthfulness of a theory of music than the fact that it works in actual listening situations. The difficulty of formulating a scientific music theory having the cognitive validity that usually characterizes a pop-theory of music has to do with the real-time demands a music theory has to fulfill in order to be a working theory. My contention is that it would be most valuable for the academic music theorist to understand the structure and implementation of music-analytical pop-theories. I consider pop-theories that are documented by scientific observations of actual listeners as crucial data for verifying scientific music theories.

The difference between pop-theories and scientific theories of music is threefold. First, pop-theories concern an individual, real listener of music while scientific theories concern a generalized, ideal listener. Second, pop-theories usually strive for efficiency with regard to the music concerned, while scientific theories strive for authenticity. Third, pop-theories are behavioral programs assembled for the purpose of understanding a music, while scientific theories of music state explicit assumptions concerning the musical competence a listener is required to possess in order to understand a music. Roughly then, the relationship of a scientific to a pop-theory of music is that of a hypothesis concerning the knowledge of an ideal musical cognizer to a procedure for actually utilizing that knowledge in a concrete listening task. That is, the two theories relate as do hypothesis and verification procedure.

The main deficiency of pop-theories of music is their implicitness, and sometimes their lack of authenticity with regard to the music concerned. The main deficiency of scientific theories of music to the present day has been their merely hypothetical character. I see no justification for considering music theory as an autonomous science as long as its hypotheses do not possess the efficiency of pop-theories of music. The efficiency of a cognitive theory depends on whether the theory can be empirically verified as being a mechanism actually used in some human task performance.

The verification problem here referred to concerns the gulf that separates the ideal and the real listener of music. This problem comes into existence because musical minds in academe set up hypotheses concerning an ideal listener of music and never get down to confronting their hypotheses with data obtainable through an observation of actual listening processes. I consider data documenting such processes as empirical, in contrast to data such as scores which are symbolic data. Symbolic data are either visual (graphic) or verbal and symbolize musical structures held in long-term memory. The use of symbolic data in musical verification procedures is dubious and inconclusive for two reasons. First, symbolic data are interpretive of the music they document and therefore themselves in need of an explication. Second, the use of these data is by necessity self-referential or, in epistemological parlance, subjective. Until recently it has been a valid justification for using symbolic data in music theory that empirical data needed to verify music-analytical hypotheses were not available. I contend that this justification has been eliminated by advances in the cognitive sciences. This fact forces the music theorist to reconsider what it is he is actually doing.

The scientific analysis of some music is a systematic description of those musical features that a music comprehender (known as the analyst) considers as cognitively relevant for comprehending some music. The analyst is not explicating empirical differences in the understanding of music by different listeners but is generalizing on the basis of his own understanding; he is hypothesizing an ideal listener for some music. Hypotheses concerning the ideal listener can be utilized in two essentially different ways. First, one can use such hypotheses as an aid in teaching musical novices how to become musical experts. This is the applied-music interpretation of music-analytical hypotheses which never goes beyond the domain of symbolic data. This interpretation condemns the theory to being a—however sophisticated—teaching aid. (Evidently, one cannot base a scientific theory on a set of teaching aids, except if the cognitive conditions of their application in teaching are elaborately specified.) Second, one can utilize the codification of an ideal listener as a hypothesis guiding the formulation of a music-understanding system. A music-understanding system is a comprehensive analysis of an actual listening task with regard to some music.

In the second application, a music analysis represents a step in theory building that is usually called hypothesis formulation. Here the music theorist acts as a cognitive scientist whose task it is to formulate, implement and test an ideal listener for some music in the form of a performance system for understanding music. The software to be defined for such a system is a proto-theory that remains to be tested against empirical data documenting human performances of listening tasks. The start limitations of conventional music analyses for purposes of the second, strictly scientific, application are easily pointed out.

Conventional analyses of some music only describe the final cognitive state of a music comprehension process, viz. its goal-state. They are goal-state analyses of music. This entails that such analyses are incomplete descriptions of the music-understanding system, or listener, they refer to. In particular, goal-state analyses of music leave the initial state, or general competence, of a listener unexplicated (except if they concentrate on this state as an alleged "musical system" of which a music is considered an instantiation); nor do conventional analyses explicate the set of intermediary knowledge states through which a listener proceeds to the goal-state of understanding a music. Therefore, in terms of cognitive system design, music analyses fail to reach their goal; they define a musical understanding but provide no data or procedures for verifying that understanding. In what follows I outline the problems posed by a comprehensive analysis of music.



I refer to a comprehensive analysis of some music as a process model. The model is a description of musical structures as activated by the mental processes of some actual listener. A process model describes a musical situation in which the music under discussion is actually used, or experienced. The model presupposes three complementary analyses of some music, viz. an initial-state, an intermediary-state, and a goal-state, analysis. The states here referred to are knowledge states of a music-understanding system called a listener. Regarding the music he deals with, the listener needs a certain knowledge base upon which to operate during listening. To define this knowledge-base is a task pursued in the initial-state analysis of the music. The analysis explicitly states the kinds of knowledge the listener must have recourse to in order to understand the music. Given a defined initial state, it must be shown how the knowledge base attributed to the listener, i.e. his competence, is actually used during his performance of the listening task. To describe this process is the purpose of the intermediary-state analysis. This analysis explicates the structure of actual listening performances. Listening performances can only be described if their goal-state is at least hypothetically known; if this condition is fulfilled one can postulate a mechanism that is sufficient for reaching the goal-state from the initial state. Goal-state descriptions are provided by most conventional music analyses. The notion of an initial-state analysis of music has been pioneered by the Princeton and Yale Schools in the form of a description of so-called "musical systems." It remains to be clarified what is the purpose and format of an intermediary-state analysis of music. Such an analysis shows how the initial and the goal-state of the music-understanding process are linked to each other. It is to be clarified, then, in what sense the initial-state analysis serves as an input to, and the goal-state analysis as an output of, the intermediary-state analysis.

From the vantage point of an intermediary-state analysis of music, the ideal listener stipulated by a goal-state analysis is a procedure for musical understanding that has been defined outside of the context of a well-defined, real-time task. The codification of an ideal listener claims to be true regardless of the real-time constraints and the memory limitations characterizing actual listeners. The hallmark of an ideal listener is authenticity, not efficiency. However, knowledge elements and procedures ascribed to an ideal listener that are provably beyond the performance capabilities of real listeners belong to a world of make-believe. I take the view that no scientific purpose is served by proclaiming in an analysis what listeners should hear; such proclamations belong to the domain of music education and should be clearly designated as such. Music analyses claiming scientific status can only be concerned with what listeners actually hear, or with what listeners can be shown to be able to learn to hear in a music. (In the latter case, the analysis must include the explication of the cognitive processes required for comprehending what the analysis stipulates can be learned.)

The first step in taking an ideal listener to test is to conceptualize the components of a music-understanding system that is engaged in a specific task. A listener attending a concert, for example, has a certain set of knowledge elements at his disposition. I would group these elements into three distinct components, entitled the Knowledge System, K, the Understand System, U, and the Performance System, P, as shown in Figure 1.


Figure 1. Model of an ideal listener.



The Knowledge System incorporates the knowledge base the listener draws upon during his performance; it represents the listener's general competence, that is, his task-independent musical knowledge. The knowledge represented in the K-System is two-fold. It comprises music-structural knowledge in declarative form, on the one hand, and largely unverbalizable procedural knowledge concerning the use of music-structural information, on the other. (To know what is a major scale is to possess declarative musical knowledge, while to be able to play a major scale on the piano requires procedural musical knowledge.) Music analyses only deal with the first of the two kinds of knowledge, viz. declarative knowledge.

The listener's Understand System receives input from the Knowledge System and outputs, at performance time, a behavioral program that the Performance System can execute. In order to output an efficient performance program for some task, the Understand System must bring about a transformation of the general knowledge in the Knowledge System into the specific knowledge elements required by the Performance System. Verification problems in music theory concern the question of whether the knowledge base stipulated in some goal-state analysis is sufficient for an actual listener's understanding of the music concerned. A set of knowledge elements is sufficient for a listening task if, first, it comprises adequate knowledge sources for comprehending the music, and second, if the use of the knowledge can be shown to be within the cognitive (real-time) capabilities of an actual listener. In order to show that a given description of an ideal listener has cognitive validity, one needs to formulate and test a model of an actual listener.

Following precedents in cognitive psychology, I suggest that in order to comprehend the behavior of an actual listener, one needs to perform an empirical analysis of the memory system the listener activates during listening. The following is a model of an actual listener as a distributed memory system.


Figure 2. The empirical listener as a distributed memory system (linked to the schema representing an ideal listener).



The diagram implies that the listener's musical memory is distributed among several submemories each of which has a different life time. The buffers composing the listener's total memory form a chain along which successively longer time-periods for storing musical information become available. The chain leads from a time constant of a quarter-second for echoic (preperceptual) memory (EM) via perceptual (PM), short-term (STM) and contextual memory (CM) to long-term memory (LTM) in which musical information is stored indefinitely. Both echoic and perceptual memory store musical information in the form of perceptual traces, while short-term memory is "syntactic," storing information in labeled chunks, that is, categorically. Long-term memory stores both perceptually encoded (PM) and categorized information (STM); it has an active portion called contextual memory (CM) in which a model of the current musical world, or context, is stored and continually updated.

The actual working memory of the listener (WM) comprises items of information from the perceptual, short-term, and contextual memories. The procedural relevance of working memory lies in the fact that the executive monitor of the listener (CPU) has direct access to this memory. This entails that the listener's actions are dependent, in terms of memory, upon the transitory contents available at any time in working memory (WM).

The above-stated model of an actual listener can be linked to the model of an ideal listener by an observation concerning the sources from which the Knowledge, Understand, and Performance Systems receive input. The K-System takes its input from long-term memory, while the P-System takes its input from working-memory. The Understand System, which receives input from long-term memory, transforms this input into contents of working-memory which the Performance System can access via short-term memory.

I can now be more specific about the nature and function of an intermediary-state analysis (or, as it is also called, an empirical task analysis) of music. Such an analysis is an explication of an actual listening experience regarding some music in terms of the distributed memory system of an actual listener. The analysis shows how the listener has used his Knowledge System in order to reach some goal-state of comprehension. To document the intermediary steps of comprehension taken by the listener one proceeds in two phases. One first formulates a conceptual analysis of the listening task in question and proposes a mechanism—if possible programmable—that is sufficient for carrying out the task. Second, one verifies the sufficiency hypothesis formulated during the first phase against empirical data documenting task performances of actual listeners. Empirical data are usually obtained in the form of problem-solving protocols. Protocols document the responses and actions of a human subject who expresses himself graphically, verbally, or by manipulating some sounding repertory. Protocols are analyzed in terms of the memory processes their responses and actions presuppose, and such an analysis can be given in terms of a programming language of (so-called) production systems. As I cannot go into details of protocol analysis here, I have to refer you to the appropriate sections of my Music, Memory, and Thought1 in which the technique is described in some detail.

Assume now that the theorist wishes to formulate an intermediary-state description of the listening process required for comprehending some music. His problem is then to find a way of formulating the analysis of the music concerned in a format that is compatible with the format in which listening processes are explicated. The solution to this problem requires that a meta-language be found in which to formulate a goal-state analysis of the music. The goal-state analysis stipulates what is the state of comprehension that the empirical listening process is attempting to reach. Clearly, it must be avoided in formulating this analysis, to attribute to the ideal listener cognitive structures and procedures that are provably beyond the memory capacity and procedural know-how of actual listeners.

I suggest that a meta-language for formulating musical goal-state analyses in the context of an empirical task analysis of listening is available. As Joseph Kunst has shown in a recent article in Interface,2 musical goal-states can be formulated in the form of networks using the operators of modal logic. To make use of Kunst's network language, one must take into consideration that musical goal-states (states of comprehension) are sets of transitory knowledge states that have been built up in contextual memory (CM) and deposited in long-term memory (LTM). Musical long-term memory is an agency in which the past of a music is accumulated by a listener. A music makes sense to a listener if it behaves as it always did in the past.3 A music's past has two aspects. It is, first, the long-term past of the tradition of which the music forms an integral part; it is, second, the immediate or current past of the music as it accumulates during the act of listening. The first aspect of the musical past concerns the competence a listener is required to have for understanding a music. The current past of a music concerns a model of the current musical world, or context, which at any moment determines the listener's interpretation of the music. The music's long-term past is stored in long-term memory in the form of syntactic and semantic primitives. The music's current past is stored in contextual memory, viz. in terms of conceptual structures defining a current model of the listener's musical world. As I have shown in Music, Memory, and Thought,4 following J. Kunst, a model of the current musical world of a listener can be formulated in terms of music-semantic networks. Space restrictions hinder me from discussing such networks in any detail.

The crucial point here to be made is that a goal-state analysis, when formulated in the format of a music-semantic network, can be embedded in an intermediary-state analysis of music. The network is an accumulative set of partial goal-states which yield a definitive state of comprehension. As an hypothesis, the network represents the set of all musical interpretations an ideal listener can possibly generate on his way to final comprehension (given some defined initial state). A network hypothesis formulated for an ideal listener can be said to possess cognitive validity only if it can be empirically verified, or at least falsified, by scrutinizing data obtained from actual listeners.



I briefly summarize my conclusions concerning a process model of musical structures, referring throughout to the diagram of Figure 3. The diagram shows the three kinds of analysis required for formulating a process model for music; each of these analyses emphatically relates to a particular part of memory.


Figure 3. Process model of musical structures.



The initial-state analysis stipulates the syntactic and semantic primitives required for generating musical interpretations during listening; ultimately, the analysis is concerned with that part of long-term memory which might be called semantic memory. The goal-state analysis is a codification of transitory knowledge states that have accrued in contextual memory (CM) and have been deposited in long-term memory (LTM). The analysis describes the epistemic structures supposedly comprehended by the (ideal) listener. The intermediary-state analysis explicates the listener's operations upon network structures held in contextual memory. The analysis is formulated in terms of an explicit description of successive states of working memory (WM), stated in terms of production systems. The process model is synthesized from all three analyses, most directly from the goal-state and the intermediary-state analysis. (The initial-state analysis is a precondition of both of these analyses.) It is stated in the model what production programs have been executed by the listener in order to establish, maintain, and update contextual memory (network memory).

The process model is a procedural description of a listener as a music-understanding system for some specific music. The model embodies an hypothesis as to what a listener can (ideally) comprehend of a music and also shows how he actually goes about comprehending the music. In demonstrating how comprehension takes place in terms of a distributed memory system, the model provides a verification of the ideal listener hypothesized in the goal-state analysis. Being a joint description of a listener and his object of comprehension, the process model can be said to realize a comprehensive music analysis.

1Otto Laske, Music, Memory, and Thought: Explorations in Cognitive Musicology (Ann Arbor: University Microfilms, 1977), chapters 3, 4, and 10.

2Joseph Kunst, "Making Sense in Music: The Use of Mathematical Logic," Interface (Amsterdam: Swets & Zeitlinger), V, 2, 1976.

3Kunst, p. 9.

4Laske, chapters 4 and 5.

1874 Last modified on November 9, 2018