When you listen to a male speaking voice emanating from your hi-fi stereo set while sound reproduction of the bass range is well enhanced, you will hear the lowest Fourier component of the uttered vowels, i.e., the so-called fundamental, rising and falling in pitch, just as if it were a boosted bass tone of music.It indeed is a bass tone, as the oscillation frequency of a typical adult male speaking voice extends from about 70 to 150 Hz. This is one of the rare cases in which you may consciously perceive the fundamental - and its pitch - of a male speaking voice. Especially for the reproduction of speech it is neither necessary nor desirable to boost the low-frequency spectral components, because intelligibility suffers from it, and the voice pitch (intonation) is very well perceived without acoustic reproduction of the fundamental. This is the reason why, e.g., the telephone channel does not distort the pitch of speech although transmission is confined to the frequency range from about 300 to about 3400 Hz. When, in the above "experiment", you suppress bass reproduction using the equalizer of your stereo set, you will notice that the fundamental gets inaudible, while the speaker's voice pitch continues to be well reproduced.
The kind of pitch of the fundamental that you may hear if the fundamental indeed is strong enough, is the pitch of a sine tone; it is of the spectral pitch type. The pitch that you ordinarily hear, however, is not dependent on the fundamental being audible; it is by the auditory system extracted from a range of the Fourier spectrum that extends above the fundamental. The latter type of pitch is termed virtual pitch [14] , [17].
From the above example it can be concluded that, as an attribute of auditory sensation, virtual pitch is fundamentally different in type from spectral pitch. This conclusion is strongly suggested by the fact that one can hear both types of pitch at a time, having the same height. Evidently, it is possible to communicate one and the same pitch (in terms of pitch height) through two drastically different perceptual "channels": Spectral pitch is communicated immediately, i.e. by a Fourier component's frequency, while virtual pitch is communicated by providing to the auditory system information about the oscillation frequency of a complex signal that is implied in the Fourier spectrum as a whole.
Moreover, the above example illustrates that in real life perception of virtual pitch is far from being an exotic, rare phenomenon or even a kind of illusion. On the contrary, perception of virtual pitch is the rule rather than an exception.
In the acoustic laboratory, the above example can be mirrored by the following demonstration. A harmonic complex tone is generated that only includes, e.g., the 5th, 6th, 7th, and 8th harmonics of 200 Hz. (Such type of sound was by Schouten [1940a] termed a "residue".) That sound (especially when presented with a low intensity) elicits a virtual pitch corresponding to 200 Hz. Now, while listening to that "residue", another sound shall be switched on, namely, a 200-Hz sine tone. The latter is heard as another sound object, i.e., it can be distinguished from the "residue". Each of the two objects has its own pitch, i.e., independently of the other. Yet the two pitches are equal. One hears two different types of pitch (i.e., virtual and spectral) at a time, and it does not matter that they are equal in height.
While from a considerable number of observations it is evident that spectral pitch must be explained by a kind of "place theory" (see the discussion in topics diplacusis binauralis and spectral pitch), it is likewise evident that virtual pitch requires another explanation. So, spectral pitch and virtual pitch differ not only in the aforementioned phenomenological aspects; they also must be theoretically explained by basically different mechanisms. So, as the basic outline of explaining spectral pitch is apparent, possible mechanisms of virtual-pitch formation remain to be discussed, as follows.
An explanation of virtual pitch that on first sight is highly suggestive is the "time-domain solution", i.e., measurement of the period of the tone signal. (Which, of course, requires that the signal actually is periodic.) That kind of solution was suggested already in the 19th Century, in particular by Seebeck (1841a). Schouten (1940c, 1970a) strongly promoted that solution. However, already in the 1950s and 1960s new observations had indicated that the time-domain model in its original design was not adequate.
Indeed, there are quite a number of obstacles against time-domain modeling of virtual pitch, such as the following.
In spite of these difficulties with explaining virtual pitch in the time domain, a considerable number of attempts of that type have been described in the literature until present days. However, I am not aware of a satisfactory solution.
Returning to the notion that two separate solutions are required for the explanation of spectral pitch and virtual pitch, respectively, up to this point the question remained open as to how separate are these two mechanisms. The above time-domain model of virtual pitch - if it actually worked - were in fact totally separate from, i.e., independent of, the spectral-pitch mechanism. The two models would operate "in parallel". There is, however, the alternative that they operate "in series", more precisely, in hierarchical steps of processing. This indeed is how the virtual-pitch theory works.
The virtual-pitch theory [14], [17], [18], which I worked out in 1969/70, explicitly and strictly pursues the latter kind of approach. Virtual pitch is conceptualized as a percept of "higher order" than spectral pitch [10]. The relationship between virtual pitch and spectral pitch is conceptualized as analogous to the relationship between virtual and primary visual contour. Just like visual virtual contours (or, as they unfortunately have also been termed, "illusory" contours) are based on visual primary contours, so are virtual pitches based on spectral pitches.
Formation of virtual pitch can essentially be said to be a process of subharmonic matching: The tonal aspects of any sound are primarily represented by a set of spectral pitches, and pertinent virtual pitches are "inferred" on the basis of the presumption that in any case they must be subharmonic to the spectral pitches.
For example, when three Fourier components with the frequencies 600, 800, and 1000 Hz are heard, the auditory system is wise enough to know that "in real life nothing is just what it appears to be". That is, not only the spectral pitches corresponding to 600, 800, and 1000 Hz are apprehended, but it is supposed that these components are likely to be harmonics of a complex tone the lower harmonics of which have been attenuated or even removed by linear distortion of the sound path. So it is advisable to look out for the fundamental pitch of that prospective complex tone. This is done by assuming subharmonic virtual pitches of each of the spectral pitches to be candidates of the unknown fundamental pitch. In terms of frequency, those subharmonics are simply obtained by dividing every component frequency by each of the integer numbers 1, 2, 3, and so on, until about 12. Then the test of subharmonic match comes into play. Obviously, in the above example, a first "full match" is obtained at the fundamental frequency 200 Hz, as this is the 3rd subharmonic of 600 Hz, the fourth subharmonic of 800 Hz, and the fifth subharmonic of 1000 Hz. By definition, such a match amplifies the relative prominence which is assigned to any of the virtual-pitch candidates, such that the above "full match" will yield a well pronounced virtual pitch at 200 Hz. However, another "full match" is obtained at 100 Hz, and since this also is an oscillation frequency that frequently occurs in real life, it is considered as an alternative virtual pitch that competes in prominence with the former one. This is how both multiplicity of virtual pitch, and pitch-commonality (see topics tone affinity, octave equivalence) of harmonic complex tones come about.
This simple example should also suffice to illustrate that the formation of virtual pitch basically cannot fail for any type of sound, provided that a few spectral pitches are elicited. (Strictly, even just one spectral pitch can be sufficient as a cue to virtual pitch, see below.) That is, any type of sound, no matter how it was created, will stimulate the auditory virtual-pitch mechanism to look out for subharmonic pitch matches, and this will yield a number of virtual pitches that are more or less prominent. This is how the theory explains not only the pitch of harmonic complex tones of any kind, but also explains phenomena such as the strike note of bells, the root of musical chords, and the "acoustic bass" of the organ. Moreover, as virtual pitch is dependent on spectral pitches, any kind of pitch deviations and/or pitch shifts must be reflected in the resulting virtual pitches, i.e., in a manner that is predetermined by the process of subharmonic pitch matching. The theory can predict pitch shift effects of virtual pitch of a size and kind that indeed have been observed [9], [11], [13] (see also topic diplacusis binauralis).
So, the virtual pitch mechanism deals with both "harmonic" and "inharmonic" sounds as well, though internally it strictly sticks to the presumption that each and every virtual-pitch candidate must be a subharmonic of a spectral pitch. Why is this so?
The first part of the answer is that this is the method by which the auditory system detects periodicity, i.e., even of multiple sound objects. Periodicity of a sound signal is invariably represented by harmonicity of its Fourier components. Thus, any harmonic series of spectral pitches that can be found in a given set indicates that there is a periodic sound signal included in the entire complex sound.
The second part of the answer refers to the next question which immediately is elicited by the first, namely: why is detection of periodicity so important to the auditory system? The answer is that periodicity is a strong clue to a sound object. In real life, periodicity of an ear signal (i.e., the sound signal at the eardrum) is highly unlikely to occur when that ear signal is composed of the sound waves that emenate from two or more independent sources. Correspondingly, harmonicity of frequencies (and so, of spectral pitches) that originate from independent sound sources is highly unlikely to occur in real life. Hence, any harmonic series of spectral pitches that occur in a complex spectral-pitch pattern can safely be interpreted as emerging from a periodic portion of the auditory stimulus and can be ascribed to one particular sound source. The virtual pitch which is inferred from that harmonic series by subharmonic matching plays the role of a label that is attached to that periodic sound object. This is how the virtual-pitch theory accounts for both pitch determination and sound-object segregation.
These notions include an answer to yet another question, namely, why does the auditory system create virtual pitches at all? The answer is that virtual pitch is a label that characterizes a certain type of sound objects, namely periodic ones, including "supposedly" periodic ones. And segregation and identification of sound objects naturally is one of the fundamental functions of any hearing organ.
All this implies that the auditory system possesses "knowledge" about (sub)harmonic pitch intervals. That knowledge obviously is available to the system, i.e., as a kind of template on which the intervals between subharmonic pitches are engraved. As suggested by the existence of stretch of harmonic pitch intervals, that template appears to be at least affected, perhaps even entirely acquired, from natural samples of periodic sound signals. For Man, the most obvious and most important type of periodic sound is speech, i.e., its voiced elements. Acquisition of the template very probably happens in earliest, i.e., even pre-natal life [18], [22].
The virtual-pitch theory, after its foundation in [17], [18], was further elaborated to quantitatively account not only for the virtual and spectral pitches as such, but also to predict their relative prominence. Besides a number of other factors, the prominence of pitches is dependent on the frequency region in which the Fourier components occur (see topic dominant spectral region) [22], [38], [55], [56]. The entire process of pitch evaluation sketched above can be automatically carried out on a computer.
The virtual-pitch theory hardly needs to be defended in view of its success in explaining and quantitatively predicting a great variety of pitch phenomena. However, there is one aspect which deserves consideration. According to the virtual-pitch theory in its current design, there cannot occur any virtual pitch without existence of at least one spectral pitch. (That indeed one spectral pitch can be sufficient, was demonstrated by Houtgast 1976a). Occasionally it was claimed that (virtual) pitch can be observed in sounds which could not be conceived as eliciting any spectral pitch. To my knowledge, however, so far no direct experimental proof was offered for the total absence of spectral pitch in those test sounds. As I have pointed out on various occasions, it is almost impossible to find any sound that does not elicit even the faintest, and perhaps temporary, spectral pitch ([104] p. 323, see also topic spectral pitch). It may well be impossible to strictly prove the total absence of spectral pitch in any sound. Thus, the above evidence appears questionable, and disproving on that ground the assumption that for formation of virtual pitch at least one spectral pitch is required, may turn out to be impossible, as well.
However, one should not entirely ignore the possibility that the auditory percept of "rattling", or roughness, that ordinarily is evoked by a periodic sound signal - provided that the period is no less than about 3 ms - may suffice to evoke a faint and poorly defined sensation of virtual pitch (i.e., without existence of any spectral pitch). In my original layout of the virtual-pitch theory [17], [18] I have included this factor, though I assumed that roughness as a clue to virtual pitch may become useful only in combination with at least one spectral pitch. If there are any cases at all in which roughness could be proven to be sufficient as a clue to virtual pitch (i.e., without existence of any spectral pitch), I can conceive of such a virtual pitch only as a faint and poorly defined sensation. Moreover, the information on periodicity included in roughness can be available only for a single, isolated periodic stimulus. Auditory segregation of multiple sound signals on that basis does not appear to be conceivable.
In summary: When the votes provided by experimental evidence are counted and evaluated in a "democratic" manner, the hierarchical approach (spectral pitch determines virtual pitch) wins by a vast majority. Evidence in favor of the idea that virtual pitch emerges from an immediate analysis of an audio signal's time structure is rare and weak.