Within the area of audiocommunication there is agreement on the term pitch insofar as it basically refers to the tonal height of a sound object, e.g. a musical tone or the human voice. The use of the term pitch is, however, often inconsistent in that the term is both used for a stimulus parameter (i.e., synonymous to frequency) and for an attribute of auditory sensation. The people concerned with processing of speech mostly use the term in the former sense, meaning the fundamental frequency (oscillation frequency) of the glottal oscillation (vibration of the vocal folds). In psychoacoustics (and so in the present discussion) the term is throughout used in the latter sense, i.e. meaning an auditory (subjective) attribute. The ANSI definition of psychoacoustical terminology says that pitch is that auditory attribute of sound according to which sounds can be ordered on a scale from low to high. To date this definition still is a useful basis, though it must be complemented by taking account of certain additional aspects.
As it turns out, the perception of pitch is a complicated matter. Therefore, it is wise to make every endeavour for achieving a concept and a definition of pitch that is as clearcut, simple and rigorous as possible. This is why I do not follow the majority of cognitive psychologists. While they are in line with the definition that pitch is a sensory (subjective) attribute, they tend toward a concept of pitch that includes not only the aspect of perceived height but in addition one or even more aspects of tones that are relevant in music. The most prominent of these additional aspects is octave equivalence - the notion that tones an octave apart are somehow similar and so, in certain musical respects, "equivalent". Referring to that notion it has been asserted ( Revesz 1912a , Idson & Massaro 1978a ) that pitch must be regarded as a two-dimensional attribute such that height is only one of two dimensions. The second "dimension" ordinarily is termed chroma. According to this concept a pitch is said to have both a certain height and a certain musical-categorical value (chroma), e.g. "c-ness", "d-ness", etc. This is often illustrated by a helix model. In that model, pitches are represented by points on an ascending helix such that the vertical height of their position reflects pitch height, while the rotational angle of the position corresponds to chroma. On the helix pitches with one and the same chroma (in music denoted by the same letter, c, d, e, etc.) are situated vertically above or below one another.
The misconception implied in this model lays in the arbitrary assignment of octave equivalence to pitch, while in fact octave equivalence is a feature of tones. I am not aware of any phenomenon that is explained by that model. This is not surprising, as the pitch helix in fact merely is a visualization of two well known aspects of tones, i.e., pitch and octave equivalence. As a concept of pitch the model makes things unnecessarily complicated and therefore should be discarded.
So we can, and should, basically stick to the aforementioned ANSI definition of pitch. However, we cannot take it too literally, because there is not such a thing as the pitch of a sound. Practically any sound of real life - including the tones of musical instruments - evokes several pitches at a time, though often (in particular for the harmonic complex tones produced by conventional musical instruments) one of them is most prominent and then is said to be the pitch. So the weak point of the ANSI definition is that there is no guarantee that any sound having pitch(es) indeed can unambigously be positioned on the low-high dimension.
There is only one type of sound that for normal-hearing listeners evokes one and only one pitch (at least to a sufficiently large extent): the sine tone. Sine tones of different frequency can indeed be unambiguously positioned on the low-high dimension in terms of their pitch - just as demanded by the ANSI defintion (The fact that one and the same sine tone, when successively presented first to the right and then to the leaft ear may produce slightly different pitches [binaural diplacusis] can be left out of consideration here.)
As a scientist one should take nothing for granted. So one may ask how the uniqueness of sine-tone pitch can be experimentally verified. The answer is: Present (preferrably monaurally) to a listener two successive sine tones of which one frequency is fixed, the other variable, and have him adjust the variable frequency such that the two tones match in pitch. Do that with many listeners and repeat the experiment until everybody gets sufficiently bored. You will invariably get a one-peaked narrow distribution of matching frequencies around the fixed frequency. When, on the other hand, you replace the fixed sine tone by a complex tone and do the same experiment with a sufficient number of repetitions and listeners, you will get a multi-peaked distribution. If the complex tone is a harmonic complex tone (such as of a musical instrument) the majoritiy of matches will occur at the complex tone's fundamental frequency. However, additional peaks of the distribution will occur at frequencies that may be both harmonic and subharmonic with respect to the complex tone's fundamental frequency.
This provides us with an appropriate operational definition of pitch. According to this definition, pitch is primarily defined as that auditory attribute in terms of which sine tones can be ordered on the low-high dimension. This is just a narrower replication of the ANSI definition. Its advantage is that it gives us a fairly rigorous definition of "pitch as such", i.e., as an auditory attribute. On that basis, now the ANSI definition can be complemented by further defining that pitches of complex sounds are taken to exist if each of them can by listeners be matched to the pitch of a sine tone by adjusting the latter's frequency. The position of peaks in the distribution of matching frequencies such obtained then is a measure of the individual pitches. As the various pitches in general differ in prominence, the height of the corresponding peaks in the matching-frequency distribution can be taken as a measure of that ([104], p. 312-316). As the pitch of a sine tone is somewhat dependent on intensity, in such a measurement the sine tone must be given a fixed, defined sound-pressure level. A good choice for that level is 60 dB.
It is this kind of definition of pitch which I am employing throughout when discussing pitch perception.