INTRODUCTION

Tests of medial olivocochlear efferent (MOC) effects in humans are important both for understanding efferent function and for interpreting otoacoustic emissions in the clinic. Animal experiments have clarified the cellular mechanisms by which medial efferents produce their effects, but the function of efferents in hearing is still controversial (reviewed by Guinan 1996). Psychophysical tests to uncover the role of efferents in hearing are most readily done in humans, but, to interpret such results, we need accurate measurements of efferent activation in humans. In clinical work, otoacoustic emissions (OAEs) are widely used to test the peripheral hearing apparatus, but it is not known to what extent differences in efferent activation across individuals affects the results of these tests. Thus, both scientific and clinical issues would benefit from greater understanding of efferent measurements in humans.

In humans, the effects of MOC efferent activation are often investigated using OAE tests because they are noninvasive and relatively easy to perform (Collet et al. 1990). MOC fibers act through synapses on outer hair cells to reduce the gain of the cochlear amplifier and thereby reduce basilar membrane motion and change OAE amplitudes. OAE-based efferent tests typically use a “probe stimulus” to produce an OAE and an “elicitor stimulus” to evoke MOC activity. Commonly used probe stimuli include clicks, tone pips, and tone pairs. These produce click-evoked OAEs (CEOAEs), transient-evoked OAEs (TEOAEs), and distortion-product OAEs (DPOAEs), respectively. In most tests, the efferent elicitor stimulus is delivered to the contralateral ear (re the probe stimulus) to avoid direct acoustic contamination of the OAE by the elicitor stimulus (e.g., Collet et al. 1990; Veuillet et al. 1991; Norman and Thornton 1993; Maison et al. 1997, 1998, 1999, 2000).

A great deal has been learned by using OAEs to study medial efferent activation, but several unresolved methodological issues cloud the interpretation of much of the data produced: (1) Most studies use only contralateral sound which may miss the greater part of the MOC reflex. This is suggested by animal studies which indicate that the crossed efferent reflex is mediated by about one-third of the total medial efferent population (Guinan et al. 1983; Robertson and Gummer 1985; Liberman 1988). (2) The probe sound, by itself, may elicit MOC efferent activity. Although probe-evoked efferent activity can be made use of in OAE adaptation paradigms (Liberman et al. 1996), in most OAE-based efferent assays, probe-evoked efferent activity is not considered and may distort the effect and/or amount of efferent activity elicited by the intended elicitor sound. (3) The elicitor and/or probe sounds may elicit middle-ear muscle (MEM) contractions that also change OAE amplitudes.

The purpose of this article is twofold: (1) to consider the underlying methodological issues involved when using any OAE test to measure MOC effects, specifically the three problems listed above, and (2) to introduce methods for measuring MOC effects based on stimulus frequency otoacoustic emissions (SFOAEs). First, we present methods for measuring MOC-induced changes in SFOAEs (ΔSFOAE) and show results using contralateral, ipsilateral, and bilateral elicitors. We then use our SFOAE-based assay to show that the most commonly used OAE probe sounds elicit MOC activity, while low-level tones, like the ones used in our SFOAE assay, elicit little or no MOC activity. Next, we describe a test using the SFOAE assay that distinguishes MOC effects from MEM effects and show that MEM reflex thresholds vary considerably across individuals. Finally, in the discussion we consider how well our SFOAE methods can be adapted for use with other types of OAEs, as well as other issues relevant to human efferent assays.

METHODS

Measuring changes in SFOAEs

To produce an SFOAE, we normally used a 40 dB SPL probe tone. Similar results were obtained with probe tones of 30–50 dB SPL. We settled on 40 dB as a compromise between increasing the signal/noise ratio (which is bigger for higher-level probes) and the desire for a low-level probe. SFOAE amplitudes can vary widely with small changes in stimulus frequency, presumably because SFOAEs are due to reflections from random variations along the cochlea (Zweig and Shera 1995). To insure adequate amplitude SFOAEs, in each subject we chose a probe frequency within 10% of the frequency of interest (1 kHz, unless stated otherwise) that produced easily measurable SFOAEs. For simultaneous bilateral tests, the probe was presented bilaterally at the same frequency in each ear. A probe frequency was chosen only if, in each ear tested, the probe was at least 50 Hz away from any spontaneous otoacoustic emission (SOAE) with an amplitude greater than −10 dB SPL. No attempt was made to align the test frequencies relative to the subject’s threshold microstructure, however, by avoiding SOAE frequencies and by choosing frequencies with large ΔSFOAEs, we may have produced an alignment.

To monitor changes in SFOAEs, ear-canal sound pressures were obtained from ER10c acoustic assemblies that were calibrated in each ear. Efferent activation produced small changes in the amplitude and phase of the sound pressure at the probe frequency (F p) that, in our early experiments (the MOC vs. MEM experiments), were extracted using a lockin amplifier (EG&G 5206, 10 ms time constant). In all other experiments the changes in the probe-tone sound pressure were extracted using a digital heterodyne method (Proakis and Manolakis 1996) that includes the following steps: (1) Sample the ER10c outputs at 20 kHz to obtain “raw waveforms” which are averaged over 4–8 responses. (2) Compute a discrete Fourier transform of each averaged raw waveform. (3) Select the (complex valued) positive frequency part of the transform (the analytic signal transform), multiply it by 2, and shift it down the frequency axis so that frequencies that were centered around F = F p are now centered at F = 0 Hz. (4) Low-pass filter the resulting complex frequency function. We used a recursive, zero-phase-delay exponential filter (Shera and Zweig 1993) with a 1/e amplitude point at 90 Hz. This filter provides a sharp cutoff that attenuates frequencies greater than 110 Hz by more than 60 dB but has minimal time splatter (e.g., 30 ms after the end of a signal, the amplitude has decayed by 60 dB but dB). Zero phase delay was achieved because the filter function has no imaginary component. (5) Compute an inverse Fourier transform to obtain the complex “heterodyned signal” P(t). (6) Convert P(t) from cosine and sine components to amplitude and phase. The resulting “heterodyned waveform” has a small fraction of the time points of the original raw waveform and gives the amplitude and phase changes of the ear-canal sound pressure at the probe sound frequency as functions of time as is done by a lockin amplifier. A major difference is that the digital heterodyne method avoids the slow drifts of lockin amplifiers and allows the use of better filtering algorithms.

To understand the meaning of the changes in ear-canal sound pressure, consider how SFOAEs and the sound from the sound source combine to produce the sound measured in the ear canal. In the absence of efferent stimulation (i.e., during a “baseline” measurement), the total pressure in the ear canal consists of a pressure due to the source acting on the passive impedance of the tympanic membrane (the impedance that would be measured with the cochlea not active), plus the pressure from sound emitted by the active cochlea at the stimulus frequency, the SFOAE. These two sounds add vectorially to produce the total pressure in the ear canal, as shown by the solid lines in Figure 1. With efferent stimulation, the SFOAE is changed and the resulting ear-canal sound pressure is changed, as shown by the wide-dash lines in Figure 1. The difference between the two sound pressures is the change in the SFOAE (ΔSFOAE—the narrow-dash line in Fig. 1). Our technique measures ΔSFOAE as a function of time.

Figure 1
figure 1

A vector diagram showing that the total ear-canal sound pressure is made of a sound source component and a stimulus frequency otoacoustic emission (SFOAE) component (greatly exaggerated here). The solid lines show the “baseline” (i.e., before efferent stimulation) pressures (labeled 1). Deviations from the “baseline” condition are produced by efferent activity changing the SFOAE, as shown by the long-dash lines (labeled 2). As long as the source pressure remains constant, the change in the total pressure (ΔP) is the same as the efferent-induced change in the SFOAE (ΔSFOAE—the short-dashed line).

Strictly speaking, we measure the change in ear-canal sound pressure (ΔP). If the impedance of the ear seen at the tympanic membrane is constant, then the source pressure is constant and ΔP = ΔSFOAE (Fig. 1). However, if the MEM contract, then the impedance at the tympanic membrane and the source pressure can change. When we are confident that middle-ear-muscle contractions were not involved, we call the measured change ΔSFOAE. If MEM contractions might have been involved, we call the change ΔP because the change might also include a change in the sound-source pressure.

The measurement paradigm

Our standard measurement used a continuous bilateral probe tone during which a 2.5 s elicitor stimulus (normally a noise burst) was presented every 5 s (Fig. 2, bottom). Raw waveforms from 4 to 8 responses were averaged and the results heterodyned to show the average amplitude and phase at the probe frequency as a function of time (e.g., Fig. 2A). For each average, the elicitor stimulus was presented with one of four lateralities: left ear, right ear, both ears, or neither ear (i.e., no elicitor). When multiple lateralities were measured, their order was randomized. With a bilateral probe tone, responses from both ears were recorded simultaneously so that a left ear elicitor produced an ipsilateral measurement from the left ear and a contralateral measurement from the right ear. During the collection of data, response signals with unusually large values (e.g., “artifacts” from movement of the subject) were automatically excluded from the averages. Additionally, runs were excluded during which the operator noted there were movement artifacts that were not rejected by the computer (i.e., unpresented noises heard on audio monitors of the subject microphones). To determine the signal-to-noise ratio of a ΔSFOAE measurement set, we used the no-elicitor (neither ear) response as a measure of the background fluctuation level. We do not call this “background fluctuation” level a “noise” level to avoid confusion with the elicitor noise.

Figure 2
figure 2

Examples of a measurement of total ear-canal sound pressure at the probe-tone frequency (A) and computed efferent-induced changes in the SFOAE (ΔSFOAE) for three elicitor lateralities (BD). For a contralateral elicitor, A to B illustrates the transformation from total sound pressure (A) to the ΔSFOAE (B). The left panels are amplitudes and the right panels are the corresponding phases. Row A was obtained by digitally heterodyning measurements originally sampled at 20 kHz. Row B was obtained from the data of row A by calculating the vector average of the data in the baseline window of A and vector subtracting this, at each time point, from the data in A. The ΔSFOAEs in C and D were obtained in a similar way from ear-canal sound pressures that are not illustrated. As shown at the bottom, a continuous probe tone (40 dB SPL, 1.1 kHz) was presented in the measurement (right) ear and a 2.5 s broadband noise elicitor (60 dB SPL) was presented in the contralateral ear (rows A and B), the ipsilateral ear (row C), and both ears (row D). The analysis window shows the time during which efferent-induced ΔSFOAE responses were averaged; it begins 50 ms after the termination of the elicitor and lasts for 100 ms. Subject 84, right ear.

Figure 3
figure 3

A single measurement set showing changes in stimulus frequency otoacoustic emissions (ΔSFOAEs) measured simultaneously in both ears. The elicitors were 60 dB SPL broadband noise bursts; the probes were 40 dB SPL, 1.1 kHz tones. The bars labeled “Background Fluctuation” are averages from runs with no elicitor. Note: Amplitudes are in dB so, in linear terms, in the left ear the difference between the ipsilateral and bilateral responses is about 10 times the amplitude of the background fluctuation. Subject 84.

Figure 4
figure 4

Efferent activity elicited by common probe sounds. Shown are time courses of the changes in stimulus frequency otoacoustic emissions (ΔSFOAEs) elicited by various sounds normally used as probe sounds. A. The tone pips were 900 Hz, with 1 period rise and fall times and two periods of plateau, and were presented 1 pip every 20 ms. B. The clicks were produced by 0.1 ms square electrical pulses, presented every 20 ms. C. The two-tone stimulus had a lower-frequency primary (f 1) at 900 Hz, and a higher-frequency primary (f 2 = 1.3 * f 1) which was 10 dB lower in level than the f 1 level listed in panel C. D. The single-tone stimulus was 900 Hz. The sound levels of the stimuli are given in peak equivalent SPL (pSPL) for tone pips and clicks, and normal (RMS) SPL for single tones and for the f 1 tone of the two-tone stimuli. All elicitors were in the left ear. The probe tone was 40 dB SPL, 900 Hz in the right ear of subject 82. Each trace is the magnitude of the synchronously averaged heterodyne waveforms from four sets of data, each with six 5 s response periods. The abbreviations in parentheses indicate the type of otoacoustic emission (OAE) elicited by each probe sound: transient-evoked OAEs (TEOAEs), click-evoked OAEs (CEOAEs), distortion-product OAEs (DPOAEs), and stimulus frequency OAEs (SFOAEs).

Figure 5
figure 5

Efferent activity evoked by sounds often used as OAE-generating probe sounds. Each panel shows ΔSFOAE-normalized magnitudes in the ipsilateral ear for the last second of elicitor stimulation versus elicitor sound level in the contralateral ear, for the elicitor type listed at the top of the panel. The type of OAE produced by the elicitor is shown in parentheses. The ΔSFOAE was normalized by dividing the ΔSFOAE magnitude by the total SFOAE magnitude (which was obtained by suppression, see Methods). In B, the two lines that go off-scale are most likely due to elicited middle-ear-muscle contractions (see text). The three off-scale points are, from line 1: 90 at 63 dB SPL, 154 at 73 dB SPL; from line 2: 101 at 73 dB SPL. Group B1 data from the right ear of subject 82 (4 level functions), and the left ears of subjects 85, 87, 88 (2 level functions each). Probe frequencies between 0.9 and 1.1 kHz.

Ear-canal sound pressure was measured as a function of time before, during, and after the elicitor stimulus using the heterodyne method explained above (Fig. 2, top). Baseline values for the amplitude and phase of the ear-canal sound pressure were obtained from the vector average (i.e., sine and cosine parts were averaged separately) of the sound pressure in the 0.5 s period before the elicitor stimulus (Fig. 2). The ΔSFOAE at each measurement time point was then obtained by vectorially subtracting the baseline sound pressure from the measured sound pressure at that time point (see Figs. 1 and 2A vs. 2B).

Measuring effects in the ipsilateral ear

While measurements of efferent effects due to contralateral elicitors are straightforward and reveal their entire time course (Fig. 2A, B), effects due to ipsilateral elicitors are usually confounded by acoustic interference and cochlear suppression caused by the elicitor stimulus. The problem of acoustic interference was removed by reversing the elicitor polarity on alternate stimuli and averaging. This does not, however, remove the elicitor-induced suppression of OAEs caused by the process of two-tone suppression. (In this article, the term “suppression” is reserved for two-tone suppression and “inhibition” is used for the effects of efferents.) Fortunately, suppression of OAEs is very fast (~10 ms or less), whereas efferent effects decay with time constants of 50–100 ms or more (Guinan 1990, 1996; Tavartkiladze et al. 1996). Thus, to measure the efferent-induced ΔSFOAE in an ear that is receiving an elicitor stimulus, we used the vector-averaged ΔSFOAE in a time window 50–150 ms after the termination of the elicitor stimulus (Fig. 2). The measurement window was delayed 50 ms after cessation of the elicitor to allow for filter settling time and for decay of suppression in the SFOAE. Although measurements with a contralateral elicitor do not require use of this restricted time window, to make comparable contralateral, ipsilateral, and bilateral measurements, we use the same postelicitor time window for all.

Eliciting efferent activity

For most tests of efferent effects, including searches for frequencies with large efferent effects and tests of possible MEM effects, broadband noise bursts were used to elicit efferent activity. Noise bursts were digitally synthesized as a sum of random-phase sinusoids (separately randomized for each ear). In early experiments the spectral level of the electrical signal to the acoustic driver was flat. In later experiments, the spectral amplitudes were adjusted according to the acoustic calibration of the individual ear to produce noise with a flat sound pressure spectrum in the ear canal across the frequency region of interest (0.1–10 kHz). Cosine-shaped rise–fall times (10 ms for the tests in Figs. 7 and 8, 5 ms everywhere else) were used to prevent frequency splatter and to preserve the flat spectra. The elicitor phase was reversed on alternate trials, and only even numbers of trials were averaged, so that the elicitor acoustic signal would cancel out in the averages and not obscure the probe-tone signal.

Figure 6
figure 6

Efferent activity elicited by common probe sounds and wideband noise bursts for ipsilateral (○), contralateral (△), and bilateral (□) elicitors. The X’s show the average background fluctuation measurement for each elicitor and level (derived from the “no-elicitor” runs). Top. Normalized ΔSFOAE magnitudes from the postelicitor window, averaged across subjects, as a function of elicitor sound level. Error bars extend ±1 SEM from each average. For clarity, the errror bars were slightly displaced to the right for bilateral elicitors and to the left for ipsilateral elicitors (see upper-right panel ). Bottom. Triangles show measured versus estimated contralateral efferent effects. Estimated contralateral |ΔSFOAE| = bilateral |ΔSFOAE| − ipsilateral |ΔSFOAE|. X’s show the corresponding background fluctuation points plotted at zero estimated contralateral effect. Subject Group B2, 9 ears from 6 subjects (subject No. and ears are: 61R, 68LR, 85L, 87LR,93R, 109LR). Probe frequencies between 0.9 and 1.1 kHz. The off-scale point for clicks had an amplitude of 63%. The results at 70 dB, particularly for clicks, may have been affected by evoked middle-ear muscle contractions included in these results. As judged by t-tests of the distributions across subjects at each elicitor level and laterality, points were significantly above the corresponding no-elicitor runs at the 0.05 level (*), the 0.01 level (**), or the 0.001 level (***) as follows (ipsi = ipsilateral, contra = contralateral, bi = bilateral): For PIPs: at 50 dB, bi*; at 60 dB, ipsi* contra* bi***; at 70 dB, ipsi**, contra***, bi**. For Clicks: at 50 dB, contra**, bi*; at 60 dB, ipsi*, contra**, bi***; at 70 dB, ipsi**, contra**, bi***. For Tones: at 50 dB, ipsi*, bi**; at 60 and 70 dB, ipsi***, bi***. For Noise: at 50 dB, ipsi**, contra***, bi***, at 60 dB,all***.

Figure 7
figure 7

Example group delay tests for medial efferent (MOC) versus middle-ear muscle (MEM) activity. Shown are ΔSFOAE phase versus probe-frequency plots for contralateral, ipsilateral, and bilateral broadband (nonflattened) noise elicitors in one subject (No. 31, subject Group C). The points were every 20 Hz (at the frequency tick marks). Rows B–E had a probe-tone level of 30 dB SPL; row A had a probe-tone level of 50 dB SPL. The long group delays (i.e., high slopes in the phase vs. frequency plots) for 45–65 dB SPL elicitors indicate that MOC effects dominated at these levels (the straight line in the row D phase plot shows the slope for a 10 ms delay). The short group delays for the 75 dB elicitor and 50 dB probe tone (row A) indicate that MEM effects dominated the response. The group delays for the 75 dB elicitor and 30 dB probe tone (row B) show evidence of both long and short group delays and are designated “Mixed” (see text).

Figure 8
figure 8

Medial efferent vs. middle-ear muscle test results for 7 subjects (Group C). Symbols indicate the elicitor levels at which the response was dominated by middle-ear muscle activity (MEM), medial efferent activity (MOC), or showed evidence for both (Mixed - see text for details). Probe frequencies between 1 and 2 kHz. Elicitors: unflattened broadband noise. The numbers of runs done at each level for each subject (in order of highest to lowest level) were: S21: 2,3,2; S24: 3,1,1; S27: 2,1,1,3,2; S31: 2,3,5,3,6,2; S35: 2,1,1,3; S36: 1,1,1,1; S37: 1,1,1,1.

For tests to determine whether commonly used probe stimuli elicit efferent activity, one of these probe stimuli was used in place of the noise-burst elicitor. Four stimuli were used: tone pips, clicks, tone pairs, and single tones. The tone pips were at the probe frequency, had a 1-cycle rise time, a 2-cycle plateau, and a 1-cycle fall time, and were presented at a 50 Hz repetition frequency (20 ms between pip onsets) (as in Veuillet et al. 1991; Maison et al. 2000; and many others). The clicks were produced by 100 µs electrical pulses and were presented at a 50 Hz repetition frequency (similar to the 80 µs pulses at 50 Hz used by Veuillet et al. 1991; Berlin et al. 1995; and many others). The two-tone stimulus consisted of a lower-frequency tone (F1) which was set to the probe frequency and 10 dB higher in level than the higher-frequency tone (F2) which was 1.3 times F1. Finally, the single-tone stimulus was a tone burst at the probe frequency. Sound-level specification of the single- and two-tone stimuli was in dB SPL according to normal (RMS) measurement of their amplitudes. Sound-level specification for clicks and tone pips was in peak-equivalent SPL. For pips, this was the SPL that would have been obtained if the same stimulus had been continuous. For clicks, this was the SPL of the tone that gave the same peak pascal level as the peak of the click waveform. For the first set of measurements (Figs. 4 and 5), the click waveform peak was calculated from a measurement in a cavity. For the second set of measurements (Fig. 6), click waveforms were obtained for each subject by an inverse convolution of the click waveform monitored by the ER10c microphone and the microphone impulse response calculated from the microphone frequency response in a small cavity. All “probe-used-as-elicitor” stimuli were presented for the normal duration of our noise elicitor (2.5 s duration, 5 s repetition period), with the polarity reversed on alternate stimuli. The presence of efferent activity elicited by these stimuli was assessed with our standard SFOAE test.

Two sets of tests were done to determine if commonly used probe stimuli elicit efferent activity. The first set of tests were done on four subjects (Group B1), with a run (i.e., a data gathering block) being a randomized level series for a given elicitor, averaging 6–8 responses per level. The ipsilateral (measurement) ear was chosen by which ear produced the biggest efferent response to broadband noise. In three subjects, two level series of each elicitor were obtained; on one subject, four level series of each elicitor were obtained. The second set of tests were done on six subjects (Group B2), with a run being a randomized presentation of right, left, bilateral, or no elicitor for a single level of one elicitor type, averaging four responses per laterality. We attempted to obtain averages of at least 32 responses for each condition, but fewer for noise elicitors which were included only for comparison (noise is a robust elicitor of efferent activity and required fewer responses). The amount of data actually obtained was strongly influenced by subject availability to do repeated runs. For pip, click, and tone elicitors, 16–56 (mean = 27) responses were averaged for each condition, and for noise elicitors, 4–16 (mean = 11) responses were averaged. In the subsequent data analysis, three ears were found to have average background fluctuation values for one or more conditions that were greater than 1/8 of that ear’s |ΔSFOAE| and were excluded from further analysis. These rejection criteria were chosen because they removed those ears with frequent nonmonotonic level functions (presumably due to large background fluctuations). Thus, the data for Group B2 is from 9 ears on 6 subjects.

Normalization

When combining data across subjects, we used normalized ΔSFOAE amplitudes. A given amount of efferent activity is expected to produce a larger ΔSFOAE in a subject with a larger original SFOAE. To account for this, ΔSFOAE amplitudes were expressed as a percentage of the baseline SFOAE amplitude. The SFOAE amplitude was determined by the suppression method (Shera and Guinan 1999). The suppressor tones were 100–110 Hz lower in frequency than the probe tone and 20 dB higher in level (i.e., 60 dB SPL) and were 495 ms (subject groups A and C) or 200 ms (subject groups B1 and B2), both presented at 1/s. The SFOAE was calculated from the vector difference of averages in windows with and without the suppressor. The suppressor tone was assumed to suppress all, or almost all, of the SFOAE so that the vector difference would approximately equal the SFOAE. In determining the SFOAE, it was important to interleave the measurements of the baseline and the suppressed sound pressures because multiple SFOAE measurements in the same session (insertion and removal of the ER10c acoustic assemblies mark the beginning and end of a session) showed variations in the baseline that were several times larger than the amplitude of the SFOAE (presumably from small variations in the position of the acoustic assembly in the ear canal). Despite the baseline variations, the SFOAE remained little changed throughout a session.

Subjects

There were four subject groups: Group A for introducing the SFOAE assay and describing some reflex properties: 11 ears in 7 subjects (subject Nos. and ears: 68LR, 82LR, 84LR, 85L, 87L, 88L, 109LR), 5/7 female, average age = 24 years (range = 20–29); Group B1 for probe sounds acting as contralateral elicitors: 4 ears in 4 subjects (82R, 85L, 87L, 88L), 3/4 female, average age = 23 years (range = 20–29); Group B2 for probe sounds acting as ipsilateral, contralateral, and bilateral elicitors: 9 ears in 6 subjects (61L, 68LR, 85L, 87LR, 93R, 109LR), 5/6 female, average age = 26 years (range = 21–30); and Group C for MOC vs. MEM tests: 7 ears in 7 subjects (21R, 24L, 27R, 31R, 35R, 36L, 37R), 3/7 female, average age-39 years (range = 26–57).

Hearing was tested on each subject using 1/3 octave bands of noise centered at octave frequencies re 1 kHz. For the study of MOC vs. MEM activation (Group C), the subjects had thresholds at frequencies from 0.5 to 4 kHz that were not higher than 10 dB re ANSI standard for tones. For all other studies, the subjects had thresholds at frequencies from 0.25 to 4 kHz that were not higher than 15 dB re ANSI standard for tones.

Subjects were chosen because they had easily measurable efferent effects, low ΔSFOAE background fluctuation, and good hearing. Prospective subjects were screened by taking a series of measurements at closely spaced probe frequencies (e.g., 0.9–1.1 kHz in 20 Hz steps, when 1 kHz was of interest). For this screening, a 60 dB SPL broadband noise elicitor of a single laterality (usually bilateral because that elicited the largest responses) was used. To be included in the data pool, a subject had to have a ΔSFOAE signal-to-background fluctuation ratio of 10 dB or more. Usually this was from a ΔSFOAE >5 dB SPL and a background fluctuation (the ΔSFOAEs obtained from no-elicitor runs) below −5 dB SPL. The closely spaced frequency steps of the screening measurement allowed us to calculate the group delays of the measured change in ear-canal sound pressure, ΔP, and from this to apply a latency test for medial efferent vs. MEM dominance in producing the ΔP (see Results). If the ΔP was dominated by middle-ear muscles, then the elicitor level was lowered by 5–10 dB and the screen was done again.

RESULTS

Measuring with contralateral, ipsilateral, and bilateral elicitors

Heterodyne waveforms of responses measured with contralateral, ipsilateral, and bilateral elicitors are shown in Figure 2. The complete time course of the efferent-induced ΔSFOAE can be seen when a contralateral elicitor is used (Fig. 2B). In contrast, with ipsilateral and bilateral elicitors, the response during the elicitor presentation is obscured by two-tone suppression (Fig. 2C, D). That there is large two-tone suppression in Figures 2C and D can be appreciated by noting that the ΔSFOAE begins immediately after the elicitor is turned on. This immediate effect is one of the hallmarks of two-tone suppression (Cannon 1976). In contrast, efferent effects require tens to hundreds of milliseconds to build up (e.g., Fig. 2B).

Using a postelicitor analysis window to quantify ΔSFOAE amplitude, we were able to measure efferent-induced changes from contralateral, ipsilateral, or bilateral elicitors (Fig. 2). However, it is evident from the contralateral response that ΔSFOAE is smaller in the postelicitor window than during the elicitor (Fig. 2B). As an estimate of this reduction, for the contralateral responses from 11 ears in which 60 dB SPL broadband noise was used as an elicitor (subject Group A), we compared the ΔSFOAE in the postelicitor window with the ΔSFOAE in a window in the last second of the response (placed 50 ms from the elicitor offset to avoid time smearing by the heterodyne filter). These showed that responses in the postelicitor measurement window were about 0.7 (SD = 0.1) of the amplitude (in pascals) of responses during the elicitor (i.e., about a 3 dB decrease in the ΔSFOAE). Since two-tone suppression obscured the responses during ipsilateral and bilateral noise elicitors, for these lateralities we have no comparable measures of response decay to the postelicitor window. However, preliminary analysis of ΔSFOAE decay time constants suggests that they are about the same after contralateral, ipsilateral, and bilateral broadband noise elicitors (Backus et al. 2003), which suggests that in all cases the response in the postelicitor is about 3 dB lower than it had been during the elicitor. No correction has been made in this article for response decay from the elicitor to the postelicitor window.

The postelicitor measurement technique enables simultaneous measurements in both ears. The heterodyne waveforms shown in Figure 2 were recorded from the right ear during a set of measurements done with bilateral probe tones and elicitor lateralities of right, left, bilateral, and none. Similar measurements were made simultaneously in the left ear. The amplitudes obtained from vector averages of the responses within the postelicitor windows from both ears are shown in Figure 3. The averages from the “None” runs provide estimates of the background fluctuation levels and are labeled “Background Fluctuation” in Figure 3. Note that by using a vector average (sine and cosine parts of the response are averaged separately) instead of a simple average of the amplitudes, the average background fluctuation level is reduced below the background fluctuations of the individual points of the response. This can be seen by comparing the background fluctuation level shown for the right ear in Figure 3 with the background fluctuations shown in the last second of the responses in Figures 2B–D.

The data in Figures 2 and 3 show that simultaneous bilateral measurements can be made on an individual subject. In this case (from a better-than-average subject), responses from four successive presentations of each elicitor laterality were averaged and the overall measurement set took about 2 min. With this degree of averaging and selection of “good frequencies” (see Methods), ΔSFOAE/background fluctuation ratios on individual subjects were adequate (i.e., ratios of 10 dB or more) to measure efferent effects elicited by 60 dB broadband noise in about half of the ears for a probe frequency near 1 kHz. For example, for subject Group A, 10 subjects were screened and 11 of the 20 ears had ΔSFOAE/background fluctuation ratios of 10 dB or more in averages of 6 responses).

For a fixed probe frequency near 1 kHz, ΔSFOAE phase was sufficiently consistent in a given subject that synchronous averaging of the signal in the response window could be done across trials or sessions. Data showing the degree of phase consistency were obtained from the Group A subjects using measurements with ΔSFOAE/background fluctuation levels of 10 dB or more. Phase coherence (obtained from the phases of all points in the response window by summing unit vectors with each phase and dividing by the number of points) averaged 0.95 (SD = 0.11), indicating little variation of phase across runs. Being able to do synchronous averaging is important because it allows data runs to be redone and averaged together in a way that reduces the background fluctuation, thereby increasing the percentage of subjects with good ΔSFOAE/background fluctuation ratios. In contrast, across ears ΔSFOAE phase is less consistent and, while magnitude averaging can be used, this process reduces variation but does not increase the ΔSFOAE/background fluctuation ratio.

The data of Figure 3 show one example of the laterality of human efferent reflexes for probes near 1 kHz and broadband noise elicitors. Although bilateral elicitors almost always evoked the largest response, the pattern of responses for ipsilateral vs. contralateral elicitors varied across subjects. In addition to variations across subjects, response laterality may be a function of many variables, including elicitor level and probe frequency. So that the issues involved can be properly introduced and discussed, medial efferent response laterality across subjects will be dealt with in a future article.

Do common probe stimuli elicit efferent activity?

To determine whether commonly used probe stimuli elicit efferent activity by themselves, we ran tests with these probe stimuli instead of our normal noise-burst elicitor stimuli. The first set of tests were done with these stimuli only in the ear contralateral to the measurement ear. We wanted to include all of the sounds used as efferent probe stimuli (i.e., clicks, tone pips, single tones, and two-tone stimuli that evoke distortion products) but our stimulus system did not allow us to do tests with two-tone elicitors while using a separate probe tone that was also in the ipsilateral ear. For these tests, randomized level functions were run over five sound levels on four subjects (Group B1).

All of the probe stimuli, except the pure tones used for SFOAEs, elicited substantial efferent activity. Average heterodyne waveforms of the ΔSFOAEs from one subject are given in Figure 4. These show that at 60–70 dB SPL all of the probes, except the single tones, elicited efferent activity within the first half second of stimulation and that the different probes elicited efferent activity with similar time courses. Level functions of the responses of all four subjects are shown in Figure 5. Each point in Figure 5 was the vector average over the last second of elicitor stimulation of the heterodyne waveform of one run (typically with six 5 s response periods) in one subject. These data show that all of the probe stimuli, except for the single tones, are potent elicitors of efferent activity, at least in the contralateral ear.

Figure 5 shows that in one subject clicks elicited particularly large changes (the off-scale lines in panel B). We suspect that these changes were due to the clicks eliciting MEM responses, but this was not specifically tested. All of the subjects used for Figure 5 were tested for MEM effects with a 60 dB SPL wideband noise, and this stimulus elicited responses that were dominated by medial efferent effects, not MEM effects (as explained in the next subsection). It is possible, however, that clicks (which have spectra that are shaped by the acoustic system in the ear) are more potent in eliciting MEM responses than wideband noise bursts (which have flattened spectra). Whether or not the extra large responses in Figure 5B are due to efferents or middle-ear muscles, the results show problems from using clicks as probe stimuli for measuring efferent effects.

Although Figures 4 and 5 show that all of the probe stimuli, except for single tones, are potent elicitors of efferent activity in the contralateral ear, efferent activity in the ipsilateral ear is of greater interest because the ipsilateral ear is where these probe stimuli are used to monitor efferent effects. To explore how much efferent activity was elicited in the ipsilateral ear, we performed additional tests on six subjects (Group B2). These tests had the elicitor stimuli presented in left, right, both, or neither ears, with assessment of the evoked efferent activity from measurements in the postelicitor window of ΔSFOAEs using 40 dB SPL, ~1 kHz probe tones in both ears. Because of the limitations of our measuring system, we were unable to present the three simultaneous stimuli required to do these tests with tone-pair stimuli in the ipsilateral ear; thus, the tests were done using clicks, tone pips, and single tones. For comparison, we also used wideband noise elicitors (although, unlike the other stimuli, these are never used as probe stimuli).

Results for the four elicitors are shown in the four columns of Figure 6. The top row shows normalized ΔSFOAE magnitudes, averaged across subjects, for ipsilateral (○), contralateral (Δ), bilateral (□), and none (X) elicitor stimuli as a function of elicitor sound level. To save time, 40 dB stimuli were not used for pips and clicks. For noise bursts, 70 dB stimuli were not used because our system could not deliver flattened noise bursts at this level in some subjects, and because 70 dB noise bursts often evoked MEM responses (see the next subsection).

The results of Figure 6 show that stimuli normally used as probe sounds evoke significant efferent activity in both ipsilateral and contralateral ears, as shown by the ΔSFOAEs produced. Except for the lowest level of each elicitor type and the contralateral tone responses, all of the responses in Figure 6 were significantly different at the 0.05 level from the corresponding background fluctuation measurements (details in Fig. 6 caption). Clicks and tone pips were similar to broadband noise in that the efferent effects ipsilateral and contralateral to the elicitors were approximately the same magnitude, and bilateral elicitors evoked efferent effects that were about twice as large. Curiously, the effects evoked by tones did not fit this pattern. As found earlier (Figs. 4 and 5), contralateral tones evoked little, if any, ΔSFOAE. In contrast, for sound levels of 50 dB SPL and above, substantial ΔSFOAEs were elicited by ipsilateral and bilateral tones. Close examination of these ΔSFOAEs revealed that (1) their phases were usually different than the phase of other ΔSFOAEs and (2) during the elicitor, the average ΔSFOAE magnitude was greater for ipsilateral than for bilateral tones, whereas the opposite was true for pip, click, and noise elicitors. These observations suggest that the ΔSFOAEs evoked by high-level ipsilateral and bilateral tones do not have the same origin as ΔSFOAEs evoked by the other stimuli. The origin of these ΔSFOAEs are considered further in the Discussion section. It should be noted that at 40 dB SPL, the level at which tones are used as probe stimuli, contralateral, ipsilateral, and bilateral tones produced only small ΔSFOAEs that were not significantly different from background fluctuations.

Distinguishing medial efferent effects from MEM effects

With the SFOAE efferent assay, we distinguished MOC efferent effects from MEM effects by using the large difference in their group delays. SFOAEs have long group delays (Kemp and Chum 1980; Shera and Guinan 1999, 2003) that can be thought of as the time required for a cochlear traveling wave to propagate to its resonant place and return to the ear canal. Because of this, MOC-induced ΔSFOAEs also have long group delays. In contrast, MEM-induced changes in ear-canal sound pressure (ΔP) have short group delays because the middle-ear impedance changes produced by the muscle contractions affect ear-canal sound pressure with little or no group delay. In terms of Figure 1, the SFOAE vector has a long group delay while the source–pressure vector has a short group delay and changes in a vector have the group delay of that vector.

We determined group delays by measuring the ΔP evoked by an elicitor, as the probe tone was swept over a narrow frequency range. With such a measurement, the group delay (GD) is the negative of the slope of the ΔP-phase vs. probe–frequency function, i.e., GD = −/df, where Φ is the phase of ΔP in periods and f is the probe frequency in Hz. Thus, shallow phase slopes signify short group delays and steep phase slopes signify long group delays.

Data from one subject in which the sound pressure of the elicitor noise bursts was varied from 45 to 75 dB SPL are shown in Figure 7. At elicitor levels 45–65 dB SPL, ΔP had steep phase slopes indicating that MOC effects dominated (Fig. 7C–E). The first run at 75 dB SPL (Fig. 7B) showed a “mixed” response with two lateralities showing relatively flat phase slopes and one showing a high phase slope. Another run was done with an elicitor level of 75 dB SPL and with the probe tone increased from 30 to 50 dB SPL; this showed a dominant MEM effect (Fig. 7A). These data are consistent with the interpretation that the 75 dB SPL elicitor evoked both MOC and MEM reflexes. By increasing the probe tone, the ΔP produced by the MEM reflex is expected to increase more than the ΔP produced by the MOC reflex because the MEM-induced change is an impedance change which would produce a linear increase in its ΔP, whereas the MOC-induced change is a change in the ΔSFOAE which grows compressively over the range 30–50 dB SPL. Thus, with the 50 dB probe tone, the MEM reflex was emphasized more and dominated the response. This shows that the 75 dB response contained both MEM and MOC components.

In 7 subjects (Group C) we measured group delays as a function of closely spaced frequencies and over a range of elicitor levels using unflattened broadband noise elicitors. In some subjects, multiple runs were done at one level and one set of frequencies. The resulting data are summarized in Figure 8. Whether MOC or MEM effects dominated ΔP values depended strongly on elicitor level, and in some cases varied across runs using the same stimulus. Points were scored as MOC dominated or MEM dominated only if all runs at that elicitor level gave consistent results (including all three lateralities and any multiple runs at that elicitor level), otherwise they were scored as mixed. As shown in Figure 8, 50 dB SPL elicitors always elicited ΔP values that were dominated by MOC effects. Elicitors of 55 dB SPL or above sometimes elicited ΔP values that were dominated by MEM effects. Thus, MEM thresholds in some of our subjects were as low as 55 dB SPL. Based on these group delay measures, elicitor noise levels must be 50 dB SPL or less to be free of MEM-dominated effects in all subjects.

DISCUSSION

Common probe stimuli elicit efferent activity

The results reported here provide strong evidence that sounds commonly used to produce OAEs for efferent tests (clicks, tone pips, and pairs of tones) also elicit efferent activity by themselves, at least when used at moderately high sound levels. It is not surprising that efferent activity is elicited by DPOAE-producing tone pairs. Such sounds have been used intentionally as combined elicitor and probe stimuli in both animal and human studies (Liberman et al. 1996; Kim et al. 2001). In contrast, clicks and tone pips are not described as eliciting efferent activity in the publications where they are used as probe stimuli. Nonetheless, our results show they do elicit efferent activity and these results are consistent with previous reports. Many papers have shown that the bandwidth of a sound is one of the most important features in determining potency for eliciting efferent activity (Berlin et al. 1993; Norman and Thornton 1993; Maison et al. 1997, 1999, 2000; Lilaonitkul et al. 2002). Narrowband sounds elicit little efferent activity while wideband sounds are potent elicitors of efferent activity. Consistent with this, click elicitors at levels as low as 17.5 dB SL evoke efferent activity (Veuillet et al. 1991; Berlin et al. 1993). Although Liberman and Brown (1986) reported that clicks do not elicit efferent activity in anesthetized cats, this statement was for clicks at 10/s. Veuillet et al. (1991) found that efferent effects varied considerably with click presentation rate and that there was little or no efferent activation for rates less than 20/s. Thus, existing data are consistent in indicating that clicks, at the 50/s rate typically used for probe stimuli, are potent elicitors of efferent activity.

While existing data indicate that clicks are potent elicitors of efferent activity, it might be thought that tone pips are narrowband and therefore do not elicit much efferent activity. However, the very short tone pips typically used (1 cycle rise and fall times and 2 cycle plateaus) have considerable spectral splatter which, evidently, is enough to make them elicit efferent activity. Also, the high (50/s) presentation rate of the tone pips is probably important. Note that 50/s was not a particularly efficacious modulation rate for eliciting efferent activity when used to amplitude modulate tones and noise (Maison et al. 1997, 1999). Perhaps the fact that 50/s stimuli were used as probe sounds in these experiments influenced the efficacy of the 50/s modulation rate. Whatever the reasons, 50/s clicks and tone pips, as well as tone-pair stimuli, clearly elicit efferent activity when used as probe sounds. Thus, measurements using these probes may be significantly influenced by the efferent activity elicited by the probes.

Tones evoke little or no efferent activity in the contralateral ear, as judged by the ΔSFOAEs produced (Figs. 4,5,6), but evoke ΔSFOAEs in the ipsilateral ear whose origin is not yet clear. The issues involved in interpreting changes in OAEs from any ipsilateral elicitor are discussed in the next subsection. As outlined there, it seems most likely that the ΔSFOAEs from high-level ipsilateral tones are due to “intrinsic cochlear effects,” not efferent effects. Our finding that there is little or no efferent activity evoked by contralateral tones is consistent with other human data. In humans, tones used as efferent elicitors (mostly in the contralateral ear) had high thresholds and produced weak efferent effects (Berlin et al. 1993; Maison et al. 1997, 2000). Even when tones were amplitude or frequency modulated, which increases their potency as elicitors, the threshold for eliciting contralateral efferent effects was ≥44 dB SPL (Maison et al. 1997, 1998). In contrast, results from animals show some tuning curves from MOC fibers with thresholds that were not far above the thresholds of auditory nerve fibers (Robertson and Gummer 1985; Liberman and Brown 1986; Liberman 1988). The possible difference between human and animal results may be because (1) the animal efferents fired at low rates in response to low-level tones so that they would change OAEs very little, or (2) perhaps tones are much less potent in humans because cochlear tuning is much sharper than in animals (Shera et al. 2002). Thus, tones excite only a small number of auditory nerve fibers which then excite only a small number of efferent fibers in humans.

When considering clicks, tone pips, tone pairs, and single tones as probe sounds, the most relevant question is: “Do these probes evoke efferent activity when presented at the levels they are used as probes?” In our efferent assay, tones are used at 40 dB SPL and at this level no statistically significant ΔSFOAEs are produced ipsilaterally, contralaterally, or bilaterally (Figs. 4,5,6). In contrast, clicks, tone pips, and two-tone stimuli are normally used as probes at considerably higher levels, 55–70 dB SPL (or pSPL). Clicks appear to be the worst probe in that they evoke the most efferent activity at any level (and perhaps MEM contractions too); furthermore, they tend to be used at higher levels than the other probes. Note that both clicks and pips evoked significant efferent activity at 60 dB SPL for all elicitor lateralities. Ironically, click and tone pip stimuli elicited more efferent activity than DPOAE-producing tone pairs, at least in the contralateral ear (Fig. 5), although DPOAE-producing tone pairs are the only probe stimuli that have been intentionally used as elicitor stimuli in humans (Kim et al. 2001).

If probe stimuli evoke efferent activity, a relevant question is: How much does this efferent activity change the measurement being made? The data of Figure 6 provide one estimate of this. When clicks or tone pips are used as probe tones, the “apparent contralateral efferent effect” is the change produced by the addition of the contralateral elicitor, with the ipsilateral probe already present. In the context of Figure 6, this increase can be calculated as the ΔSFOAE from the bilateral elicitor minus the ΔSFOAE from the ipsilateral elicitor. The actual contralateral effect is shown directly by the contralateral points. These two metrics for contralateral efferent effects are plotted against each other in the bottom row of Figure 6. If the two metrics were equal, the points would fall on the diagonal line. Figure 6 (bottom) shows that for clicks and tone pips at the highest level, there is a substantial difference, i.e., an error in using (bilateral − ipsilateral) as a substitute for contralateral, but relatively little error at other levels. In contrast, the data from noise elicitors (noise is never used as a probe sound) show little error even when large efferent effects are produced. Another way to look at these data is that for results with little “error,” the ipsilateral and contralateral responses summate almost by simple addition, but with high-level clicks and pips, the summation is not simple addition. One possibility is that this difference is produced by included MEM contractions that show facilitation for bilateral stimuli.

The above analysis indicates clear problems in using clicks and tone pips as probe sounds at 70 dB SPL but fewer problems when they are used at 60 dB SPL. However, these results must be interpreted with a few caveats. The data in Figure 6 apply directly only for cases in which the contralateral elicitor is exactly the same sound as the ipsilateral probe. The results may be different if the elicitor and probe are presented at different sound levels, are pips at different frequencies, or are not the same kind of sound. For instance, a tone pip probe may have more (or less) effect when used with elicitors at frequencies near the probe frequency than with elicitors at distant frequencies (which would change the apparent bandwidth summation for elicitors of different bandwidths). Efferent activity evoked by the probe might distort the effect of the elicitor by making the elicitor evoke more efferent activity (e.g., by summation of the probe and elicitor signals in the brain stem) or perhaps by making the elicitor evoke less efferent activity (e.g., by brain stem inhibition or saturation). In addition, efferent activity elicited by the probe might affect the OAE change evoked by the elicitor through nonlinearities in the cochlea (e.g., the probe-elicited activity may already saturate the efferent-effect mechanism in the cochlea). The relative weight of these effects is unknown and is likely to depend on the circumstances. Thus, no matter whether efferent-evoking probes increase or decrease the apparent contralateral effect, such probes certainly increase the uncertainty in interpreting the results and decrease the ability to compare efferent effects from different elicitors. To reduce such problems, these probes should be used at as low a sound level as is practical, never at 70 dB or higher.

Measuring ipsilateral and bilateral efferent effects

It might be thought that the main difficulty in measuring ipsilateral efferent effects is acoustic interference between the high-level elicitor and the low-level OAE. However, our results show that the biggest problem is two-tone suppression. Acoustic interference can be removed by alternating the sign of the elicitor across trials so that the elicitor sound cancels out in the average. In contrast, there is no way to remove the effects of two-tone suppression. For an ipsilateral elicitor, two-tone suppression can be prevented by having the frequency content of the elicitor be far from the probe frequency. However, this excludes the most interesting ipsilateral elicitor–probe combinations. The property that allows ipsilateral and bilateral efferent effects to be measured in the postelicitor window is the difference in time course between suppression and efferent effects (Guinan 1990, 1996). Suppression is almost simultaneous (Cannon 1976; Tavartkiladze et al. 1996) but is spread out in time somewhat by measurement-system filtering and OAE travel time. In contrast, efferent effects have decay times on the order of 100 ms (and are also affected by filtering and OAE travel time). Thus, by using fast filtering and a properly delayed postelicitor window (e.g., Fig. 2), ipsilateral and bilateral as well as contralateral efferent effects can be measured, although with some loss of amplitude. Use of a postelicitor measurement window is not restricted to SFOAEs; it can be used with any probe stimulus to obtain measurements with contralateral, ipsilateral, and bilateral elicitors (e.g., Berlin et al. 1995).

A second factor to be considered when measuring efferent effects with an elicitor in the ipsilateral ear is whether the elicitor produces changes in the OAE by “intrinsic cochlear processes,” i.e., by processes that are not mediated by efferents or due to systemic changes. Changes in 2f 1f 2 DPOAEs produced by intrinsic cochlear processes were demonstrated by Liberman et al. (1996) using a DPOAE adaptation paradigm. Cutting the efferents in the brain stem removed the fast adaptation (time constant ~100 ms) due to efferent activity evoked by the DPOAE primary stimuli, but it did not remove a much slower adaptation (time constant ~1 s) also elicited by the DPOAE primary stimuli. Since this slow adaptation was present without efferents, it must have been due to intrinsic cochlear processes (although it might be modified by efferent effects as are most cochlear processes). A somewhat similar adaptation of f 2f 1 DPOAEs has also been shown by brain stem cuts to be due to nonefferent mechanisms (Kujawa et al. 1995; Lowe and Robertson, 1995). Another elicitor-induced reduction in OAE amplitude has been seen with very high rate clicks (maximum length sequences); this reduction is not due to efferents (Hine et al. 1997) and may also be due to intrinsic cochlear processes and/or to suppression. The origin of the intrinsic cochlear processes that produce these effects is unknown but two classes of candidates can be suggested: (1) local changes in K+ concentrations (Johnstone et al. 1989) or levels of ATP, calcium, or other chemicals in the organ of Corti (Chen et al. 1998) and (2) sound-evoked synaptic actions in the nonefferent neural network beneath OHCs in humans, particularly the reciprocal synapses on OHC which may be formed by processes from type II spiral ganglion cells (Thiers et al. 2002a, b).

Any ipsilateral OAE-evoking sound might produce intrinsic cochlear changes. If such changes are due to local cochlear phenomena produced by the sound activation, then, presumably, they would be greater when the local cochlear response to sound is greater. Thus, SFOAEs and DPOAEs from primaries at 60–70 dB might experience the largest intrinsic cochlear changes because they are evoked by continuous tones which produce concentrated activation patterns in the cochlea. In contrast, clicks and pips may have higher peak SPLs than SFOAE and DPOAE primary tones, but they have much lower duty cycles and less concentrated activity patterns in the cochlea and, therefore, might produce less intrinsic cochlear change. Broadband noise elicitors are continuous sounds (i.e., have 100% duty cycle) but their energy is also spread throughout the cochlea so they might be expected to produce relatively little intrinsic cochlear effect at any one cochlear place.

In light of these possibilities, we now consider the origin of the substantial ΔSFOAEs produced by moderate-to-high level tones in the ipsilateral ear. Although the evidence is not conclusive, these ΔSFOAEs seem more likely to be due to intrinsic cochlear processes than to efferent effects. First, note that ΔSFOAEs from high-level ipsilateral tones were different in several ways from other ΔSFOAEs: (1) Their laterality was different during the elicitor. For pip, click, and noise elicitors, bilateral elicitors produced slightly larger ΔSFOAEs during the elicitor than ipsilateral elicitors (during the elicitor, these ΔSFOAEs are from suppression plus efferent effects). In contrast, for tones after about 0.5 s of stimulation, the average ΔSFOAE was larger for ipsilateral than for bilateral elicitors. (2) Their laterality was different in the postelicitor window. For pip, click, and noise elicitors, ipsilateral and contralateral sounds evoked ΔSFOAEs in the postelicitor window that were similar in magnitude, and bilaterally evoked ΔSFOAEs were about two times larger. In contrast, for high-level tones the ipsilateral ΔSFOAE was much greater than the contralateral ΔSFOAE and the bilateral ΔSFOAE was almost identical to the ipsilateral ΔSFOAE (Fig. 6). Finally, (3) for large ΔSFOAEs, ΔSFOAE phase was different for ipsilateral and bilateral tone elicitors than for pips, clicks, and noise elicitors. These differences indicate that the ΔSFOAE produced by high-level ipsilateral tones has characteristics that set it apart from the ΔSFOAEs produced by other elicitors, and the differences are not easily explained by supposing that these ipsilateral tones evoke a large amount of efferent activity. Second, there are several other reasons for thinking that ipsilateral tones do NOT evoke particularly high levels of efferent activity. In animals, where the efferent activity evoked by sounds has been measured directly, there is no indication that efferent activity from ipsilateral tones is very much greater than for contralateral tones, particularly for 1 kHz. Across all frequencies ipsilateral is about two times larger than contralateral in medial efferent innervation and the number of responding neurons, but at low frequencies (e.g., near 1 kHz) the ipsi/contra ratio is almost equal (Guinan et al. 1983; Liberman 1988). Finally, a wide range of human data indicate that the potency of elicitors decreases as their bandwidths decrease (Berlin et al. 1993; Norman and Thornton 1993; Maison et al. 1997, 1999, 2000), including ipsilateral bands of noise (Lilaonitkul et al. 2002), which is consistent with the conclusion that tones are a weak elicitor of efferent activity.

One possible explanation for the substantial ΔSFOAEs produced by moderate-to-high level tones in the ipsilateral ear is that the ΔSFOAEs from high-level ipsilateral tones are due to intrinsic cochlear processes. However, intrinsic cochlear processes that produce long-lasting changes do not appear to be candidates because the ΔSFOAEs produced by ipsilateral tones decay relatively fast, similar to the decays of ΔSFOAEs from other elicitors (average waveforms from Group 2B data at 60 dB SPL show a decay time constant of 194 ms for ipsilateral tones and 147 ms for ipsilateral broadband noise; see Backus et al. 2003). To understand what might cause decays in this range, we need to consider the processes by which efferents produce their effects.

Medial efferents produce their effects by synapses that release acetylcholine (ACh) onto OHCs thereby activating a molecular cascade that leads to the opening of calcium-activated potassium channels in the OHC cell membrane (Housley and Ashmore 1991; Evans 1996). After MOC efferents stop firing and releasing ACh, the (“fast”) efferent effect decays with a time constant on the order of 100 ms (Wiederhold and Kiang 1970; Sridhar et al. 1995). The decay time constant appears to be the time required to stop the OHC processes that activate the potassium channels. A second efferent effect that builds up and decays over tens of seconds (the “slow” efferent effect) has been found in the high-frequency region of guinea pigs (Sridhar et al. 1995; Cooper and Guinan 2003). While this “slow” efferent effect may be present in humans, our measurements are on a 5 s time base (Fig. 2) which is too short to show efferent “slow” effects. Thus, for present purposes, we need to consider only efferent “fast” effects.

In addition to MOC synapses, there are nonefferent synapses on OHCs, some of which are reciprocal synapses (Thiers et al. 2002a, b). These nonefferent OHC synapses have efferent-like parts with presynaptic vesicles and postsynaptic cisterna similar to MOC synapses on OHCs. It is not known, however, if acetylcholine, the transmitter in MOC synapses, is also the efferent-direction transmitter in the nonefferent OHC synapses. Release of transmitter by these nonefferent synapses onto OHCs may activate the same OHC processes that are activated by medial efferents. If so, then the effects of these nonefferent synapses should decay with the same time constant as efferent effects. Thus, an interesting hypothesis for the origin of the ΔSFOAEs evoked by high-level ipsilateral tones is that they are due to local activation of the reciprocal and other non-olivocochlear-efferent synapses on OHCs. The hypothesis presumes that activation of the nonefferent synapses does not require efferent activation, but it does not rule out that efferent activation might influence these synapses. This hypothesis appears to account for our ipsilateral tone data, but it must be considered speculative until more work is done to establish it. If the hypothesis is correct, then ΔSFOAEs produced by high-level tones would be expected to be found in animals that have a rich endowment of reciprocal and other non- olivocochlear-efferent synapses on OHCs (e.g., primates; Thiers et al. 2002a, b; Francis and Nadol 1993) but not in animals with few or none of these synapses (perhaps nonprimate mammals). However, OHC reciprocal synapses may be common across mammalian species, in which case invasive experiments could be done more easily, e.g., measuring for ΔSFOAEs from high-level tones after efferents are cut or drugs perfused through the cochlea. Whether or not this hypothesis proves correct, it seems likely that the ΔSFOAEs evoked by high-level ipsilateral tones are not due to evoked efferent activity. These ΔSFOAEs are different from ΔSFOAEs evoked by other sounds, sounds for which there is ample reason to believe that they produce ΔSFOAEs by evoking medial efferent activity.

Distinguishing MOC effects from MEM effects

The data in Figure 8 show that the lowest level at which noise burst elicited responses were affected by MEM responses varied over subjects, which shows the need for MOC vs. MEM tests to be done on each subject. Most OAE-based studies of efferent effects do not employ any such test, although all OAEs (except spontaneous OAEs) provide relatively easy ways to test for MOC effects vs. MEM effects. For all evoked OAE measurements, the ear-canal sound is composed of (1) the “primary sound,” due to the sound source acting on the passive impedance of the middle ear at the tympanic membrane, and (2) the OAE, which originates from processes in the cochlea. In general, MOC efferents act only on the OAE, while MEMs act on the primary sound by changing the impedance of the middle ear and also on the OAEs by changing transmission through the middle ear. This means that the primary sound is changed only by MEMs and not by MOCs. Thus, to determine whether MEMs are having a substantial effect on the measurement, one only needs to measure whether there is a change in the primary sound (i.e., for clicks: the initial wave of the click; for tone pips: the pip sound; for DPOAE measurements: the two tones). If there is no change in the primary sound, then there is likely to be little or no change in the OAEs due to MEMs. The test is not perfect. It is possible that a weak MEM contraction, particularly a stapedius contraction, changes middle-ear transmission while making a negligible change in the impedance of the ear and therefore a negligible change in the primary sound. Nonetheless, it seems likely that most MEM contractions would be detected by this method and that its use would help clarify many studies in which it is not clear whether the measured changes were due to MOC or MEM effects. Note that for illustration purposes the SFOAE vector is greatly exaggerated in Figure 1. Actually, the source pressure vector is much larger than the SFOAE vector. Thus, small fractional changes in the source pressure vector can swamp large fractional changes in the SFOAE vector. This means that MEM-induced changes can far outweigh any MOC-induced change.

The “change in primary sound” test is somewhat harder to interpret for DPOAE measurements than for transient probe stimuli because each primary tone of a DPOAE measurement also has an accompanying SFOAE that can be affected by MOCs. Transient probe stimuli (clicks and pips) do not have this problem because their primary sounds are separated in time from their OAEs. SFOAEs at the primary frequencies should be a minor problem in DPOAE tests because these SFOAEs will be small compared with the primary tones for typical DPOAE primaries. The lower-frequency primary (F1), is normally presented at a high level (60 dB SPL) at which the SFOAE is a very small percentage of the primary amplitude. For the higher-frequency primary (F2), the SFOAE will be partly suppressed by the higher-level F1 primary so that, again, the SFOAE is a small percentage of the primary amplitude.

An alternate approach to determining the involvement of MOC vs. MEM effects for DPOAE stimuli involves looking at the DPOAE phase change as a function of frequency (Buki et al. 2000). While this method may provide useful insights, it is complicated to apply and requires measurements at many frequencies. It should be noted that the DPOAE phase change as a function of frequency obtained with a fixed F2/F1 ratio, as in Buki et al. (2000), is not expected to have a long group delay because it arises from a distortion source (in contrast, SFOAEs arise from reflection sources and have a long group delay; see Shera and Guinan 1999). It should be possible to apply a group delay test to DPOAE data using phase gradients obtained while keeping F1 or F2 fixed and varying the other frequency, but the “delays” calculated from these phase gradients should be interpreted with caution (see Shera et al. 2000).

The results in Figures 7 and 8 were obtained with our original testing system which used unflattened noise elicitors. With our current system, in which the elicitor noise spectrum is flattened over the 0.1–10 kHz frequency range, we routinely use 60 db SPL noise elicitors and usually find MOC-dominated effects by the group delay test. We have not explored the difference between these two sets of results in detail, but the difference appears to be adequately explained by the level changes involved in flattening the elicitor noise. In an ear, the ER10c acoustic assembly typically delivers lower sound levels at frequencies above 5 kHz, so flattening (while keeping the overall SPL the same) raises the level of the high frequencies and lowers the level of the low frequencies. There are few data on human MEM reflex thresholds for frequencies over 5 kHz, but existing data suggest that frequencies near 1 kHz are more effective in eliciting MEM responses than frequencies in the range of 4–10 kHz (Wilson and McBride 1978; Gelfand 1984; Guinan and McCue 1987). If sound frequencies above 4–5 kHz are less effective in eliciting MEM contractions, then flattening will lessen the effectiveness of the stimulus in eliciting MEM responses. Thus, the applicability of the pattern shown in Figure 8 for eliciting MOC vs. MEM activity at various noise levels depends on the detailed properties of the broadband noise in the ear canal.

For distinguishing MOC efferent vs. MEM effects, tones producing SFOAEs offer few advantages over clicks, tone pips, or tone pairs as probe stimuli, and in some respects are worse. The group delay test (see Fig. 7) provides a good way to determine whether an elicitor effect is dominated by a MOC-induced ΔSFOAE or by a MEM-induced ΔP. However, this test determines only which factor was dominant in producing the measured response change (because changes in SFOAE and sound source components add, and the change with the largest amplitude captures the phase). Thus, the finding that a response is MOC dominated does not exclude a minor component due to MEM contractions. In contrast, with clicks and tone pips, the primary sound and OAE are separated in time and a small change in the primary should be detectable even if there is large change in the OAE. Such a finding would suggest that both MEM and MOC effects were present. The technique of looking for small changes in the primary sound is limited mostly by its sensitivity to small changes in the subject ear (e.g., those caused by small movements of the acoustic assembly), something which is not a problem with the group delay test. A final advantage of MOC vs. MEM tests with pips, clicks, and DPOAEs is that they require only one measurement while the SFOAE group delay test requires measuring at a series of closely spaced frequencies.

Types of OAE-based efferent assays

Efferent assays can be divided into two types: (1) methods that use separate elicitor and probe sounds (“SEAPS” methods), and (2) methods that use the same sound for the elicitor and for the probe and that measure adaptation of the resulting OAE (“adaptation” methods). The SEAPS method can be done with any probe sound that evokes a measurable OAE, combined with any elicitor sound. So far, the adaptation method has been used only with DPOAEs. However, since clicks and tone pips are efficient elicitors of efferent activity (Figs. 5 and 6), and also work well as OAE-evoking probe sounds, they should work well in an adaptation paradigm. In such a paradigm, pips and clicks would be presented as fast repeating stimuli, not continuously as can be done with DPOAE primary tones. However, this is not a fundamental difference. In fact, the stimulus originally used to demonstrate DPOAE adaptation in Liberman et al. (1996) was a series of discontinuous primary tones. Thus, at their commonly used sound levels, clicks, tone pips, and DPOAE primaries can be used in an adaptation paradigm efferent assay. In contrast, single-tone SFOAEs, at the 40 dB SPL level of their normal use, evoke little or no efferent activity and cannot be used in an adaptation paradigm. Whether single tones at higher sound levels could be used successfully in an adaptation paradigm is unknown and depends on gaining further understanding of the processes by which high-level tones change SFOAEs in the ipsilateral ear.

At first glance it might seem that an adaptation paradigm is just as powerful as a SEAPS paradigm in that both can be used with ipsilateral and bilateral stimuli. However, the lack of separation of probe and elicitor in the adaptation paradigm is a distinct disadvantage. With the adaptation paradigm it is not possible to change the elicitor level while holding the probe level constant (or vice versa). Another disadvantage is that the adaptation method is suitable only for stimuli that make good OAE-evoking probe sounds. An adaptation method cannot be used, for instance, with broadbrand noise. However, the effects of broadband noise can be assessed in the context of an adaptation paradigm by adding it during the relatively steady part of the response (e.g., as in Fig. 6 of Liberman et al. 1996) in what amounts to a combined adaptation and SEAPS paradigm. A final disadvantage for stimuli that evoke considerable intrinsic cochlear effect (i.e., tones and tone pairs) used in an adaptation paradigm (or whenever tones are used as elicitors) is that efferent effects and intrinsic cochlear effects both change the OAE over time and are difficult to distinguish. This may be much less of a problem for these same stimuli used as continuous probes. As probes, if the stimuli are on long enough that the intrinsic cochlear effects asymptote, the influence of any intrinsic cochlear effects on the measurement may be negligible.

One possible advantage of adaptation paradigms is in measuring the time course of efferent effects elicited by ipsilateral stimuli. In a SEAPS paradigm, measurement of the time course of an ipsilateral efferent effect is difficult because two-tone suppression caused by the elicitor obscures the probe-evoked OAE during the elicitor. An adaptation paradigm allows measurements to be made during the elicitor, but this advantage is offset (1) for a DPOAE adaptation paradigm by the complexity of the DPOAE production which involves three cochlear frequency places, cancellations that can make the “adaptation” go in either direction, and known problems with slow adaptation being produced by intrinsic cochlear properties; and (2) for pip or click adaptation paradigms by the fact that the stimuli are presented at discrete times. In summary, the SEAPS paradigm allows the greatest flexibility but may not be the best in all circumstances.

The advantages of using SFOAEs to measure efferent effects

The principal advantage of using stimulus frequency emissions as an assay of medial efferent activity is that SFOAEs require only a single low-level probe tone. This is advantageous because (1) it allows the use of a probe stimulus that elicits little or no efferent activity and causes little or no intrinsic cochlear change, and (2) it allows measurements of efferent effects on low-level stimuli where medial efferents have the largest effects on OAEs. Other advantages are: SFOAEs are large in humans and offer good signal/noise ratios, SFOAEs provide as frequency-specific a test of efferent effects as is possible, the production of low-level SFOAEs by the process of coherent reflection is better understood than the process by which distortion product OAEs are produced (Shera and Zweig 1993; Zweig and Shera 1995; Kemp 2002), and, unlike DPOAEs, the interpretation of SFOAEs (at least at low sound levels) is not complicated by mixing of emission types (Shera and Guinan 1999). SFOAEs also provide a good test of MOC vs. MEM domination of putative efferent effects, but SFOAEs have no advantage in this. Finally, while there is no currently available commercial system that measures efferent effects on SFOAEs, once implemented, the SFOAE assay is easy and fast to use and, most importantly, produces a result that is not clouded by efferent activity evoked by the probe sound.