Phase-Synchronized Stimulus Presentation Augments Contingency Knowledge and Affective Evaluation in a Fear-Conditioning Task

Abstract Memory often combines information from different sensory modalities. Animal studies show that synchronized neuronal activity in the theta band (4–8 Hz) binds multimodal associations. Studies with human participants have likewise established that theta-phase synchronization augments the formation of declarative video–tone pair memories. Another form of associative learning, classical fear conditioning, models nondeclarative, emotional memory with distinct neuronal mechanisms. Typical fear-conditioning tasks pair a conditioned stimulus (CS) in one modality with an aversive unconditioned stimulus (US) in another. The present study examines the effects of CS–US synchronization in the theta band on fear memory formation in humans. In a fear generalization procedure, we paired one of five visual gratings of varying orientation (CS) with an aversive auditory US. We modulated the luminance of the CS and the volume of the US at a rate of 4 Hz. To manipulate the synchrony between visual and auditory input during fear acquisition, one group (N = 20) received synchronous CS–US pairing, whereas the control group (N = 20) received the CS–US pairs out of phase. Phase synchronization improved CS–US contingency knowledge and facilitated CS discrimination in terms of rated valence and arousal, resulting in narrower generalization across the CS gratings compared with the out-of-phase group. In contrast, synchronization did not amplify conditioned responding in physiological arousal (skin conductance) and visuocortical engagement (steady-state visually evoked potentials) during acquisition, although both measures demonstrated tuning toward the CS+. Together, these data support a causal role of theta-phase synchronization in affective evaluation and contingency report during fear acquisition.


Introduction
Phase synchronization of brain oscillations has been proposed as a mechanism supporting neuronal communication and plasticity (Fell and Axmacher, 2011). A theoretical perspective holds that the ongoing oscillatory phase reflects the excitability of a neural population and therefore determines a window for successful long-term potentiation (LTP), a cellular process underlying learning and plasticity (Lynch, 2004). Research in rodents has shown that the induction of LTP or long-term depression (LTD) critically depends on oscillatory phases and the stimulation or recording site: while LTP was induced in behaving rats when the hippocampal CA1 was stimulated at the oscillatory peak, LTD resulted from stimulation at the trough (Hyman et al., 2003). Using trace eyeblink conditioning in rabbits and recordings in the hippocampal fissure, CS presentation in the trough induced phase-locked, regular (theta) oscillations that were in turn associated with better learning, whereas CS presentation to the peak impaired regularity and learning (Nokia et al., 2015). Note that the theta phase reverses between the hippocampal fissure and the CA1 region. Since LTP requires precise timing between presynaptic and postsynaptic activation in the millisecond range (Markram et al., 1997), orchestrating activity by phase synchronization of neuronal oscillations is a potential mechanism supporting LTP. Among other oscillatory phenomena, oscillations in the theta range (4-8 Hz in primates, 4-12 Hz in rodents) and their synchronization among memory-related brain sites are linked to memory performance (Headley and Paré, 2017). Rodent research (Benchenane et al., 2010;Place et al., 2016) and human EEG studies (Weiss and Rappelsberger, 2000;Summerfield and Mangels, 2005) found increased theta synchronization among brain regions during different episodic memory tasks, suggesting that theta synchronization facilitates communication (Fell and Axmacher, 2011).
Intriguingly, studies in humans have causally linked theta-phase synchronization to episodic associative memory. Repetitive, rhythmic sensory stimulation eliciting steady-state evoked potentials (Clouter et al., 2017;Wang et al., 2018) enables experimental control over response frequency in a sensory region and corresponding phase synchrony between regions (Thut et al., 2011;Herrmann et al., 2016;Hanslmayr et al., 2019). Synchronizing the oscillatory phase evoked by periodically modulated visual and auditory stimuli facilitated the encoding of an episodic audiovisual memory (Clouter et al., 2017;Wang et al., 2018), suggesting theta-phase synchronization as a mechanism for binding multisensory episodic memories. The synchronized input is assumed to increase temporallyorganized neuronal firing, which in turn may result in LTP (Buzsáki, 2002;Fries, 2015).
Although LTP is best understood in the hippocampus, its associative and synapse-specific properties make it a potential mechanism for plasticity in other regions (Maren and Fanselow, 1995;Orsini and Maren, 2012;Bliss et al., 2018). For example, fear conditioning, a paradigm of associative emotional memory, involves associative plasticity within the lateral nucleus of the amygdala (Kim and Cho, 2017), but also in other structures processing the conditioned stimulus (CS) and the unconditioned stimulus (US; Herry and Johansen, 2014). Sensory information of both stimuli (typically, different modalities) converge onto the same neuronal populations in the lateral amygdala (LA; Romanski et al., 1993). Activating weak CS synapses in temporal proximity to strong US synapses initiates a cascade of cellular reorganization, strengthening CS synapses and enabling the CS to elicit fear responses (Blair et al., 2001;Orsini and Maren, 2012). Importantly, theta synchronization among medial prefrontal cortex, amygdala, and hippocampus plays a role during fear conditioning (Seidenbecher et al., 2003;Karalis et al., 2016;Taub et al., 2018;Zheng et al., 2019; for review, see Bocchio et al., 2017;Çalişkan and Stork, 2018). However, its causal role in forming CS-US associations is unknown.
The current study asked whether synchronized sensory input helps the formation of a multisensory CS-US association in aversive learning. Using rhythmic external stimulation (Clouter et al., 2017;Wang et al., 2018), we presented the visual CS and auditory US "in-phase" or "out-of-phase" in a 2-day fear conditioning procedure with generalization (five similar CS). We hypothesized that theta-band (4 Hz) synchronization of two distinct sensory systems promotes the CS-US association. Specifically, it was expected that in-phase presentation facilitates fear acquisition, whereas out-of-phase presentation prompts poor fear conditioning. Synchronizing the multisensory input is expected to orchestrate neuronal activity in the sensory cortices (so-called entrainment). If synchronization in the theta range provides a window for successful LTP, it should optimize conditions for synchronous afferent signals reaching further structures within the fear network, especially the lateral amygdala (Romanski et al., 1993;LeDoux, 2000). To assess different response systems in human fear conditioning, we measured conditioned responses in physiological arousal, affective evaluation of arousal and valence, contingency knowledge of CS and US, and visuocortical engagement. hormone fluctuations, female participants were included only if they used monophasic oral contraceptives (pill) and were examined between the 6th and 20th day of pill intake (i.e., in the pill-on phase). Participants were screened via self-report questionnaire and a structured interview for inclusion and exclusion criteria in a screening session that was always conducted on a different day than the actual main experiment. Students with acute or chronic physical and/or psychiatric disorders (e.g., migraine, epilepsy, cardiovascular diseases, and phobias) were not eligible. Further exclusion criteria encompassed hearing and/or uncorrected vision impairments, tinnitus, acute medication, drug abuse, average alcohol consumption exceeding 20 or 40 g ethanol/d (for women and men, respectively), and smoking more than five cigarettes per day. Volunteers were screened for post-traumatic stress disorder (PTSD) using a translated version of the Posttraumatic Stress Diagnostic Scale (Foa, 1995;Steil and Ehlers, 2000) and excluded if they met the criteria of the DSM-IV (Diagnostic and Statistical Manual of Mental Disorders, fourth edition) for PTSD. From 64 volunteers (34 women), 46 (25 women) were eligible to participate, 3 women did not appear to the appointment, and 3 volunteers (2 women) discontinued the main experiment because of the aversive nature of the conditioning paradigm, leading to our final sample of 40 participants. Within the female and male subsamples, participants were randomly assigned to one of two groups, the inphase or out-of-phase group, with the same number of men and women in each group (in-phase group, 10 women; outof-phase group, 10 women).
The study was approved by the ethics committee of the University of Osnabrück and conducted in accordance with the Declaration of Helsinki guidelines. Written informed consent was obtained from all participants after adequate understanding of the explained procedures. Each participant was free to choose between participation credits (four credits) or a corresponding amount of money (32 e) for finishing the screening and day 1 and 2 of our conditioning procedure.

Experimental design and stimuli
We used a 2-day fear conditioning procedure, including habituation, acquisition, immediate extinction on day 1, and a 24 h delayed recall on day 2 (Fig. 1B). Our study comprised a 5 Â 2 mixed factorial design within each learning phase, with five CS orientations of the below characterized Gabor gratings (25°, 35°, 45°, 55°, and 65°) as the within-subject factor and synchronization (inphase, 0°phase shift; vs out-of-phase, 90°, 180°, and 270°phase shift); as the experimental between-subject factor.
Five high-contrast, black-and-white Gabor gratings (i.e., a sinusoid grating filtered with a Gauss function) with a low spatial frequency served as the visual CS. The five CS differed only in orientation (25°, 35°, 45°, 55°, and 65°, relative to vertical 0°; Fig. 1A). The CSs were presented for 5 s centrally on a dark gray background (100% black setting on the monitor). During the presentation, the experimental chamber was lit only by the CS on the screen. Technical failure forced us to exchange the monitor from a 19 inch (model P911, Acer) to a 17 inch (model CPD-E220E, Sony) cathode ray tube (CRT) after examining the first 12 participants. We matched stimulus properties as closely as possible with the new monitor. The relevant parameters were comparable: 85 Hz refresh rate, low spatial frequency (0.96 vs 0.81 cycles/°), large central CS presentation (5.70°vs 5.73°visual angle), and high contrast (96% Michelson for both monitors).
As the US, we used a 2 s, broadband white noise (20 Hz to 22 kHz, 44,100 bits/s, 16 bits/sample), presented binaurally at a maximum of 96.5 dB(A) over two loudspeakers positioned 0.7 m left and right behind the participant. For an additional unimodal audio task (at the end of session on day 2), we presented the same white noise for 4 s at a nonaversive sound-pressure level with a maximum of 70.4 dB(A).
The intensity of the visual CS, the aversive auditory US, and the nonaversive auditory noise (unimodal task) was modulated at 4 Hz (see also Clouter et al., 2017). The luminance of the visual CS was sinusoidally modulated in 4 Hz, where luminance changed at each screen refresh, resulting in 21 steps per cycle (0-100% luminance). The amplitude of the auditory signal was sinusoidally modulated (0-100%) in 4 Hz by multiplying the signal vector with a 4 Hz sine wave at the native 44.1 kHz audio sampling rate. Presentation of each 4 Hz modulated stimulus (auditory and visual) always started at 0% intensity, increasing to 100% in the first half cycle.

Conditioning procedure
Our procedure included habituation, acquisition, and immediate extinction on day 1, as well as a 24 h delayed recall on day 2 (Fig. 1B). During habituation, each 5 s, 4 Hz modulated CS was presented 12 times in pseudorandom order. Before acquisition, participants were instructed that only one of the 5 CS orientations will be followed by an aversively loud noise, without specifying which orientation. During the acquisition phase, each 4 Hz modulated CS was again presented 12 times. However, the 45°CS (CS 1 ) was always paired (12 times) with the 2 s, 4 Hz modulated aversive noise US (reinforcement rate, 100%), while the other orientations were not (25°, 35°, 55°, and 65°gratings as CS -; Fig. 1B). Previous work has used this same generalization paradigm, with 45°gratings serving as CS 1 , while also establishing that there are no systematic preconditioning differences between different grating orientations on the measures used here (McTeague et al., 2015). Previous work has also demonstrated that conditioning is seen across orientations and with counterbalancing (Moratti and Keil, 2005). Together, to facilitate interpretation and comparison with prior work, this led us to adopt a fixed contingency between 45°and the US. Each CS presentation was prolonged for the duration of the US, adding 2 s (i.e., 7 s duration for the CS 1 and the CSgratings during acquisition). For the 45°CS 1 , the last 2 s of visual CS overlapped with the auditory US presentation.
Since retinal phototransduction was shown to be slower than auditory transduction (;50 ms for visual stimuli vs 10 ms for auditory stimuli; Lennie, 1981;King and Palmer, 1985), the onset of the auditory US had a 40 ms delay relative to the CS onset (Clouter et al., 2017). The slower transduction of visual stimuli is also in line with recordings in the amygdala after visual versus auditory stimulation. The earliest activity occurred between 40 and 80 ms (up to 316 ms, depending on the pathway to the amygdala that differs in length) after visual stimulation (Luo et al., 2010;Silverstein andIngvar, 2015 for review, see McFadyen et al., 2017). In contrast, auditory information was recorded as early as 10-40 ms in single units of anesthetized as well as freely moving rats (Romanski et al., Figure 1. Experimental design: stimuli, procedure, and the operationalization of the in-phase group versus the out-of-phase group. A, Gabor gratings used as CSs. The 45°grating served as CS 1 (paired with the US during acquisition). The other four served as CS -(never paired with the US). The luminance of each CS was sinusoidally modulated at 4 Hz. The US was a broadband white noise, amplitude modulated at 4 Hz and presented at a maximum of 96.5 dB(A). B, Fear-conditioning procedure with the learning phases habituation, fear acquisition, and extinction (day 1) and delayed recall (day 2). Each CS grating was presented 12 times in each learning phase. The US was only presented during fear acquisition (12 times coterminating with the CS 1 ). At the end of day 2, the unimodal audio task comprised 75 presentations of the 4 Hz modulated white noise (4 s each) at a nonaversive volume (maximum = 70.4 dB(A)]. Vertical lines below the timeline indicate the rating time points. Extended Data Figure 1-1 shows the specific trial orders 1 and 2 that were used. C, Operationalization of the in-phase group versus the out-of-phase group. Fear conditioning for both groups was identical to the only exception that the in-phase group received the 12 CS 1 US pairings during acquisition without a phase shift (0°) and the out-of-phase group received the CS 1 US pairings with phase shifts of 90°, 180°, and 270°(four trials each). In C, the top row shows a simplified depiction of a CS changing luminance at 4 Hz for 750 ms. The bottom part of C shows the first 750 ms of an overlapping CS 1 US presentation for the two groups. The light gray curve shows the luminance of the CS 1 (each vertical line shows one step following the 85 Hz refresh rate of the monitor). The black (0°phase shift), dark gray (90°), yellow (180°), and blue (270°) graphs show a downsampled representation of the 4 Hz modulated, white noise US. 1993; Quirk et al., 1995). This temporal difference in processing from receptors to the afferent neurons in the CNS has to be considered when entraining the brain with multisensory information in a synchronous way. Thus, adjusting for a temporal delay in one modality is necessary to achieve theta synchronization of visual CS and auditory US in the sensory cortices and facilitating synchrony of both inputs on the LA. CS presentation followed one of two random sequences (Extended Data Fig. 1-1), with trial order constrained to not allow more than two consecutive CSs of the same orientation. Additionally, the acquisition phase started with a booster sequence, where five of the first seven trials were CS 1 -US pairings.
The following immediate extinction, and the 24 h delayed recall phase comprised only CS presentations (12 times each, no US), as in the habituation phase. The aversive US was not presented at any point except in the 12 CS 1 -US pairings in the acquisition phase. However, neither before immediate extinction learning nor before 24 h delayed recall, participants were informed that no US will occur in the following stimulation phase.
Between the end of one CS presentation and the beginning of the next one, a black screen was shown for 3-5 s (random from a uniform distribution) during learning phases (1.5-3 s during the unimodal audio task), followed by a white fixation cross at the center of the screen for 1.5 s, resulting in an intertrial interval (ITI) between 4.5 and 6.5 s.
In accordance with the study by Clouter et al. (2017), the 12 pairings of the 4 Hz modulated CS 1 and US were realized with either 0°(i.e., 0 ms) phase shift (in-phase group), or 90°(62.5 ms), 180°(125 ms), and 270°(187.5 ms) phase shifts for four USs each (out-of-phase group; Fig. 1C). Accounting for the 40 ms delay between rapid auditory and later visual processing times, input with a phase lag of 0°c auses phase-synchronized cortical activity in the visual and auditory cortex (Clouter et al., 2017). This synchronized activity at the primary cortices is expected to increase the synchronized afferent signals reaching the amygdala, where CS-US convergence occurs, hence supporting associative plasticity in the lateral amygdala (Blair et al., 2001;Bocchio et al., 2017). In contrast, phase lags between 90°and 270°(i.e., timing shifts of 62.5-187.5 ms) should result in a suboptimal level of excitability and therefore decrease the likelihood of synaptic changes. The experiment was conducted in an electromagnetically shielded and sound-attenuated experimental chamber, Figure 2. Processing steps and validation of in-phase versus out-of-phase stimulation. A, Processing example (one trial of one participant) of our audio (microphone in front of the participant's speakers) and video signal (photodiode attached to the participants' monitor). Data were segmented relative to the onset of a US (i.e., 12 segments per subject). Before analysis, video data were shifted 40 ms forward in time to account for the 40 ms time shift programmed into the stimulus presentation. Data were rectified, bandpass filtered between 3 and 5 Hz, and subjected to a Hilbert transform. Instantaneous phase information at 4 Hz was extracted from the imaginary part of the analytic signal. B, Visualization of in-phase (left column) and out-of-phase (right column) CS 1 -US stimulation for all CS 1 -US trials and all participants (12 Â 20 trials per group). Each thin orange line shows the video signal of one participant and one trial. Each thin blue line shows the audio signal (one participant and trial). In B, the top rows show bandpass-filtered data; the middle row shows the extracted phase information; and at the bottom, polar histograms show the clustering of all phase differences per group.
where participants were seated in a comfortable chair positioned centrally in front of the monitor. The experiment, including all stimuli, was created in MATLAB (version 2019b; RRID:SCR_001622) using the Psychophysics Toolbox (RRID:SCR_002881; Brainard, 1997;Kleiner et al., 2007).

Sensory stimulation validation
To validate the temporal fidelity of the stimulation, we analyzed data from two sources: a photograph diode (photograph sensor; Brain Products) attached to the participant's monitor; and a microphone (built into a StimTrak, BrainVision) positioned in front of the participant's speakers. Both signals were recorded at 1000 Hz with a recorder (BrainVision). The photograph diode was placed over the upper right corner of the CRT monitor where a smaller version of the Gabor gratings appeared during the same monitor refresh cycle (and far outside of the CS presentation area) in the same sinusoidal luminance modulation as the original CS gratings. Pilot studies using photograph sensors at both the upper corner (small-test Gabor) and the central screen (actual CS grating) showed excellent synchrony of both stimuli. The onset of the central grating was consistently 0.5 refresh cycles after the onset of the miniature grating in the top left corner (i.e., ;5.9 ms). Using an Analyzer (BrainVision), data from the photograph sensor and the microphone were segmented from À320 to 2500 ms relative to the onset of a US (i.e., 12 segments per subject) and visually inspected for artifacts. A subset of segments was excluded, as microphone data were corrupted or missing because of the failure of the StimTrack batteries (in 5 of 40 participants). For visualization, the remaining data were exported to MATLAB and rescaled from -p to 1p . Further analysis was computed over 7.02 cycles of microphone and video data (1755 ms), disregarding the last cycle of audio stimulation. Video data from À40 to 11715 ms relative to US onset were used. In contrast, microphone data from 0 to 1755 ms entered analysis. This effectively shifts video data 40 ms forward in time to account for the 40 ms time shift programmed into the stimulus presentation. Microphone data were first rectified (square root of the signal squared). Both the photograph sensor and microphone channels were bandpass filtered between 3 and 5 Hz, using the bandpass function of the MATLAB Signal Processing Toolbox with an IIR (infinite impulse response) filter (60 dB attenuation at the edge frequencies) and a steepness of 0.95. Instantaneous phase information at 4 Hz for the audio and video signals was extracted from the imaginary part of the analytic signal after a Hilbert transform ( Fig. 2A).
This analysis Fig. 2B also demonstrates that there was very little variability in the timing of sensory stimulation within a trial, as well as between trials and between participants of one group.

Dependent variables
Steady-state visually evoked potentials (ssVEPs; via EEG), skin conductance responses (SCRs), and subjective ratings served as dependent outcomes. Further, horizontal and vertical eye movements were recorded by electrooculography (EOG) with a bipolar BrainAmpExG Amplifier (BrainProducts) to detect and eliminate artifacts in the EEG recordings.

EEG parameters
EEG recording and preprocessing. A 64-channel EEG was recorded on both days with two 32-channel BrainAmp DC amplifiers with a resolution of 0.1 mV (Brain Products). The 64 active electrodes (Ag/AgCl, actiCAP, Brain Products) were filled with electrolyte gel (Super-Visc 10% NaCl, EasyCap) and positioned according to the extended international 10-20 system. Efforts were made to keep impedances ,5 kV (manufacturer recommendation, ,25 kV). FCz served as the recording reference, and AFz served as the ground. A sampling rate of 1000 Hz and a high-pass filter at 0.016 Hz were used. In addition to the EEG, EOG was recorded with four Ag/AgCl electrodes (Ø, 4 mm) to control for eye movements. Two electrodes were placed on the lateral canthus of each eye for horizontal movements, and two electrodes were placed infraorbital and supraorbital, in line with pupil of the right eye, for vertical movements. An electrode on the forehead was attached as the ground.
Offline preprocessing was done with Analyzer 2 Software (version 2.1.2.327; BrainVision). Raw data were bandpass filtered between 1 and 100 Hz using Butterworth (zero phase shift) filters with a 3 dB low cutoff at 1 Hz (time constant, 0.1592; order 8) and a 3 dB high cutoff at 100 Hz (order 4). Additional 50 and 100 Hz (bandwidth, 1 Hz; order 4) notch filters were applied to eliminate line noise. Data were segmented from À1250 to 7500 ms relative to a CS onset, and an ocular correction independent component analysis (ICA), as implemented in BrainVision Analyzer, was applied. After visual inspection of the resulting factors and factor topographies, factors related to horizontal and vertical eye movements, blinks, as well as strong cardiac or muscular artifacts were removed from the reconstructed data. ICA-corrected data were rereferenced to an average reference, and the recording reference was reincluded in the data as a 65th channel at position FCz. The segments were cut to an interval between À1000 and 5000 ms relative to CS onset. With this segmentation, the US intervals were excluded from further analyses to avoid contamination of our EEG data. After another visual inspection, we rejected segments with remaining artifacts. On average, 3.93 segments were rejected per participant (0-15 rejected of 240 segments for each participant). Data were downsampled to 512 Hz, in accordance with the findings of Clouter et al. (2017), and were exported to MATLAB (MathWorks; RRID:SCR_00162). To increase spatial specificity, reduce volume conduction effects, and obtain reference-free data, we conducted a scalp current source density (CSD) transform (Junghöfer et al., 1997). The CSD values (as estimates of cortical surface potentials) are represented on a sphere, approximating a cortical surface. For scalp-level analyses and topographical mapping, the CSD was projected back onto the original electrode space. Analyses were performed on CSD-transformed data, and CSD data are shown throughout the figures.
Validation of visual and auditory entrainment (unimodal). To validate the visual cortical entrainment at 4 Hz, we first averaged CSD-transformed data over all habituation trials at each sensor of a participant in the time domain [disregarding CS orientations (i.e., averaging 5 Â 12 trials per subject)]. Habituation trials were not only strictly unimodal visual stimulation but preceded any pairing of the CS with the aversive US. To avoid early event related potentials entering the frequency domain analysis, Fourier transform was applied on data from 750 to 5000 ms (i.e., containing 17 cycles of 4 Hz) post-CS onset. These data were windowed with a cosine-square window (20 point rise/fall) and subjected to a discrete Fourier transform (MATLAB) with a frequency resolution of 0.24 Hz. We extracted the absolute values of the Fourier coefficients at 4 Hz and transformed the resulting power values to signal-to-noise ratios (SNRs), using the average of five frequency bins ,4 Hz and four frequency bins .4 Hz.
To validate auditory entrainment, we used the unimodal audio task at the end of day 2, as it reflected 4 Hz unimodal auditory stimulation, without concurrent visual stimulation. EEG data from the audio-only task were subjected to the same preprocessing pipeline as CS-related data. As for the visual unimodal data described above, CSD data were segmented (here from À1000 to 4100 ms, relative to audio stimulus onset) and averaged per participant and sensor over the 75 audio-only trials. Fourier transform was applied on windowed data (cosine-square, 20 point rise/fall) starting from 500 to 4000 ms after audio onset (i.e., containing 14 cycles of 4 Hz), resulting in a frequency resolution of 0.29 Hz. Like for the visual stimulation, we converted the power at 4 Hz to SNRs, using the average of the five frequency bins below and four above the frequency of interest as noise estimates. Figure 3 shows the scalp distribution of the 4 Hz SNR averaged over participants (N = 40), for the visual (Fig. 3A) 4 Hz stimulation and the auditory 4 Hz stimulation (Fig.  3B). The average topographies are consistent with typical visual and auditory steady-state evoked potential at 4 Hz, respectively. Specifically, the relatively low driving frequency of 4 Hz has traditionally been shown to prompt larger spread of the ssVEP topography, reflective of longer individual stimulation cycles, which allow spreading across the visual hierarchy (Skrandies, 2007). However, the topographies of the 4 Hz EEG signal showed some variation between subjects. Therefore, for subsequent analyses in the frequency domain, including single-trial analyses of CS-related activity, we selected the six individual sensors for each participant showing the highest SNR at 4 Hz.
ssVEP single-trial analysis. We conducted a single-trial analysis to be able to show the temporal evolution of visual cortical engagement over the course of learning trials. For single-trial analysis, we used data segments between À1000 and 5000 ms, relative to CS onset as 0 (in sample points at 512 Hz sample rate, this is 1-3072 sample points with zero being sample point 512). First, we sampled it up from 512 to 1536 Hz. Upsampling the data ensured an integer number of sampling points per one cycle for the 4 Hz as well as its harmonics (up to 16 Hz). At 1536 Hz, one cycle of the driving frequency (4 Hz) is 384 samples (instead of 128 at 512 Hz). By subtracting the mean of the 1000 ms prestimulus interval, the data were baseline corrected. The power extraction of single trials was based on the analysis window between 750 and 5000 ms (relative to 0 ms = CS onset). Over this analysis window, a moving average procedure was conducted. We obtained averages by shifting a window with a length of four cycles of the frequency of interest (i.e., 4 Hz) across the detrended data segments in steps of one cycle and averaging the contents of the window with each step (12 steps, last fourcycle step starting at 3000 ms after CS onset). We then transformed the single-trial estimates from the time into the frequency domain using discrete Fourier transform (DFT) and extracted the power at the driving frequency as the absolute of the Fourier coefficients, normalized by the length of FFT (here, 1536 sample points). Interindividual variance in response strength and preexperimental bias was corrected by calculating a habituation ratio for each CS (via division by habituation mean over all 60 trials of each participant, disregarding the different CS orientations), with values .1 describing an enhancement and ,1 describing a decrease of ssVEP power compared with habituation. In addition, single-trial data were smoothed with a moving average along the 12 trials (5 point symmetrical, shrinking at the end points) within each learning phase and CS orientation (5 orientations Â four learning phases with 12 trials each). For plots showing the temporal evolution of ssVEP over trials, we pooled data over sensors as the average of the individually defined six maximal SNR sensors for each participant. The individual sensors entering this six-sensor cluster were defined as the six sensors showing the highest SNR at 4 Hz during habituation trials for a participant [see above, Validation of visual and auditory entrainment (unimodal)]. Of note: while single-trial data are interesting and informative, we have no prior evidence allowing us to formulate specific hypotheses about group differences (in phase vs out of phase) in the temporal dynamics of ssVEP. Therefore, these data were averaged over all trials of a learning phase before statistical testing for group effects.
Skin conductance responses and electrocardiography, and blood pressure. In addition to ssVEP power tunings toward specific CS gratings, we used SCRs as a common measure of learning-induced changes in physiological arousal to the CS. As our laboratory is configured for stress-associated questions by default, we also recorded electrocardiography (EKG) and blood pressure (BP) as control parameters only. EKG and BP will not be reported in the Results section. We used a Brainamp ExG amplifier (Brain Products) and a 0.5 V constant voltage coupler to record SCRs with a sampling rate of 1000 Hz and a resolution of 0.0061 mS. We attached two Ø 10 mm (inner diameter) electrodes, filled with 0.05% NaCl paste (TD-246) on the thenar and hypothenar of the left hand (nondominant) of each participant (Boucsein et al., 2012). No additional filters were applied. Data were downsampled to 200 Hz in BrainVision Analyzer 2.1 and exported to MATLAB. Responses with an onset latency between 1 and 4 s and a minimum amplitude of 0.02 mS were automatically scored using Ledalab (Benedek and Kaernbach, 2010). If more than one response met the criteria, single responses were summed up. Responses that did not meet the criteria were scored as zero. After Ledalab scoring, we used an additional visual inspection of heat maps of single trials and corrected 42 values (of 240 trials Â 40 participants = 9600 total values) that were overscored or underscored by Ledalab. To further correct for interindividual differences and push distribution toward normal, we calculated z-values using the means and SDs of CS and US responses of all learning phases (habituation, acquisition, immediate extinction, and delayed recall) per participant. In accordance with ssVEPs, single-trial data were smoothed with a moving average along the 12 trials (5 point symmetrical, shrinking at the end points) within each learning phase and CS orientation (5 orientations Â 4 learning phases with 12 trials each). The z-standardized SCRs were then averaged across the 12 trials of each learning phase, and the averages were used in all statistical analyses.
For the recording of EKG, we positioned three Ø 8 mm (inner diameter) electrodes (filled with 5% NaCl EKG paste, GE Medical Systems Information Technologies) under the right collarbone, the left shinbone, and (as ground electrode) on the right shinbone.
Systolic and diastolic BP were measured at discrete measurement points using a semiautomatic electronic sphygmomanometer (bosotron 2, Bosch 1 Sohn). Therefore, an inflatable cuff was placed around the left upper arm, with the sensor plate positioned over the brachial artery at heart level.
Subjective ratings: valence, arousal, and US expectancy A paper-pencil version of the 9 point pictorial Self-Assessment Manikin (SAM; Bradley and Lang, 1994) scale was used to evaluate each CS orientation for its valence (from negative to positive) and arousal (from excited to calm). Ratings were conducted after habituation, acquisition, and immediate extinction, as well as before and after 24 h delayed recall. In addition, we asked the participants to rate their expectancy that a US occurs with the depicted grating with answers ranging from À5 (very certain, no), to 0 (uncertain), to 5 (very certain, yes). Except for after habituation, paperÀpencil US expectancy ratings were conducted together with our SAM ratings.

Overall procedure
The study covered the following two parts: the screening session, lasting ;1 h, explaining the general procedure of the main session, testing for inclusion and exclusion criteria and obtaining informed consent (for description, see the Participants section); and the main conditioning study. Screening and the main study were scheduled on different days.

Main conditioning study
The main session was conducted on two consecutive days, starting at 10:00 A.M., 2:00 P.M., or 5:30 P.M. The duration of day 1 and day 2 of the main session were 2 and 1 h, respectively. At the beginning of day 1 and day 2, we attached EEG, EKG, EOG, and SCR electrodes, and positioned the inflatable cuff for BP measures. Habituation, acquisition, and immediate extinction took place on day 1, while a 24 h delayed recall took place on day 2. After each learning phase on day 1 (i.e., after habituation, acquisition, and immediate extinction) as well as before and after delayed recall on day 2, resting periods, SAM and US expectancy ratings (except after habituation, where SAMs were conducted without US expectancy ratings, since no US has occurred), and EKG, SCR, as well as BP measures were performed (Fig. 1B). Before starting the computer task, we read the standardized "general information about the experiment," including a description of the procedure and the stimuli we were about to present. Subjects were instructed to sit comfortably and avoid any movements (except eye blinking) for the entire duration of the computer tasks and the explicitly announced measurement periods. During the resting phases, subjects were encouraged to move carefully to avoid the detachment of electrodes. At the beginning of habituation, subjects were asked to fixate on an upcoming white cross in the center of the screen followed by some black and white "flickering" gratings. Before acquisition, we informed the participants that a loud flickering noise would be presented with only one of the gratings. However, we did not specify which of the five orientations would predict the aversive noise. Before immediate extinction (day 1) and delayed recall (day 2), participants were asked to remember the instructions, without informing them that no aversive noise would be presented.
As this is the first attempt to compare a synchronized versus nonsynchronized condition with a fear generalization design, we hypothesized that group differences may manifest in one of the following three possible ways: (1) synchronized CS-US presentation may lead to major increases in CS responding not limited to the CS 1 (this would be evident in a main effect group in the ANOVA); (2) synchronized CS-US presentation may dramatically change the pattern of responding over the five different CSs (this could be evident in an orientation Â group interaction in the ANOVA); and (3) finally, synchronized CS-US presentation may alter the width of the generalization curve. This could optimize learning, resulting in a narrower generalization and thus better discrimination among the five CSs, without changing overall response levels or dramatically changing the response pattern. ANOVA interactions would not be able to detect this. Therefore, we designed a custom contrast for the group Â orientation interaction, using the LMATRIX command for contrast coefficient matrices in SPSS. We obtained the contrast weights by subtracting a broader generalization profile (Fig. 4B, orange line and font) from a narrower generalization profile (Fig. 4B, blue line and font), resulting in a form resembling a "Mexican Hat" (weights: 0.142, À0.489, 0.694, À0.489, and 0.142). We expected group differences to manifest during (or directly after) acquisition. However, to explore the longevity of potential group effects, we repeated our 5 Â 2 ANOVA and the custom Mexican Hat group Â orientation contrast for immediate extinction and delayed recall on day 2.

Results
Phase synchronization causes a better discrimination between CS 1 and neighboring CSgratings in the US expectancy ratings We found an effect of orientation on US expectancies collected immediately after the CS 1 was repeatedly aversively reinforced during acquisition (F (3,109) = 12.491, p = 6.764E-7, partial h 2 = 0.247; Table 1, a). The resulting pattern reflected generalization around the CS 1 (Fig. 5), with the CS 1 and the most similar gratings receiving the highest US expectancy scores (generalization contrast fit: F (1,38) = 28.360, p = 0.000005, partial h 2 = 0.427, Table 1, b). In addition, data revealed a main effect of group (F (1,38) = 7.310, p = 0.010, partial h 2 = 0.161; Table 1, c), but no group Â orientation interaction (Table 1, d). Here, the out-   Table shows statistical analyses including p value and effect size for each memory outcome measure, separated by learning phase. For each outcome measure, we calculated repeated-measures ANOVAs with the CS orientation as the within-subject factor and the group (in-phase group vs out-of-phase group) as the between-subject factor. Successful conditioning (i.e., increased response toward the CS 1 respective of group) was validated by the main effects of orientations (noted in the column effects as ME: o). To account for the specific symmetric generalization pattern (CS 1 in the middle), additional generalization contrast fits were used (noted as GEN). The main effects of group (ME: g) and group Â orientation interactions (o Â g INT) addressed differences between in-phase and outof-phase conditioning. Better grating discrimination versus stronger generalization across orientations are described by a Mexican Hat contrast fit for the group Â orientation interactions (MEX). ANOVA, Mixed repeated-measures ANOVA; ME, main effect; o, orientation; h 2 p , partial h 2 ; g, group; MEX, Mexican Hat contrast fit of orientation Â group interaction; INT, interaction; GEN, generalization fit; Val, valence; Aro, arousal. of-phase group showed broader generalization of the US expectancy ratings, while the in-phase group had a narrower generalization pattern with more discrimination between the CS 1 and the four CS - (Fig. 5). This was supported by a significant orientation Â group interaction in the form of a Mexican Hat (F (1,38) = 4.796, p = 0.035, partial h 2 = 0.112; Table 1, e). As a comprehensive index of CS discrimination (i.e., CS 1 vs average of all CS -), we calculated discrimination indices by subtracting the weighted average of CSresponses from the CS 1 responses (Extended Data Fig. 5-1). To account for the fact that the 35°and 55°CSorientations only differ from the CS 1 by 10°and are thus harder to discriminate, these orientations were multiplied with a weight of 0.33[...] before averaging. The more dissimilar orientations (25°, 65°) differ by 20°from the CS 1 and are easier to discriminate. Therefore, these two were weighted with 0.166[...] (i.e., half of the weight of the more similar orientations). Although the CSweights account for the perceptual difference, they are not directly derived from a psychophysics curve. Extended Data Figure 5-1 depicts estimation statistics for the discrimination indices within each learning phase by presenting individual values as well as the effect sizes (Hedge's g) as a bootstrap 95% confidence interval (5000 samples; Ho et al., 2019). To increase transparency, Extended Data Figure 5-2 shows the same for a discrimination index computed with the unweighted average of the four CSvalues.
For US expectancy ratings collected after extinction, we found no main effect of group or a group Â orientation interaction (Table 1, f, g). However, even after extinction trials, the in-phase group showed a narrower generalization pattern than the out-of-phase group (Fig. 5). Mexican Hat contrast fit for the orientation Â group interaction (F (1,38) = 6.660, p = 0.014, partial h 2 = 0.149, Table 1, h). On day 2, 24 h later (Fig. 5, Extended Data Fig. 5-1, day 2 before delayed recall) we found no group differences in US expectancy ratings (no main effect group, no orientation Â group interaction; Table 1, i, j), and the generalization was no longer significantly narrower in the in-phase group (Mexican Hat orientation Â group interaction; Table 1, k).
Synchronization leads to a narrower rating pattern toward the CS 1 in valence and arousal ratings after fear acquisition For both, valence and arousal ratings after acquisition (Fig. 6), we found a prioritization of the CS 1 similar to that for US expectancy (main effect orientation: valence: F (3,96) = 7.756; p = 0.000272; partial h 2 = 0.170;  1, r, s). However, in both measures the in-phase group showed a narrower generalization than the out-of-phase group (Fig. 6). This was evident in significant orientation Â group interactions in the form of a Mexican Hat for valence (F (1,38) = 9.228; p = 0.004; partial h 2 = 0.195; Table 1, t) and arousal (F (1,38) = 7.325; p = 0.010; partial h 2 = 0.162; Table 1, u). The discrimination indices (CS 1 vs averaged CS -) as well as estimation plots, including individual values and effect sizes, are additionally presented in Extended Data Figure 6-1.
After extinction, there were no effects of synchronization in valence (group main effect or orientation Â group interaction; Table 1, v, w) or arousal (group main effect or orientation Â group interaction; Table 1, x, y). The same was true for valence and arousal ratings on day 2 (group Figure 5. US expectancy ratings separated for each measurement point: after acquisition, after extinction on day 1, and before delayed recall on day 2 in the in-phase and the out-of-phase groups. US expectancy was rated per CS on scale ranging from À5 (very certain, no US after this CS) over 0 (uncertain) to 5 (very certain, a US will follow this CS). Each data point presents the mean US expectancy rating for each CS orientation (averaged over participants per group and measurement point), error bars show 1 SEM. Extended Data Figure 5-1 shows discrimination indices (CS 1 minus the weighted average of all CS -) and estimation statistics for US expectancy ratings. For transparency, Extended Data Figure 5-2 shows discrimination indices that result when subtracting the unweighted average of the CSfrom the CS 1 . main effect and orientation Â group interaction: valence, SCRs showed the typical increase toward the reinforced CS 1 but were unaffected by the synchronization conditions Figure 7, A and B, depicts the SCRs on a trial-by-trial basis to visualize the temporal dynamics of moving-averaged and z-transformed SCRs. In addition, z-values (i.e., without moving average) SCRs are presented in Extended Data Figure 7-1. However, as single trials are subject to noise, SCRs were analyzed using averaged data (Fig. 7C), as described in the Materials and Methods section. Pairing the CS 1 orientation with the aversive US within acquisition led to the predicted increase of SCR toward the reinforced grating (main effect orientation: F (3,96) = 14.856, p = 3.1057E-7, partial h 2 = 0.281; Table 1, dd). The response pattern was described by generalization around the CS 1 (generalization fit: F (1,38) = 31.987, p = 0.000002, partial h 2 = 0.457; Table 1, ee). However, this was independent of group (main effect group and orientation Â group interaction; Table 1, ff, gg, Extended Data Fig. 7-2 for discrimination indices and estimation statistics). Looking at Figure 7 (Extended Data Fig. 7-1), it is unusual that SCRs toward the CS 1 seem already increased on the very first trial Figure 6. A, B, Valence ratings (A) and arousal ratings (B) separated for each measurement point: after habituation, after acquisition, after extinction (day 1), and before delayed recall (day 2). Valence was rated with the Self-Assessment Manikin on a 9-point scale from 1 (unpleasant) to 9 (pleasant). For better comparability with arousal ratings, valence ratings were recoded, changing the scale from 1 (pleasant) to 9 (unpleasant). Arousal was also rated with the Self-Assessment Manikin, here ranging from 1 (calm) to 9 (arousing). Each data point presents valence or arousal ratings, respectively, for each CS orientation (averaged over participants per group and measurement point), error bars show 1 SEM. Note: for better visualization, the y-axis is scaled from 3 to 8 instead of showing the full range from 1 to 9. Extended Data Figure 6-1 shows discrimination indices (CS 1 minus the weighted average of all CS -) and estimation statistics of valence and arousal data. Extended Data Figure 6-2 additionally shows the discrimination indices that use the unweighted average of all CSvalues for subtraction. of acquisition, independent of the applied smoothing procedure (Extended Data Fig. 7-1, unsmoothed data). However, explorative analyses of group differences without the first trial did not change the results (i.e., there was still no overall difference between groups and no significant orientation Â group interaction).
During extinction, there was no difference between groups (main effect group and orientation Â group interaction; Table 1, hh, ii) and also on day 2 synchronization had no effects (main effect group and orientation Â group interaction; Table 1, jj, kk).
ssVEP power revealed a tuning toward the visual CS 1 that was similar in both groups Figure 8, A and B, depicts ssVEPs on trial-by-trial basis to visualize temporal dynamics, and Extended Data Figure 8-1 shows ssVEP ratios without a moving average. However, as for the SCRs, ssVEPs were analyzed using averaged data (Fig. 8C), as described in the Materials and Methods section.
ssVEPs during acquisition revealed a conditioned power increase toward the CS 1 and neighboring gratings (main effect orientation: F (4,137) = 5.696, p = 0.000479, partial h 2 = 0.130; Table 1, ll). It was described by a generalization pattern around the CS 1 (generalization fit: F (1,38) = 8.447, p = 0.006, partial h 2 = 0.182; Table 1, mm). However, this prioritization was not affected by group (orientation Â group interaction; Table 1, nn). In similarity to SCRs during acquisition, Figure 8, A and B, indicates an increased ssVEP ratio toward the CS 1 on the very first trial. However, considering the unsmoothed data in Extended Data Figure 8-1, the power increase here seems to be an artifact of the applied smoothing procedure. As depicted in Figure 8C, synchronization also had no effects on ssVEPs in extinction. Consequently, we found neither a significant main effect of group nor an orientation Â group interaction (Table 1, oo, pp). Intriguingly, Figure 7. A-C, Single-trial (A, B) and averaged (C) skin conductance responses. Single-trial SCRs are separated by the synchronization condition into the in-phase group (0°phase offset; A) and the out-of-phase group (90°, 180°, and 270°phase offset; B). Single-trial data are z-transformed SCRs, averaged over participants per group for each trial and CS orientation. Before averaging, data were smoothed over the 12 trials of a learning phase using a moving average (5 points long, symmetrical, shrinking at the end points). C depicts averaged data over 12 trials of habituation, acquisition, extinction, and delayed recall to visualize the response patterns within each learning phase. Here, each data point presents z-transformed SCRs of each CS orientation averaged over participants and trials per group. The z-transformation was calculated with the means and SDs over CS and US responses of all learning phases (habituation, acquisition, immediate extinction, delayed recall) per participant. Error bars show 61 SEM. Extended Data Figure 7-1 shows single-trial SCR data without smoothing (i.e., no moving average). Extended Data Figure 7-2 shows discrimination indices (CS 1 minus the weighted average of all CS -) for SCR and estimation statistics. Extended Data Figure 7-3 depicts discrimination indices without weighting the averaged CSvalues. the ssVEP power during delayed recall on day 2 was generally higher in the in-phase group than in the out-ofphase group (F (1,38) = 5.354, p = 0.026, partial h 2 = 0.123, Table 1, qq), although this effect was independent of orientation (orientation Â group interaction; Fig. 8C, Table 1, rr). In accordance with ratings and the SCRs, discrimination indices (weighted CS 1 minus averaged CSgratings) and estimation plots depicting individual values and effects sizes are presented in Extended Data Figure 8-2.

Discussion
The formation of associative memories is an elemental aspect of human behavior, but its underlying neurocomputations are largely unknown. One group of theoretical notions has emphasized the role of phase-synchronized oscillations for binding representations of conditioned cues to behavioral outcomes (Headley and Paré, 2017). Recent research has increasingly used external rhythmic stimulation to test the role of phase relations in specific frequency bands for the formation of working memory (Polanía et al., 2012;Violante et al., 2017) and audiovisual associations (Clouter et al., 2017;Wang et al., 2018). Based on these previous findings, we applied this method for the first time in a fear-conditioning paradigm. Modulating the phase shift of a visual CS and aversive auditory US that was presented in the memory-relevant theta frequency allowed us to causally interpret phase synchronization in fear conditioning. To assess the various response systems that are important in fear learning (Lang et al., 2000), we measured skin conductance responses, indexing the physiological arousal of fear; the ratings of valence and arousal to capture the subjective evaluation of each stimulus; and US expectancy, which assesses the participant's knowledge of the CS-US association. Additionally, ssVEPs provided information about visuocortical engagement and tuning patterns in sensory processing. Figure 8. A-C, Single-trial (A, B) and averaged (C) power of the 4 Hz ssVEPs for each learning phase (habituation, acquisition, extinction, and delayed recall). Single-trial data are separated by the synchronization condition into the in-phase group (0°phase offset; A) and the out-of-phase group (90°, 180°, and 270°phase offset; B). The ssVEP power is shown as the SNR at 4 Hz, corrected for habituation-level responding. Correction was performed by dividing individual SNR values by the average SNR from habituation (mean over all 60 trials of each participant, disregarding the different CS orientations). Therefore, values .1 describe an enhancement, and values ,1 describe a decrease of ssVEP-SNR at 4 Hz relative to habituation. Single-trial data were smoothed over trials via a moving average along the 12 trials of each learning phase (5 point symmetrical shrinking at the end points). Each data point in A and B represents habituation corrected SNR for each trial and CS orientation, averaged over participants per group. C depicts data averaged over the 12 trials of habituation, acquisition, extinction, and delayed recall to visualize the response patterns within each phase. Error bars show 61 SEM. Note: habituation data in C are nearly "flat" at ;1 because of the habituation correction, as described above and in the Materials and Methods section. Extended Data Figure  The measures we collected in the current study reflect different facets of the associative conditioning process and as such responded differently to the experimental manipulations. In accordance with our hypothesis, synchronized CS-US presentation facilitated the ability to identify the CS 1 as the grating that was most likely followed by the aversive US. Remarkably, participants that received synchronized CS-US presentation discriminated the CS 1 more precisely from the neighboring CSgratings (which only differed in an orientation shift of 10°). Participants in the out-ofphase group, in contrast, generalized across the CS 1 or the most similar CSgratings. We therefore conclude that the synchronous input of two (multimodal) stimuli stemming from two sensory modalities strengthens the cognitive representation of the CS-US association.
Consistent with the US expectancies, the subjective valence and arousal ratings reflected the effects of phase synchronization. While participants who received inphase stimulation were more sensitive to changes in the perceived valence and arousal of the CS 1 , participants in the out-of-phase group reported generalized arousal and unpleasantness across the CS 1 and neighboring CSgratings. Hence, synchronous input not only sharpens the cognitive representation of CS-US contingency but seems to have a similar influence on the affective evaluation.
Surprisingly, there were no corresponding effects in the SCR or ssVEP data. Considering SCR data, during acquisition both groups showed the strongest response toward the CS 1 grating, independent of synchronization. Especially in the single-trial data, however, the in-phase group appears to respond stronger to the CS 1 , which seems to be more pronounced in the first trials. We therefore exploratively tested the potential group difference by segmenting the trials into trial blocks (three blocks with four trials per block). However, adding this within factor to our statistical analysis did not reveal any significant differences between the groups in different phases of acquisition. One possible explanation of the higher SCRs in the beginning of acquisition might be the booster sequence that was used (i.e., five of the first seven gratings were CS 1 gratings). The booster and the applied criterion to not allow more than two consecutive CS of the same orientation might also be the reason for another unusual observation within our SCR results: irrespective of the factor group and independent of the applied smoothing procedure, SCR toward the CS 1 was already increased on the very first trial. While we applied the booster sequence for a better comparability with previous findings (McTeague et al., 2015;Antov et al., 2020), future studies should consider a different approach to minimize the similar trial order effect. Nevertheless, there were no differences between the in-phase and out-of-phase group; thus, this observation does not change the following interpretations.
An effect of phase synchronization was also missing in the ssVEP-data. Although we were able to detect a tuning pattern with the greatest power for the reinforced CS 1 grating for 4 Hz stimulus presentation as previously described for stimulation in the low beta range (McTeague et al., 2015;Antov et al., 2020), the pattern did not differ between in-phase and out-of-phase group.
A possible explanation for the observed discrepancies in the different variables could be the involvement of different memory types we might have assessed with our measures. Although fear conditioning is a well established and widely used paradigm, it is difficult to strictly distinguish the mechanisms behind each response system. For example, skin conductance responses measure physiological arousal during fear learning (although it is not restricted to fear conditioning) and is often considered as a measure of the unaware fear reaction, especially dependent on the amygdala (Knight et al., 2003(Knight et al., , 2006Christopoulos et al., 2019; but see also Lovibond and Shanks, 2002;Sevenster et al., 2014). US expectancy ratings, on the contrary, are considered to specifically reflect declarative knowledge of the CS-US contingency (Boddez et al., 2013), which is known to include additional brain structures like the hippocampus. Bechara et al. (1995) observed a neural dissociation between implicit and explicit aspects of a fearconditioning procedure. While patients with bilateral amygdala lesions were unable to elicit SCRs but had an intact memory for the declarative facts, patients with bilateral lesions of the hippocampus showed the exact opposite effects (i.e., they acquired SCRs but failed to recall declarative facts). Speculating that the effects of visual-auditory stimulation is not only restricted to early sensory cortices, but influences deeper brain regions in the course of rhythmic processing, our results might be explainable based on these distinct systems: theta phase synchronization might especially modulate the path involved in forming declarative facts about the CS-US pairing (i.e., US expectancy ratings), probably including the hippocampus, without influencing the emotional conditioning comprising the amygdala. One possible mechanism could be that the phase-synchronous visual CS 1 and auditory US simultaneously arrive at neural populations in the hippocampus, increasing the likelihood of long-term potentiation and thereby enhancing synaptic strength (Fell and Axmacher, 2011). Although the EEG method used here does not allow the drawing of conclusions about mechanisms at the synaptic level in subcortical structures, one might also speculate why the heightened CS-US association in the rating data is not reflected in metrics thought to reflect limbic processing (i.e., the SCRs). There are two potential explanations that we highlight in this context. (1) In various species, theta-phase synchronization has predominantly been examined in the context of LTP in the hippocampus (Huerta and Lisman, 1995;Buzsáki, 2002;Hyman et al., 2003;Lega et al., 2012), which is specifically relevant in the formation of declarative memory (Eichenbaum, 1999;Clouter et al., 2017;Wang et al., 2018). Thus, one may speculate that theta-phase synchronization is linked to hippocampusdependent processes, whereas the exact timing of CS and US may play a lesser role in amygdala-dependent fear learning. However, some studies have found thetaphase synchronization between the amygdala and other important structures of the fear circuit (e.g., hippocampus, ventrolateral PFC, anterior gyrus cinguli) as well as within the subnuclei of the amygdala (Seidenbecher et al., 2003;Karalis et al., 2016;Taub et al., 2018;Zheng et al., 2019; for review, see Bocchio et al., 2017;Çalişkan and Stork, 2018) during different stages of the fear-conditioning process, supporting the general influence of theta synchronization during fear memory formation. As such, future work may characterize the role of synchronization within and between specific brain regions for the establishing and maintenance of fear memories. (2) More importantly, however, is the question whether synchronized theta rhythms propagate to the amygdala. Sensory information reaches the amygdala via multiple pathways, among which some are faster and subcortical, or "low," routes; and others are slower, or "high," cortical routes (Silverstein and Ingvar, 2015). Since our method of visual and auditory synchronized (vs asynchronized) theta stimulation is delivered globally and is unlikely to target one specific pathway, the timing might not have been suitable to enable locally specific synchronization. Considering that we used a generalization paradigm with similar CS gratings, we may offer the speculation that the challenging discrimination of the CS 1 requires a more demanding processing via the slow, cortical route, while the simple aversive US reaches the amygdala via the fast, subcortical pathway. Thus, the 40 ms we added to the US might have been insufficient to achieve theta synchronization when the CS and US reach the LA. Because of the relatively long CS-US overlap of 2 s, we additionally cannot rule out that our synchronized stimulation reached the amygdala via the thalamic route first, but then also via cortical routes, leading to cancellation of the first CS-US phase synchronization, hence minimizing the suggested effects.
Another interesting consideration in this context is the role of theta synchronization between the amygdala and hippocampus for pattern separation of emotional images. Examining presurgical epilepsy patients, Zheng et al. (2019) found that bidirectional theta synchronization between both structures was associated with the ability to discriminate an encoded image and a new, but similar, "lure" image in a test phase. Considering that most of our results consist of a better discrimination between the aversive CS 1 and the most similar CSgratings, synchronized CS-US presentation might be beneficial for the amygdalo-hippocampal communication, associated with enhanced discrimination of emotional content. However, further research with additional outcome measures is needed to pinpoint all underlying neurophysiological processes. For example, future studies may attempt to experimentally untangle declarative and nondeclarative memory processes involved in fear conditioning, including their reactivity to synchronized presentation. Measuring amygdala and hippocampus activity via fMRI or in experimental animals may also help to clarify the influence of synchronized presentation on distinct subprocesses of fear conditioning and their associated neural substrates.
One important consideration when interpreting the current results is the fact that group differences were mostly restricted to the encoding phase of fear (acquisition), although we expected that improved fear learning after synchronous presentation prompts greater extinction resistance. Contrary to expectations, we did not find extinction-resistant patterns in the in-phase group during immediate extinction or delayed recall. However, using a reinforcement rate of 100% is known to cause rapid extinction (Haselgrove et al., 2004;Dunsmoor et al., 2007), which could make it harder to detect between-group effects. Moreover, because extinction leads to the formation of a new (i.e., CS-no-US) memory trace that inhibits the original fear memory, future research may use a second CS 1 stimulus that is not extinguished, which will aid in assessing the long-lasting effects of theta-phase synchronization on fear memory recall (i.e., the trace that was causally manipulated by theta-synchronized stimulation). Additionally, animal and human work suggests that prolonged stimulus-free periods during encoding are associated with the more effective production of long-term memory (Philips et al., 2013;Jiang et al., 2020). Increasing the ITIs might help to form more robust memory traces that persist over time. Another possible cause for the absence of longterm effects of stimulation phase is that the externally modulated CS-US stimulation only affects short-term or working memory processes but does not have any effects on actual long-term memory. In two comparable studies focusing on declarative memory Clouter et al. (2017) and their follow-up study by Wang et al. (2018) used a distractor task as a time gap between the encoding and recall of the learned video-tone associations, which only lasted for 30 s, likely too short a time to inform the formation of longterm memory.
Although the current study provided evidence of a causal role of theta-phase synchronization in the context of fear conditioning, there are some limitations to consider. First, our sample size was chosen to detect medium to strong effect sizes, which was based on previous studies (Clouter et al., 2017;Wang et al., 2018). Therefore, we cannot rule out the possibility that we could not detect small effect sizes. This is especially interesting for the SCR data, where the responses are descriptively stronger after in-phase CS-US presentation, but the statistics did not show significant differences. Increasing the statistical power via a greater sample size might help to even detect small effect sizes. Second, we cannot conclude that the stimulation effects observed here are specific to the theta band, because we did not test other frequencies.
However, both animal model studies that examined thetaphase synchronization in the fear network (Seidenbecher et al., 2003;Taub et al., 2018), as well as entrainment studies that focused on working and declarative memory (Alekseichuk et al., 2016;Clouter et al., 2017;Violante et al., 2017) support the current conclusion that synchronization in the theta-frequency band is specifically important for fear memory formation. Third, we did not explicitly ask whether participants were able to detect the synchronous or asynchronous timing between CS and US, and therefore we cannot rule out that out-of-phase or inphase stimulation exerted effects based on phenomenological, perceptual differences. Nevertheless, we used the exact time lags used by both Clouter et al. (2017) and Wang et al. (2018), who did not observe any interference with perceptual judgments or decline in performance. In addition, Clouter et al. (2017) conducted a control experiment with static stimuli, which, on a perceptual basis, represents the best-case scenario for perceptual binding and still found better results after theta-synchronized video-audio presentation. Fourth, although EEG data showed a group-independent tuning toward the CS 1 grating, supporting the conditioned effects on sensory processing, the hypothesized sharpening in the in-phase group was not confirmed. What we found is a general increase in ssVEP-power in the in-phase group during day 2, suggesting a stronger engagement of the sensory cortex. However, based on the present data, we cannot establish to what extent this effect was caused by the thetaphase synchronization on day 1 as opposed to arising as an epiphenomenon (e.g., of the cognitive changes induced by the synchronization). Fifth, theta synchronization may not facilitate learning, but desynchronized stimulation may disrupt ongoing oscillatory processes, resulting in less precise (i.e., more generalized) fear responses (Alekseichuk et al., 2017). To clarify this assumption, future work may include a third group in which participants are presented with nonflickering CS and US stimuli.
Finally, an important limitation is that we were not able to show that participants' auditory and visual EEG responses were synchronized or desynchronized as intended. This was because of the limited number of trials and noisy US data. In contrast to previous work with innocuous stimuli, the US in a fear-conditioning experiment has to be highly aversive. Inherently, this means that the duration of the US (the only period in which auditory and visual stimulation overlap) will produce noisy EEG data with many movements, a startle response, and other artifacts. This is why EEG studies of human fear conditioning (regardless of the US used and the number of trials) typically do not analyze any data during the US presentation window. Because the US is aversive, we did not want to expose participants to more noise than absolutely necessary. Thus, based on previous experiments, we limited the duration of a single US to 2000 ms and the number of US trials to 12 per participant. In combination with inherently noisy EEG data during a US presentation precluded us from localizing and analyzing phase differences in the brain response. This should be addressed with a modified design in future studies. Nevertheless, we did verify the precise nature of the bimodal stimulation on a single-trial and single-subject level (Fig. 2). Importantly, earlier work (Clouter et al., 2017;Wang et al., 2018) with 4 Hz audio-video synchronization using larger trial numbers and nonaversive audio stimulation have shown that precise audio-video stimulation results in synchronized responding in the auditory and visual cortex. Future studies may also consider extending the temporal gap between the acquisition phase and the delayed recall, because previous work has indicated that theta-band synchronization between the amygdala and sensory cortices affects the storage of fear information in remote, but not recent, fear retrieval (Sacco and Sacchetti, 2010;Do-Monte et al., 2015).
In conclusion, the current study represents an initial step toward establishing the causal effects of theta-phase synchronization for fear memory formation. Our results replicate the importance of synchronization for acquiring new cognitive representations, measured via US expectancy ratings, and affective evaluation (subjective valence and arousal ratings). By contrast, the present evidence was mixed at the level of sympathetic (skin conductance) and visuocortical (ssVEPs) engagement. Future studies may further explore the differentiation between different response systems in the context of fear conditioning. Leveraging the potential of rhythmic stimulation and synchronization while taking into account the evolution of fear acquisition across the learning phases will ultimately assist in improving our understanding of the mechanisms behind the acquisition of learned fear responses.