Abstract
Current research strives to investigate cognitive processes under natural conditions. Virtual reality and EEG are promising techniques combining naturalistic settings with close experimental control. However, many questions and technical challenges remain, e.g., are saccade onsets a suitable replacement of fixation onsets as key events in continuous gaze trajectories ( Amme et al., 2024), and consequently, can VR capture differences across different stimulus categories associated with varying saccade durations? To address both questions, we investigate the N170 face effect in humans (14 males, 19 females, zero diverse) using a free-viewing and free-movement immersive VR study that contained houses, various background stimuli, and, notably, static and moving pedestrians to study face perception under naturalistic conditions. Our results show that aligning trials to saccade onsets leads to more well-defined ERPs than fixation onsets, especially for the P100 component, demonstrating that saccade-onset ERPs are a better-suited analysis method for this type of experiment. Furthermore, we observe an evolution of category-based differences, i.e., face versus background saccade-onset ERPs, compatible with previous reports but extending in a large temporal window and including all electrode sites at different points in time. In summary, employing VR, EEG, and eye-tracking to investigate differences across fixation categories provides insights into the relevance of saccadic onsets as event triggers and enhances our understanding of cognitive processes in naturalistic settings.
Significance Statement
With the effort of investigating and understanding cognitive processes under naturalistic conditions, combining virtual reality and EEG can be fruitful in implementing free-viewing studies. The current work combines these technologies to explore key challenges in the context of face perception in an immersive virtual environment. Our results show that saccade-onset ERPs yield more precise measurements when analyzing continuous eye-tracking data than fixation onsets. Furthermore, when processing face compared with background stimuli, distinct temporal patterns encompassing all electrode sites can be observed, offering new insights into face perception. Overall, this work highlights the potential of integrating VR and EEG to advance our understanding of cognitive processes in naturalistic settings.
Introduction
In recent years, a step has been taken to study and understand cognitive processes under natural conditions, capturing them in dynamic, real-world environments (Tromp et al., 2018; Shamay-Tsoory and Mendelsohn, 2019; Rounds et al., 2020; Gert et al., 2022; Stangl et al., 2023). A central aspect of this approach are free-viewing paradigms, where subjects move their eyes and actively choose where to direct their gazes (Gert et al., 2022; Amme et al., 2024), allowing us to study the spontaneous and adaptive nature of real-world visual behavior (Shamay-Tsoory and Mendelsohn, 2019; Stangl et al., 2023). For these studies, virtual reality (VR) is emerging as a powerful tool, combining the high experimental control of laboratory setups with free-viewing experiences of real life (Bohil et al., 2011; Pan and Hamilton, 2018; Bell et al., 2020). Supporting the potential of VR, recent studies demonstrated that VR can provide findings similar to real life (Nolte et al., 2025) and is suitable for analyzing eye-tracking data in naturalistic environments (Clay et al., 2019; Llanes-Jurado et al., 2020; Nolte et al., 2024). Beyond eye movements, integrating VR with electroencephalography (EEG) allows for exploring neural responses to naturalistic visual behavior (Tromp et al., 2018; Rounds et al., 2020; Stangl et al., 2023) and measuring fixation-onset event–related potentials (ERPs; Nolte et al., 2024). This highlights the potential of combining VR and EEG to study cognitive processes under naturalistic, free-viewing conditions.
While VR has proven helpful in investigating vision and neural processes under naturalistic conditions, many questions and technical challenges remain. For instance, although neural processes can be studied with VR–EEG setups (Tromp et al., 2018; Rounds et al., 2020; Nolte et al., 2024), the feasibility of using this combination to examine fixation-onset ERP differences across experimental conditions remains to be explored. Notably, a recent magnetoencephalography (MEG) study investigated fixation- and saccade-onset ERP differences during naturalistic viewing of pictures and found saccade-onset ERPs better suited for studying early visual components (Amme et al., 2024). Building on these findings, a question arises: Do saccade onsets provide the optimal alignment for ERP analysis in free-viewing studies? Furthermore, if saccade onsets are the preferred alignment, can VR capture differences across stimulus categories varying in saccade characteristics?
To address the first question, we can assess the timing of a saccade-onset P100 in an immersive free-viewing study. Employing these saccade-onset ERPs to study a well-established effect, such as the N170 face effect (Rossion and Jacques, 2008; Eimer, 2011), can tackle the second question. The N170 effect, described as a stronger ERP response of faces (Rossion and Jacques, 2008; Eimer, 2011) and bodies (Hietanen and Nummenmaa, 2011) compared with other stimuli, has been replicated in free-viewing picture setups (de Lissa et al., 2019; Auerbach-Asch et al., 2020; Gert et al., 2022) and with virtual humans (Wheatley et al., 2011). Thus, the N170 effect is ideal for investigating whether saccade onsets provide superior temporal alignment and whether the ERPs can reveal differences between experimental conditions in an immersive free-viewing study.
The current experiment was designed as a three-dimensional virtual city populated with avatars. Participants ad libitum explored the city center while we recorded their eye movements and EEG signals. We explored the temporal alignment of fixation- and saccade-onset ERPs in line with previous research (Amme et al., 2024) to determine the more suitable option. Furthermore, to investigate differences between stimulus categories, we split our data into head, body, and background stimuli (Gert et al., 2022). We hypothesized the highest N170 amplitude for heads, followed by bodies, and the smallest for background stimuli. Our results supported a saccade-onset alignment. Noise-level differences across stimulus categories prevented directly testing the N170 effect; instead, a mass univariate analysis revealed differences between all three categories, partially supporting our hypothesis. Overall, these results underline the suitability of combining EEG with VR but highlight new methodological challenges of free-viewing studies.
Materials and Methods
Subjects
Overall, 61 subjects were invited to the lab for the experiment. We could not start the recording for two subjects due to technical difficulties. Out of the remaining 59 subjects, a total of 26 subjects were excluded: 5 quit due to motion sickness, 3 subjects were excluded as they did not follow task instructions and left the central square for >10% of the time, and 18 had to be excluded due to data issues that occurred during or after the recording including 8 subjects that were excluded due to unsynchronized drifts between or within multiple data streams recorded. After applying a conservative approach to data inclusion to maintain high data quality, the final dataset included 33 subjects (19 females, zero diverse; mean age 22.63 ± 2.48 years). All subjects had normal or corrected to normal vision, did not report any neurological disorders, gave written informed consent before participating, and were rewarded with monetary compensation or participation hours. The ethics commission of the University of Osnabrück approved the study.
Experimental setup
A detailed description of the data and experimental design can be found in previous publications (Nolte et al., 2024, 2025). Below, we provide the essential aspects relevant to the current study. The experiment was developed using Unity3D (Unity Technologies, 2021) version 2019.4.21f1, employing the built-in Universal Render Pipeline/Unlit with one central light source. To maintain perceptual consistency, we minimized shaded areas. The virtual environment was displayed at a constant 90 Hz frame rate via the HTC Vive Pro Eye head-mounted display (HMD; 110° field of view, resolution 1,440 × 1,600 pixels per eye, refresh rate 90 Hz; HTC Corporation, 2018a). The advantage of the HTC Vive Pro Eye HMD is the integrated Tobii eye-tracker (0.5–1.1° accuracy, 110° field of view), allowing us to actively record the subject's eye movements. Eye-tracking was facilitated using the SRanipal SDK (v1.1.0.1; HTC Corporation, 2018b), and spatial tracking was provided by the HTC Vive Lighthouse 2.0 system (HTC Corporation, 2018c). Participants moved within the virtual city using HTC Vive controllers 2.0 (HTC Corporation, 2018d; sensory feedback disabled), with the direction of movement determined by the head orientation. The data were recorded on an Alienware Aurora Ryzen computer (Windows 10, 64-bit, build 19044, 6553 MB RAM; Nvidia RTX 3090 GPU, driver version 31.0.15.2698; AMD Ryzen 9 3900X 12-Core CPU). Simultaneously, EEG data using a 10/20 64-channel Ag/AgCl-electrode system with a Waveguard cap (ANT Neuro) and a Refa8 (TMSi) amplifier were recorded using the OpenVIBE acquisition server (v2.2.0; Renard et al., 2010) on a Dell Precision 5820 Tower (Windows 10, 64 bit, build 19044; Nvidia RTX 2080 Ti GPU, driver version 31.0.15.1694; Intel Xeon W-2133 CPU). The EEG data were collected at 1,024 Hz with an average reference and a ground electrode under the left collarbone. Impedances were kept below 10 kΩ. Synchronization between the EEG and VR systems was achieved using the LabStreamingLayer (LSL; Kothe, 2014). Throughout the experiment, participants were seated on a swivel chair to allow full 360° body rotation (Fig. 1A).
Experimental setup. A, Participants were seated on a swivel chair, wearing an EEG cap and VR glasses. The EEG equipment, specifically the amplifier, was stored on the back of the chair. B, The walkable area was confined to the beige floor and comprised the center of the VR scene. C, Different pedestrians were distributed throughout the city square.
The experimental procedure
The entire experiment lasted 2.5 h. At first arrival, participants filled out informed consent sheets and received instructions about the experiment. Following this, participants underwent a 1 min motion sickness test in the same virtual environment but in an unreachable part of the city. They were instructed to move toward a red sphere at the end of a street. Only participants who reported no discomfort or motion sickness after this test proceeded to the main experiment. Following this initial test, the EEG system was set up (see section “EEG preprocessing” for details), which took most of the time. Finally, the main experiment began with the eye-tracker's calibration and subsequent five-point validation.
The experimental duration lasted ∼40 min. The participants had 30 min to ad libitum explore the central city square (Fig. 1B, beige tiles) under the instruction to behave naturally as if waiting for a friend. Every 5 min, the exploration was paused for eye-tracker validation and recalibration and for the participant to take a break if needed. After each pause, participants were returned to their previous location in the city.
The virtual environment
The virtual environment was modeled to resemble a city center, populated with various background objects (e.g., buildings, foliage; Fig. 1B) and 140 pedestrians (Fig. 1C). Pedestrians were sourced from the Adobe Mixamo collection (Mixamo, 2008), displaying varied activity and animation levels ranging from stationary and static to actively moving throughout the city. The pedestrians were designed to represent typical behaviors such as shopping, meeting friends, or relaxing on benches. The pedestrians did not react to the participants other than avoiding movement collisions. The active-moving pedestrians moved along predefined paths. Each object in the virtual environment had a collider, an invisible box or sphere marking the outline of an object, attached to them, with pedestrians having a separate collider for their heads and the rest of their bodies. This allowed us to separately investigate the neural response toward the heads and bodies of the virtual avatars. Participant movements within the virtual environment were programmed to mimic real-life displacements controlled by the participants’ head orientation and matched in speed to the moving pedestrians. The dimensions of the virtual city matched the real world, with one unity unit corresponding to 1 m, allowing us to indicate distances using meters.
Using gaze events to determine EEG trial onsets
We recorded EEG and eye-tracking simultaneously to use the timing of gaze events (fixations or saccades) as trial markers. Due to the absence of external stimulus onsets (or comparable events), we consider the data recorded during fixation (or saccade) and its immediate temporal context as a “trial.” This allowed us, for example, to investigate fixation ERPs (Dimigen, 2020; Gert et al., 2022). To this end, accurate detection of event onsets (fixations and saccades) in the eye-tracking data was essential. Therefore, we employed a velocity-based eye–tracking algorithm for free-viewing and free-exploration in a virtual environment, which corrects translational movement information superimposed on the eye movement data (Nolte et al., 2024; based on Voloh et al., 2020; Dar et al., 2021; Keshava et al., 2023). Applying this algorithm allowed us to differentiate between gazes (eye-stabilization movements, from now on, simply referred to as fixations) and saccades. In detail, the continuous eye-tracking data were segmented into smaller intervals (Dar et al., 2021), and a data-driven threshold was calculated for each of these intervals (Voloh et al., 2020; Keshava et al., 2023). Consecutive samples exceeding this threshold were classified as a saccade, and samples below the threshold were classified as fixations. This process resulted in a sequential identification of saccades and fixations throughout the entire recording. These events could then be used as trial onsets for the EEG analysis. Specifically, we compared ERPs aligned to fixation and to saccade onset, where the latter used the saccade onset preceding each fixation as the trial onset (Amme et al., 2024). This approach allowed us to compare identical trials, differing by a time shift: the time point zero in saccade-onset trials happened several milliseconds before the corresponding time point of fixation-onset trials. Consequently, if events, such as small saccades and the matching subsequent fixation onsets, were not detected, they would be present in and affect both types of ERPs similarly. The trials for both fixation- and saccade-onset ERPs were split into three distinct stimulus categories: heads, corresponding to fixations on the heads of pedestrians, bodies, and background stimuli, encompassing everything that was not a pedestrian, allowing us to investigate the presence of an N170 effect in a free-viewing experiment conducted in VR. For saccade-onset ERPs, we used the stimulus category of the fixation directly succeeding the saccades. A sequence of a participant's walking path and a few selected fixations can be seen in Figure 2A.
Distribution of gaze events. A, An example of different fixations on different objects is plotted on top of the corresponding image of the city center. As a note, the data were slightly adjusted for visualization purposes only. The black line corresponds to the participant's movement path, and the arrows correspond to a few selected fixations during this duration. A fixation on the face of a pedestrian is highlighted in red. The blue arrows correspond to fixations directed at different background objects. B–D, The distribution of background (B), body (C), and head (D) fixations over time, displayed as cumulative distribution functions. Each line corresponds to one participant.
Temporal alignment of EEG and eye-tracking data
Aligning the EEG and eye-tracking data worked; however, visual inspection indicated a small constant linear drift between the overall EEG and eye-tracking (unity) timelines. To correct this drift, we calculated the difference between the first EEG and eye-tracking timestamps and between the last ones, computed the deviation between these differences, and applied it linearly to the eye-tracking timeline (Nolte et al., 2024). Twenty-one subjects displayed a more substantial drift, requiring the start–end deviation up to four times or to adjust the timeline by one (11 ms) or two (22 ms) sample(s), according to the 90 Hz sampling rate, over the course of a 30 min experimental session. Notably, this drift correction was identical for fixation and saccade onsets. The final dataset only included subjects for which we were confident in aligning the two data streams (also see above, Subjects).
EEG preprocessing
Preprocessing was performed in MATLAB (R2024a) using the EEGLab software (Delorme and Makeig, 2004; version 2020.0). EEG data were first loaded into MATLAB, channels were renamed according to the 10-5 BESA standard system, and empty channels were removed. We then imported a separate trigger file containing all relevant fixation or saccade-onset events derived from our eye-tracking data (see above, Using gaze events to determine EEG trial onsets, for a detailed explanation). We applied a low-pass filter at 128 Hz and a high-pass filter at 0.5 Hz (pop_eegfiltnew, using a hamming window; Widmann et al., 2015). Following the recommendation of Klug and Kloosterman (2022), we downsampled the EEG data from 1,024 to 500 Hz to apply a line noise filter from the “zapline plus” plugin (Klug and Kloosterman, 2022, based on de Cheveigné, 2020). We conducted this procedure to automatically remove spectral peaks ∼50 Hz and, separately, 90 Hz. Then, ensuring the data were referenced to the average reference, we applied automated cleaning of noisy channels and data segments using the “clean_rawdata” plugin (Kothe et al., 2019). Our data included active movement and contained more noise than expected in a classic stationary laboratory setup. To this end, we chose to apply a conservative burst criterion of 20, referring to the standard deviation cutoff for the removal of bursts via artifact subspace reconstruction. Removed noisy segments were saved to be used by the unfold toolbox (Ehinger and Dimigen, 2019; for a detailed description, see below, EEG analysis). After channel removal, the clean dataset was rereferenced to the average reference once more. Using the AMICA plugin (version 15, Palmer et al., 2012), we performed an independent component analysis (ICA) on the cleaned data to identify and remove muscle, eye, heart, or remaining line or channel noise. For this step only, we high-pass filtered our data to 2 Hz (Dimigen, 2020). Components labeled with 80% muscle activity or above (mean, 16 components; SD, 7.407) or >90% of other noise (ocular movement, mean, 2.121; SD, 0.331; channel noise, mean, 0.303; SD, 0.529; cardiac artifact, mean, 0.060; SD, 0.242; line noise, mean, 0.030; SD, 0.174), as identified by ICLabel (Pion-Tonachini et al., 2019), were removed automatically. ICA weights were then transferred to the dataset filtered at 0.5 Hz. Finally, we interpolated the missing channels (spherical interpolation). The described procedure was repeated for all subjects before we applied further statistical analysis.
EEG analysis
First, we analyzed and compared ERPs aligned to fixation and saccade onsets, investigating the ERP waveforms for −300 to 500 ms surrounding each event for individual subjects and the averaged ERPs across subjects.
Next, to account for and correct the effect of overlapping events due to our free-viewing paradigm, we used a linear model implemented by the unfold toolbox (Ehinger and Dimigen, 2019) with the current event factor and the levels of background, body, and head. This overlap correction was applied for −500 up to 1,000 ms surrounding saccade onsets (Gert et al., 2022). As we investigate differences in saccade-onset ERPs, we did not model saccade amplitudes due to their high correlation with saccade durations (Harris and Wolpert, 2006; Guadron et al., 2022).
To investigate differences across categories at all electrodes and time points (−500 to 1,000 ms surrounding saccade onset), we conducted a one-factor repeated–measure ANOVA (1 × 3: head, body, and background), with an alpha level set at 0.05. To account for the multiple-comparison problem, we applied a cluster-based permutation test incorporating threshold-free cluster enhancement (TFCE), as implemented via the ept_TFCE MATLAB toolbox (Mensen and Khatami, 2013). We performed 10,000 permutations, randomizing data across the three factors for each permutation, followed by a one-factor repeated–measure ANOVA. The resulting F values were enhanced using TFCE (parameters E = 0.666; H = 1) based on recommendations for F statistics (Mensen and Khatami, 2013). This process generated an empirical null distribution (H0) of TFCE-enhanced F values, and the maximum F value across channels and time points for each permutation was recorded. The observed TFCE-enhanced F values were then compared with the empirical distribution, with statistical significance determined as values exceeding the 95th percentile of the null distribution.
Assessment of face stimuli characteristics
To validate our stimuli, we conducted an online survey with a separate group of 12 participants (eight females, zero diverse; mean age, 31.62 ± 13.30 years). A total of 40 randomly selected facial images were shown: 10 of our avatars and 30 images selected from Gorlini et al. (2023), consisting of 10 each from three categories: unrealistic, semirealistic, and realistic faces. Participants rated each image on three indices using a validated questionnaire developed by Ho and MacDorman (2010): humanness (six items), eeriness (eight items), and attractiveness (four items), with semantic differential items assessed using a five-point Likert scale. For all stimulus categories, we calculated average scores for each participant for each index. Statistical differences were evaluated using a separate Friedman test per index.
Code accessibility
The code described in the paper is freely available online at https://github.com/debnolte/saccade-onset_ERPs_of-face_perception_free-viewing_VR. The code is available as Extended Data.
Data 1
The code used for the creation of this manuscript. Download Data 1, ZIP file.
Results
Gaze events
Before analyzing ERPs, we first compared the different gaze events by examining the median and median absolute deviation (MAD) across various aspects of each category. One clear difference between the three categories—background stimuli, bodies, and heads—was the number of trials. Background stimuli had the most trials, with a median of 3,968 ± 492. Notably, the body and head categories had ∼10 times fewer trials than the background category, with bodies averaging 741 ± 221 trials and heads 151 ± 133 trials. Although there was considerable between-subject variation in the number of trials for both the body (min, 290; max, 1,296) and head (min, 17; max, 913) categories, the number of fixations directed at bodies and heads was not significantly correlated across participants (r = 0.184; p = 0.305). This indicated that it was not simply that some subjects gazed at pedestrians overall more or less; instead, some subjects focused more on heads, while others directed more fixations toward bodies. Despite the differences in trial counts, fixations in all three categories were equally distributed across the entire experimental duration (see cumulative distribution functions, Fig. 2B–D). This balanced distribution was essential, enabling comparison across the three categories without adjusting for differences in experimental duration or participant fatigue. When examining the median and MAD for event durations, fixation durations were similar across all three categories: background stimuli (0.186 s ± 0.013), bodies (0.2 s ± 0.016), and heads (0.178 s ± 0.023). In contrast, saccade durations differed, with background stimuli having the largest saccade durations (0.076 s ± 0.003), followed by bodies (0.066 s ± 0.001) and heads with the shortest saccade durations (0.056 s ± 0.015). A similar pattern emerged for saccade amplitudes, where saccades toward background stimuli had the highest amplitudes (12.119° ± 2.104), followed by bodies (7.212° ± 1.618) and heads (5.35° ± 1.491). These findings highlighted that, despite similarities in fixation durations and their temporal distribution, the different stimulus categories were associated with different saccade patterns and varying numbers of events.
Next, to investigate whether fixation and saccade onsets are subject to a bias inherent in the eye movement classification algorithm, we analyzed the distribution of eye movement velocities at event onsets (Fig. 3). Specifically, we examined the variability across trials. If we assume that event onsets are well defined, we would expect to observe a low variability of velocities around these onsets. In contrast, if the event onsets are not clearly defined, we would anticipate a higher variability. By comparing the variability of velocities at fixation and saccade onsets, we aimed to determine whether our classification algorithm was more precise in defining one type of event over the other. As shown in Figure 3, the velocities at fixation onsets exhibit a relatively low variability within and also across subjects. The velocities at saccade onsets display a similar but time-shifted distribution, with low variability within and across subjects, one sample before the saccade onset followed by a high variability at the saccade onset. Furthermore, the distributions of data for either event do not overlap. These observations provide support that the classification of fixations and saccades is not influenced by an obvious bias.
Variation of velocities around event onsets. The deviation of angular velocities surrounding event onsets is displayed for (A) background, (B) body, and (C) head trials. The x-axis displays samples around event onsets, with the fixation and saccade onsets aligned and marked by a black line. Velocities aligned to fixation onsets are shown in red, while those aligned to saccade onsets are displayed in blue. Each line corresponds to one participant, displaying the standard deviation across all trials.
Comparing fixation- and saccade-onset ERPs
In investigating ERPs, we first examined single subjects to compare the difference between fixation and saccade-onset ERPs, following the approach of Amme et al. (2024). For visualization purposes, we selected three representative subjects (Fig. 4). The first subject (Fig. 4A,B) had 4,620 trials, split into 3,779 background, 747 body, and 194 head trials. The second subject (Fig. 4C,D) had 3,802 background, 609 body, and 151 head trials, while the third subject (Fig. 4E,F) had 3,797 background, 792 body, and 361 head trials. To compare the difference between fixation- and saccade-onset ERPs, we sorted each subject's fixation-onset trials by the duration of the preceding saccade, in line with Amme et al. (2024). The EEG data were aligned and epoched using fixation onsets and then ordered based on saccade durations. Figure 4, A,C, and E, shows the results: fixations onsets are marked by the straight black lines at zero, while saccade onsets are indicated as the preceding curved black lines. If fixation onsets were the optimal alignment points, we would expect the P100 amplitude peaks to form a straight line 100 ms after the fixation onset. However, across all three subjects, the P100 amplitude peaks followed the curved saccade-onset trajectory, suggesting that saccade onsets provide more suitable time points for aligning individual trials in our free-viewing experiment. Interestingly, trials with very short saccades visually differ from longer saccades, potentially due to smaller changes of the visual input, smoothing, or an overlap of saccadic and fixation activity. The notion that saccade-onset ERPs were more temporally precise was further supported by time-shifted but higher P100 amplitudes compared with smaller, more smeared-out fixation–onset ERPs (Fig. 4B,D,F). Notably, the saccade-onset ERP waveform of the head category had higher noise levels than the background and body categories. Overall, the single-subject results supported the idea that saccade-onset ERPs might be a better-suited analysis method than fixation-onset ERPs for this type of experiment.
Fixation- and saccade-onset ERPs for a single subject. A, All trials of one subject at electrode PO7, aligned to fixation onset, are sorted according to saccade duration, with the first trials (top of the y-axis) having the longest saccade durations. Red indicates positive, and blue indicates negative amplitudes. The trials are plotted over time. The black vertical line corresponds to the fixation and, therefore, trial onset. The trials are smoothed for visualization with a Gaussian filter of 2 in the y direction and a Gaussian filter of 5 in the x direction. The black-dotted line represents the saccade onset in each trial. B, Fixation- (dotted lines) and saccade-onset (solid lines) ERPs of the same subject for electrode PO7. The different categories, background, body, and head, are indicated by the different colors. C, D, The same plots for a second and (E, F) for a third subject.
Next, we investigated the difference between fixation- and saccade-onset ERPs across subjects. For this, we first averaged within subjects to account for the high variability of head trials and then averaged across subjects. Fixation-onset ERPs (Fig. 5A, dotted lines) show a broad P100 component across all three stimulus categories, with background stimuli evoking the highest P100 peak and head stimuli the lowest. In comparison, saccade-onset ERPs (Fig. 5A, solid lines) are shifted in time but exhibit higher amplitudes across all stimulus categories and P100 peaks that are more temporally focused. Interestingly, the differences in P100 peaks between the stimulus categories seen in fixation-onset ERPs disappear with saccade-onset alignment. Like the single-subject results, the head category appears to be the noisiest, regardless of the alignment. The topographical analysis across all channels (Fig. 5B,C) supported this observation, highlighting that the preference for saccade-onset ERPs is not restricted to only the single, selected electrode. ERPs aligned to the saccade onset elicit higher amplitudes across occipital electrodes than those aligned to the fixation onset. These findings support the usage of saccade-onset versus fixation-onset alignment in EEG analyses due to their impact on the overall ERP curve. Saccade onsets lead to more well-defined ERPs, especially concerning the P100 component.
Fixation- versus saccade-onset ERPs. A, Across-subject ERPs at channel PO7 for all three categories for the two different onsets: fixation-onset ERPs shown with dotted lines, saccade-onset ERPs with solid ones. B, C, Topoplots across all trials irrespective of the stimulus category (the average of all background, body, and head trials) of (B) fixation-onset and (C) saccade-onset ERPs, shown for three distinct time points: before the P100, around the P100, and at the N170. For visualization purposes, we selected time intervals based on the visual inspection of fixation- and saccade-onset ERPs, maintaining a constant difference between fixation and saccade onsets.
To statistically compare fixation- and saccade-onset ERPs, we followed the approach of Amme et al. (2024), grouping the data into 10 equally sized bins based on saccade duration. Within each bin, ERPs were averaged first within participants and then across participants. We identified the half-maximum point of the P100 slope for each binned ERP waveform. We then calculated the standard deviation of the half-maximum time points across bins, separately for fixation- and saccade-aligned conditions. The standard deviation was 19.82 ms for fixation-onset ERPs and 8.32 ms for saccade-onset ERPs. A paired-sample t test confirmed that this difference was statistically significant (t(9) = 7.828; p < 0.0001), indicating that saccade-onset alignment yields a more temporally stable estimate of the P100 component.
Comparing head, body, and background trials of saccade-onset ERPs
Before investigating the presence of an N170 effect, typically associated with the perception of faces (Rossion and Jacques, 2008; Eimer, 2011), we assessed that aggregating all fixations on heads, irrespective of viewing angles, does not obscure any potential differences. To this end, inspecting the viewing angle distribution across participants (Fig. 6A) revealed that most fixations are directed toward pedestrians’ faces. Specifically, computing the circular mean within participants, followed by the circular mean and the circular standard deviation across participants, resulted in average viewing angles of 13.210 ± 30.868°. Additionally, to exclude viewing distance as a potential influence, we computed the median and MAD across participants (5.209 ± 2.591 m; Fig. 6B), indicating that most heads are viewed at a close range. When we investigated the saccade-onset ERPs of fixations on heads (151 ± 133 trials) compared with those on only faces (72 ± 64 trials), no visible differences other than a slight increase in noise levels emerged, suggesting that aggregating all head fixations is valid. Only inspecting frontal faces viewed at a close distance (<5 m) resulted in fewer trials (39 ± 38 trials) and an ERP with high noise levels, making the ERP curve challenging to interpret. Overall, our results indicate no discernible differences elicited by viewing angles of distances, confirming that aggregating head fixations is appropriate.
Rotation and distance effects on head saccade-onset ERPs. A, The distribution of viewed orientations of pedestrians’ heads is displayed as percentages; the top of the plot corresponds to face fixations, while the bottom refers to fixations on the back of the head. The distribution of the individual participants is displayed by blue bars, and the black line indicates the median distribution across participants. B, The distribution of distances is shown for each participant in blue bars, and the black line represents the median distance across participants. C, The ERP for all head fixations is displayed in blue. Frontal head fixations, corresponding to those within 90° of the front of the head, are shown in green. Frontal fixations on pedestrians at a distance of 5 m or less are depicted in red. Each line and the corresponding confidence intervals represent the average across participants.
To examine category-specific differences of saccade-onset ERPs, we first focused on channels discussed in previous literature (Gert et al., 2022), particularly PO7 (Fig. 7A) and PO8 (Fig. 7B). Comparing the deconvoluted potentials across categories at these channels revealed no visible discrepancies, with only minimal, if any, negative deflection following the initial positive peak, a time-shifted P100. In contrast, other electrodes displayed notable differences between categories. For instance, at electrode P2 (Fig. 7C), the head trace diverged from the body and background traces after the saccade onset until reaching peak amplitudes ∼150 ms. Similarly, at frontal sites such as F7 (Fig. 7D), distinctions between the head compared with both background and body categories were visible around the fixation onset, ∼50–80 ms after the saccade onset. Notably, across all four electrodes, the head category exhibited a higher noise level and more considerable between-subject variability than the other two categories, most likely caused by the lower number of trials. This variability of the head category and the observed topographical distinctions required a statistical approach beyond traditional measures such as peak-to-peak comparisons.
Saccade-onset ERPs for all three categories. A–D, The across-subject average saccade-onset and deconvoluted ERPs were separated for background, body, and head trials. The average ERPs and corresponding confidence intervals indicating the average across participants are shown for four different electrodes: PO7 (A), PO8 (B), P2 (C), and F7 (D). The stimulus categories are indicated by different colors. E–G, Difference plots of two categories at distinct time points after saccade onset: 20, 106, 150, 200, and 240 ms. The topographic plots are shown for the average difference between (E) background minus body, (F) background minus head, and (G) body minus head trials.
Accordingly, we employed a mass univariate analysis to examine category-specific differences across electrodes and time. Specifically, testing for significant differences using TFCE (correction for multiple comparisons with α < 0.05) revealed a significant cluster most compatible with an effect spanning from 18 to 246 ms and encompassing all electrodes. The cluster started at channel F7 at 106 ms (see Fig. 7D and the second difference plot in Fig. 7E–G) shortly after fixation onsets, with a median difference between background and body of −0.174 μV, a median difference between background and head of 0.505 μV, and between body and head of 0.658 μV at this time point and electrode. Of note, while this cluster effect starts quite early, it does not reach negative times. Inspecting temporal differences between categories further highlighted this effect (Fig. 7E–G), revealing a gradual shift in differences between categories, primarily driven by the head category (Fig. 8A). Notably, while the maximum contribution is primarily due to differences between the head and either of the other two categories (Fig. 8A,B), the timing of each maximum contribution differs little for all three contrasts. Specifically, it can be observed that frontal electrodes have their highest contribution on average around the saccade onset, while occipital electrodes have their highest contribution slightly later (Fig. 8C,D). Additional smaller clusters were found. As we wish to avoid a multiple-comparison problem, we do not report this information here but include it as Extended Data Figure 8-1. Overall, these findings support the presence of category-based differences throughout the entire temporal window and all electrode sites, with the highest differences visible around the saccade onset.
Measures of the effect size of saccade-onset ERPs. For each of the three differences (background–body, background–head, body–head), the results of the TFCE are displayed. A, The maximum contribution averaged over participants is displayed for each electrode. The results are given in microvolt. B, The corresponding standard errors of the mean for each maximum distribution are displayed. C, The time points for the maximum contributions of each electrode are shown. The color bar corresponds to the time interval of the significant cluster, lasting from 18 to 246 ms. D, The corresponding standard error of the mean for the time of each maximum contribution is displayed. In addition, smaller clusters can be found in the Extended Data Figure 8-1.
Figure 8-1
Description of additional clusters found using TFCE. Download Figure 8-1, DOCX file.
Assessment of face stimuli characteristics
To assess whether the visual appearance of our virtual avatars may have contributed to reduced face fixations, we conducted a supplementary face rating study. A Friedman test confirmed significant differences between stimulus categories for humanness (χ2(3) = 24.3; p < 0.001), eeriness (χ2(3) = 15.4; p = 0.001), and attractiveness (χ2(3) = 27.5; p < 0.001; see Fig. 9 and Extended Data Figs. 9-1, 9-2 for details). Post hoc tests revealed differences between our avatars and the unrealistic category in terms of attractiveness; they aligned most closely with the unrealistic category in terms of humanness and eeriness. These findings suggest that our avatars do not strongly induce an uncanny valley effect.
Assessing the uncanny valley effect. For each image (10 per category—40 in total), we computed the average score of eeriness and humanness across participants. The four categories (VR avatars, unrealistic, semirealistic, and realistic) are displayed in different colors. The mean of each category is indicated by a star. The statistical results can be found in Extended Data Figures 9-1 and 9-2.
Figure 9-1
Average ratings of the different picture categories on three indices. Download Figure 9-1, DOCX file.
Figure 9-2
Results of the Friedman tests and post-hoc analysis for the three indices. Download Figure 9-2, DOCX file.
Discussion
This study explored methodological considerations for using VR and EEG to investigate ERPs in a free-viewing task. We designed a three-dimensional virtual city populated with virtual pedestrians, allowing participants to navigate and visually explore as they would in a real-world setting. Our findings align with a recent MEG study (Amme et al., 2024) showing that saccade-onset locked ERPs offered more consistent and interpretable results than traditional fixation-onset ERPs, particularly during time windows associated with P100 responses. This finding suggests that saccade onsets serve as more physiologically meaningful triggers for understanding visual processing, marking the initiation of critical neural mechanisms that shape ERP responses (Katz et al., 2020; Amme et al., 2024; Gordon et al., 2024). The underlying processes may be influenced by preparatory activity or predictive planning (Crapse and Sommer, 2008; Wurtz et al., 2011; Katz et al., 2020) or may be guided by attentional shifts (Hoffman and Subramaniam, 1995; Kowler et al., 1995; Deubel and Schneider, 1996; Rolfs et al., 2011), which are linked to enhanced visual discrimination (Deubel and Schneider, 1996). These mechanisms can be associated with the brain preparing for anticipated visual input before initiating the movement (Rao and Ballard, 1999; Friston, 2010), suggesting that early neural activity related to visual selection and saccade preparation might occur before the fixation onset (Henderson, 2017). In contrast, fixation-onset triggers might capture neural activity after the initial processing of visual input. As such, utilizing fixation onsets for self-initiated eye movements could lead to misinterpretations of visual processing timing and dynamics. These results support the critical role of self-initiated eye movements in priming the brain for incoming visual information (Amme et al., 2024; Gordon et al., 2024) and advocate for a methodological shift in free-viewing paradigms toward saccade-onset ERPs to improve our understanding of visual processing.
The selection of saccade onsets versus fixation onsets as triggers for ERPs carries significant methodological implications, particularly concerning the timing and interpretation of early components like the P100. Aligning ERPs to the saccade onset may result in a delayed P100 response compared with classical stimulus-locked ERPs, with the exact timing of this component in relation to saccade events requiring further investigation. Furthermore, the inherent variability in saccade characteristics, such as increased saccade amplitudes over time (Pannasch et al., 2008) or different saccade durations for different stimulus categories, could influence the analysis and interpretation of results. In contrast, later ERP components, dominated by lower-frequency dynamics, may be less affected by alignment but pose their own challenges for isolating saccade- versus fixation-related activity. Recent findings (Amme et al., 2024) show that saccade-onset ERPs contain residual fixation-locked activity, underscoring this complexity. We therefore do not argue against fixation-locked analyses in free-viewing designs but suggest that saccade-locked approaches provide greater temporal precision for early components and should be used alongside, rather than instead of, fixation-based alignment. Investigating whether the methodological shift will allow the comparison of neural responses to target categories associated with varying saccade durations is therefore essential.
To evaluate whether saccade-onset ERPs derived from VR eye movements would allow us to discern differences between various stimulus categories, we aimed to replicate the well-established N170 face effect (Rossion and Jacques, 2008; Eimer, 2011). Our results partially supported our hypothesis, showing a significant cluster of differences among the three experimental categories: heads, bodies, and background stimuli. These differences spanned much of the trial duration, including the P100 and N170 time windows across all electrodes, consistent with previous research (Gert et al., 2022), implying that neural processes associated with face processing (Rossion and Jacques, 2008; Freiwald et al., 2016; Gao et al., 2019) likely contribute to these differences. These results demonstrate the feasibility of using VR for examining category-based differences.
While our results indicate that combining EEG and eye-tracking in VR allows for studying differences in stimulus categories, several aspects must be examined to interpret our results. First, the observed face effect may be influenced by residual noise, particularly given that the cluster peak is located near the cluster's margin. While we aimed to minimize these factors through preprocessing, individual participant or trial variability may still contribute to the effect. Relatedly, we note that our decision not to model saccade amplitudes despite their known influence on the P100 effect may also be relevant in this context (Henderson et al., 2013; Ehinger and Dimigen, 2019; Guadron et al., 2022). Instead, we investigated differences in saccade-onset ERPs, aligning neural responses across categories while accounting for saccade durations, which strongly correlate with saccade amplitudes (Harris and Wolpert, 2006; Guadron et al., 2022). Given this correlation, we believe our approach did sufficiently control for saccade effects. However, further testing should rule out the saccade amplitudes’ contribution to the observed face effects. Here, a study contrasting categories with comparable amounts of trials would be essential.
In the current study, despite expecting interindividual differences in gaze behavior (Guy et al., 2019; Guy and Pertzov, 2023; Rubo et al., 2023), the high between-subject variability and a low number of head fixations for some subjects were surprising, given the changes in movement of the pedestrians stimuli, something known to capture attention (Abrams and Christ, 2003). The differences in trial counts and resulting varying noise levels made it impossible to test our hypothesis regarding the N170 effect directly. However, increasing the trial count might not have led to conclusive results, as visual inspection of our ERP traces revealed no observable negative peaks in occipital–temporal electrodes typically implicated in N170 research (Rossion and Jacques, 2008; Eimer, 2011).
Several factors may explain not observing a visible face-selective component in this study. First, many free-viewing studies show smaller or absent ERP components compared with stimulus-locked studies (Auerbach-Asch et al., 2020; Gert et al., 2022; Ladouce et al., 2022; Spiering and Dimigen, 2025), a difference that may be even more pronounced in VR-based research. As we did not account for previous fixation types, potential adaptation effects cannot be ruled out (Gert et al., 2022; Spiering and Dimigen, 2025). Future studies should therefore contrast saccade-onset ERPs to stimulus-locked ones to disentangle those effects (Amme et al., 2024; Spiering and Dimigen, 2025). Relatedly, overlapping (micro)saccade responses, undetectable with our eye-tracker's sampling rate, could temporally coincide with and mask the true N170 component, leading to its apparent absence (Spiering and Dimigen, 2025). Second, methodological aspects could play a role. While virtual faces have been shown to elicit an N170 effect, it is typically weaker than in two-dimensional studies (Kirasirova et al., 2025). As our stimulus set has not been systematically evaluated in a stimulus-locked experiment, future studies should investigate the presence of an N170 effect in such a setup using our or similar avatars. Beyond this, to our knowledge, this study is the first to examine face perception in a free-viewing, free-movement VR setup, introducing new variables such as attentional shifts or stimulus variability in a virtual environment, which may contribute to observing a nonvisible effect. Additionally, other methodological factors, such as the virtual environment or the density of visual stimuli, may contribute to the observed findings. Third, individual differences and hemispheric variations in the N170 response (De Vos et al., 2012) could introduce variability, rendering an across-subject investigation difficult. Finally, although fixation-locked ERPs to front-facing faces have previously revealed meaningful condition differences (Auerbach-Asch et al., 2020; Gert et al., 2022), we consider it unlikely that not observing a clear N170 effect in our data is due to the use of saccade-onset ERPs. Visual inspection of fixation-locked traces likewise did not reveal a distinct N170 component, and given the predominance of front-facing head stimuli, it seems unlikely that fixation-aligned analyses would have produced a markedly different pattern for rotational effects. Nonetheless, future studies explicitly designed to compare fixation- and saccade-onset ERPs (Amme et al., 2024) are needed to disentangle their respective contributions. Overall, determining the exact cause of an N170 face effect not being observable remains complex and warrants future investigation. These results highlight challenges when adapting lab-based methodologies to free-viewing studies and suggest the need for a methodological shift when moving toward experiments with higher ecological validity.
Several challenges have been associated with recording multimodal mobile data in a VR setting (Stangl et al., 2023). One of them, the alignment of various data streams, could be tackled by employing state-of-the-art amplifiers with a built-in connection or direct support for data synchronization tools such as LSL to improve the temporal accuracy of recordings further (Gramann, 2024). In this sense, effective and reliable synchronization would reduce alignment or drift issues that led to the exclusion of subjects’ data in the present study. Additionally, technological advancements could lead to the creation of even more realistic VR environments. With increasing visual realism and accuracy, undesired side effects of VR setups, such as motion sickness, could be reduced (Keshavarz et al., 2011; Kim et al., 2018; Clay et al., 2019; Rebenitsch and Owen, 2021). This process is likely to develop in the future. In contrast, increasing realism will also increase the possibility of an uncanny valley effect of virtual agents (Schindler et al., 2017), something future studies should carefully examine. In the current study, even though we have no indication that the faces of our avatars elicited an uncanny valley effect, we cannot exclude the possibility that virtual faces were avoided due to their appearance. Nonetheless, we recommend using artificial avatars, as they offer a high degree of interaction, modification (Schindler et al., 2017), or specific positioning (Sánchez Pacheco et al., 2025) compared with static images or video recordings. Further recommendations might be to record in smaller segments, allowing for higher control over the recorded data quality. Importantly, while facing challenges, setups combining VR, tracking, and brain imaging techniques offer great potential to provide novel insights into the interplay of action and perception by enabling dynamic, precise, and synchronous measurements across multiple domains and measurement devices (Baceviciute et al., 2022; Kim et al., 2023; Larsen et al., 2024). With these considerations in mind, we recommend using similar setups to address and re-evaluate research questions in settings with higher ecological validity (Parsons, 2015; Kothgassner and Felnhofer, 2020).
In conclusion, our results demonstrated the potential of VR-based, free-viewing paradigms combined with EEG to capture meaningful neural data, with saccade-onset–locked ERPs offering advantages over the fixation onset for ERP analysis. Employing saccade-onset ERPs, we observed significant differences between stimulus categories, providing a suitable approach for analyzing free-viewing data. However, variability in noise levels across categories and among subjects underscores the complex nature of free-viewing studies. While combining free-viewing in VR with EEG recordings is feasible and yields valuable data, significant methodological challenges remain. With this work, we highlight the need to develop further and refine measurement and analysis approaches to suit the needs of advanced experimental designs. Thereby, we aim to contribute to laying a foundation for future studies to advance the field of free-viewing or exploration paradigms in visual 3D environments further.
Footnotes
The authors declare no competing financial interests.
We thank everyone who contributed to the project, specifically John Madrid-Carvajal, Jakob Litsch, Eva von Butler, Anneke Büürma, Marketa Becevova, Reem Hjoj, and Marie Bensien for their help in collecting the data. Furthermore, we thank Marc Vidal De Palol, Jakob Litsch, and Anna L. Gert for their support in designing the study, Artur Czeszumski for his input while developing the automated EEG preprocessing pipeline, Jessica Simon for investigating different preprocessing parameters, Moritz Lönker for implementing the saccade amplitude calculations, and finally, Tracy Sánchez Pacheco for her input and feedback on the visualizations and TFCE analysis. Funding was provided by the German Federal Ministry of Education and Research for the project SIDDATA (Individualization of Studies through Digital, Data-Driven Assistants), FKZ 16DHB2123, and by the EU Horizon 2020 (MSCDA) research and innovation program under Grant Agreement Number 861166 (INTUITIVE). This research was also supported by the University of Osnabrück in cooperation with the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) in the context of funding the Research Training Group “Situated Cognition” under the Project Number GRK 274877981.
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.















