Elsevier

Biological Psychology

Volume 68, Issue 3, March 2005, Pages 201-213
Biological Psychology

Instrumental and test–retest reliability of saccadic measures

https://doi.org/10.1016/j.biopsycho.2004.06.005Get rights and content

Abstract

Little is known about the reliabilities of the various measures of saccade control that can be derived from pro- and anti-saccade tasks. This paper presents correlational results of 2 different studies comprising altogether 446 psychiatrically and neurologically healthy participants in the range of 6–88 years. Saccades were elicited under different stimulation conditions and during task blocks of 100 or 200 trials. Odd–even and split-half correlations determined for study 1 (N = 327, age 9–88 years) were found to be good to excellent (.60 ≤ r ≤ .97) for most measures and generalisable over the entire life-span. The 19-month test–retest correlations obtained in study 2 (N = 117, age 6–18 years) ranged between .43 and .66 after controlling for age, and suggest moderate stability of individual differences over time during childhood and adolescence. Hence, these parameters are very useful for concurrent validity studies at every age, but less so for predictive validity studies with children and adolescents.

Introduction

The study of eye movements is a popular approach in neurology and psychiatry, especially in schizophrenia research. While there has been extensive research on smooth pursuit eye movement distortions in schizophrenic patients (Levy et al., 1993), considerably less research has been conducted for saccadic eye movements. This may be related to the fact that delayed visually guided or pro-saccades cannot reliably be observed in schizophrenic patients (Clementz et al., 1994, Iacono et al., 1981, Klein et al., 2000), except for certain stimulation conditions (Clementz et al., 1994, Currie et al., 1993, Sereno and Holzman, 1993). Instruction-based or anti-saccades, by contrast, have more frequently been investigated in schizophrenic patients since the late eighties (Fukushima et al., 1988). When, following presentation of the central fixation point, a peripheral cue is presented to the left or right of fixation, subjects are instructed to look at it during the pro-saccade task, but to look straight at its mirror image position during the anti-saccade task. Schizophrenic patients exhibit augmented rates of erroneous pro-saccades to the cue (e.g., Fukushima et al., 1988, Sereno and Holzman, 1995) and may show augmented latencies of correct anti-saccades (e.g., Crawford et al., 1998, Fukushima et al., 1990). This rather consistent result lends support to the “hypofrontality” hypothesis of schizophrenia, because anti-saccades are considered as “executive functions” (Klein and Foerster, 2001), sensitive to frontal damage (Pierrot-Deseilligny et al., 1991, Rivaud et al., 1994, Walker et al., 1998). Furthermore, the healthy first-degree relatives of schizophrenic patients exhibit impaired anti-saccade task performance (e.g., Clementz et al., 1994, Curtis et al., 2001, McDowell and Clementz, 1997), suggesting that the deficit may reflect an enduring, trait-like manifestation of genetic vulnerability for schizophrenia. An overview of basic research and clinical aspects of the anti-saccade task has been given by Everling and Fischer (1998), the neurophysiology of this task is explained in Munoz and Everling (2004).

The aforementioned individual differences research points to the fact that parameters of saccade control are also investigated in order to provide validation of clinical constructs like “schizophrenia” or “hypofrontality” (by specifying correlates of schizophrenic disorders or reduced frontal metabolism). Here, two types of reliabilities (instrumental, test–retest) and two types of criterion validities (concurrent, predictive) must be distinguished, with the former being required in order to meaningfully interpret the latter.

Differential psychology is concerned with differences between individuals or groups of individuals (Asendorpf, 1999). Accordingly, a measurement is said to be reliable to the extent that its repeated application in a sample of subjects yields consistent differences between individuals, or consistency across measurements in the individuals’ scores relative to the group's average. Two kinds of influences may affect reliability measurements: state fluctuations (e.g., from alterations in motivation or alertness) and trait changes (e.g., due to development or because of a chronic disease) that differ between individuals. A measurement's instrumental reliability is assessed when a measurement series is split into trials with odd or even numbers (“odd–even” reliability). In this case, neither state fluctuations nor trait changes affect the reliability estimate. If, on the other hand, the measurement series is split into its first and second half (“split-half” reliability), individual differences in state fluctuations may reduce the reliability estimate when compared to the first procedure. Finally, if the two to-be-correlated measurement series are separated by a considerable period of time (months or years), individual differences in both state fluctuations and trait changes of the measured feature between the first and the second measurement may adversely affect the stability of individual differences. This last-mentioned reliability estimate is the test–retest reliability, which is closely related to the trait concept of differential psychology. To the extent that for similar situations individual differences are stable over time, the respective feature is said to be a trait (Asendorpf, 1999). While the last-mentioned reliability estimate is also called “stability”, the first and the second reliability estimates will be referred to as “instrumental reliabilities”, because both are devoid of trait fluctuations.

The term “criterion validation” refers to the question whether individual differences in a given feature covary with individual differences in other features that reflect the same or related construct(s) (convergent validity; Campbell and Fiske, 1959). The establishment of criterion validities provides an answer to one of the basic questions of differential psychology, namely which features “go together” within a population, either simultaneously (concurrent validity; e.g., schizophrenics who exhibit a low performance in the Wisconsin Card Sorting Test also show low frontal metabolism; Weinberger et al., 1980) or subsequently (predictive validity; e.g., a high proportion of anti-saccadic direction errors is associated with an augmented risk of developing psychopathological symptoms at a later point in time). Generally, the degree of reliability of a measurement limits the degree of its criterion validities. Accordingly, a high concurrent validity necessitates a high instrumental reliability, while a high predictive validity necessitates both high instrumental reliability and stability.

So far, only few saccade studies have addressed these reliability issues. For pro-saccadic reaction times, within-session and 2-year retest correlations of .54–.58 (Iacono and Lykken, 1981; 46 healthy subjects), 1–2 week retest correlations of about .61–.69 (Roy-Byrne et al., 1995; 8 healthy subjects), 4-week retest correlations of .78 (Klein and Berg, 2001; 20 healthy subjects), and 2-month retest correlations of .79 (Ettinger et al., 2003; 21 healthy subjects) have been reported. In addition, Currie et al. (1993) reported retest correlations of up to rtt = .94 for the percentage of express saccades, i.e. visually guided saccades with latencies of about 80–130 ms (Fischer and Weber, 1993) in a sample of 15 participants. Concerning anti-saccades in healthy subjects, Roy-Byrne et al. (1995) reported 1–2 weeks rtt of .78–.80 (saccadic reaction times) and −.30 to .22 (percentage of erroneous pro-saccades), and Klein and Berg (2001) found individual differences in anti-saccadic reaction times to be stable by rtt = .88 over a 4-week period. Similar results were found in a study by Ettinger et al. (2003), who found a 2-month test–retest correlation of .69 for anti-saccadic reaction times and .89 for the proportion of direction errors. Thaker et al. (1989) reported 1-year test–retest correlations of more than .75 for anti-saccadic direction errors and reaction times in 12 schizophrenic patients. Finally, Calkins et al. (in press) determined an rtt of .73 for anti-saccadic direction errors and reaction times in a small sample of 15 schizophrenic patients and their relatives retested after 1.8 years. Among the shortcomings of some of the cited studies, which limits the generalisability of their results, are their sample size and composition as well as the length of the retest interval. Except for the study of Iacono and Lykken (1981) samples of 21 or even fewer participants were tested. While Klein and Berg (2001) and Ettinger et al. (2003) tested age-homogenous samples, which may yield somewhat “conservative” retest correlations, Calkins et al. (in press) and Thaker et al. (1990) tested mixed samples of schizophrenic patients and first-degree relatives or healthy controls. Differences between the subgroups in the dependent measures may thus have inflated variance within the composite group and, hence, covariances and correlations. Furthermore, restrictions of range, which limit the amount of possible correlations, were found for the proportion of direction errors in other studies (Klein and Berg, 2001, Roy-Byrne et al., 1995). Finally, except for the study of Iacono and Lykken (1981), Thaker et al. (1989), and Calkins et al. (in press) only short retest intervals spanning 2 months at most were realized.

Based on these considerations, the studies presented here aimed at systematically investigating the test–retest and instrumental reliabilities of nine variables that can be derived from the pro- and anti-saccade tasks in large samples of heterogeneous participants, using a long test–retest interval for the stability estimates.

Section snippets

Methods

Data from two studies comprising altogether 446 participants were re-analyzed for our reliability estimates. Sample compositions, procedural details, and hardware are described in greater detail in Fischer et al. (1997) and Klein (2001) for samples 1 and 2, respectively.

Results

The various reliability estimates derived from two different studies are documented in Table 1. Most saccade parameters exhibited good to excellent odd–even (.71 ≤ r ≤ .97) and split-half (.67 ≤ r ≤ .92) coefficients. The best coefficients were obtained for saccadic reaction times and their intraindividual variability, as well as the proportions of express saccades and direction errors. Lower, but still good, coefficients were found for those parameters that are related to processes following

Discussion

The correlation analyses presented here revealed four main results. First, instrumental reliability estimates were substantially higher than stability coefficients. Second, the stability of individual differences, as assessed by z-score differences, was unrelated to the participants’ ages. Third, except for parameter PES, all coefficients became smaller when the cross-sectional age effects were statistically controlled. Fourth, three groups of parameters could be distinguished, with the first

Conclusions

All reliability estimates presented here are based on large samples and are, by and large, independent of age. While the instrumental reliability estimates exhibit a high degree of generalizability over the entire life-span, the test–retest results apply for children and adolescents only. That most of the saccadic parameters examined here exhibited good to excellent instrumental reliabilities, makes these parameters very suitable for convergent validity studies. This will be important, for

Acknowledgement

Research was supported by the Deutsche Forschungsgemeinschaft (DFG; Kl 985/6-1).

Reference (45)

  • B.E. Snitz et al.

    Neuropsychological and oculomotor correlates of spatial working memory performance in schizophrenia patients and controls

    Schizophrenia Research

    (1999)
  • R. Walker et al.

    Saccadic eye movements and working memory deficits following damage to human prefrontal cortex

    Neuropsychologia

    (1998)
  • J. Asendorpf

    Psychologie der Persönlichkeit. Grundlagen

    (1999)
  • Calkins, M.E., Iacono, W.G., Curtis, C.E., in press. Smooth pursuit and antisaccade performance evidence trait...
  • D.T. Campbell et al.

    Convergent and discriminant validation by the multitrait–multimethod matrix

    Psychological Bulletin

    (1959)
  • B.A. Clementz et al.

    Saccadic system functioning among schizophrenia patients and their first-degree biological relatives

    Journal of Abnormal Psychology

    (1994)
  • T.J. Crawford et al.

    Saccadic eye movements in families multiply affected with schizophrenia: the maudsley family study

    American Journal of Psychiatry

    (1998)
  • L.J. Cronbach

    The two disciplines of scientific psychology

    American Psychologist

    (1957)
  • J. Currie et al.

    Selective impairment of express saccade generation in patients with schizophrenia

    Experimental Brain Research

    (1993)
  • C.E. Curtis et al.

    Saccadic disinhibition in acute and remitted schizophrenia patients and their first-degree biological relatives

    American Journal of Psychiatry

    (2001)
  • U. Ettinger et al.

    Reliability of smooth pursuit, fixation, and saccadic eye movements

    Psychophysiology

    (2003)
  • B. Fischer et al.

    Characteristics of “anti” saccades in man

    Experimental Brain Research

    (1992)
  • Cited by (44)

    • Neurobehavioral maturation of motor response inhibition in adolescence – A narrative review

      2022, Neuroscience and Biobehavioral Reviews
      Citation Excerpt :

      Yet, the pattern of improvement in SST-based executive reaction times was less consistent than in the GNG as several well-designed studies reported non-significant age effects on Go RTs (e.g., Cohen, 2010; Rubia et al., 2007, 2013). As in many SST studies, when reported, AST-based PS RTs decrease through adolescence and into early adulthood either linearly (e.g., Fischer et al., 1997; Ordaz et al., 2013) or curvilinearly (i.e., plateauing; Klein and Fischer, 2005). One reviewed study reported a non-significant correlation between age and PS latency, but the sample size was very small (n = 15; West and Lippé, 2016).

    • Cognitive control training for emotion-related impulsivity

      2018, Behaviour Research and Therapy
      Citation Excerpt :

      Regarding far transfer of response inhibition, there was clear evidence of improved inhibition on the antisaccade task. Although one might wonder if this reflected practice effects, performance on the antisaccade task has been shown to be relatively stable over time (Klein & Fischer, 2005). In sum, findings regarding response inhibition indicated that despite a lack of gains on the nonadaptive Go/No-Go task, highly impulsive individuals improved their ability to inhibit responses on both an adaptive training task and on a conceptually similar transfer task.

    • Individual differences in human eye movements: An oculomotor signature?

      2017, Vision Research
      Citation Excerpt :

      Eye movements are the most common of all human actions: every second of our waking life we make approximately three of the rapid, stereotyped movements that are saccades (Carpenter, 2004). It is known, however, that there are reliable individual differences in the characteristics of both saccades and smooth-pursuit eye movements (Ettinger et al., 2003; Katsanis, Taylor, Iacono, & Hammer, 2000; Klein & Fischer, 2005; Meyhofer, Bertsch, Esser, & Ettinger, 2016; Smyrnis, 2008; Vikesdal & Langaas, 2016; Wostmann et al., 2013); and it has sometimes been suggested that oculomotor measures are specific enough to be used for biometric identification (e.g. Kasprowski & Ober, 2004; Komogortsev, Karpov, & Holland, 2016; Komogortsev, Karpov, Price, & Aragon, 2012; Poynter, Barber, Inman, & Wiggins, 2013; Zhang, Laurikkala, & Juhola, 2015). We have obtained a comprehensive set of oculomotor measures for over 1000 healthy young adults and have established the reliabilities of the measures by re-testing 10% of the participants after a median interval of 18.8 days.

    • Accuracy and re-test reliability of mobile eye-tracking in Parkinson's disease and older adults

      2016, Medical Engineering and Physics
      Citation Excerpt :

      As such, there is sparse information regarding the psychometric properties of mobile eye-tracking devices in people with PD and OA. Previous studies [4–7] have evaluated reliability of static eye-tracking devices in various populations, measuring saccades for specific phenomena using highly specialised protocols. For example, Farzin et al. [7] reported that their static eye-tracker (Tobii, T120, 300 Hz) was reliable in reporting number and duration of fixations, and pupillary response during a seated picture-viewing protocol in Fragile-X syndrome patients and controls.

    View all citing articles on Scopus
    View full text