Table 1

Overview of definitions, applicability and calculations of repeatability, reliability, and reproducibility

ParameterDefinitionWhen to useHow to calculate
RepeatabilityVariation in repeat measurements made on the same subject under identical conditions: Same method,
same observer, measurements are taken in quick succession (Bartlett and Frost, 2008).
Variation is ascribed to errors in the measurement process (Bartlett and Frost, 2008; Bland and Altman, 1999).
The CR can be used to study measurement precision (Bartlett and Frost, 2008). It is used when decisions are made on an individual basis.
CR indicates how much two or more measurements made on the same subject will vary on 95% of occasions (Bartlett and Frost, 2008). Thus, the higher the measurement error, the higher the CR.
CR = 1.96*2σw ,
where σw is within-subject variance (Bartlett and Frost, 2008).
ReliabilityRatio of the subject variation compared with the total variation: subject variation and measurement error (variation in the measurement process; Bartlett and Frost, 2008).
A reliability of 1 indicates no measurement error and 0 indicates that all variation stems from measurement error (Koo and Li, 2016).
The ICC can be used to study the amount of measurement error in measurements made on the same subjects by different observers (interobserver reliability) or by a single observer (intraobserver reliability; Bartlett and Frost, 2008).
ICC measures how well subjects maintain their position within the group with repeated measurements (Streiner and Norman, 2008). This is important for sample size and power calculations in interventional studies (Fleiss, 1999; Brown et al., 2017) and provides some indication on a discriminative value of a test (de Vet et al., 2006).
As reliability ICC is a dimensionless ratio, ICC can be used to compare methods, whose measurements are on different scales (Koo and Li, 2016).
(SD of subjects’ true values)2/[(SD of subjects’ true values)2 + (SD measurement error)2] (Bartlett and Frost, 2008).
ReproducibilityVariation in measurements made on the same subject under changing conditions:
Different methods or instruments, different observers, measurements being made at different timepoints, within which the “true” underlying variable could undergo non-negligible changes (Bartlett and Frost, 2008).
Reproducibility can be studied when measurements are made by different observers, with different methods or instruments, or at different timepoints (Bartlett and Frost, 2008).
Different statistical analysis methods have different assumptions. Choice of statistical analysis depends on study design, measurement scale, etc.
rmANOVA was used to study difference in timepoints.
Paired t test (incl. correction for multiple comparison) was used to study interobserver differences.