Test-Retest Reliability of Short-Interval Intracortical Inhibition Assessed by Threshold-Tracking and Automated Conventional Techniques

Abstract Two novel short-interval intracortical inhibition (SICI) protocols, assessing SICI across a range of interstimulus intervals (ISIs) using either parallel threshold-tracking transcranial magnetic stimulation (TT-TMS) or automated conventional TMS (cTMS), were recently introduced. However, the test-retest reliability of these protocols has not been investigated, which is important if they are to be introduced in the clinic. SICI was recorded in 18 healthy subjects using TT-TMS (T-SICI) and cTMS (A-SICI). All subjects were examined at four identical sessions, i.e., morning and afternoon sessions on 2 d, 5–7 d apart. Both SICI protocols were performed twice at each session by the same observer. In one of the sessions, another observer performed additional examinations. Neither intraobserver nor interobserver measures of SICI differed significantly between examinations, except for T-SICI at ISI 3 ms (p = 0.00035) and A-SICI at ISI 2.5 ms (p = 0.0103). Intraday reliability was poor-to-good for A-SICI and moderate-to-good for T-SICI. Interday and interobserver reliabilities of T-SICI and A-SICI were moderate-to-good. Although between-subject variation constituted most of the total variation, SICI repeatability in an individual subject was poor. The two SICI protocols showed no considerable systematic bias across sessions and had a comparable test-retest reliability profile. Findings from the present study suggest that both SICI protocols may be reliably and reproducibly employed in research studies, but should be used with caution for individual decision-making in clinical settings. Studies exploring reliability in patient cohorts are warranted to investigate the clinical utility of these two SICI protocols.


Introduction
Conventional transcranial magnetic stimulation (cTMS) uses magnetic stimulation to measure cortical excitability by applying constant stimulus intensities. If stimulation is applied over the motor cortex, a motor evoked potential (MEP) can be recorded (Kujirai et al., 1993). Cortical excitability can then be measured as changes in averaged MEP (Kujirai et al., 1993).
Threshold-tracking TMS (TT-TMS) is an unconventional TMS method, which also measures cortical excitability (Fisher et al., 2002). Contrary to cTMS, the MEP amplitude is predefined and kept constant by adjusting the stimulus intensities, thus enabling continuous tracking of the motor thresholds and allowing for fluctuations in cortical excitability (Fisher et al., 2002). The method was introduced to counteract restrictions in cortical excitability fluctuations and MEP variability (Fisher et al., 2002;Groppa et al., 2012).
Short-interval intracortical inhibition (SICI) measures cortical inhibition and is a TMS protocol in which two stimuli are delivered with an interstimulus interval (ISI) of 1-7 ms (Kujirai et al., 1993). The first subthreshold stimulus (conditioning stimulus, CS) is followed by a second suprathreshold stimulus (test stimulus; Kujirai et al., 1993), which in TT-TMS is continuously adjusted based on the recorded MEP amplitude (Fisher et al., 2002). In conventional amplitude SICI (A-SICI), cortical inhibition is measured as the relative change in MEP amplitude (Kujirai et al., 1993). In T-SICI (SICI measured by TT-TMS), cortical inhibition is measured as the relative change in stimulus intensity (Fisher et al., 2002).
The precise physiological mechanisms behind SICI are unknown, but SICI at an ISI of 1 ms (SICI 1ms ) is thought to reflect neuronal refractoriness or extrasynaptic GABA-A signaling (Fisher et al., 2002;Roshan et al., 2003;Stagg et al., 2011), whereas SICI at an ISI of 2.5 ms (SICI 2.5ms ), and an ISI of 3 ms (SICI 3ms ) are thought to reflect synaptic GABA-Aergic inhibition (Ziemann et al., 1996;. Earlier studies have shown that T-SICI may be used for diagnosing ALS and has been suggested as a biomarker Kiernan, 2006, 2008;Menon et al., 2015;Vucic and Rutkove, 2018). These studies applied a protocol of serial tracking that estimated T-SICI at successively increasing ISIs Matamala et al., 2018). A slightly different tracking strategy was applied in a recent study, in which comparability and reliability of T-SICI and automated A-SICI were explored at ISI 2.5 ms at four different CS intensities in healthy subjects (Samusyte et al., 2018). The CS intensities were tracked in parallel in a pseudorandomized order, a commonly used approach in cTMS (Samusyte et al., 2018). A good correlation of SICI obtained by the two techniques was found across the whole range of CS (Samusyte et al., 2018).
More recently, a good correlation of automated A-SICI and T-SICI at a single CS intensity and ISIs of 1-7 ms with a parallel tracking strategy and important limitations of serial tracking were demonstrated in healthy subjects (Tankisi et al., 2021a). It was proposed that because of a smaller between-subject variability among healthy individuals, A-SICI may be better at demonstrating a pathologic loss of inhibition, which has been observed as an early feature of motor neuron disease . A recent study has demonstrated that both techniques performed well at discriminating ALS patients from patient controls with T-SICI being most reduced before the upper motor neuron signs become apparent (Tankisi et al., 2021b). However, none of the studies assessed the test-retest reliability of the methods, which is important if an investigation is to be used for diagnostic purposes or interventional studies. A trend for improved reproducibility of T-SICI 2.5ms has been reported (Samusyte et al., 2018), but it remains unclear whether this applies to other SICI ISIs. Further studies are needed to explore the utility of T-SICI and A-SICI in diagnostic decision-making in ALS, but the comparison of these two methods' reproducibility and reliability in healthy subjects should be investigated before implementation in clinics. Moreover, T-SICI and A-SICI may potentially investigate different motor neuron pools (Samusyte et al., 2018), and may therefore supplement each other in diagnostics and intervention studies. Therefore, the present study aimed to explore the repeatability and observer reproducibility of the two novel automated conventional and thresholdtracking SICI protocols across ISIs 1-7 ms in healthy subjects. The study also aimed to assess intraday and interday reliability in relation to diurnal variations, which may affect the reliability of SICI measurements (Matamala et al., 2018). No previous study has examined these parameters of T-SICI parallel and automated A-SICI on such an extensive scale.

Subjects
The study was conducted at Department of Clinical Neurophysiology, Aarhus University Hospital, Denmark, from February 2018 to August 2018. Inclusion criteria were: age above 18 years; absence of neurologic or psychiatric disorders. Exclusion criteria were: pregnancy; use of medication known to affect the nervous system; metal implants. All participating subjects were screened using a modified TMS safety questionnaire (Rossi et al., 2011) and a gross neurologic examination.
Twenty healthy subjects (nine females) were recruited. One subject (S5) was excluded because of undetectable MEP, another subject (S7) because of inability to relax the hand muscles. Eighteen subjects (eight females, mean age: 56.9 years, SD: 12.3; range: 41-77 years) were recruited. One subject (S4, male, age: 46 years) completed all nine TT-TMS examinations, but only seven cTMS examinations by observer 1 because of time restraint. The subject was excluded from the analysis of reliability of cTMS data.
Written informed consent was obtained from all subjects in accordance with the Declaration of Helsinki II. The project was approved by The Central Denmark Region Committees on Health Research Ethics (case: 1-10-72-201-17) and the Danish Data Protection Agency.

Study design
To investigate intraobserver test-retest reliability, each subject was investigated by the same observer (observer 1) on two separate days, 5-7 d apart. On the first examination day, two sessions were conducted: a morning session (10.00-11.30 A.M.) and an afternoon session (1-2:30 P.M.). On the second examination day, a morning session (10-11:30 A.M.) and an afternoon session (1-2:30 P.M.) were conducted again (Fig. 1). Each session consisted of four TMS examinations: two A-SICI examinations and two T-SICI examinations. On each examination day, each subject underwent four A-SICI examinations and four T-SICI examinations. Thus, eight A-SICI examinations and eight T-SICI examinations were conducted on each subject in total by observer 1 (Fig. 1).
All examinations were executed at the same time of day for each subject. Each examination lasted on average 15 min, giving approximately 1 h in total for each session.
To investigate interobserver reliability and reproducibility, a second observer (observer 2) performed an additional TT-TMS and cTMS examination on each subject in continuation of one of the recording sessions. Interobserver (observer 2) examinations were measured either at the morning session of day 1 (n = 3), the afternoon session of day 1 (n = 7), the morning session of day 2 (n = 4) or the afternoon session of day 2 (n = 4) because of practical limitations.
Subjects were instructed to restrain from coffee (12 h), alcohol (24 h), and exhaustive exercise (48 h) before TMS examination.

Experimental setup
The subjects were comfortably seated during the examinations, with their right arm resting in a relaxed position on a pillow placed on their lap. The subjects were instructed to stay relaxed but vigilant. MEP responses were recorded from the relaxed right first dorsal interosseous (FDI) muscle of the right hand using Ag/AgCl ECG electrodes (Ambu WhiteSensor 40713). The active electrode was placed on the belly of the FDI muscle, the reference electrode on the second metacarpophalangeal joint. The ground electrode was placed on the dorsum of the hand. Skin temperature of the subject's right hand was measured before and after each examination, and a constant temperature was ensured with a heating lamp throughout the examination.

TMS
Cortical function was assessed using a 70-mm figureof-eight coil (D70 Remote Coil, reference number 3190-00) connected to two Magstim 200 2 stimulators in Bistim mode (Magstim Co Ltd). Posterior-anterior current flow in the subject's motor cortex was induced by placing the coil over the left hemisphere with the coil handle angled 45°postero-laterally to the midsagittal line.
The hand motor hotspot was located by moving the coil in anterior-posterior and medial-lateral directions to induce a MEP of 0.2 mV using a minimal stimulus intensity. Once located, coil-positioning over the hand motor hotspot was kept constant by drawing the coil outline onto a swimming cap worn by the subject. This procedure was Figure 1. Study design. Each subject was examined on four sessions on two separate days, 5-7 d apart: two sessions in the mornings and two sessions in the afternoons. A-SICI and T-SICI protocols were repeated twice in each session. These examinations were performed by the same observer (observer 1). Two TMS examinations (one A-SICI examination and one T-SICI examination) were additionally performed by a second observer (observer 2) on each subject. The examinations were done in continuation of one of the sessions. In total, nine A-SICI and nine T-SICI examinations were performed on each subject. repeated before each examination. A spring balancer (SiraFlex, type B) helped steadying the coil. Stimulation frequency was 0.2 Hz.
Automated stimulator control, stimulus delivery, data acquisition and calculation of TMS parameters were managed by the computer software QTRACW (Institute of Neurology, University College London, United Kingdom, distributed by Digitimer Ltd.) using bespoke recording protocols (QTMS-2017).

Resting motor threshold (RMT)
RMT estimates cortical excitability by measuring the lowest stimulation intensity required to elicit a predefined target MEP. The RMT is estimated before initiation of the SICI protocol and is used as a baseline for calculating CS intensities in SICI. In TT-TMS, the RMT is continuously estimated, "tracked," during the paired-pulse protocol, as opposed to automated cTMS.
After localization of the motor hotspot, but before SICI protocol initiation, the lowest stimulus intensities (measured in percentage of maximum stimulator output; % MSO) required for eliciting a peak-to-peak target MEP of 0.2 mV (RMT 0.2mV ) and a peak-to-peak target MEP of 1 mV (TS 1mV ) were estimated by threshold-tracking (Fig. 2). The size of the MEPs was analyzed online by QTRACW, which then automatically adjusted the Magstim 200 2 stimulator output. A proportional tracking mode with a maximum step of 2% MSO was used: the stimulation intensities were adjusted depending on the percentage error of a single MEP (decreased, increased or unchanged if the MEP was above, below, or on target, respectively). A 20% tracking error (on a logarithmic scale) was allowed, and the threshold estimate was considered valid if the MEP hit or bracketed the target line. The RMT 0.2mV and TS 1mV tracking was deemed stable when six valid estimates had been obtained. RMT 0.2mV and TS 1mV were automatically calculated by applying a weighted logarithmic regression (Fig. 2). This approach is based on work by Fisher et al. (2002), who found that the stimulus-response curve between the stimulus intensity for single pulse TMS and the MEP amplitude was approximately exponential over a 100-fold range of responses. When plotted on a logarithmic scale, the relationship between stimulus intensity and MEP amplitude was approximately linear in the interval from 0.02 to 2 mV. Target MEP was set at 0.2 mV, the midpoint in this log-linear interval (Fisher et al., 2002). Furthermore, the MEP distributions are often skewed (Nielsen, 1996), and the variability in MEP size, which could represent excitability fluctuations in cortical pyramidal cells and spinal motor neurons, may differ across stimulus intensities (Kiers et al., 1993). Thus, logarithmic transformation of amplitude data has been proposed to ensure normal distributions and to at least somewhat reduce the variability spread across the stimulus intensities (Nielsen, 1996;Goetz et al., 2014).  deduced that by setting a target MEP located in the midpoint of the log-linear interval, the variability in MEP amplitude translates to smaller changes in stimulus intensity, potentially overcoming these limitations.
The tracked RMT 0.2mV estimate was then used to set the CS for both T-SICI and A-SICI protocols to ensure a comparable intensity between the techniques. The SICI protocol was then initiated (Fig. 3). Estimation of RMT 0.2mV and TS 1mV . Scatter plots of recorded MEP against stimulus intensity (in % MSO). RMT 0.2mV is the test stimulus (in % MSO) required to evoke a peak-to-peak MEP of 0.2 mV, whereas TS 1mV is the test stimulus (in % MSO) required to evoke a peak-to-peak MEP of 1 mV. The figure conceptually shows how RMT 0.2mV and TS 1mV are estimated. The MEPs were recorded during the RMT 0.2mV (orange triangle) and TS 1mV (black circle) estimation, before initiation of SICI protocols. RMT 0.2mV and TS 1mV estimation (solid black vertical lines) were calculated from the intersect of target MEP (dotted horizontal lines) and weighted semi-logarithmic regressions (orange and gray lines), shown by arrows in the figure. Low (,0.02 mV) or high (.2 mV) MEP responses (open circles and open triangles) were excluded from the calculation.

T-SICI protocol
The T-SICI protocol was initiated after RMT 0.2mV estimation and was executed by QTRACW. The conditioning and test stimulus pairs were given at nine different conditions, with each condition comprising of a different ISI. The ISIs were 1, 1.5, 2, 2.5, 3, 3.5, 4, 5, and 7 ms ( Fig. 3D-F). For adequate monitoring of the control condition, RMT 0.2mV was continuously tracked on three independent channels using 1% MSO steps. The control and paired stimuli were delivered in a pseudorandomized order (Fig.  3D). The protocol stopped after each of the nine ISI conditions had been delivered ten times (giving 90 paired stimuli), and RMT 0.2mV estimation had been conducted 30 times (giving 30 single stimuli).
The CS and the test stimulus were adjusted continuously. The CS intensity was set to 70% of RMT 0.2mV . Stimulus adjustments were based on changes in the average RMT 0.2mV estimate obtained from three control channels after each pseudorandomization cycle. The test stimulus intensity was initially set to 120% of RMT 0.2mV . It was then adjusted continuously using a step size of 1% MSO (doubled if two in a row stimuli "missed" the target), to maintain a target MEP of 0.2 mV (Fisher et al., 2002). This was done independently for each SICI ISI condition, resulting in nine conditioned thresholds. To calculate the estimates of control and conditioned thresholds, the recorded log MEP response was plotted against the corresponding test stimulus intensity (Fisher et al., 2002). The and T-SICI recorded with TT-TMS (D-F) for subject 1 of the first examination in morning session 1 from QTRACW. Each color depicts either a different ISI condition (ISI = 1-7 ms) or a control stimulus (ISI = 0 ms). After location of the motor hotspot, RMT 0.2mV and TS 1mV were estimated, and the A-SICI protocol was initiated (A-C). Likewise, after RMT 0.2mV estimation, the T-SICI protocol was initiated (D-F). For T-SICI, the nine different conditions and control stimuli (D) were pseudorandomized and tracked in parallel (E). Paired stimulus intensities were adjusted continuously to obtain a target MEP of 0.2 mV (F). Likewise, for A-SICI nine different conditions and control stimuli (A) were pseudorandomized (B). However, the paired stimulus intensities were kept constant for A-SICI (C).
test stimulus corresponding to the target MEP of 0.2 mV was then estimated by regression analysis (Fisher et al., 2002). The regression analysis was weighted to account for the position of the data points, so that data points lying closer to the estimated regression line contributed more. Data points lying outside the linear part [0.02-2 mV] of the semi-logarithmic regression were excluded (Fisher et al., 2002). T-SICIs were calculated as T-SICI = (conditioned threshold -RMT 0.2mV )/RMT 0.2mV Â 100% (Fisher et al., 2002). Negative values reflect cortical facilitation, and positive values reflect cortical inhibition. Variables of interest were RMT 0.2mV , T-SICI 1ms , T-SICI 2.5ms , T-SICI 3ms , T-SICI averaged at ISI from 1 to 3.5 ms (T-SICI 1-3.5ms ) and T-SICI averaged at ISI from 1 to 7 ms (T-SICI 1-7ms ). RMT 0.2mV was chosen as it reflects cortical excitability threshold. T-SICI 1-3.5ms and T-SICI 1-7ms were investigated as predictors of general cortical inhibition.

A-SICI protocol
The A-SICI protocol was also managed by QTRACW and was initiated after RMT 0.2mV and TS 1mV had been estimated at the beginning of the examination ( Fig. 3A-C). The same ISI conditions as in the T-SICI protocol were used. The same procedure for pseudorandomization of the nine SICI and three control conditions as in the T-SICI protocol was applied (Fig. 3A). The A-SICI protocol also stopped after each of the nine ISI conditions had been delivered ten times, and control stimuli at TS 1mV 30 times.
As opposed to the continuous RMT 0.2mV estimation and continuous adjustment of the paired stimulus intensities at each ISI condition in T-SICI, in A-SICI the paired stimulus intensities for each condition remained fixed throughout the examination (Fig. 3C). The CS intensity was fixed at 70% of the RMT 0.2mV , and the test stimulus intensity was fixed at TS 1mV .

Exclusion of spontaneous muscle contraction
For both protocols, the online gating threshold during recording was set to 15 mV to exclude traces with contamination from spontaneous muscle contraction.
To exclude responses when the target muscle was not completely relaxed, online gating of prestimulus activation was used in both protocols. Sweeps in which negative EMG peaks exceeding 0.015 mV were detected 270 ms before the magnetic stimuli were automatically discarded from the analysis.

Statistical analysis
Microsoft Excel version 16.16.15. was used for calculation of repeated measures ANOVA (rmANOVA) and Student's paired t test. Other statistical analyses were performed using the statistical software program R version 3.6.2 (2019-12-12) and OriginPro 2017 (OriginLab).
Normality was checked by quantile-quantile plots (QQplots) and histograms, and log-normally distributed data were log10-transformed.
Simple linear regression and calculation of correlation coefficients were applied for intraobserver method correlation analysis (Bland and Altman, 2003). RMT 0.2mV and SICI variables for each subject were calculated by averaging, either arithmetically (for normally distributed data) or geometrically (for log-normally distributed data), all measurements across all sessions. One sample t test comparing to a control condition (0% RMT 0.2mV for T-SICI, 100% test MEP for A-SICI) was used to determine significant inhibition or facilitation at each ISI. The relationship between two SICI methods across all ISIs was described by fitting a linear curve [y = a 1 b Â log (10), where a = intercept, b = slope, x = mean group A-SICI, non-transformed].
For normally distributed data, CR and Bland-Altman plots can be interpreted as the absolute difference between any two future replicate measurements estimated to be no greater than CR on 95% of occasions. However, this is not the case for CRs and Bland-Altman plots, which have been calculated with log-transformed data. Log-transformed CRs and Bland-Altman plots become dimensionless ratios on backtransformation (Bland and Altman, 2003), and can be interpreted as a relative difference between any two future replicate measurements estimated to be no greater than CR on 95% of occasions. A two-way random effects model with single-ratings, absolute-agreement [ICC (2,1)] was applied to quantify ICC (Fleiss, 1999;de Vet et al., 2006;Streiner and Norman, 2008;Koo and Li, 2016;Brown et al., 2017). Reliability was defined as poor (ICC , 0.50), moderate (ICC 0.50-0.749), good (ICC 0.75-0.90) or excellent (ICC . 0.90; Koo and Li, 2016).rmANOVA was also used to estimate between-subject and within-subject variance. If assumption of sphericity for rmANOVA was violated (Mauchly's sphericity test, p , 0.05), Greenhouse-Geisser correction was applied. If significant effects were identified (p , 0.05), pairwise post hoc analysis with Bonferroni correction for multiple comparisons was applied.
Interobserver statistical parameters were calculated by applying the first examination by observer 1 in the corresponding session. If significant effects were identified (p , 0.05), pairwise post hoc analysis with Bonferroni correction for multiple comparisons was applied. Interobserver reliability was estimated by ICC. Student's t test for paired data were applied for comparison of interobserver measurements. 95% CI for interobserver estimates are not given, because two observers are too few to give useful estimates.

Data characteristics
The cTMS and TT-TMS RMT 0.2mV were in general below 60% MSO, except for subjects S8 and S11. TS 1mV was on average 119.3% (range 109.9-144.7%) of RMT 0.2mV for cTMS. Most of the subjects exhibited cortical inhibition at SICI 1ms , SICI 2.5ms and SICI 3ms with cTMS and TT-TMS (Figs. 4,5). Subject S14 (male, age: 75 years) exhibited cortical facilitation with both TMS methods throughout all sessions at SICI 1ms , SICI 2.5ms , and SICI 3ms . Average skin temperature for both methods was 35.4°C [range 34.5-36.4°C]. A-SICIs were log-normally distributed. All other TMS parameters were normally distributed. Description of data are given in Table 2.

Method correlation (intraobserver measurements)
On a group level, significant inhibition was seen at ISIs 1-3 ms with TT-TMS and at ISIs 1-4 ms with cTMS, with two distinct peaks observed at 1 and 2.5 ms with both techniques (Fig. 6). Meanwhile, there was significant facilitation at ISI 7 ms with both methods.
At individual or averaged ISIs, both intraobserver RMT 0.2mV and SICI measurements obtained with cTMS and TT-TMS all correlated significantly (Fig. 7) On a group level, a strong linear relationship between T-SICI and log-transformed A-SICI was observed across the whole range of tested ISIs, and this relationship was maintained throughout the experimental days as well as the time of the day (Fig. 8). However, there was considerable variability in this relationship in individual subjects with some discordance (i.e., one showing inhibition, another, facilitation) between the techniques at some ISIs (Fig. 9).
The CR can be used to study measurement precision (Bartlett and Frost, 2008). It is used when decisions are made on an individual basis. CR indicates how much two or more measurements made on the same subject will vary on 95% of occasions (Bartlett and Frost, 2008). Thus, the higher the measurement error, the higher the CR. CR = 1:96p ffiffiffiffiffiffiffiffiffi 2s w p , where s w is withinsubject variance (Bartlett and Frost, 2008).

Reliability
Ratio of the subject variation compared with the total variation: subject variation and measurement error (variation in the measurement process; Bartlett and Frost, 2008). A reliability of 1 indicates no measurement error and 0 indicates that all variation stems from measurement error (Koo and Li, 2016).
The ICC can be used to study the amount of measurement error in measurements made on the same subjects by different observers (interobserver reliability) or by a single observer (intraobserver reliability; Bartlett and Frost, 2008). ICC measures how well subjects maintain their position within the group with repeated measurements (Streiner and Norman, 2008). This is important for sample size and power calculations in interventional studies (Fleiss, 1999;Brown et al., 2017) and provides some indication on a discriminative value of a test (de Vet et al., 2006). As reliability ICC is a dimensionless ratio, ICC can be used to compare methods, whose measurements are on different scales (Koo and Li, 2016).
Reproducibility Variation in measurements made on the same subject under changing conditions: Different methods or instruments, different observers, measurements being made at different timepoints, within which the "true" underlying variable could undergo non-negligible changes (Bartlett and Frost, 2008).
Reproducibility can be studied when measurements are made by different observers, with different methods or instruments, or at different timepoints (Bartlett and Frost, 2008 (Table 3). Thus, the difference between any two future examinations is estimated to be no greater than 2.6-to 7.2-fold difference on 95% of occasions for A-SICI 1ms , A-SICI 2.5ms , and A-SICI 3ms . CR for T-SICI 1ms , T-SICI 2.5ms , and T-SICI 3ms ranged from 9.4% to 17.3% RMT 0.2mV on both days (Table 3), meaning that the absolute difference between any two future examinations is estimated to be no greater than 9.4À17.3% RMT 0.2mV on 95% of occasions for T-SICI 1ms , T-SICI 2.5ms , and T-SICI 3ms . CRs tended to be lower for averaged SICIs (SICI 1-3.5ms and SICI 1-7ms ) compared with the non-averaged SICIs (SICI 1ms , SICI 2.5ms , and SICI 3ms ). Also, CRs tended to be lower for morning SICI measurements compared with afternoon SICI measurements on both days. CRs for TS 1mV tended to be higher than for RMT 0.2mV in cTMS (Table 3).

Intraday and interday reliability (intraobserver measurements)
On day 1, A-SICI morning reliability ranged from moderate to good, afternoon reliability ranged from poor-to-moderate, and intraday reliability was moderate-to-good (Fig. 10A). On day 2, A-SICI morning reliability was good, and afternoon and intraday reliability both ranged from moderate to good (Fig. 10B).
On day 1, T-SICI morning reliability was good, and afternoon and intraday reliability ranged from moderate to good (Fig. 10C). T-SICI reliability ranged from moderate to good on day 2 (Fig. 10D).
Interday reliability of all SICI measurements was moderate-to-good (Fig. 11). Intraday and interday reliability of all RMT 0.2mV and TS 1mV ranged from good to excellent (Figs. 10, 11).

Reproducibility (intraobserver measurements)
None of the Bland-Altman plots for A-SICI measurements revealed ratios significantly different from 0, except for the A-SICI 2.5ms ratio A À SICI 2:5ms examination7 A À SICI 2:5ms examination8 = 0.73 (95% CI [0.56-0.94]) in the afternoon session on day 2. rmANOVA revealed that between-subject variation accounted for the largest part of total intersession variation, and that between-subject differences were significant for ISI of 1, 2.5, 3, 1-3.5, and 1-7 ms are depicted in different colors. Each subfigure denotes measurements from one individual subject. Subject number 5 (undetectable MEP) and subject number 7 (unrelaxed hand muscles) were excluded. Measurements of TMS parameters by observer 1. Sample size n = 18 for TT-TMS and n = 17 for cTMS. Each subject was examined eight times by each method. One subject (S4) was examined seven times with cTMS and eight times with TT-TMS. SICI was measured with the A-SICI and the T-SICI parallel protocol. Only values measured at ISIs 1 ms (SICI1ms), 2.5 ms (SICI2.5ms), 3 ms (SICI3ms), 1-3.5 ms (SICI1-3.5ms), and 1-7 ms (SICI1-7ms) are depicted. Each TMS parameter was measured twice in each session. a Point estimates for each subject were calculated as geometric means of their measurements. Data are displayed as medians with interquartile ranges [IQR] of subjects' point estimates. b Normally distributed data for each subject was arithmetically averaged to calculate group mean (6SE) and SD.

Discussion
The present study elaborates multiple aspects of testretest reliability of two emerging TMS protocols, including intraobserver and interobserver, intraday and interday reliability as well as diurnal influences on it. The present work extends current knowledge of the utility of these two novel SICI protocols.

Comparability between the techniques
Earlier studies have compared parallel TT-TMS with cTMS for SICI with either multiple CS intensities and single ISI (Samusyte et al., 2018) or a single CS intensity and multiple ISIs (Tankisi et al., 2021a).
Despite some technical differences in the parallel threshold-tracking paradigm (such as tracking mode or maximum tracking step size), a good correlation between A-SICI and T-SICI measurements was observed in these studies both within and across tested SICI conditions, suggesting that both techniques reflect largely similar underlying physiological mechanisms, at least in healthy volunteers.
In the present study, a good correlation between SICI measurements obtained with the two protocols both at individual ISIs and averaged SICI was observed across subjects. Furthermore, a linear relationship between T-SICI and log-transformed A-SICI was found across the whole range of tested ISIs. On a group level, this relationship appeared to remain stable throughout the day or different experimental days. Nevertheless, considerable interindividual variability in A-SICI/T-SICI slopes was observed, which is similar to earlier findings comparing SICIs at multiple conditioning intensities (Samusyte et al., 2018). In several subjects (Fig. 9), a discrepancy between the methods was seen at some ISIs, which is probably reflected in the slightly different duration of inhibition in the group (1-3 ms for T-SICI vs 1-4 ms for A-SICI; Fig. 6). This cannot be explained by differences in CS intensity as it was set to 70% of tracked RMT 0.2mV for both techniques. Although it is common to adjust CS based on active motor threshold or conventional RMT of 0.05 mV (Rossini et al., 2015), RMT 0.2mV was used in the present study to ensure activation of comparable inhibitory neuron populations with both methods as SICI is known to vary depending on conditioning intensity (Kujirai et al., 1993;Vucic et al., 2009). However, because of the intrinsic differences in the techniques (predetermined test MEP size dependent on a constant test stimulus in cTMS versus predetermined conditioned MEP size resulting in varying test stimuli across ISIs in TT-TMS), the upper motor neuron populations tested by the two techniques may differ.

Repeatability
CR can be used to estimate measurement error (Bartlett and Frost, 2008). In statistics, "measurement error" refers to the inherent continuous natural variation that occurs with repeated measurements of the same biological quantity in a subject. The measurement error may include natural biological variability in the subject and variability in the measurement method (Bland, 2015). Thus, measurement error does not refer to a mistake made during the examination, e.g., when an estimate is written down incorrectly (Bland, 2015).
One way to report measurement error is to estimate how much any two future measurements made on the same subject are expected to differ. This estimate may also be called "within-subject variation": second measurements on the same subject are not expected to differ systematically from the first measurement, as this would indicate the values were not true replicates (Bland and Altman, 1999). Hence, the possibility of bias between measurements is excluded, and the measurement error depends only on the within-subject variation. Withinsubject variation is therefore the same as the variation of the measurement error (Bartlett and Frost, 2008). CR estimates measurement error by quantifying the size of the differences between any two future measurements made on the same subject on 95% of occasions (Bland and Altman, 1999;Bartlett and Frost, 2008). Thus, the higher the CR, the higher the measurement error.
Measurement error in the present study may be ascribed to variation in coil placement and coil angling during the examination. The TMS coil was kept manually in place, and it was observed that even a slight change in coil angling by a few degrees or millimeters, either because of head movement by the subject or coil movement by the observer, could stimulate nearby muscles on the same hand, thereby possibly introducing measurement error. Likewise, the location of motor hotspot in the beginning of each measurement might have been subject to variation as well. The location was done manually, without a navigation system. It is possible that the exact same coil position and coil angling was not achieved in each consecutive examination, which might have contributed to the observed measurement error.
Measurement error in the present study may also be ascribed to biological variation in the subjects. Although the examinations were done in quick succession, the long duration of each session may have decreased subject alertness. A recent study demonstrated that spontaneously occurring fluctuations in alertness modulate cortical reactivity and MEP amplitude over relatively short durations in awake subjects (Noreika et al., 2020). If these findings can be extrapolated to the present study, then fluctuations in subject alertness may have contributed to biological variability, and thus measurement error in the present study. Furthermore, Figure 8. Relationship between mean group A-SICI and T-SICI curves. Group means of T-SICI (arithmetic) are plotted against A-SICI (geometric) at matching ISIs. x-axis (log 10 scale): A-SICI obtained by cTMS. y-axis (linear scale): T-SICI obtained by TT-TMS. T-SICI and A-SICI group means were calculated by averaging all 18 individual subjects' means across all measurements (all sessions), across measurements taken on the same time of day (morning and afternoon sessions), or the same experimental day (day 1 and day 2). Error bars represent SEM (6SEM for T-SICI, Â/ÄSEM for A-SICI). A linear relationship (denoted by black solid line, blue dashed line 95% CI for regression line, and orange dashed line 95% prediction intervals for regression line) was observed between T-SICI and log-transformed A-SICI across ISIs (as indicated by navy numbers in the top panel), and was maintained throughout the experimental days as well as different time of the day.
underlying oscillations in brain activity may affect TMS measurements (Zrenner et al., 2018;De Goede and Van Putten, 2019) and further contribute to the measurement error.
In the present study, the repeatability of RMT 0.2mV is in line with studies that used probabilistic methods to determine the conventional RMT with a 0.05 mV cutoff (Beaulieu et al., 2017). Meanwhile, the intraday CRs of RMT 0.2mV and TS 1mV are comparable to the previous study, which employed threshold-tracking and reported CRs of 5.5% and 10% MSO, respectively (Samusyte et al., 2018). The repeatability of A-SICI in the present study cannot be directly compared with previous studies which reported a wide range of CRs (17-147% test MEP; Fleming et al., 2012;Ngomo et al., 2012;Schambra et al., 2015;Samusyte et al., 2018). Because of non-normality, log-transformed A-SICI was used to calculate CRs. The back-transformed CRs are dimensionless and therefore it is not straightforward to apply them in clinical practice (Schambra et al., 2015). However, conceptually CRs are similar to Bland-Altman's limits of agreement, which, if calculated with log-transformed values and then backtransformed, indicate a ratio to the value on the x-axis (Bland and Altman, 1999). Thus, measurement error of log-transformed values should be interpreted as a relative difference or fold-change between any two future measurements.
Meanwhile, the CRs for T-SICI were high when compared with their respective means. This shows that in an average subject with inhibition on the initial recording, some degree of facilitation could be observed with repeated testing, representing an expected variation (be it technical or biological). This is in keeping with previous reports of rather poor repeatability of T-SICI measurements Figure 9. Relationship between A-SICI and T-SICI curves in individual subjects. Individual subject means of T-SICI (arithmetic) are plotted against A-SICI (geometric) at matching ISIs (calculated by averaging individual subjects' means across all measurements). Black dashed lines indicate control conditions (0% RMT 0.2mV for T-SICI, 100% test MEP for A-SICI). x-axis (log 10 scale): A-SICI obtained by cTMS. y-axis (linear scale): T-SICI obtained by TT-TMS. In many subjects, the relationship between log-A-SICI and T-SICI appeared to be linear or near-linear. However, in some there seemed to be a "floor effect" with cTMS that was overcome by TT-TMS (e.g., subject 12, subject 15); in others, no apparent correlation between the techniques was seen (e.g., subject 11, subject 14).
in younger healthy volunteers (Matamala et al., 2018;Samusyte et al., 2018). The repeatability of SICI was improved by averaging multiple ISIs for both techniques, an observation which was also reported in another study (Matamala et al., 2018). This likely reflects the variation (biological and/or technical) of SICI versus ISI curves within subjects, which can be reduced by averaging across ISIs. Nevertheless, averaging is not sufficient to allow a confident use of these measurements for individual decision-making. For intraobserver (observer 1) measurements. Sample size n = 17 for cTMS and n = 18 for TT-TMS. SICI measured with the A-SICI protocol (A-SICI, a dimensionless ratio; see Materials and Methods, Statistical analysis, for further explanation) and the T-SICI protocol (T-SICI, in % RMT). The unit of RMT 0.2mV and TS 1mV is % MSO. CR for day 1 (morning and afternoon session 1) and day 2 (morning and afternoon session 2) were calculated to estimate repeatability. Figure 10. Intraday reliability of intraobserver cTMS and TT-TMS measurements. Intraobserver (observer 1) intraday reliability, estimated by ICC, of cTMS (A, B) and TT-TMS (C, D) parameters from day 1 (A-C) and day 2 (B-D). Sample size was n = 17 for cTMS and n = 18 for TT-TMS. y-axis: intraday ICC (2,1) with 95% CI for day 1 (A-C) and 2 (B-D). x-axis: cTMS and TT-TMS parameters. ICC intervals , 0.5, between 0.5 and 0.749, between 0.75 and 0.9, and intervals .0.9 indicated poor, moderate, good, and excellent reliability, respectively. Morning reliability (blue squares), afternoon reliability (yellow triangles), and morning and afternoon reliability (gray circles) are depicted from each measurement day. ICC calculations were based on measurements from the session of interest, i.e., morning session ICCs were based on the two morning measurements. Calculation of "morning and afternoon" reliability was based on data from the first morning measurement and first afternoon measurement.

Intraobserver and interobserver reliability
Overall, RMT 0.2mV and TS 1mV showed good-to-excellent reliability compared with poor-to-good intraobserver reliability, and poor-to-moderate interobserver reliability of paired-pulse measurements . SICI 3ms and estimates averaged across ISIs tended to have higher interday ICCs, but no consistent pattern was observed for same-day recordings. None of the techniques demonstrated a superior reliability in the present study, in contrast to the previous report (Samusyte et al., 2018). This could be related to methodological differences, but it is also important to note that a direct comparison of ICCs Figure 11. Interday reliability of intraobserver cTMS and TT-TMS measurements. Intraobserver (observer 1) interday reliability, estimated by ICC, of cTMS (A) and TT-TMS (B) parameters. Sample size was n = 17 for cTMS and n = 18 for TT-TMS. y-axis: interday ICC (2,1) with 95% CI for cTMS and TT-TMS. x-axis: cTMS and TT-TMS parameters. ICC intervals ,0.5, between 0.5 and 0.749, between 0.75 and 0.9, and intervals . 0.9 indicated poor, moderate, good, and excellent reliability, respectively. Morning reliability (blue squares), afternoon reliability (yellow triangles), and morning and afternoon reliability (gray circles) are depicted. ICC was calculated to estimate interday reliability. Calculation of reliability was done by using data from the first measurement in each session from day 1 and day 2. Calculation of "morning and afternoon" (gray) reliability was based on data from the first morning measurement on day 1 and first afternoon measurement on day 2. Reproducibility of interobserver measurements of A-SICI and T-SICI. Sample size n = 18 for cTMS and TT-TMS for observer 2. Calculation of statistical parameters is based on TMS measurements from observer 2 (from one TMS measurement for each subject) and the corresponding examination (by session, first examination in the session) by observer 1. A significant difference between observers was seen in A-SICI 2.5ms (p = 0.0103) only. Differences were interpreted as significant at p , 0.05. a Normally distributed data were arithmetically averaged to calculate mean (6SE) and SD. b Point estimates for each subject were calculated as geometric means of their measurements. Data are displayed as medians with interquartile ranges [IQR] of subjects' point estimates.
between the studies cannot be made without taking into account the heterogeneity of the samples (Bartlett and Frost, 2008;Streiner and Norman, 2008;Beaulieu et al., 2017). There was no significant bias in the TMS parameters obtained by different observers and the observer reliability of RMT 0.2mV and TS 1mV was similar. However, the interobserver ICCs for SICI were generally lower than intraobserver ICCs with both techniques, suggesting that longitudinal measurements should ideally be obtained by the same observer.

Observer and intersession reproducibility
In general, intraobserver cTMS and TT-TMS measurements were reproducible and no significant differences between examinations were found, except for A-SICI 2.5ms (between examinations 7 and 8), and two intersessional differences for T-SICI 3ms (between examinations 4 and 6) and for cTMS TS 1mV (between examinations 1 and 4). Likewise, no significant interobserver differences were observed, except for the A-SICI 2.5ms between observer 1 and observer 2. This suggests that interobserver measurements were reproducible.
As A-SICI 2.5ms examination 7 by observer 1 was used in both A-SICI ratios that turned up statistically significant, it is likely that a bias was introduced in this examination, since it differed both from the same observer's A-SICI 2.5ms examination 8 and from observer 2's A-SICI 2.5ms examinations. It can only be speculated on, how and why a bias was introduced into only one parameter in just one of the examinations.
Considering the observed statistically significant differences seen with T-SICI 3ms and with cTMS TS 1mV , it is unclear why exactly these two parameters differ significantly, when none of the other parameters from the same examinations differ. It is possible that even when controlling for familywise error rate (FWER) using the Bonferroni procedure, one or both of the observed statistically significant differences were because of random chance, as a total of four "families" of comparisons were made (Motulsky, 2010).

Is the time of day important?
There is limited data on the stability of TMS parameters throughout the day. No significant shift in RMT, MEP amplitude or conventional SICI has been previously observed in the awake state during the day (Koski et al., 2005;Lang et al., 2011;ter Braack et al., 2019). Our findings are also consistent with those of a previous study, which found no significant effect of time for SICI when tested at 9 A.M and 4.P.M (Doeltgen and Ridding, 2010). However, it has been proposed that SICI measurements obtained by TT-TMS may be more reliable if performed in the morning on different days compared with different times on the same day (Matamala et al., 2018). Indeed, it was observed in the present study that most SICI measurements obtained in the morning sessions tended to have better test-retest reliability indices (i.e., higher ICCs and/or lower CRs) compared with the afternoon sessions, both when measured on the same and different experimental days (Figs. 10, 11). Such a pattern was seen with both techniques, though more consistently with TT-TMS. Incidentally, it was observed in the present study that subjects found it more difficult to remain alert during the afternoon sessions. Although recently a nonlinear modulation of corticospinal excitability because of fluctuations in alertness has been described (Noreika et al., 2020), it is unclear whether this could have contributed to the increased variability in the afternoon in the present study. The intraday and interday reliability of SICI was largely comparable in the present study.

Strengths and limitations
Automated recording protocols allow the observer to concentrate on coil positioning and minimize observer Figure 12. Interobserver reliability of cTMS and TT-TMS measurements. Interobserver (observer 1 and 2)reliability, estimated by ICC, of cTMS (A) and TT-TMS (B) parameters. Sample size was n = 18. y-axis: interobserver ICC (2,1). Calculation of statistical parameters is based on TMS measurements from observer 2 (from one TMS measurement for each subject) and the corresponding examination (by session, first examination in the session) by observer 1. x-axis: cTMS (green squares) and TT-TMS (black squares) parameters. bias, which is crucial for longitudinal assessments and multicentre studies. No considerable systematic bias in the SICI measurements across multiple ISIs obtained by the same or different observers was found in the present study, supporting the use of such protocols.
The numerous observer examinations are an important strength of the present study as it provides reliability data for different experimental and clinical scenarios: (1) the "immediate" reliability when measurements are repeated in a quick succession (e.g., in studies of interventions with short-lasting effects or a repetition of a test to improve diagnostic certainty in clinics); (2) intraday reliability (e.g., interventional studies in which the effects are measured over the course of the day); (3) interday reliability (e.g., longitudinal assessments in clinical trials).
Although SICI reliability tended to be better in the morning than in the afternoon sessions, these findings were not statistically significant because of broad and overlapping 95% CI. This could be explained by a relatively small sample (Rankin and Stokes, 1998;Bonett, 2002). However, improved precision would require much larger samples (Bonett, 2002), which may not be practical given that ICCs cannot be easily generalized between different samples or populations with different variances (e.g., from healthy volunteers to patients).
While in many fields CRs and ICCs are used to define measurement error, a distinction between technical and biological variability cannot be made for TMS measurements. For example, the increased measurement error in the afternoon may be related to fluctuations in both the subject's state and in the observer's vigilance. Further studies using a robotic arm for coil positioning and thus eliminating the observer factor would shed more light on the biological variability of cortical excitability.
In addition to the different ISIs, SICI ISI averages of 1-3.5 and 1-7 ms were analyzed for consistency with earlier studies (Matamala et al., 2018;Menon et al., 2018;Ørskov et al., 2021;Tankisi et al., 2021a). Given the interindividual and intraindividual variability of SICI versus ISI curves, average measures may provide a more reliable parameter. Indeed, SICI averaged across ISIs of 1-7 ms has shown the best reproducibility (Matamala et al., 2018) and diagnostic utility for ALS in earlier studies with serial tracking (Menon et al., 2015), although some overlap with intracortical facilitation is likely reflected in this measure. Another SICI variable, an average across ISIs of 1-3.5 ms, has earlier been reported (Ørskov et al., 2021;Tankisi et al., 2021a). It represents intervals with maximum inhibition. However, one should remember that SICI at different ISIs have different underlying physiological mechanisms related to the refractory period, extrasynaptic and synaptic inhibition and overlap with short interval intracortical facilitation (Ziemann et al., 1996;Peurala et al., 2008;Stagg et al., 2011).
The potential differences as well as advantages and disadvantages of the two techniques have been discussed earlier (Samusyte et al., 2018). Briefly, threshold-tracking may allow a better evaluation of the full inhibitory potential as it overcomes the "floor effect" seen with cTMS. Meanwhile, A-SICI may be more suitable if one is interested in a particular subset of motor neurons. Threshold-tracking protocols can be preferred when the MEP amplitude is low, since in these conditions conventional method will be difficult to perform successfully. In contrast, in subjects with high RMT, the stimulator power may not be sufficient to capture full inhibition with threshold-tracking. As conventional and threshold-tracking protocols may potentially examine different neuron pools in healthy subjects (Samusyte et al., 2018) and patients (Tankisi et al., 2021b), they may have different sensitivity in pathologic conditions or respond differently to drugs. Future head-to-head comparisons in patient populations and interventional studies are warranted.
In conclusion, good correlations between SICI measurements obtained by cTMS and TT-TMS across a full range of ISIs was observed. The two techniques showed similar test-retest reliability profiles in healthy subjects with poor repeatability on the individual level, and satisfactory reliability on the group level. This suggests that the two automated SICI protocols may be reliably employed in research studies, but should at this moment be used with caution for individual decision-making in clinical settings. Further studies exploring reliability in different disease cohorts, such as motor neuron diseases or stroke, are warranted to investigate the diagnostic and clinical utility of the two automated SICI protocols.