The purpose of the present article is to supplement and extend a previous review of sensorimotor synchronization (SMS) studies using finger-tapping paradigms (Repp, 2005; referred to hereafter as R05). Although R05 was reasonably comprehensive, it is rapidly becoming outdated as a flood of new research is pouring in. More than enough new material has accumulated now to warrant another review. However, this article does not replace R05; rather, it is intended as a continuation and expansion.

The research covered in the present review is diverse, reflecting a broadening of the field of investigation brought about by new ideas and technologies. However, all of the research concerns SMS, defined as the coordination of rhythmic movement with an external rhythm. For space reasons, the review cannot consider such closely related topics as self-paced rhythm production, synchronization of covert internal processes to external rhythms, perception of synchrony between external signals, intrapersonal synchronization of limb movements, temporal coordination of single or nonrhythmic actions with external events, or synchronization among nonanimate agents. An exception is made in Part 4, in which neuroscience studies of rhythm perception are reviewed briefly before plunging into the neuroscience of SMS. Because of the large number of studies reviewed here, references to relevant literature cited in R05 are kept to a minimum.Footnote 1 However, some older studies not cited in R05 are included.

This review is divided into four parts. Part 1 is concerned with tapping studies, which continue to occupy an important place in SMS research, and in which the data are discrete time series. Part 2 reviews SMS studies in which various forms of continuous movement were carried out in synchrony with external rhythms. Part 3 covers the rapidly growing area of research on interpersonal synchronization. Part 4 surveys new findings in the neuroscience of SMS. Note that Parts 13 are concerned with behavioral studies only.

Tapping with an external rhythm

Finger tapping in synchrony with an external (usually computer-controlled) rhythm, often an isochronous metronome, remains a popular paradigm because of its simplicity and long history. An increasing number of researchers have equipment for conducting tapping studies, and some have written and made available special software for experimental control and data analysis (e.g., Elliott, Welchman, & Wing, 2009b; Finney, 2001a, b; Kim, Kaneshiro, & Berger, 2012). The basic mechanisms of SMS are still studied most conveniently with the finger-tapping paradigm, and the discrete nature of the taps makes the results particularly relevant to music performance. (Studies of the kinematics of rhythmic finger movement are reviewed later, in section 2.1.)

Asynchronies and their variability

Asynchronies (also called synchronization errors) are the basic data in any SMS study using tapping as the response. Asynchrony is defined as the difference between the time of a tap (the contact between the finger and a hard surface) and the time of the corresponding event onset in the external rhythm. The mean asynchrony is typically negative and, if so, is referred to as negative mean asynchrony (NMA). The standard deviation of asynchronies (SD asy) is an index of stability. Some studies instead employ circular statistics that yield a mean vector (angular deviation) and a circular variance (see Fisher, 1993); this approach is useful when synchronization is poor. The mean and the variability of intertap intervals (ITIs) are often reported as well, but they are of secondary importance in most SMS tasks. The tempo of the external rhythm, usually measured in terms of interonset interval (IOI) duration (or interbeat interval duration, in the case of nonisochronous rhythms or music), is an important independent variable.

Development, enhancement, and impairment of SMS ability

SMS ability takes years to develop (see also section 2.5.1). A large study by McAuley, Jones, Holub, Johnston, and Miller (2006) was primarily concerned with changes in preferred (self-paced) tapping tempo across the life span (ages 4–95), but in an electronic appendix they reported median produced ITIs during synchronization with metronomes whose IOIs ranged from 150 to 1,709 ms. Although a match between the median ITI and IOI does not necessarily imply good synchronization, a mismatch does reflect poor synchronization. It is clear from these data that 4- and 5-year-olds did not synchronize well, if at all, whereas 6- and 7-year-olds performed much better, almost at the adult level. Elderly participants retained good synchronization ability. (See also the mention of Turgeon, Wing, & Taylor, 2011, in section 2.2.) In a study using visual metronomes with IOIs ranging from 500 to 2,000 ms, Kurgansky and Shupikova (2011) found children 7–8 years of age to be generally more variable than adults, and often unable to synchronize at the fastest tempo.

Van Noorden and De Bruyn (2009) reported an extensive developmental SMS study with 600 children ranging in age from 3 to 11 years. The children listened to familiar music played at five different tempi and, after watching an animated figure demonstrating the task of synchronizing with the musical beat by tapping with a stick on a drum, continued the task. The youngest children usually tapped at a rate of 2 Hz and did not adapt to the tempo of the music, but increasing adaptation was evident from 5 years and up. Synchronization performance, as measured by the circular variance of tapping, improved throughout the age range, especially between 3 and 7 years. The authors interpreted their results in terms of the resonance theory of van Noorden and Moelants (1999), suggesting that young children have a narrow resonance curve centered near 2 Hz, which enables them to synchronize only at their preferred tempo. The resonance curve broadens with increasing age, especially toward lower frequencies.

Among adults, variability (SD asy) is generally lower for highly trained musicians than for nonmusicians (Repp, 2010b; Repp & Doggett, 2007). Krause, Pollok, and Schnitzler (2010) reported variability of ITIs, which was lowest in drummers (about 2.5 % of the mean ITI of 800 ms), and lower in professional pianists than in amateur pianists, singers, and nonmusicians. For professional percussionists playing a rhythm on a drum kit in synchrony with a metronome, Fujii et al. (2011) found a mean SD asy of about 16 ms (1.6 %) when the metronome IOI was 1,000 ms, and an SD asy of about 10 ms (2 %) when the IOI was 500 ms. They found no difference in variability among the three limbs (two hands and one foot) involved, even though the limbs were required to move at different rates. When the IOI was 300 ms, the variability approached 4 %, probably due to the biomechanical difficulty of tapping the high-hat cymbal at twice that rate with the right hand.

Repp (2010b) found no difference in SD asy between musical amateurs and nonmusicians (who had no music training at all) who tapped in synchrony with short metronome sequences having IOIs of 500 ms. Likewise, Hove, Spivey, and Krumhansl (2010) did not find any effects of music training in a group of college students who synchronized with an auditory metronome and with various visual stimuli at several tempi (see section 1.4). However, Repp, London, and Keller (2013) did find lower ITI variability in percussionists than in other musicians synchronizing with nonisochronous rhythms. Thus, it seems that only a high level of rhythmic expertise reduces the variability of tapping in SMS. Bailey and Penhune (2010) found that early-trained musicians performed better than late-trained ones in a paced rhythm reproduction task, but this difference was significant in terms only of reproduction accuracy, not of synchronization. Pecenka and Keller (2009) measured participants’ auditory imagery ability in a pitch-matching task and found that it predicted mean absolute asynchrony and SD asy in tapping with a metronome that was either isochronous or gradually changed in tempo, even when musical experience was factored out.

Individuals with motor disorders have been found to be impaired in rhythm tasks, including SMS (see also section 2.4). For example, Whitall et al. (2008) found that 7-year-old children with developmental coordination disorder (DCD) performed more variably than a control group of normally developing children, who in turn were more variable than adults. The task was tapping with a metronome at four different tempi (IOI = 313–1,250 ms), using the two hands in alternation. The authors suggested that children with DCD are deficient in their auditory–motor coupling. A few seemingly quite different disorders also lead to impaired SMS performance. For example, 7- to 11-year-old children with speech/language impairments showed greater variability than did a control group in a SMS task, though only at a stimulus rate of 2 Hz (Corriveau & Goswami, 2009). Likewise, dyslexic undergraduates showed higher variability of ITIs than did controls when tapping in synchrony with a metronome at three tempi (Thomson, Fryer, Maltby, & Goswami, 2006). Impaired speech/language ability and impaired SMS might both be associated with cerebellar dysfunctions (Nicolson & Fawcett, 2011). In addition, individuals with bipolar disorder have also been found to show slightly more variable ITIs than do controls in a synchronization task (Bolbecker et al., 2011). Analysis of their continuation tapping according to the model of Wing and Kristofferson (1973) suggested a difference in internal timekeeper variability, which according to the authors might also originate from deficient cerebellar functions. These findings seem to indicate a common neural underpinning—cerebellar dysfunction—for the SMS impairment observed in various disorders, though deficiency in other relevant cortical–subcortical networks and/or in auditory–motor coupling within the brain cannot be excluded (see section 4.2).

The negative mean asynchrony

The NMA in tapping with simple metronomes is a ubiquitous but still not fully explained finding. A frequently encountered statement in recent articles is that the NMA proves that participants anticipate rather than react to pacing stimuli, but in fact any positive asynchrony shorter than the shortest possible reaction time (about 150 ms) is still evidence of anticipation; the tap need not literally precede the stimulus (a point already made by Mates, Radil, & Pöppel, 1992, p. 701). At the same time, anticipatory responses are reactions, but to preceding stimuli: They are reactions timed so as to coincide approximately with the next target stimulus.

Białuńska, Dalla Bella, and Jaśkowski (2011) addressed a prediction made by the sensory accumulation model (Aschersleben, 2002), according to which the NMA is due to faster accumulation of sensory evidence from auditory or visual pacing stimuli than from tactile and kinesthetic feedback from taps. Białuńska et al. argued that the rate of sensory accumulation should depend on stimulus intensity, and therefore expected the NMA to decrease as the intensity of the pacing stimuli was decreased. However, varying stimulus intensity had no effect on the NMA, while it did affect simple reaction times to unpredictable stimuli in a separate condition. The authors speculated that, in the SMS task, an effect of slower sensory accumulation was canceled by lowering of a sensory threshold, controlled by attentional processes.

An alternative hypothesis is that the NMA is due to perceptual underestimation of the metronome IOI (Wohlschläger & Koch, 2000). Extra tones or movements occurring in the IOIs between pacing stimuli reduce the NMA, and such subdivision presumably reduces the underestimation of the IOI. Flach (2005) found a positive correlation between the mean asynchrony during SMS and the mean ITI of continuation tapping after the metronome had stopped, suggesting that the degree of IOI underestimation was reflected in the tempo of continuation tapping. Zendel, Ross, and Fujioka (2011) also obtained results consistent with this hypothesis: When they varied metronome IOI duration while keeping the ITI constant (1:n tapping), the NMA of musicians decreased as the IOI decreased (i.e., the more subdivision tones occurred between taps), whereas varying the ITI while keeping the metronome IOI constant (n:1 tapping) had little effect on the NMA. Repp (2008a), however, found exactly the opposite in a group of highly trained musicians: The NMA decreased as the ITI decreased, but IOI duration had little or no effect on the NMA when the ITI was constant. The reason for this difference in results is not clear. Loehr and Palmer (2009) also reported relevant results that did not conform to the IOI underestimation hypothesis. Pianists had to play isochronous melodies on an electronic piano in synchrony with a metronome (IOI = 500 ms) while they (1) heard additional subdivision tones between the beats, (2) played additional notes without hearing them, or (3) played and heard additional notes. Contrary to expectations, the NMA increased following a passively heard subdivision tone and, to a lesser extent, following an actively played subdivision tone. These findings cast doubt on the hypothesis that subdivision reduces IOI underestimation.

The NMA tends to be smaller for musicians than for nonmusicians.Footnote 2 Krause, Pollok, and Schnitzler (2010) compared the NMAs of drummers, professional pianists, amateur pianists, singers, and nonmusicians tapping with a metronome (IOI = 800 ms). Drummers showed the smallest NMA (about –20 ms), whereas others had NMAs in the vicinity of –50 ms. Fujii et al. (2011) reported that professional drummers playing three percussion instruments simultaneously in synchrony with a metronome had mean asynchronies ranging from –13 to 0 ms, depending on the instrument and tempo. A positive mean asynchrony was observed at a fast tempo close to the biomechanical limit of execution. Stoklasa, Liebermann, and Fischinger (2012) reported that musicians playing their own brass or string instrument in synchrony with a metronome showed a negligible NMA (–2 ms), unlike their tapping (–13 ms).

The NMA typically increases as the metronome IOI increases. For example, Fujii et al. (2011) found an increase in drummers’ NMAs as the metronome IOI increased from 300 to 1,000 ms. The NMA was significantly smaller for the right hand, which always moved at twice the rate of the metronome (2:1), than for the left hand and the right foot, each of which moved at half the metronome rate (1:2) in that study. Musicians performing a 1:1 tapping task also have been found to show an increase in their NMAs as IOI duration increased from 600 to 1,000 ms (Repp, 2008a) and from 260 to 1,560 ms (Zendel et al., 2011). However, according to other conditions in Repp’s (2008a) study, this increase was mainly due to the simultaneous increase in ITI duration, whereas Zendel et al. found little effect of the ITI variable. Thus, it remains unclear whether the NMA depends mainly on metronome tempo or on tapping tempo; both may play a role. Repp and Doggett (2007) examined 1:1 tapping at slow metronome tempi with IOIs ranging from 1,000 to 3,500 ms. Nonmusicians’ NMAs increased linearly as the IOI increased, whereas musicians’ NMAs were small and nearly constant (see also section 1.1.3 and note 2).

Boasson and Granot (2012) investigated the effect of pitch changes on SMS with isochronous melodies. Because performing musicians tend to accelerate when the pitch rises, it was predicted that tapping might accelerate as well. This was indeed found, with asynchronies becoming more negative during a pitch rise. Sugano, Keetels, and Vroomen (2012) used the NMA as an indicator of sensorimotor temporal recalibration. Participants tapped at a designated tempo while receiving auditory or visual feedback at one of two delays (see also section 1.3.1). Following exposure to the longer delay, participants showed an increased NMA when synchronizing with an auditory or visual pacing sequence. This adaptation effect was larger in the auditory than in the visual modality, and it transferred from the visual to the auditory modality, but not vice versa, possibly because rhythmic visual stimuli engage auditory processing, whereas the reverse may not occur.

When the task is to tap in synchrony with every tone of a nonisochronous cyclic rhythm, ITIs often exhibit characteristic distortions, similar to those observed in self-paced reproduction of the same rhythm. These distortions affect the asynchronies with individual tones in the rhythm cycle, such that a tap terminating an ITI that is too short will have a larger NMA than will a tap that terminates an ITI that is too long (Fraisse, 1966/2012). Recent examples can be found in Repp, London, and Keller (2005, 2008, 2011) for musicians tapping in synchrony with two- and three-interval rhythms. While rhythmic distortions affect local asynchronies, a global NMA tends to persist.

Variability

In 1:1 synchronization, the variability of asynchronies (SD asy) increases with both IOI and ITI duration (Repp, 2012; Zendel et al., 2011), but what kind of function describes this increase? Is it linear, reflecting some form of Weber’s law, or nonlinear? Linearity tends to hold over narrow ranges of IOIs (e.g., 500–950 ms; Lorås, Sigmundsson, Talcott, Öhberg, & Stensdotter, 2012), but wide ranges extending to long IOIs reveal nonlinearities. Repp and Doggett (2007) asked musicians and nonmusicians to tap with slow auditory metronomes whose IOIs ranged from 1,000 to 3,500 ms. For both groups, SD asy increased linearly up to 2,500 or 2,750 ms, and then increased more steeply, so that the complete functions had significant nonlinear (quadratic) trends, contrary to Weber’s law. The musicians in this study were also asked to tap in antiphase with the metronome, and their mean SD asy again increased nonlinearly with IOI duration, but with a shallower slope. This increasingly greater stability of antiphase than of in-phase tapping was attributed to subdivision of the metronome IOIs by the antiphase taps. In a follow-up study with musicians, Repp (2010a) added two further conditions, one requiring mental subdivision of IOIs while tapping in phase with the metronome, and the other requiring 2:1 tapping, which can be considered a conflation of in-phase and antiphase tapping. In all four conditions, SD asy increased smoothly but nonlinearly with IOI duration, and the differences among conditions were small initially but increased with IOI duration. Variability was highest for in-phase tapping (in which the instructions discouraged mental subdivision), slightly lower for in-phase tapping with mental subdivision, and clearly lower for 2:1 and antiphase tapping. When the 2:1 taps were separated into in-phase and antiphase taps, the antiphase taps were found to be less variable. This was attributed to anchoring of antiphase taps to the preceding tone, whereas in-phase taps seemed to be anchored more to the preceding antiphase tap than to the (more distant) preceding tone. This interpretation was supported by a constant high positive correlation between the asynchronies of successive antiphase and in-phase taps, whereas the correlation between the asynchronies of successive in-phase and antiphase taps was smaller and decreased as IOI duration increased.

A reduction in variability, previously termed the subdivision benefit (Repp, 2003), is also effected by computer-controlled physical subdivision of metronome IOIs. Unlike overt or covert subdivision by the participant, this kind of subdivision normally has no variability. Repp (2008a) confirmed that musicians’ mean SD asy decreases when the IOIs separating the metronome beats are subdivided into equal parts by one, two, or three additional tones, but only as long as those parts are at least 200–250 ms long. Zendel et al. (2011) likewise reported a subdivision benefit in musicians, and also noted a relative increase in variability for 1:3 as compared to 1:2 and 1:4 tapping (cf. Repp, 2003, 2007b).

On the basis of the distribution of asynchronies that underlies SD asy, it has been argued that a lower rate (upper IOI) limit exists for 1:1 SMS (see Repp, 2006b, for a review). The distribution has been observed to become increasingly bimodal when the IOI exceeds about 1,800 ms, due to the emergence of positive asynchronies that represent reactions to stimuli, rather than anticipations. However, when Repp and Doggett (2007) instructed participants always to predict the next stimulus and not to adopt a reactive strategy, positive asynchronies were only about as frequent as would be expected given a normal distribution with an increasingly large SD, up to IOIs of 3,500 ms. Despite greater variability, nonmusicians actually showed fewer positive asynchronies at long IOIs than did musicians, because their NMAs increased (as mentioned in section 1.1.1). No indication of an upper IOI limit for predictive SMS appeared in these data. However, it remains true that SMS becomes subjectively difficult when IOIs exceed about 1,800 ms (Bååth & Madison, 2012), and reacting to pacing tones is an effective strategy for reducing variability (while forsaking true SMS, which requires a strategy of prediction that may or may not lead to an NMA).

In a different vein, Keller, Ishihara, and Prinz (2011) asked whether the variability of tapping on one’s own body in synchrony with a metronome depends on the tactile sensitivity of the tapped-upon body part. Unexpectedly, SD asy, as well as movement amplitude and its variability, was larger for tapping on the (sensitive) left index fingertip than on the (less sensitive) left forearm. The authors attributed this to possible ambiguity about the source of sensory feedback, created by overlap of the neural representations of the two index fingers; an increase in the amplitude and timing variability of the tapping finger may facilitate disambiguation. Interestingly, SD asy was also higher when participants tapped on another person’s index finger than when they tapped on that person’s forearm. This may have been due to empathic experience of sensory feedback, as control conditions showed that the relative softness of the surface being tapped on was not responsible.

Error correction

Error correction is essential to SMS, even in tapping with an isochronous, unperturbed metronome. Two independent processes have been postulated: phase correction, a largely automatic process that does not affect the tempo of tapping, and period correction, which is usually intentional and changes the tempo. Phase correction may be based either on perceived asynchronies or on a mixture of phase resetting to the preceding stimulus and to the preceding tap, with much evidence favoring the second interpretation (see R05). Period correction may be based on a comparison of the perceived IOI duration with an internal timekeeper period (Mates, 1994) or on perceived asynchronies (Schulze, Cordes, & Vorberg, 2005).

Modeling and parameter estimation

Several two-parameter models of SMS have been proposed in the literature, with the parameters not necessarily representing phase and period correction directly. Jacoby and Repp (2012) analyzed the formal structure of four such models and showed that three (those of Hary & Moore, 1987; Michon, 1967; and Schulze et al., 2005) are mathematically equivalent instances of a general linear model, whereas one (Mates, 1994) is different and has a restricted parameter space because it contains a nonlinearity due to its assumption regarding period correction. Using newly collected data from musicians’ SMS with sequences containing tempo changes, Jacoby and Repp showed that the Mates model is contradicted by part of the data. The data, together with earlier results by Schulze et al. (2005), thus support the hypothesis that period correction is based on the most recent asynchrony. Jacoby and Repp also described and applied a new efficient method for estimating the model parameters, called bounded general least squares (bGLS), which relies on well-established matrix algebra formulations.

Using a more cumbersome iterative parameter estimation method for the Schulze et al. (2005) model, Repp and Keller (2008) simulated data obtained from musicians’ SMS with “adaptively timed” sequences. In this task, the computer controlling the metronome is programmed to carry out phase correction (and, if desired, period correction), which results in a pacing sequence whose timing is continuously modulated in response to the participant’s taps. Human and computer phase correction are additive (Vorberg, 2005). The computer’s phase correction parameter (α) can be set at positive or negative values, so that it either augments or cancels the human phase correction. Repp and Keller were able to show that the human α remained constant as the computer’s α varied between 0 and 1, even though this resulted in overcorrection (combined α > 1) when the computer’s α was large. (Repp, Keller, & Jacoby, 2012, replicated this interesting finding using the bGLS estimation method.) When the computer’s α was negative, however, SMS became rather unstable, and the simulations suggested that the human participants not only increased their α but also engaged period correction to counteract the computer’s uncooperative behavior. The adaptive-timing paradigm can be seen as a preliminary step toward an investigation of interpersonal synchronization (see section 3.1). Computer implementation of an elaborated adaptation and anticipation model (ADAM) has recently been reported by van der Steen and Keller (2012). A related study by Kelso, de Guzman, Reveley, and Tognoli (2009) is described in section 3.2.1.

A simple way to estimate α is to introduce unpredictable local phase perturbations (phase shifts or event onset shifts; see Figs. 2 and 3 in R05) of different magnitudes in an isochronous metronome and to measure the participant’s phase correction response (PCR), which is the shift of the immediately following tap from its expected time point (see the present Fig. 1a). Linearly regressing the PCR on perturbation magnitude yields a PCR function whose slope is the estimate of α (see Fig. 1b). Note that this estimate is solely based on PCRs to perturbations, not on other intervening taps.

Fig. 1
figure 1

(a) Schematic illustration of the phase correction response (PCR) to a negative phase shift (PS) in a tone sequence. S = stimulus, R = response. (b) The mean PCR as a linear function of PS magnitude, ranging here from −10 % to 10 % of the metronome baseline interonset interval (IOI), at four IOI durations. The slope of the PCR function is an estimate of α, and it increases with IOI. Small deviations from linearity are not significant in these data. (c) Alpha estimates as a function of IOI for one group of participants as a function of IOI in three SMS conditions: metronome with phase shifts (PCR only), regular metronome (RM), and adaptively timed metronome (AT). The open circles represent the slopes from panel b. The filled circles are the corresponding bounded general least squares (bGLS; Jacoby & Repp, 2012) estimates; they are slightly smaller because bGLS regresses the PCR on the preceding (noisy) asynchrony, not on the PS. The α values for RM and AT are bGLS estimates, too. All α estimates increase rather linearly with IOI, but they differ significantly between conditions (PCR > RM > AT). Note the overcorrection (α > 1) in the PCR at IOI = 1,300 ms. Error bars represent ±1 standard error. The data in panels b and c are reproduced from “Quantifying Phase Correction in Sensorimotor Synchronization: Empirical Comparison of Different Paradigms and Estimation Methods,” by B. H. Repp, P. E. Keller, and N. Jacoby, 2012, Acta Psychologica, 139, p. 285. Copyright 2012 by Elsevier. Adapted with permission.

Repp, Keller, and Jacoby (2012) used the bGLS method as well as previously developed algorithms to estimate α for musically trained participants synchronizing with (1) an isochronous metronome, (2) a perturbed metronome containing local phase shifts, and (3) an adaptively timed metronome, each at four different base tempi (IOIs of 400–1,300 ms). The estimates obtained with the different algorithms showed reasonable agreement, and all of the α estimates increased with IOI duration (see also section 1.2.2). Remarkably, however, the PCR-based α estimates from Condition 2 above were significantly larger than the estimates from Condition 1, which in turn were larger than those from Condition 3 (see Fig. 1c). Application of the bGLS method to selected taps in Condition 2 revealed that α increased immediately following a phase shift (i.e., just for the PCR) and then dropped back to levels characteristic of unperturbed sequences. Moreover, the PCR-based α estimates were uncorrelated with estimates derived from Conditions 1 and 3, while those conditions were highly correlated. Interestingly, however, the lower α estimates derived from post-PCR taps in Condition 2 correlated with the PCR-based α estimates, not with the similarly low α estimates from the other two conditions. Although they are based on a small sample of participants, these recent results suggest that SMS with locally perturbed sequences engages a different phase correction process than does SMS with unperturbed or continuously perturbed sequences. The results also suggest that continuous timing modulations due to adaptive timing weaken sensorimotor coupling, as indexed by α. Similarly, Launay, Dean, and Bailes (2011) reported that phase correction is less vigorous in synchronization with continuously but randomly perturbed sequences than with isochronous sequences.

At a recent conference, Vorberg (2011) described a revised version of his linear phase correction model (Vorberg & Schulze, 2002) in which the internal timekeeper does not trigger responses (Wing & Kristofferson, 1973) but rather sets up temporal goal points for anticipated action effects (Drewing, Hennings, & Aschersleben, 2002). At the same meeting, Schulze, Schulte, and Vorberg (2011) reported applying a modified linear phase correction model to synchronization with nonisochronous rhythms (see also section 1.2.2). A new investigation of error correction in antiphase tapping was presented by Launay, Dean, and Bailes (2012).

A new model of SMS containing two linear and three nonlinear terms has been proposed by Laje, Bavassi, and Tagliazucchi (2013). These authors showed that their model predicts the behavioral response to all types of temporal perturbation (phase shift, event onset shift, and step change; see R05) reasonably well. However, their data were limited to a single tempo (IOI = 500 ms) and to small perturbations (<50 ms), knowledge of the type of perturbation was assumed, and the first tap following the perturbation (the PCR) was omitted from modeling. Therefore, the generality of this particular model and its superiority to linear models remain to be firmly established.

Although playing a melody on a piano is more complex than tapping, phase and period correction in synchronization with a metronome probably operate similarly. Loehr, Large, and Palmer (2011) compared linear and nonlinear models of SMS in piano playing: Pianists played a melody, composed of beat and subdivision levels, while being paced by a beat-level metronome that started out isochronously but then accelerated or decelerated, with linear changes in IOI duration. The pianists’ adaptation to the tempo changes was modeled using the linear model of Schulze et al. (2005) and a coupled nonlinear oscillator model (Large & Jones, 1999). The oscillator model was found to be superior.Footnote 3 The behavioral data showed better adaptation to decelerating than to accelerating metronomes, but in each case the asynchrony (expressed as relative phase) changed substantially, becoming negative during deceleration and positive during acceleration. Little evidence emerged of accurate prediction of tempo changes. Research by Pecenka and Keller (2011a) has indicated that prediction of gradual tempo changes is not automatic, as it is impaired in a dual-task situation that puts a load on working memory.

Taking a dynamic-systems approach, Stepp and Frank (2009) described a procedure for obtaining simultaneous estimates of coupling strength and of the amount of stochastic noise from asynchrony time series. They demonstrated the method in several simulations, but apparently it has not yet been applied in empirical studies.

All of the models mentioned so far assume that the noise in the data is Gaussian. However, asynchrony time series often exhibit positive long-term serial dependencies, also known as fractal noise. Torre and Delignières (2008b) proposed a model of SMS in which an internal timekeeper generates fractal noise while phase correction remains linear (Vorberg & Schulze, 2002). Spectral power analyses of long series of taps synchronized with a metronome (IOI = 500 ms) confirmed the presence of fractal noise, and simulations using a fractal-noise algorithm adapted by Delignières, Torre, and Lemoine (2008) yielded a reasonable approximation to the statistical properties of the data. Delignières, Torre, and Lemoine (2009) extended the fractal-noise model to antiphase (“syncopated”) tapping by assuming that participants estimate the midpoints of IOIs and use them as SMS targets. This estimation process was modeled as another source of fractal noise, which further increased the serial dependencies of asynchronies. The statistical properties of long series of in-phase and antiphase taps collected with an IOI of 800 ms were approximated well by the new model. However, this fractal-noise modeling has not yet yielded a generally applicable algorithm for estimating α.

Methods that do not take fractal noise into account will underestimate α. However, due to participants’ fatigue and attentional fluctuations, fractal noise may be more evident in the long time series (typically more than 1,000 taps) that are required to carry out spectral power analyses. Fractal noise implies high positive autocorrelations of asynchronies that decrease gradually as the lag is increased. Autocorrelations can be assessed in short trials that are presented repeatedly. Repp (2011a, Fig. 8) displayed average autocorrelation functions for musicians tapping with a regular metronome having IOIs ranging from 400 to 1,300 ms, where each individual time series (trial) encompassed just 60 taps. The lag-1 autocorrelation was positive for short IOIs but decreased as IOI increased, and reached zero at the longest IOI, indicating that significant fractal noise was present only at relatively fast tempi.Footnote 4 Moreover, Lorås et al. (2012) found hardly any lag-1 autocorrelation for tapping in a triangular spatial pattern at IOIs ranging from 500 to 950 ms.

The phase correction response

A number of recent studies from author B.H.R.’s laboratory have focused on the PCR—the immediate, largely automatic response to a metronome perturbation, which occurs even when the perturbation is not perceived consciously (see R05). Repp (2010b) investigated effects of music training on the mean PCR, using short sequences with a base IOI of 500 ms and containing phase shifts within ±10 % of IOI.Footnote 5 Unexpectedly, the mean PCR of highly trained musicians was smaller than that of musical amateurs and nonmusicians, but this difference was found to be due to three musicians who had participated in many previous tapping experiments. When the musicians were retested later, after all had participated in various tapping experiments, the mean PCR of the previously inexperienced tappers had dropped significantly, whereas that of the experienced tappers was unchanged. These results suggest that sensorimotor coupling strength does not depend on music training, but rather decreases with task experience.Footnote 6 By reacting less vigorously to perturbations and thereby spreading out the phase correction over several taps, experienced tappers decrease the variability of their asynchronies and ITIs, thus achieving smoother performance. Unlike specific task experience, age (19–98 years) does not seem to affect the efficiency of phase correction in response to phase shifts (Turgeon et al., 2011).

Several studies—all with musicians as participants—have demonstrated that the mean PCR increases with the metronome IOI. Using phase perturbations whose magnitude increased proportionally with IOI, Repp (2008c) found the increase of the PCR to be linear between IOIs of 300 and 1,200 ms. Repp (2011c) replicated this result with phase shifts of fixed size that became imperceptible as IOI increased. In both studies, significant overcorrection (mean PCR > 100 %) was observed at the longest IOIs (see also Fig. 1c). In a separate experiment, Repp (2011c) found consistent overcorrection in the IOI range between 1 and 2 s, but the mean PCR increased only slightly with IOI duration, which suggests a nonlinear increase overall. It remains unclear why overcorrection occurs at slow tempi.Footnote 7 Overcorrection is problematic for the mixed-phase-resetting hypothesis (see R05 and beginning of section 1.2), as mere phase resetting should not result in overcorrection. Moreover, the fact that overcorrection occurs even in response to subliminal perturbations (Repp, 2011c), together with the suggestion of a special phase correction process for the PCR (Repp et al., 2012), raises the intriguing possibility that the PCR is driven nonlinearly by highly accurate subconscious registration of temporal expectancy violations.

When the task is to tap in synchrony with perturbed nonisochronous rhythms, a dependency of the mean PCR on the preceding IOI duration is less clear. Repp, London, and Keller (2008) used cyclic two- and three-interval rhythms with IOIs of 400 and 600 ms in all combinations and permutations, and introduced small event onset shifts at certain points. They found a significantly larger mean PCR near the end of the longer IOI in two-interval rhythms, but not in three-interval rhythms. Repp, London, and Keller (2011) used two-interval rhythms with various IOI durations (360–840 ms) but did not find a dependency of the mean PCR on preceding IOI duration, though the mean PCR was larger with some rhythms than with others. In both studies, the mean PCR was about as large as in SMS with isochronous rhythms. The PCR is clearly reduced, however, when the preceding IOI is shorter than 300 ms (Repp, 2011d). In Repp’s (2011d) study, musicians tapped in synchrony with cyclic two-interval rhythms in which the tone initiating the shorter IOI was shifted occasionally. As that IOI was increased from 100 to 300 ms, the mean PCR increased gradually from zero to full magnitude, reaching an asymptote already at 250 ms.Footnote 8 Additional experiments in which either one tap or one tone per rhythm cycle was omitted (1:2 or 2:1 tapping) demonstrated that, for this increase in the PCR to occur, it was necessary neither to tap with the shifted tone nor for the tap exhibiting the PCR to have a synchronization target (see also R05).

The PCR function, which relates PCR magnitude to perturbation magnitude, is usually strongly linear (with slope α) for perturbations within ±10 % of the IOI (see Fig. 1b).Footnote 9 When the range of perturbation magnitudes is extended up to ±50 % of the IOI, the PCR function is typically sigmoid-shaped, with a steeper slope in the center. In other words, participants immediately correct a smaller percentage of a large than of a small perturbation. Repp (2011c) investigated whether this sigmoid shape is governed by the absolute or the relative magnitude of the phase shifts. This required varying IOI duration while holding either absolute (ms) or relative (% of IOI) phase shift magnitude constant. The results did not provide a simple answer, even though the PCR functions were consistently sigmoid. However, he did find clear evidence of an asymmetry, with the mean PCR being smaller for negative (advances) than for positive (delays) phase shifts. An interesting secondary result was that large negative phase shifts, in which a metronome tone occurred much earlier than expected, elicited an “early PCR” of the tap that was intended to coincide with the unexpectedly shifted tone. The early PCR started to emerge when the phase shift exceeded –150 ms, and reached a magnitude of about –100 ms when the phase shift was as large as –400 ms (i.e., about 25 %). The PCR of the next tap was reduced accordingly.

Making use of the fact that a PCR can be elicited by a shift in intervening subdivision tones when taps are synchronized with “beat” tones (Repp, 2008b), Repp and Jendoubi (2009) demonstrated that the PCR is triggered by a violation of temporal expectations established by preceding context, not by the temporal location (relative phase) of the shifted event in the IOI. The relative phase of cyclically repeated subdivisions did affect the asynchronies of taps, but these adaptations to a particular rhythm were clearly distinct from (and sometimes contrary to) the PCR, which occurred as an immediate response to a deviation from the current rhythm, whatever it was. A novel finding was that omission of an expected subdivision tone occurring at 2/3 of the IOI elicited a large positive PCR (i.e., a delay of the subsequent tap). This was attributed to perceptual grouping of the subdivision tone with the following beat tone.

Repp (2009a, b) attempted to use the PCR as an indirect measure of auditory stream segregation (Bregman, 1990). Musicians tapped with every third tone (the “beat”) of an isochronous sequence, and perturbations were introduced either in the beats (Repp, 2009b) or in the intervening two subdivision tones (Repp, 2009a), which had a different pitch. The results of a perceptual perturbation detection task indicated that the beat and the subdivision tones were integrated into a single stream when their pitches were two semitones apart, but not when they were 48 or 46 semitones apart. Remarkably, however, pitch separation had no effect at all on the PCR to perturbed beats: Relative to a control condition consisting only of beat tones, intervening subdivision tones reduced the mean PCR (see Repp, 2008b) regardless of pitch separation. Similarly, at the slower of two beat tempi used (IOI = 600 ms), pitch separation had no effect on the mean PCR to shifted subdivision tones; only at the faster tempo (IOI = 450 ms) was the mean PCR reduced substantially when the pitch separation was large. The conclusion was that perceptually segregated streams often still function as integrated rhythms at a sensorimotor level.

The PCR has also been used to measure perceptual centers (P-centers; Morton, Marcus, & Frankish, 1976)—the time points at which auditory events are perceived to occur, especially in a rhythmic context. The traditional paradigm for measuring the relative P-centers of two sounds is to play them cyclically in alternation and to adjust the timing of one of them until the sequence sounds isochronous. Using a set of speech syllables, Villing, Repp, Ward, and Timoney (2011) compared results obtained with the traditional method to results from a new method based on the PCR. One syllable was played in an isochronous sequence, and another syllable was substituted from time to time, with a variety of timings (i.e., phase shifts). Participants tapped in synchrony with the sequence, and the PCR to the inserted syllable was measured. The relative P-center of the inserted syllable could be inferred from the x-axis intercept of the PCR function, where PCR = 0. The results were highly similar to those obtained with the traditional method, but the PCR method offers some advantages, such as requiring no perceptual judgments. One interesting secondary finding, seen most clearly in homogeneous control sequences, was that the mean PCR decreased as the acoustic complexity of the syllable increased, reflecting increased uncertainty about the temporal location of the P-center.

Anticipatory phase correction

Anticipatory phase correction (APC) requires that a participant have advance knowledge of an upcoming perturbation. Repp and Moseley (2012) showed that advance information about the position and direction of a phase shift in an isochronous sequence enables musicians to advance or delay the tap intended to coincide with the shifted tone, and thereby to reduce the asynchrony occurring at that point. The mean PCR to the residual asynchrony was enhanced slightly relative to the mean PCR to unexpected phase shifts. In a second experiment, Repp and Moseley varied the time available for APC by varying IOI duration (400–1,200 ms) and the time at which complete advance information was supplied (one or two beats before the phase shift). From the data, APC (analogous to PCR) functions could be derived whose slope was a measure of the effectiveness of APC, comparable to α. When the advance information was supplied late, the slope increased with IOI duration up to about 1 s, and only at that point matched the slope for APC following early information, which was much less affected by IOI duration. Thus, up to 1 s was required to prepare and execute an effective APC response. There was also a clear asymmetry, with APC being less effective for negative than for positive phase shifts. These data provided evidence that phase correction can be under intentional control, even though normally it is an automatic response. As there were no carryover effects onto the next tap, it seemed that APC did not engage period correction.

Some researchers committed to a dynamic-systems perspective have promoted the idea of “strong anticipation,”Footnote 10 which they tested in a task requiring tapping in synchrony with a chaotically timed visual sequence whose IOIs varied in the range between 1 and 1.5 s (Stephen, Stepp, Dixon, & Turvey, 2008).Footnote 11 The chaotic signal was assigned three different levels of fractal structure, which were found to be mirrored by the structure of the ITI time series. The authors interpreted this correlation as evidence for strong anticipation, described as a dynamic adaptation to the global statistical structure of the environment (see also Stepp & Turvey, 2010).Footnote 12 Stephen and Dixon (2011) further expanded this account by analyzing and describing it as an instance of multifractality, “a reactive, feedforward coordination of preexisting fluctuations of very many sizes across multiple time scales” (p. 167).

The movement trajectory of synchronized tapping

The tapping cycle consists of flexion and extension phases, typically with a motionless phase in between, occurring either at the contact point (dwell time) or at the maximal extension (hold time). Accordingly, two tapping styles, “legato” and “staccato,” can be distinguished, though this distinction is rarely made in the literature or considered in instructions to participants.

Krause, Pollok, and Schnitzler (2010) found that participants moved their finger faster when tapping with an auditory metronome than with a flashing circle, and that this difference occurred mainly in the flexion phase (downward movement) of the finger, presumably reflecting stronger sensorimotor coupling. Drummers moved their fingers significantly faster during both flexion and extension than did other musicians or nonmusicians.

Hove and Keller (2010) recorded participants’ finger motions as they tapped in synchrony with a flashing square or with a display of a finger exhibiting apparent up–down motion at one of two amplitudes. Tap amplitudes were higher with the high-amplitude than with the low-amplitude finger display, suggesting an involuntary influence of perception on action. Flexion times were shorter than extension times and depended less on tempo, indicating a ballistic movement toward the contact target. The lag-1 autocorrelation of the ITIs was zero in tapping with flashes but negative (though small) in tapping with finger displays, probably reflecting better phase correction in the latter case. Small negative correlations were also observed between the asynchrony of one tap and the amplitude, extension time, and dwell time of the following tap, but not with its flexion time or movement velocity. This suggested that phase correction was implemented through adjustments of tap amplitude, extension time, and dwell time. Torre and Balasubramaniam (2009) observed a similar negative correlation between asynchrony and the following extension phase when participants tapped without contact (“in the air”) in synchrony with an auditory metronome. While there was no dwell time here, a great slowing amounting to a hold phase occurred at the end of the extension phase, which resulted in a strong asymmetry of the extension and flexion phases. A strong negative correlation was observed between the degree of this asymmetry and the SD asy, indicating more stable synchronization when the movement was less sinusoidal.

Elliott, Welchman, and Wing (2009a) compared three forms of finger action in synchrony with an auditory metronome: standard tapping, isometric force pulses applied to a sensor while maintaining contact with it, and smooth quasi-sinusoidal pressure variation applied to the sensor. Phase correction in response to an unpredictable phase shift was significantly slower in the smooth movement than in the two more discrete movements, and SD asy was also greater. The authors concluded that discrete movements provide more salient sensory information on which phase correction is based.Footnote 13

Pseudo-synchronization, feedback, and feeling in control

Pseudo-synchronization and feedback

Pseudo-synchronization occurs when participants believe that they are synchronizing with an externally controlled rhythm, but actually they are controlling (“producing”) the tones with their own taps. In other words, the tones provide “auditory feedback” about the taps, in particular about their tempo and variability.Footnote 14 Fraisse and Voillaume (1971/2009) found that, when participants were switched suddenly from SMS to pseudo-SMS without being aware of it, they accelerated their tapping progressively in the belief that it was the metronome that was accelerating. Because asynchronies during pseudo-SMS are normally close to zero, it seems that participants tried to recuperate their typical NMA by vainly trying to get ahead of the metronome. Participants who were informed about the switch showed smaller but still substantial acceleration.

Flach (2005) did not replicate these dramatic findings, perhaps because he used only a single metronome tempo (IOI = 800 ms). He found only a small and abrupt acceleration of tapping immediately after the transition from SMS to pseudo-SMS, regardless of whether or not participants were informed about the transition. The magnitude of this change in the ITI was positively correlated with the mean asynchrony preceding the transition. Thus it can be interpreted as a PCR to the sudden change in IOI (equivalent to a negative phase shift) and asynchrony (from negative to zero), and the maintenance of a slightly faster tapping tempo during pseudo-SMS can be attributed to repeated PCRs to deviations from the expected NMA. Importantly, Flach also manipulated the feedback delay during pseudo-SMS. The change in ITI after the transition depended strongly and positively on the feedback delay: When the negative asynchrony caused by delayed feedback exceeded the pretransition NMA, participants slowed down rather than sped up after the transition.

In a similar study, Takano and Miyake (2007) also varied feedback delay, but in addition varied sequence IOIs over a wide range (450–1,800 ms). Furthermore, they introduced a secondary task, silent reading, that diverted attention from the tapping task. When the tempo was slow and the feedback delay was small or zero, some participants accelerated much more than others. This tendency was absent, however, when participants were engaged in the secondary task. The authors therefore considered the acceleration a cognitively controlled form of phase correction.

In another clever experiment, Flach (2005) attempted to dissociate participants’ knowledge of (not) being in control of the tones from their behavioral responses to the transition and to event onset shifts in its vicinity. He gave participants a pitch cue to the transition from SMS to pseudo-SMS, but the actual transition occurred two tones earlier or later. He also shifted the onset of one tone before or after the transition, and that tone either did or did not coincide with the actual transition. The results, while somewhat complex, essentially indicated that knowledge of control has no influence on behavior. This is an important finding, as it suggests that SMS (and phase correction in particular) is independent of whether the timing of an external rhythm is externally or self-controlled, and also independent of participants’ belief about the locus of control.Footnote 15

Flach’s (2005) findings are relevant to a recent study by Drewing (2013) in which participants tapped in a self-paced manner while hearing feedback tones contingent on the taps. Every other feedback tone was delayed by a fixed amount, so that isochronous tapping resulted in nonisochronous feedback. Drewing found that participants tapped nonisochronously, partially compensating for the feedback delay, which resulted in more nearly (but not perfectly) isochronous feedback tones. He interpreted these results as support for the hypothesis that self-paced tapping involves the timing of integrated sensory (including auditory) consequences of movements (Drewing et al., 2002). However, if feedback tones function like pacing tones and automatically engage phase correction, a nonisochronous tapping pattern just like the one found would be predicted, with the ITIs echoing the IOIs at a lag of 1. Therefore, even though Drewing’s hypothesis is plausible, his findings do not seem to provide unambiguous support for it.

Feeling in control

Repp and Knoblich (2007) studied participants’ ability to discern whether or not they were in control of tones that they heard. Musicians tapped in synchrony with an isochronous metronome that at some point switched to feedback mode (pseudo-SMS), or the reverse. The participants knew how each trial started and had to report when the mode of control changed. The probability of detecting the change increased steeply over about six serial positions following the transition and then continued to increase more gradually until the end of the trial, where it was still well below 1. Sensorimotor cues—that is, the presence of variable asynchronies during SMS or their absence during pseudo-SMS—were important, for participants performed more poorly in a condition in which they listened passively to identical tone sequences without tapping, so that only perceptual cues of variability were available. It emerged that the transition from variability to constancy was much more difficult to detect than the opposite. This asymmetry was shown even more clearly by a group of nonmusicians (Knoblich & Repp, 2009). Participants also exhibited a bias to judge themselves as being in control. Knoblich and Repp then devised a simpler paradigm, in which nonmusicians listened to and then tried to reproduce a brief isochronous sound sequence. The reproduction taps were accompanied by sounds, which the participants had to judge as being externally controlled or self-controlled. In passive-listening conditions, the same sound sequences were played back and the same judgments had to be made. The participants were able to discriminate the two types of sequences better in the active than in the passive condition, and again showed a bias toward self-attribution of control. The difference in performance between the active and passive conditions was even more pronounced when the tempo of the externally controlled tones was varied somewhat during the reproduction phase; this increased sensorimotor cues to control (asynchronies) but decreased the salience of perceptual cues (IOI variability). Hauser et al. (2011) subsequently used this paradigm in a study of prodromal and diagnosed schizophrenics, who were expected, and indeed were found, to have an even stronger self-attribution bias than normal controls. The paradigm thus is potentially suitable for clinical assessments of the feeling of control, but it requires both an ability and a willingness to follow instructions precisely.Footnote 16

Many studies have been conducted to determine the effects of delayed or altered auditory feedback on speech production and music performance. However, Couchman, Beasley, and Pfordresher (2012) were the first to ask whether manipulated feedback affects participants’ feeling of being in control of their actions, and whether that feeling may in part be responsible for any impairment in performance. Using altered auditory feedback during performance of simple melodies on an electronic piano, they found that altered feedback did affect judgments of control as well as performance, and that disruption of performance was greatest when the feeling of control was ambiguous. However, the authors were able to conclude on the basis of correlational analyses that participants’ feelings of control did not affect their performance directly.

Tapping with composite auditory, visual, tactile, and multimodal rhythms

A composite auditory rhythm consists of several superimposed auditory sequences differing in pitch or some other acoustic attribute. In multimodal rhythms, sequences from different modalities are combined. In either case, the task may be to synchronize taps with all sequences simultaneously or to single out a target sequence and regard the other sequences as distractors.

Composite auditory rhythms

Keller and Repp (2008) investigated the effect of melodic pitch feedback on performance of a difficult SMS task. Musicians were required to tap in antiphase with a metronome using the two hands in alternation. The pitch of feedback tones controlled by the taps of each hand was manipulated to be the same as, close to, or far from the metronome pitch, with the higher feedback pitch being assigned to either the right or the left hand. The task was easiest when the feedback pitches were different from but close to the metronome pitch, which made the tones easier to integrate with the metronome into a composite melody/rhythm while maintaining a distinction between pacing and feedback tones. Performance was also better when the right hand generated a higher pitch than did the left hand, consistent with the “piano in the head” effect described by Lidji, Kolinsky, Lochy, Karnas, and Morais (2007).

Asynchronies occur naturally in the performance of musical chords. Intending to investigate whether such asynchronies facilitate synchronization, Hove, Keller, and Krumhansl (2007) required participants with or without musical training to tap in synchrony with isochronous sequences of two-tone complexes in which tones of different pitches were either synchronous or asynchronous by a small fixed amount (25–50 ms). The NMA (measured relative to the onset of the leading tone) was indeed smaller when there was an asynchrony. However, this can be interpreted as an attraction of taps toward the lagging tones, which functioned like distractors (see R05) if the leading tones are regarded as targets (though no targets had been designated by the instructions). A tendency to tap closer to the lower of the two tones was also observed. A perceptual task that estimated the P-centers of the chords (see section 1.2.2) yielded differences paralleling the changes in NMA, so that the results also could be described as participants synchronizing with P-centers. Participants with music training showed lower variability when tapping with asynchronous than with synchronous chords, whereas nonmusicians showed the opposite result.

Visual versus auditory rhythms

The variability of taps is typically greater when synchronizing with visual than with auditory metronomes, regardless of the musical training of participants (see R05). Although musical training is most relevant to SMS with auditory stimuli, Krause, Pollok, and Schnitzler (2010) recently found drummers and professional pianists to be significantly less variable than nonmusicians in tapping with visual stimuli (a flashing circle). Kurgansky (2008) studied SMS with similar visual metronomes covering a wide range of IOIs (500–2,200 ms), paying special attention to the initial “tuning-in” phase after the metronome started, for which he observed several different strategies. He also demonstrated a decrease in the lag-1 autocorrelation of asynchronies and a corresponding increase in α with increasing IOIs during steady-state SMS. Furthermore, he found an increase in positive asynchronies at long IOIs, though they remained shorter than reaction times to unpredictable visual stimuli. In a follow-up study, Kurgansky and Shupikova (2011) observed higher variability and less effective phase correction in 7- to 8-year-old children than in adults, while general performance characteristics were similar.

Lorås et al. (2012) compared tapping with auditory and visual (flashing) metronomes at IOIs ranging from 500 to 950 ms. Participants tapped here in a spatial pattern, moving around the corners of a virtual triangle. SD asy was much smaller with auditory than with visual pacing stimuli and, in each case, increased linearly with IOI. Surprisingly, the NMA for auditory stimuli was very small and independent of IOI, whereas the mean asynchrony for visual stimuli was positive and increased with IOI. By contrast, Sugano et al. (2012) found a substantial NMA with both types of pacing stimuli (IOI = 750 ms), though it was larger with auditory stimuli, whereas the variability was only slightly larger with visual than with auditory stimuli.

Static visual metronomes are difficult to synchronize with when their IOIs get shorter than 500 ms (Repp, 2003). It was long suspected that the critical IOI duration (and variability) might be lower for moving visual stimuli. This was first demonstrated by Hove and Keller (2010), who compared SMS with a flashing square to SMS with alternating images of a finger in raised and lowered positions, exhibiting apparent movement. Subsequently, Hove et al. (2010) and Ruspantini, D’Ausilio, Mäki, and Ilmoniemi (2011) reported results implying lower critical IOI durations for a bar or a finger exhibiting real up–down movement than for flashes. Moreover, Hove et al. (2010) found that an upright finger whose motion was compatible with a participant’s finger motion yielded better performance than an inverted finger that moved up when the participant’s finger moved down. However, none of these moving visual stimuli yielded SMS performance approaching that with auditory stimuli. Iversen, Patel, Nicodemus, and Emmorey (unpublished) reported that a video of a bouncing ball yielded SMS variability (SD asy) close to that observed with an auditory metronome, but Hove, Iversen, Zhang, and Repp (2013) still found a significant difference in favor of the latter. Even more effective visual stimuli for SMS than a bouncing ball may yet be found.

When participants synchronize with a target sequence of visual flashes that is accompanied by an auditory distractor sequence, participants’ taps veer in the direction of the distractor tones and react to perturbations in them (see R05). By contrast, visual distractors have hardly any effect on synchronization with auditory targets. Because some of these earlier results could have been due to misperception of the timing of visual stimuli when they occurred in the vicinity of auditory stimuli (“temporal ventriloquism”), Kato and Konishi (2006) presented target and distractor sequences in antiphase, roughly 500 ms apart, which made perceptual interactions unlikely. Nevertheless, temporal jittering of the auditory distractor sequence greatly increased SD asy for the visual target, but barely the other way around, thus replicating the earlier results.

Hove et al. (2013) pitted a bouncing ball against an auditory metronome in a target–distractor paradigm, varying the phase difference between the two sequences. For a group of musicians, auditory distractors tended to attract taps more than did visual distractors. A group of visual experts (video gamers and ball players), however, showed the opposite pattern, even though they synchronized better with unimodal auditory than with visual sequences, as did the musicians. Overall, the bouncing ball proved to be an effective competitor for an auditory metronome.

Multimodal rhythms

To compare SMS with unimodal and bimodal stimuli, Wing, Doumas, and Welchman (2010) presented tones and haptic stimuli (passive movements of a nontapping finger) individually or simultaneously. As would be predicted by a model of optimal multisensory integration (Ernst & Bülthoff, 2004), SD asy was lower in the bimodal condition than in either of the unimodal conditions, which exhibited similar variability. Adding temporal jitter to the auditory metronome increased variability much more in the unimodal auditory than in the bimodal condition, because in the latter condition participants relied in part on the unperturbed tactile stimuli. Elliott, Wing, and Welchman (2011) tested elderly participants (63–80 years) in a similar paradigm in which the phase offset between the modalities was also varied. While these participants showed greater variability than a younger comparison group, they reacted similarly to bimodal stimuli, suggesting intact multisensory integration. Elliott, Wing, and Welchman (2010) extended the paradigm to three modalities, using auditory, tactile, and visual (flashing) metronomes. All three pairwise bimodal combinations were presented, as well as unimodal conditions. In the bimodal conditions, two degrees of jitter were applied to the modality with the lower unimodal variability (auditory < tactile < visual). An optimal-integration model predicted the results for isochronous and lightly jittered conditions well, but when jitter was high, variability was larger than predicted in two of the three bimodal conditions, albeit still lower than in the relevant unimodal jittered condition. Thus, participants were able to avoid some of the effect of the jitter by relying more on the isochronous stimuli in the other modality, but not as effectively as would be predicted on the basis of optimal integration. This deviation from predictions was attributed to jitter-generated asynchronies between bimodal stimuli, whose magnitude may often have exceeded the optimal sensory integration window.

Tapping with a metrical beat

In this section, we review studies in which tapping was used primarily to indicate the most salient beat (tactus) of a rhythm. This task can be seen as a mixture of synchronized and self-paced tapping. In a nonisochronous rhythm, common in music, the beat may not always be marked explicitly by event onsets in the stimulus; such a rhythm is called syncopated. In general, the beat of a rhythm is never represented unambiguously in the sound pattern, but has to be determined by the participant, by the experimenter’s instructions, by preceding context, or by music notation (time signature). While taps must always be temporally coordinated with the external rhythm, they are synchronized with an internal periodic process that marks the beat. Consequently, taps are also synchronized with external events that happen to coincide with the beat.

Tapping with induced or imposed beats

Snyder, Hannon, Large, and Christiansen (2006) presented isochronous melodies with pitch patterns that favored either a 2–2–3 or a 3–2–2 grouping of notes—hence, a nonisochronous beat. Participants were required to tap in synchrony with a 2–2–3 or 3–2–2 drumbeat that initially accompanied the melodies, and then to continue tapping the same beat pattern, in synchrony with the melody if it continued. Participants systematically distorted the 2:3 interval ratio in the direction of 1:2, which affected the asynchronies (see section 1.1.1). Ratio production was more accurate, but SD asy was larger, in the 3–2–2 than in the 2–2–3 tapping pattern. A mismatch of melodic and drumbeat patterns increased variability, but only when 3–2–2 was the pattern being tapped. After the drumbeat stopped, the variability of ITIs was greater when the melody continued than when it did not, probably due to the phase correction required to stay in synchrony.

Fitch and Rosenfeld (2007) used nonisochronous rhythms varying in degree of syncopation. Participants had to tap along with an isochronous induction beat and then to maintain the beat in coordination with the rhythm after the induction beat stopped. As expected, measures of performance accuracy decreased as the degree of syncopation increased, regardless of tempo. Highly syncopated rhythms often made participants shift the phase of their beat, because this made the rhythms less syncopated. Using a similar synchronization–continuation paradigm, Repp, Iversen, and Patel (2008) presented highly trained musicians with rhythms adapted from Povel and Essens (1985), some of which strongly induced the feeling of a beat with an 800-ms period. An 800-ms induction beat was initially superimposed in one of four possible phases relative to the rhythm. Surprisingly, SD asy (relative to the imposed beat) tended to be lowest when the imposed beat was in antiphase with the favored beat, perhaps because the tones marking the favored beat served as effective subdivisions. However, when participants were instructed to tap in antiphase with the imposed beat, variability tended to be lowest when the imposed beat was in phase with the favored beat.

Rankin, Large, and Fink (2009) asked participants with limited music training to tap with the beat of piano music by Bach and Chopin at two designated metrical levels. The music was played either metronomically or with expressive timing. A cross-correlation analysis of IOIs and ITIs suggested that the participants were actively predicting the expressive timing in at least one of the two pieces. The expressive timing (IOI) patterns were found to exhibit fractal properties that were also reflected in the ITIs.Footnote 17

Finding the beat of music

The study by Repp, Iversen, and Patel (2008) included a beat-finding task in which participants listened to the rhythms and started tapping with their preferred beat as soon as they had decided on it. While the beat favored by the temporal structure was often chosen in strongly beat-inducing rhythms, there was also a bias to select a beat that started in phase with the first tone of the rhythm. Su and Pöppel (2012) investigated whether moving along with a rhythm facilitates the discovery of its beat. Musicians and nonmusicians were presented with nonisochronous rhythms, with the task being to discover their beat. Half of the participants were told to sit still while listening, whereas the other half was instructed to move along in any way that they liked as soon as the sequence started. When participants felt that they had found the beat, they were to start tapping it in synchrony with the continuing rhythm. While musicians performed equally well in both conditions, nonmusicians who moved were able to find a stable beat on 80 % of trials, whereas those who sat still found a beat on only 30 % of trials. Movement also reduced the time that it took the nonmusicians to find a beat.Footnote 18 The authors concluded that nonmusicians “seemed to be lacking an effective internal motor simulation that entrained to the pulse when it was not regularly present at the rhythmic surface” (p. 379).

Choosing a preferred beat (tactus) can be considered tantamount to judging the tempo of a musical rhythm. McKinney and Moelants (2006) asked participants with varying degrees of music training to tap with the beat of musical excerpts drawn from ten different genres. Martens (2011) conducted a similar study with musical excerpts taken from classical music recordings. In both studies, a resonance model (van Noorden & Moelants, 1999) did not account well for the distribution of chosen beat tempi. McKinney and Moelants found that tempo choices varied over a wide range, even for the same excerpt, and were genre-dependent: Classical music was more often associated with slow beats, whereas metal/punk music elicited fast beats. Acoustic analysis of the materials indicated that periodic dynamic accents were often responsible for a choice of beat outside the resonance curve, especially if it was a slow beat. Martens found little relation between spontaneous tapping tempo and the chosen tactus. He distinguished three groups of participants on the basis of their preferred beat level(s): “surface tappers,” (often nonmusicians) who generally tap with the fastest pulse in the music, and sometimes fail to synchronize; “variable tappers,” who choose beats of various rates; and “deep tappers,” who most often tap with a slow metrical level. Participants’ exposures to different musical styles may have been responsible for these different preferences. Moelants (2010) also used a tapping task to investigate the metrical ambiguity of musical excerpts (binary vs. ternary meter).

Madison and Paulin (2010) and London (2011) pursued the alternative idea that music has a subjective speed that is not necessarily identical with the tempo of the chosen tactus. Madison and Paulin asked listeners to rate the perceived speed of musical excerpts after “measuring” the excerpts’ tempo by having two individuals tap with the perceived beat.Footnote 19 The speed ratings indeed often deviated from the measured tempo. In one of several experiments, London asked participants to tap along with the perceived beat of artificially constructed simple rhythms, and found this not to have any effect on subjective speed ratings. In a commentary on London’s study, Repp (2011b) suggested that participants probably tapped with the beat that constituted the basis for their speed ratings; he argued against the idea that music has a speed independent of metrical structure, and in favor of a particular metrical level serving as the indicator of tempo, more in accord with McKinney and Moelants (2006) and Martens (2011). Because multiple metrical levels could serve as the tactus, musical tempo may often be ambiguous, and this ambiguity may be reflected in speed ratings.

The “groove” of music

Two studies not involving movement have been concerned with the properties of music that make listeners want to move in synchrony with it. Madison (2006) asked nonmusicians to rate a large number of musical excerpts from different genres on 14 scales, one of which was labeled “groove,” defined as “inducing movement.” A factor analysis yielded four factors, one of which loaded highly on the “groove” and “driving” scales. In a further attempt to determine what musical characteristics might predict nonmusicians’ subjective ratings of groove, Madison, Gouyon, Ullén, and Hörnström (2012) conducted detailed acoustic analyses of a large number of musical excerpts from five different genres. The predictors varied with genre, but beat salience and event density were the best (positive) predictors overall. Interestingly, deviations from temporal regularity (expressive timing) had no impact on groove ratings, nor did beat tempo (as determined by two of the authors tapping along). For jazz, no significant predictors of groove emerged.

Janata, Tomic, and Haberman (2012) conducted an extensive study of groove that included SMS. Groove ratings of musical excerpts were consistent across participants, varied across genres, were higher for fast than for slow music (tempo being here determined by an automatic algorithm; Tomic & Janata, 2008), and were highly correlated with enjoyment ratings. When participants tapped along, they reported feeling more “in the groove” and found tapping easier with high- than with low-groove excerpts, chosen according to the earlier ratings. When participants were instructed to sit still during the music, they exhibited more spontaneous body movement (especially of feet and head) when listening to high-groove music. Application of the resonator model of Tomic and Janata indicated that sensorimotor coupling strength was higher with high- than with mid- or low-groove music.

Finding a conductor’s beat

A series of studies involving SMS has investigated where in the trajectory of a conductor’s movement the beat is located. Luck and Toiviainen (2006) recorded the gestures of a student conducting an ensemble with a baton, and found that the ensemble tended to synchronize with points of maximal deceleration, and also with points of high vertical velocity, both of which precede the lowest point of the trajectory. However, the ensemble lagged behind those points, which means that they came closer to actually synchronizing with the lowest point.

Luck and Nte (2008) recorded single-beat manual conducting gestures and then displayed the trajectories of the fingertip marker on a screen. Conductors, other musicians, and nonmusicians were asked to press a key in synchrony with the perceived single beat. Conductors were most consistent in locating the beat along the trajectory, while there was no difference between the other two groups. Using three-beat conducting patterns, Luck and Sloboda (2009) subsequently showed that acceleration was the best predictor of the perceived beat location, more so when the radius of curvature was small and the tempo fast. However, they could not determine whether participants anticipated or lagged behind the perceived beat, as there was no point in the trajectory that represented the beat in any objective sense. Luck and Sloboda (2008) further demonstrated that participants tended to locate the beat at points of high deceleration or acceleration when the radius was small, whereas they tended to prefer points of high velocity when the radius was large. When the trajectories were straightened, thereby eliminating curvature as a factor, velocity was the strongest predictor, though acceleration made a contribution as well. When the original trajectories were maintained but velocity was held constant, radius and its derivative did not predict participants’ responses at all.

Wöllner, Deconinck, Parkinson, Hove, and Keller (2012) investigated whether it is easier to synchronize taps with a prototypical conducting beat pattern than with individual conductors’ patterns. Point-light displays derived from 12 conductors’ recorded movements, performed in synchrony with an auditory metronome, were morphed together to form a grand average pattern, as well as separate averages for more and less experienced conductors. The average patterns were significantly smoother than the individual patterns. The circular variance of participants’ taps was smaller with the average patterns than with individual conductors’ patterns, and the mean asynchrony with the original metronome beat was smallest with the grand average and with experienced conductors’ average patterns.

Tapping with an external rhythm: what have we learned since 2005?

Although at the time of R05 a good deal was known about SMS, primarily from tapping studies, the recent research reviewed in Part 1 of this article extends this knowledge in several important ways. Linear models of error correction have been consolidated and equipped with an improved parameter estimation method. Some nonlinear models have been proposed and await further validation. The behavior of the PCR as a function of experimental variables has been mapped out in considerable detail, with one study suggesting that the response to local phase perturbations is both quantitatively and qualitatively different from the phase correction that occurs in SMS with isochronous or continuously perturbed sequences. Moving visual stimuli have been shown to be far superior to static flashes as rhythmic pacers of SMS. Results have demonstrated that pacing and “feedback” stimuli have similar behavioral effects, and that it is difficult to tell them apart subjectively. The tendency of music to elicit movement has been investigated thoroughly, and beginnings have been made in understanding how a conductor conveys the beat to an orchestra. We also have learned more about many other topics, including the development and impairment of SMS, its stability in old age, the beneficial effects of music training, the dependence of variability on interval duration, the possibility of anticipatory phase correction, multimodal integration in SMS, and the discovery and maintenance of the beat of a rhythm. One issue on which there has been little progress, however, is finding an explanation of the NMA. The perceptual basis of error correction is also not yet fully understood.

Moving continuously with an external rhythm

In this section, we review research in which various forms of continuous periodic movement have been synchronized with an external rhythm, usually a metronome or the beat of music. The wide availability of motion capture technology has led to a sharp increase of studies in this area. As this kind of research was not covered in R05, a few older references are included.

Limb movements

Event-based versus emergent timing

Torre and Delignières (2008a) compared asynchrony and ITI time series obtained, in very long trials, from two tasks performed in synchrony with an auditory metronome (IOI = 500 ms): tapping versus oscillating a joystick in the frontal plane. The joystick oscillation involved a forearm rotation in which each maximal pronation was to be synchronized with a metronome beat; thus, the oscillation was rather fast. The asynchronies in both tasks exhibited positive autocorrelations that decayed slowly as a function of lag, indicative of fractal noise, but they were larger and decayed more slowly for joystick oscillation than for tapping (see Fig. 2). Tapping showed a negative lag-1 autocorrelation of the ITIs, whereas oscillation did not. (See also Lorås et al., 2012, where the oscillation was a circular hand movement.) The log power spectrum of the ITIs increased linearly as a function of log frequency, indicating a preponderance of high frequencies (at least in part due to phase correction), whereas that of the oscillation periods increased only up to a point, and then stayed flat or decreased slightly at higher frequencies (see Fig. 2). The authors attributed these different patterns to different modes of timing control: event-based versus emergent. According to the well-known event-based timing model of Wing and Kristofferson (1973), which also applies to SMS (Vorberg & Schulze, 2002; Vorberg & Wing, 1996), each ITI contains the difference between two variable motor delays associated with the delimiting taps. This “differenced noise” causes rapid variation of ITI durations and a negative correlation of the adjacent ITIs, which is further increased by phase correction. By contrast, the variability of emergent timing arises in a continuous dynamic process of movement control, and thus represents simple (not differenced) noise, resulting in a flat or decreasing high-frequency spectrum.

Fig. 2
figure 2

Power spectra and autocorrelation functions for periods (PER) and asynchronies (ASYN) of tapping and joystick oscillation in synchrony with a metronome (IOI = 500 ms). Numbers in the panels indicate slopes of the linear fits. The periods show antipersistent correlations (positive slopes) in the low-frequency range for both tapping and oscillation, but in the high-frequency range for tapping only; this is attributed to “differenced noise.” Correspondingly, tapping shows a negative lag-1 autocorrelation, but oscillation does not. The asynchronies show persistent correlations (negative slopes and positive autocorrelations)—that is, fractal properties—for both tapping and oscillation, though they are more pronounced in the latter task. From “Distinct Ways of Timing Movements in Bimanual Coordination Tasks: Contribution of Serial Correlation Analysis and Implications for Modeling,” by K. Torre and D. Delignières, 2008, Acta Psychologica, 129, p. 289. Copyright 2008 by Elsevier. Reprinted with permission.

Torre and Balasubramaniam (2009) further examined joystick oscillation with regard to phase correction. They divided each oscillation cycle into two halves: “to” and “away from” the metronome beat. In contrast to their findings for tapping (see section 1.2.4), oscillation asynchronies were negatively correlated with the duration of the next “to” phase, but not with that of the immediately following “away from” phase. This indicated that phase correction is implemented late in the oscillatory cycle, whereas it is evident early in the tapping cycle. The authors attributed these differences to different control regimes: continuous sensorimotor coupling in oscillation versus discrete error correction in tapping.Footnote 20 Asynchronies were more variable in oscillation than in tapping, which suggests less effective phase correction. Further analyses and a coupled-oscillators model of SMS were presented in Torre, Balasubramaniam, and Delignières (2010).

Finger, hand, knee, and whole-body oscillations

Lee (1998) proposed a theory of temporal movement control according to which SMS is like repeated temporal interception: Each movement closes a “motion gap” by being coupled to a quantity (tau) that indexes the time remaining before the next target and serves as a guide for movement timing. In the case of synchronizing with a metronome, tau is internally generated and the action is “intrinsically guided,” whereas in the case of synchronizing with a continuously varying stimulus, tau is provided explicitly and the action is “extrinsically guided.” Inspired by Lee’s theory, Rodger and Craig (2011) compared SMS with an auditory metronome to SMS with auditory stimuli that increased continuously in either pitch or amplitude but periodically reset themselves to the starting values, with IOIs of 1, 2.5, or 4 s. The rhythmic action was quasi-continuous and involved moving the finger horizontally (with brief contact) between two barriers that were 20 or 70 cm apart. As compared to SMS with a simple metronome, synchronization with resetting sounds led to a smaller SD asy that also increased less steeply with IOI duration, as well as to smoother finger movement. The most striking finding, though, was that the NMA with the resetting stimuli increased dramatically with IOI duration, leading to anticipation of the resetting points by as much as 350 ms. The authors attributed this to underestimation of the time-to-arrival of the “looming” sounds, while no such anticipation was observed in synchronizing with a metronome, nor was there evidence of a reactive strategy (see section 1.1.3).

Varlet, Marin, Issartel, Schmidt, and Bardy (2012) investigated SMS of unimanual rigid pendulum-swinging with auditory and visual stimuli that were either discrete (tone, flash) or continuous (frequency-modulated, fading between two colors) and either unimodal or bimodal (in phase), at three frequencies (0.5–1 Hz). Participants’ wrist adductions lagged behind the discrete stimuli but led the continuous ones. The circular variance was lowest with continuous and highest with discrete visual stimuli; no difference was found between discrete and continuous auditory stimuli. These results show that continuous visual stimuli without a spatial component can also be effective pacers, at least for continuous movement. Variability tended to be lower in the bimodal than in the unimodal conditions, and relative phases were intermediate between those found with the unimodal component stimuli; there was no indication of auditory dominance (see section 1.4.2). Interestingly, participants’ movement cycles showed a significant negative lag-1 autocorrelation at the slowest tempo (0.5 Hz), suggesting event-based timing (see section 2.1.1).

When asked to synchronize taps or hand movements with a metronome, participants typically prefer to move downward on the beat. Lagarde and Kelso (2006) had participants move a finger rhythmically in the air, such that flexion occurred in phase with an auditory metronome and extension occurred in phase with tactile stimuli that were presented in antiphase with the metronome. When the frequency was increased, switches to the opposite coordination mode were observed, indicating that flexion to auditory stimuli (and/or extension to tactile stimuli) was the preferred coordination mode.

In an SMS study of skilled dancers and athletes bending their knees in synchrony with a metronome, Miura, Kudo, Ohtsuki, and Kanehisa (2011) found that synchronizing the downward movement with the beat was easier and more stable than synchronizing the upward movement. Carson, Oytam, and Riek (2009) investigated whether the preference for downward movement is due to the aid of gravity. With the forearm in either a supine or prone position, participants were instructed to move their hand either up or down in synchrony with an auditory metronome, and the effect of gravity was manipulated by means of a robotic device that made it possible to invert the gravitational forces on the hand. The results showed that gravity stabilized SMS regardless of movement direction, consistent with the hypothesis that reduced force requirements result in lower variability.

Demos, Chaffin, and Marsh (2010) reported a study in which participants sitting in a rocking chair were instructed to rock with the beat of music whose tempo varied from 60 to 80 beats per minute, or to ignore the music. The musical tempo affected the rocking tempo in both conditions, but evidence of entrainment was found only in the intentional condition (see also section 3.2.2).

Visuomotor tracking

Visuomotor tracking is a task in which an oscillatory movement is coupled to an oscillating visual target. Roerdink, Peper, and Beek (2005) asked participants to move a lever by flexing and extending their wrist in phase or in antiphase with a target varying in frequency. The authors found that antiphase tracking became unstable at a lower frequency (1.74 Hz on average) than did in-phase tracking (2.59 Hz) and tended to switch to in-phase tracking at faster tempi. However, when visual feedback was provided by a second visual stimulus moving in antiphase with the lever, antiphase tracking was stable for a longer time and switched to in-phase tracking much less frequently. Participants’ gaze tended to become fixed as target frequency increased, and fixation on one of the target endpoints reduced the spatial variability of the movement, creating an “anchor point” for coordination. In a follow-up study, Roerdink, Ophoff, Peper, and Beek (2008) manipulated gaze direction (left or right) and wrist posture (relatively flexed or extended) and found that these variables had independent effects on spatial endpoint variability, but only wrist posture affected the temporal endpoint variability (analogous to SD asy) of the movement. Ceux, Montagne, and Buekers (2010) manipulated visual feedback while participants moved a lever back and forth in synchrony with a horizontally oscillating visual target in one of three modes: in phase, antiphase, and 90 deg out of phase. Visual feedback was delayed by various amounts, thus creating a conflict between visual and proprioceptive feedback. The effects depended on the coordination mode, with better performance when the target and feedback stimuli moved in the same direction.

Schmidt, Richardson, Arsenault, and Galantucci (2007) investigated the role of eye movements (visual tracking) in visuomotor tracking. Participants either had to swing a rigid pendulum at their most comfortable frequency or to synchronize their pendulum swings with an oscillating target, either in phase or in antiphase. Visual tracking of the target, as compared to stationary fixation, led to unintentional intermittent entrainment in the first condition and lowered the variability of intentional entrainment in the second condition. Lopresti-Goodman, Silva, Richardson, and Schmidt (2008) subsequently showed that the strength of unintentional entrainment depends on how similar the preferred period of the pendulum movement is to the period of the oscillatory target. In visuomotor tasks requiring rhythmic arm movement instead of pendulum swinging, increasing the amplitude of an oscillating target increases the tendency to unintentionally entrain arm movements to it (Varlet, Coey, Schmidt, & Richardson, 2012). Furthermore, Romero, Coey, Schmidt, and Richardson (2012) demonstrated that spontaneous entrainment of horizontal or vertical rhythmic arm movements to an orthogonally oscillating visual stimulus—evident in deviations from the instructed movement trajectory—occurred only when participants had to track the stimulus visually.

To investigate visuomotor tracking of an irregularly oscillating target, Stepp (2009) asked participants to track a cursor that moved in a chaotically varying elliptical trajectory. Participants moved a stylus on a tablet, and delays were introduced in the visual feedback of the stylus position on the screen. With delays of 200–400 ms, participants’ stylus movements anticipated the cursor, yet achieved good synchrony. This was considered an instance of “anticipatory synchronization” or “strong anticipation” (Stepp & Turvey, 2010; see section 1.2.3).

Eye movements

When two horizontally separated fixation targets are presented alternately at increasing frequencies, saccades switch at some point from being reactive (with latencies of 150–220 ms) to being predictive, showing much reduced latencies, or even anticipation. Shelhamer and Joiner (2003) increased and decreased the alternation frequency in steps between 0.2 and 1 Hz and found that the switch occurred around 0.5 Hz (IOIs of 1 s), as evidenced by increased variability near that frequency, indicating a mixture of reaction (tracking) and prediction. The switching point also depended on the direction of frequency change.Footnote 21 The power spectrum of the asynchronies for reactive saccades was rather flat, indicating random variation, whereas that for predictive saccades decreased toward higher frequencies according to a power law, indicating fractal noise and positive long-term dependencies. The slope of the spectral plot seemed closer to that for tapping than for joystick oscillation (Torre & Delignières, 2008a), and indeed, predictive saccades are not smoothly oscillatory, as the gaze comes to dwell on the target after a very quick eye movement. Shelhamer (2005) determined that significant autocorrelations of predictive saccade asynchronies extend backward through a time window of about 2 s, regardless of the saccade frequency. It is not clear whether saccade timing was event-based or emergent because the intersaccade intervals were not analyzed in these studies.

Joiner, Lee, Lasker, and Shelhamer (2007) demonstrated that predictive saccades can also be entrained by an auditory metronome with IOIs of 500–1,000 ms while two fixation targets remain continuously visible. The distribution of asynchronies was indistinguishable from that obtained with visual pacing by alternating targets. Zorn, Joiner, Lasker, and Shelhamer (2007) showed that presentation of alternating visual targets while participants fixated a stationary central target accelerated the development of predictive saccades after the central target had been removed.Footnote 22 Wong and Shelhamer (2011) investigated how predictive saccades correct spatial errors, both continuously and in response to a perturbation (a shift in target location). Although spatial processes are beyond the scope of this review, the close analogy to phase correction in the temporal domain should be noted. Richardson and Balasubramaniam (2010) found that the way that saccades were entrained during a synchronization phase (by alternating targets, continuous pursuit, or discontinuous pursuit) affected the variability of their timing during a continuation phase, in which the gaze alternated between fixed spatial targets at the same tempo. Wing–Kristofferson decomposition of the timing variance into clock and motor components showed that this persistence was located in the clock component.

Circle drawing

Self-paced tapping and circle drawing are paragons of event-based and emergent timing, respectively (see Huys, Studenka, Rheaume, Zelaznik, & Jirsa, 2008; Zelaznik, Spencer, & Ivry, 2008; see also section 2.1.1). Studenka and Zelaznik (2011a) were the first to closely examine SMS of circle drawing with an auditory metronome (IOI = 500 ms). Participants moved an index finger along the perimeter of a circle template (7-cm diameter) and had to pass through the top of the circle (marked by a dot) in synchrony with the metronome. Even though the trials were short, substantial phase drift occurred, with participants typically drawing too fast. This suggested poor or even absent phase correction. Studenka and Zelaznik (2011b) introduced a phase shift into the metronome, which elicited only a very small PCR (see section 1.2). Even four cycles later, phase correction was still far from complete, whereas in a tapping condition the PCR was large and phase correction was complete within a few taps. However, phase correction in circle drawing improved if participants received tactile feedback whenever they passed through the top of the circle, whereas phase correction in tapping deteriorated when participants tapped in the air. The authors concluded that discrete sensory information about the target point aids SMS.

Repp and Steinman (2010) also investigated SMS of tapping and circle drawing with a metronome but used somewhat different methods (longer trials, slower tempi, a smaller circle template, and a different target point) and musicians as participants.Footnote 23 Although the SD asy of circle drawing was about twice as large as that of tapping, the participants were able to maintain approximate synchrony with the metronome in circle drawing, phase wrapping being rare. SD asy was smaller at the designated target point than at other cardinal points of the circle. In circle drawing, the autocorrelations of the asynchronies decreased from a large positive value at lag 1 to zero within lags of 5–8 cycles, whereas tapping showed a (much smaller) positive autocorrelation only at lag 1. In response to a phase shift in the metronome, a substantial mean PCR occurred in circle drawing, though it was smaller than in tapping. Repp and Steinman also investigated simultaneous tapping and circle drawing, carried out with different hands first in synchrony with a metronome and then in a self-paced manner. The two tasks had relatively little effect on each other, and they remained synchronous with each other during continuation. Moreover, performing the two tasks simultaneously did not affect each task’s mean PCR following a perturbation in the metronome. Phase correction became evident in the circle drawing trajectory 150–200 ms after the perturbation (see section 1.2 for an analogous result for tapping) and increased nonlinearly throughout the movement cycle. Repp and Steinman argued that event-based and emergent timing not only can occur simultaneously in synchronous tapping and circle drawing but might even occur simultaneously in a single task, such as tapping at a fast tempo or circle drawing with discrete feedback. In a commentary, Delignières and Torre (2011) disagreed strongly, arguing that the two control modes are by definition mutually exclusive: Participants might switch between the two modes within a trial but never could engage them simultaneously.Footnote 24

In another study (Repp, 2011a), musicians drew circles and ∞ shapes in synchrony with a metronome at four tempi (IOIs of 400–1,300 ms), choosing their own shape diameters and target points. Synchronization was maintained in the large majority of trials. In comparison to tapping (Repp et al., 2012), SD asy was much larger and increased more steeply with IOI duration, and autocorrelations of asynchronies were substantially larger and more persistent. Nevertheless, phase correction in circle drawing took only 2–4 cycles to be completed, which represents much better performance than was observed by Studenka and Zelaznik (2011b).

Lorås et al. (2012) had participants rotate a wooden disc with a handle so as to keep the 12 o’clock position in synchrony with auditory or visual pacing stimuli at IOIs ranging from 500 to 950 ms. These authors likewise did not note any synchronization difficulties, and even at the fastest tempo variability was not elevated. However, SD asy did not vary much with IOI, which indicates an increase in relative variability at faster tempi. In general, sensorimotor coupling with a metronome is weaker for continuous drawing or rotating movements than for tapping.

Walking

The interstride intervals of self-paced walking, like the ITIs of self-paced tapping, have been found to exhibit fractal noise, which disappears when walking is paced by a metronome (Hausdorff et al., 1996). However, Delignières and Torre (2009) showed that fractal noise is still present in the time series of asynchronies. The spectral power plots were quite similar to those found for joystick oscillation in synchrony with a metronome (see section 2.1.1). The results were consistent with a nonlinear dynamical model of gait (West & Scafetta, 2003).

The potential benefits of auditory pacing in the rehabilitation of patients with walking difficulties have long been recognized (see Thaut, 2005). For example, Hausdorff et al. (2007) showed that auditory pacing reduces gait variability in patients with Parkinson’s disease. These patients’ interstride intervals are generally more variable and contain less fractal noise than do those of normal controls (Hausdorff, 2009). Hove, Suzuki, Uchitomi, Orimo, and Miyake (2012) asked Parkinson patients to walk without a pacing signal, with a metronome, and with an interactive “Walk-Mate” (Miyake, 2009), similar to an adaptively timed metronome (see section 1.2). The Walk-Mate, but not the isochronous metronome, raised the patients’ fractal noise to almost normal levels. Notably, participants in this study were not explicitly instructed to synchronize their steps with the pacing sounds, and often did not synchronize with the metronome, although they all synchronized with the Walk-Mate because that device also synchronized with them, having been programmed to carry out phase correction.

When stroke patients walked on a treadmill while being paced by an auditory metronome at three tempi, they adjusted to each tempo by increasing their stride frequency and reducing the step time of the paretic limb relative to an unpaced condition, though they were not always synchronizing with the metronome (Roerdink, Lamoth, Kwakkel, van Wieringen, & Beek, 2007). Acoustic pacing thus reduced the spatial and temporal asymmetry between the paretic and nonparetic limbs in walking, whereas varying the treadmill belt speed alone did not. Moreover, patients were more successful in changing their stride frequency with acoustic pacing, whereas they changed mainly their stride length when treadmill speed was varied. Roerdink et al. (2009) subsequently showed that pacing of every step was more effective than pacing of every other step.

H. Y. Chen, Wing, and Pratt (2006) compared stepping in place with heel tapping in normal participants paced by an auditory metronome that contained occasional phase shifts. The PCR was significantly smaller in stepping than in heel tapping. The authors attributed this to the additional balance requirements in stepping, although it could also have reflected the more oscillatory nature of stepping versus the more discrete heel tapping. It should be noted that the PCR to the perturbation occurred on the next step—that is, on the step of the foot that was lifted as the perturbation occurred. No evidence emerged that phase correction was foot-specific. Unlike tapping (see section 1.1.2), phase correction was more rapid after negative than after positive phase shifts.

Using a treadmill, Roerdink et al. (2009) compared stroke patients’ and normal controls’ phase corrections in response to large (±33 %) negative and positive phase shifts in a metronome. Patients often lost synchrony following such a perturbation. When they did make an adequate response, however, they were similar to controls in that they usually responded by making a faster step following a phase advance and a slower step following a phase delay. Pelton, Johannsen, Chen, and Wing (2010) similarly examined stroke patients walking on a treadmill at their most comfortable speed while being paced by a metronome containing smaller (±20 %) phase shifts. In those trials on which adequate phase correction was observed, it took the patients about seven steps to get back into phase. The PCR was significantly smaller with the paretic than with the nonparetic leg, even though these patients showed minimal leg asymmetries when walking on a treadmill.

Arias and Cudeiro (2008) found a simple visual metronome (a flashing light) to be ineffective in improving the walking of Parkinson’s patients, whereas an auditory metronome proved effective when it matched the preferred walking tempo. In another relevant study by Nieuwboer et al. (2007), Parkinson’s patients were exposed to auditory, visual (flashing), and tactile pacing stimuli and then chose a modality for training. Nobody chose the visual stimuli. The auditory (chosen by two thirds of the patients) and tactile training stimuli proved beneficial. However, a much more effective visual pacing stimulus has recently been devised by Bank, Roerdink, and Peper (2011). It consists of alternating light patches that are projected onto a treadmill in front of the walkers (“stepping stones”). Healthy elderly adults walked in synchrony with either an auditory metronome or visual stepping stones, each containing phase shifts, sometimes together with a switch from the metronome to stepping stones, or vice versa. Participants adjusted to perturbations more rapidly (1) with stepping stones than with the metronome, (2) when the metronome switched to stepping stones rather than vice versa, and (3) with phase delays than with phase advances. In this study, the visuospatial stimuli actually proved to be more effective pacing stimuli than an auditory rhythm (see also section 1.4.2), perhaps because they afforded a goal-directed action.

Several studies by Getchell and colleagues mainly concerned whether auditory pacing improves the intrapersonal temporal coordination of walking and clapping in children, though the studies also contain observations about SMS. The variability of SMS decreased as age increased (Getchell, 2007), and children’s adaptation to a tempo change in the metronome also improved with age (Clizbe & Getchell, 2010). Children with DCD exhibited greater variability than did a control group, and they often failed to synchronize the two motor activities as the pacing tempo increased (Whitall et al., 2006). Children with dyslexia also performed more poorly than did normal children in this task (Getchell, Mackenzie, & Marmon, 2010)

Walking can also be paced by music. In a study by Styns, van Noorden, Moelants, and Leman (2007), participants with various degrees of music training tried to synchronize their steps to music or to a metronome while walking on an athletic track. The participants most often walked in synchrony with (what the researchers considered to be) the beat of the music, though sometimes they chose half or twice the speed. Some participants who failed to synchronize nevertheless tended to walk faster when the tempo of the music or the metronome was fast. Failures to synchronize were least frequent when the beat frequency was near 2 Hz. This frequency also marked the point beyond which walking speed did not increase with walking tempo: Instead, participants made smaller steps as they stepped faster. The authors hypothesized a “resonance curve” for locomotion that peaks near 2 Hz (MacDougall & Moore, 2005; van Noorden & Moelants, 1999). More recently, van Noorden and Franěk (2012) found little spontaneous entrainment of long-distance walking to music, even when the musical beat was close to participants’ preferred stride frequency. However, faster music nevertheless accelerated walking. Sejdić, Jeffery, Kroonenberg, and Chau (2012) reported that varied music increased the nonstationarity of overground walking, most likely due to intermittent synchronization with the musical beat.

Dancing

Humans

While studies in the literature (e.g., Provasi & Bobin-Bègue, 2003) suggest that young children are not able to synchronize movement with a metronome or musical beat until they are at least 4 years old (see also section 1.1.1), much younger children already show a tendency to move rhythmically (though asynchronously) when they are exposed to music. Zentner and Eerola (2010) found that infants 5–24 months of age exhibited more spontaneous movement when listening to music or to a simple rhythm derived from the music than when hearing recorded speech. The infants also tended to move faster when the tempo of the auditory rhythm was faster, and they smiled more when they moved more and when their movement tempo was closer to the stimulus tempo. Eerola, Luck, and Toiviainen (2006) asked 2- to 4-year-old children to move along with familiar music presented both at the original and at modified tempi and recorded their head movements. Eerola et al. distinguished three groups: hoppers, circlers, and swayers. Autocorrelation analyses revealed evidence of periodic movement but little adaptation to tempo changes in the music. There was no clear evidence of SMS.

To determine how adults prefer to move in synchrony with music, Toiviainen, Luck, and Thompson (2010) instructed participants to move freely to dance music played at different tempi. Several movement dimensions could be identified and were related to different metrical levels of the music: Arm movements tended to be synchronized with faster levels, and body sway and rotation with slower ones. Thus, dancers were able to embody several metrical levels simultaneously. Burger, Thompson, Saarikallio, Luck, and Toiviainen (2010) analyzed the audio signals and found that music with a clear beat (low “fluctuation entropy”) increased dancers’ local movement, whereas music with a strong rhythm (high “low-frequency variation”) made dancers move slower and on the spot. Burger, Thompson, Luck, Saarikallio, and Toiviainen (2011, 2012) further investigated how these acoustic variables affect SMS with different metrical levels. Using a more diverse selection of popular musical materials to dance with, Luck, Saarikallio, Burger, Thompson, and Toiviainen (2010) identified five principal movement components (local movement, global movement, hand flux, head speed, and hand distance) and showed that they varied as a function of musical style and of participants’ personality characteristics. For example, extraversion was positively related to all movement dimensions, while neuroticism was negatively related to four dimensions and positively related only to local movement. Saarikallio, Luck, Burger, Thompson, and Toiviainen (2010) showed that the trait “emotional expressivity” and positive mood were related to the amount and range of dancers’ head and hand movements.

Leman (2007) published an important monograph in which he laid out a detailed theory of embodied music cognition. As stated by Leman and Naveda (2010), “the human body plays an important role as a mediator that couples subjective experiences with the physical environment” (p. 71). Inspired by the theories of Becking (1928/2011), Leman and Naveda proposed that “spatiotemporal reference frames” or “basic gestures” underlie action–perception coupling in dance. These gestures are movement periodicities that correspond to different metrical levels in the music. Naveda and Leman (2009) described in detail how such periodicities can be recovered from motion-capture data by means of so-called periodicity transforms. Naveda and Leman (2010) developed these methods further into a system of topological gesture analysis, illustrated graphically with analyses of the movements of expert and student dancers. The emphasis of this work so far has been on the development of theoretical concepts and analytic methods, but it clearly has great potential for application in more extensive empirical studies of dance.

Van Dyck et al. (2013) recently found that when the bass drum in dance music was made louder, dancers increased their motor activity and entrained better to the musical beat. The loud bass drum in much disco music thus seems to have a functional role, possibly mediated by stimulation of the vestibular system (Todd, Rosengren, & Colebatch, 2008).

Honisch, Roach, and Wing (2009) described an interesting paradigm in which professional dancers synchronized cyclic dance movements with familiar or unfamiliar dance movements portrayed by a stick figure on a screen. Changes in tempo or amplitude were introduced into the display. This research stands at the threshold of studies of interpersonal synchronization (see section 3.1).

Anecdotes of individuals unable to dance to music are abundant, but there are few well-documented cases. Phillips-Silver et al. (2011) identified one young man who proved unable to bounce in synchrony with the beat of dance music. His impairment extended to tapping in synchrony with the beat and to perceptually judging the synchrony of another bouncing individual with the musical beat. Interestingly, however, he was not significantly impaired in bouncing or tapping in synchrony with a metronome or with another individual. Therefore, the authors attributed his impairment to a deficit in perceptual beat extraction or “beat deafness.” Iversen and Patel (2008) developed a test for diagnosing beat deafness, the BAT (Beat Alignment Test). It requires SMS with a metronome and with the beat of musical excerpts, as well as judging whether or not a series of tones superimposed on music coincides with its beat. This test may facilitate the discovery of additional beat-deaf individuals in the future.

Nonhuman animals

At the time that R05 appeared, SMS with external rhythms, including music, seemed to be a uniquely human ability, at least among vertebrates.Footnote 25 This impression was dispelled by the advent of Snowball, the dancing cockatoo, on YouTube. In the posted videos, this bird appeared to synchronize head bobs and, to a lesser extent, leg movements with the beat of a favorite pop song. Patel, Iversen, Bregman, and Schulz (2009b) analyzed video recordings of Snowball’s movements and found that the head bobs, while often not in synchrony with the musical beat, were in synchrony significantly more often than would be expected by chance. Snowball also clearly showed some ability to adjust to changes in tempo of the music. Subsequent analyses revealed that synchronization occurred most frequently near Snowball’s preferred movement frequency, even though that frequency was faster than the original tempo of the familiar song that he liked to dance to (Patel, Iversen, Bregman, & Schulz, 2009a). When dancing to novel songs, Snowball showed little evidence of synchronization but displayed an interesting variety of rhythmic gestures (Jao, Iversen, Patel, Bregman, & Schulz, 2010). Schachner, Brady, Pepperberg, and Hauser (2009) surveyed a large number of YouTube videos of dancing animals and found evidence of significant entrainment to a musical beat in parrots of 14 different species and in one elephant.

Because parrots are vocal mimics, these findings lent impressive support to Patel’s (2008) vocal learning and rhythmic synchronization (VLRS) hypothesis, according to which only animals that are vocal learners might be capable of SMS. In further support of this hypothesis, Hasegawa, Okanoya, Hasegawa, and Seki (2011) successfully trained eight budgerigars (parakeets) to make six successive pecks in synchrony with audiovisual metronomes at a wide range of tempi. However, the birds did not show an NMA and tended to lag behind the stimulus onsets, although at the slower tempi their mean lag was significantly shorter than their reaction time to stimuli occurring at random intervals. This may reflect some degree of anticipation.

Schachner (2010) reported that parrots’ SMS ability has so far only been observed in domestic birds, perhaps because they require human or animal models to learn the skill. Patel, Iversen, Bregman, and Schulz (2009c), in the course of refining the VLRS hypothesis, also mentioned that parrots are able to imitate nonvocal (such as dance) movements, and that, unlike some other vocal learners, they live in large social groups and retain a life-long ability to acquire new vocal patterns. Patel, Iversen, and Schulz (2010) further reported that social factors, such as the presence of a human giving encouragement or dancing along, play a significant role in Snowball’s dancing. Evidence for SMS ability in vocal learners other than parrots is still extremely limited. The necessary and sufficient conditions for nonhuman animals’ ability to synchronize with music remain to be studied more thoroughly, and Patel et al. (2009c) have provided an excellent roadmap for this enterprise.

In contrast to parrots, common pets such as dogs or cats (not vocal learners) are unlikely ever to synchronize movement with a metronome or a musical beat (Schachner, 2010). Our closest nonhuman relatives also seem to have great difficulty synchronizing with a metronome. Zarco, Merchant, Prado, and Mendez (2009) laboriously trained three macaques in a synchronization–continuation button-pushing task paced by auditory or visual metronomes. Although the monkeys were able to match the stimulus tempo during the synchronization phase, they never learned to anticipate the stimuli, but kept reacting to them. However, reaction times were shorter with regular than with random interstimulus intervals (cf. Hasegawa et al., 2011), which may indicate some degree of anticipation. Evidence of primates’ ability to synchronize with external rhythms after extensive training or enculturation may yet emerge but is not predicted by the VLRS hypothesis.Footnote 26

A first challenge to the VLRS hypothesis has appeared recently, however. Cook, Rouse, Wilson, and Reichmuth (2013) successfully trained a female sea lion to synchronize head bobs with an auditory metronome. Reportedly, sea lions are not vocal learners and produce only a few stereotypic vocal sounds. After relatively little training, the animal was able to closely match her bobbing frequency not only to metronomes with different tempi, but also to the beats of two different songs. The relative phase of the head bobs was strongly dependent on tempo, suggesting a preferred bobbing frequency. Nevertheless, this seems to be a first example of a non-vocal-learner being able to synchronize.

Moving continuously with an external rhythm: what have we learned since 2005?

Studies of movement trajectories in the context of SMS are a relatively new development, but they have already yielded several important results. Statistical and behavioral indices of different timing modes for discrete and continuous movements have been identified. Trajectory analyses have revealed how phase correction is implemented in movement kinematics. Sensorimotor coupling is generally weaker with continuous than with discrete movements. Eye movements are similar to other movements, in that they can be entrained to an external rhythm. New and effective auditory and visual pacing stimuli have been devised for gait rehabilitation. Infants and toddlers have been shown to move readily when exposed to auditory rhythms, though not yet synchronously. Adult dancing movements have begun to be studied in detail, and a theory of embodied music cognition has been outlined. A first case of human inability to synchronize with a musical beat has been identified. Conversely, for the first time some nonhuman animals (primarily parrots) have been shown to possess some rudimentary SMS ability. Many exciting opportunities have been revealed for further research on these and related topics.

Interpersonal entrainment

In Parts 1 and 2 of this article, we reviewed laboratory studies in which participants synchronized movements with a machine-controlled stimulus sequence. In real life, however, synchronization with an external rhythm, as in dancing or music performance, usually takes place in a social context where several persons are moving simultaneously. Consequently, mutual entrainment among participants may occur and may not only facilitate entrainment to the external rhythm, but also make the task more enjoyable. Moreover, the external rhythm itself may originate from humans, such as musicians or dancers. Methodological advances have led to a rapid increase of research in interpersonal entrainment, which is reviewed in this section. For theoretical discussions of the concept of entrainment in musical, social, and evolutionary contexts, see Clayton, Sager, and Will (2005), Keller (2008), Merker, Madison, and Eckerdal (2009), Phillips-Silver, Aktipis, and Bryant (2010), Gill (2012), and Phillips-Silver and Keller (2012).

Tapping

Konvalinka, Vuust, Roepstorff, and Frith (2010) were the first researchers in many years to publish a study of individuals tapping in synchrony with each other without visual contact.Footnote 27 Paired participants sat in separate rooms and started synchronizing with the same metronome. After a certain number of taps the metronome stopped, and participants continued tapping in three conditions: with auditory feedback from their own taps only (no coupling), with auditory feedback from one participant’s taps (unidirectional coupling), or with auditory feedback from each other’s taps (bidirectional coupling). In accord with the well-documented automaticity of phase correction (see R05 and section 1.2), the results showed that bidirectionally coupled participants mutually adjusted their ITIs, which was reflected in a negative lag-0 and a positive lag-1 cross-correlation of ITIs, without any evidence of a leader–follower relationship. In the unidirectional-coupling condition, the follower (who heard the leader’s taps) tended to track the leader’s ITIs at a lag of 1. The authors concluded that two coupled tappers form an interactive unit of two “hyper-followers.” Related research has been reported by Himberg (2006, 2008, 2011) and Nowicki (2009; Nowicki, Prinz, Grosjean, Repp, & Keller, 2013), with similar conclusions. Nowicki also found that visual contact had no effect.

Kleinspehn (2008; Kleinspehn-Ammerlahn et al., 2011) showed that individual synchronization skill, as assessed by tapping with a metronome, predicted dyadic synchronization accuracy in various mixed-age groups (5–78 years). In addition, young children performed better when paired with an older partner. Participants in dyads with higher synchronization accuracy rated the situation and their partner more positively. Pecenka and Keller (2011b) partially predicted the SD asy of interpersonal SMS in a joint tapping task from a “prediction index” of each individual, derived from a separate task requiring tapping in synchrony with sequences that gradually changed in tempo. (See also Keller, Pecenka, Fairhurst, & Repp, 2012.)

The presence of another person seems to facilitate young children’s ability or intention to synchronize. Using a significant Rayleigh test (Fisher, 1993) as their criterion, Kirschner and Tomasello (2009) showed that children 2.5–4.5 years of age were more likely to spontaneously synchronize with a drumbeat produced by a real person than with one produced by a computer-controlled stick (each in view) or with a recording. The authors emphasize the importance of shared intentionality (Tomasello & Carpenter, 2007) and joint attention (Sebanz, Bekkering, & Knoblich, 2006) in children’s SMS. Katahira (2010) found that adults playing on a drum with a stick were more accurate in synchronizing with a virtual partner (a point-light figure on a screen) when their movements were similar to those of the model. Hove and Risen (2009) demonstrated that interpersonal entrainment can have positive social consequences: Participants liked the experimenter more when they had tapped in synchrony with him. Synchrony in this joint tapping task was manipulated by presenting the same or different visual pacing stimuli to the two individuals. Valdesolo and DeSteno (2011) used a similar manipulation of joint tapping and found that successful synchronization experience increased both the compassion for and the tendency to exhibit helpfulness to the “victim” in a subsequent social game situation.

Continuous movements

Intentional entrainment

Kelso et al. (2009) presented a new paradigm for systematically exploring the parameters of interpersonal and human–machine rhythmic coordination (see also section 1.2.1). Participants were required to carry out periodic finger movements in phase with an animated finger visible on a screen, which in turn reacted to the participant’s movements according to a coupled-oscillators model (Haken, Kelso, & Bunz, 1985), thus simulating a virtual partner. However, the model was parameterized to be most stable in antiphase with the participant, which created a “conflict of intentions.” Consequently, periods of unstable behavior such as phase wrapping or abrupt phase switches were observed in participants as the movement frequency was increased. Of course, other parameterizations are possible to create more cooperative scenarios.

The interpersonal coordination of leg oscillation or pendulum swinging has been investigated extensively from a dynamic-systems perspective (Schmidt, Bienvenu, Fitzpatrick, & Amazeen, 1998; Schmidt, Carello, & Turvey, 1990; Schmidt, Christianson, Carello, & Baron, 1994; Schmidt & Turvey, 1994; for theory and reviews, see De Rugy, Salesse, Oullier, & Temprado, 2006; Riley, Richardson, Ramenzoni, & Shockley, 2011; Schmidt, Fitzpatrick, Caron, & Mergeche, 2011; Schmidt & Richardson, 2008). Basically, this research showed that interpersonal coordination follows the same dynamic principles as intrapersonal (e.g., bimanual) coordination, although purely informational (i.e., visual) coupling is generally weaker (Black, Riley, & McCord, 2007; De Rugy et al., 2006; Richardson, Lopresti-Goodman, Mancini, Kay, & Schmidt, 2008). In-phase and antiphase coordination were found to be the only stable modes, with in-phase being more stable. The relative phase depended on the natural frequencies of the two systems, which were typically manipulated by varying pendulum length or weight: The participant with the slower preferred tempo lagged behind the one with the faster preferred tempo.

Nessler and Gilliland (2010) found that intentional interpersonal entrainment of treadmill walking resulted in smaller and faster steps than did spontaneous or no synchronization. Nessler, Gonzales, Rhoden, Steinbrick, and De Leone (2011) further reported that intentional synchronization in overground walking increased variability and reduced fractal noise in interstride intervals. Notably, synchronization with an auditory metronome had similar effects (see also section 2.4).

Deliberate interpersonal entrainment has also been observed in dancing. De Bruyn, Leman, and Moelants (2008) found that participants moving to the beat of familiar music in groups of four moved more intensely and synchronized better with the music when they could see each other. Mutual entrainment was indicated by increased within-group correlations of the acceleration time series (see also Desmet, Leman, Lesaffre, & De Bruyn, 2010). In a similar study with children (De Bruyn, Leman, Moelants, Demey, & Desmet, 2009), participants also moved more when they could see each other, but their synchronization with the music was not improved.

The ability of individuals to read or produce memorized text in synchrony has been studied by Cummins (2002, 2003, 2009, 2011). He found that this task is performed remarkably well without special practice. Participants seem to adjust their speaking patterns so as to make them maximally predictable, similar to musical ensemble players choosing a “standard” interpretation when reading through a piece of music together. Visual contact between talkers improved synchrony but was not crucial. Synchronization with recorded speech was more accurate when the recording came from a synchronized speech trial than when it represented a solo reading. By manipulating the recorded signal, Cummins (2009) showed that the amplitude envelope, pitch contour, and spectral qualities of speech all influence synchronization accuracy.

As with tapping, interpersonal entrainment of continuous movements can have positive side effects. Macrae, Duffy, Miles, and Lawrence (2008) found that when participants synchronized up–down hand movements with a metronome while the experimenter uttered a list of words and carried out the same movements in phase or in antiphase, or did not move, those in the in-phase condition recalled more words in a surprise recall test afterward. Miles, Nind, Henderson, and Macrae (2010) had participants and the experimenter repeat words that they heard over earphones while they coordinated arm movements in phase or in antiphase. In a surprise recall test, participants showed the expected advantage for self-produced words following antiphase coordination, but not following in-phase coordination, perhaps due to a shift of attention to the partner’s utterances. No main effect of coordination mode on recall was apparent in that study. In another study by Valdesolo, Ouyang, and DeSteno (2010), participants rocking in chairs in synchrony (side by side), as compared to others rocking independently (back to back), performed better in a subsequent perceptual speed judgment task, as well as in a joint task requiring motor skill. The authors concluded that synchronization enhances basic perceptual and motor abilities.

Unintentional or spontaneous entrainmentFootnote 28

In a now classic study of unintentional entrainment, Schmidt and O’Brien (1997) asked paired participants to swing pendulums while sitting side by side (facing ahead) or while facing each other. Even though the instructions discouraged coordination, in the second condition phase relationships near in-phase and antiphase were more frequent than were other phase relationships, indicating intermittent or “relative” coordination. Richardson, Marsh, and Schmidt (2005) replicated these results in a situation in which participants had to interact verbally and/or visually while swinging pendulums as a purported motor distraction task. Visual contact led to relative coordination, whereas verbal interaction did not have any effect. (Shockley, Santana, & Fowler, 2003, however, reported that conversation leads to subtle entrainment of body sway between standing individuals.) Richardson, Marsh, Isenhower, Goodman, and Schmidt (2007) further observed unintentional entrainment in individuals rocking in chairs side by side. When instructed not to synchronize and to keep rocking at their most comfortable frequency, they still showed a preponderance of in-phase relationships, at least when they looked at each other. When they had to rely on peripheral vision, the tendency to entrain unintentionally was very weak, whereas intentional entrainment in another condition was quite successful.

Using a similar paradigm, Coey, Varlet, Schmidt, and Richardson (2011) investigated unintentional interpersonal entrainment when each participant swung two pendulums in phase or in antiphase. Regardless of the participants’ intrapersonal coordination mode, intermittent in-phase coordination occurred when the participants’ coordination modes were congruent with each other, and to a lesser extent when they were incongruent. Intrapersonal coordination was unaffected by interpersonal entrainment. For evidence of interpersonal coordination of free, only occasionally rhythmic arm movements under explicit instructions not to coordinate, see Issartel, Marin, and Cadopi (2007).

Spontaneous entrainment may occur when the instructions neither encourage nor prohibit it. In the rocking-chair paradigm, spontaneous coordination is usually found when there is visual contact. Demos, Chaffin, Begosh, Daniels, and Marsh (2012) found that hearing each other’s rocking sounds increased coordination, even in the absence of visual contact. In that study, listening to music while rocking actually reduced the coordination between participants when they could see each other; music appeared to compete with the other participants’ rocking, as there was evidence of synchronization with the music as well. Participants who synchronized more strongly with the music reported feeling more connected with their partners, even though their mutual coordination was not necessarily greater. (For a further discussion of social factors in interpersonal coordination, see Marsh, Richardson, & Schmidt, 2009; Schmidt et al., 2011.)

Spontaneous entrainment via visual contact was also observed by Oullier, de Guzman, Jantzen, Lagarde, and Kelso (2008), with participants who carried out periodic finger movements at their preferred frequency while observing each other’s movements, even though in this case entrainment required most of the participants to deviate from their preferred frequency. When participants were subsequently instructed to close their eyes, they tended to stay close to their adapted frequency rather than returning to their preferred frequency. The authors attributed this persistence to social factors, though simple continuation of the adapted tempo seems perhaps the most obvious explanation. Besides the effect on movement tempo, visual information also facilitated spontaneous entrainment of interpersonal arm movements in the uninstructed direction, when participants deliberately moved in phase with each other but in orthogonal directions (one horizontally, the other vertically; Richardson, Campbell, & Smith, 2009).

A dynamic-systems model and several methods for analyzing the interpersonal synchronization of continuous movements have recently been set forth in great detail by Mörtl et al. (2012). These authors also presented empirical data from spontaneous synchronization of goal-directed arm movements, though the focus of their article was theoretical and methodological.

Several studies have examined unintentional or spontaneous interpersonal entrainment when pairs of participants walked on a treadmill or on the ground. Thus, van Ulzen, Lamoth, Daffertshofer, Semin, and Beek (2008) had participants walk side by side on a treadmill, with instructions to synchronize in-phase or antiphase, or not to synchronize. Although evidence for unintentional entrainment was found in the last condition, the observed phase relationships did not conform to the predictions of a simple coupled-oscillators model. Van Ulzen, Lamoth, Daffertshofer, Semin, and Beek (2010) paced paired participants with metronomes having different phase relationships but found no significant effect on the variability of interpersonal relative phase, suggesting that the visual coupling between treadmill walkers was weak. However, a tendency to veer toward in-phase walking was observed, and when the pacing signals were removed, there was spontaneous drift toward either in-phase or antiphase walking.

Several studies have investigated the effect of sensory information on spontaneous interpersonal entrainment of walking. When pairs of women were asked to walk together down a hallway (Zivotofsky & Hausdorff, 2007), tactile information (holding hands) led to synchronization of steps in about 50 % of the cases, whereas manipulations of visual and auditory information had no significant effect. In a similar, more recent study (Zivotofsky, Gruendlinger, & Hausdorff, 2012), half of the pairs tested never synchronized, whereas the other half did so in most conditions. Tactile and auditory information seemed to encourage entrainment, but peripheral vision did not. By contrast, in an analogous study of side-by-side treadmill walking, in which the tactile information was conveyed via a soft spring connecting the bodies, Nessler and Gilliland (2009) found evidence of spontaneous entrainment in all sensory coupling conditions. Individuals with similar preferred stride frequencies (related to leg length) were more likely to entrain to each other. Nessler, Kephart, Cowell, and De Leone (2011) confirmed this by varying treadmill speed and inclination to either increase or decrease the difference in walking speed and stride length between the two walkers. Overall, it seems that spontaneous entrainment of paired walking depends on the type of sensory information available, with tactile information being more effective, as well as on the condition of walking (over ground or on a treadmill).

Harrison and Richardson (2009) paired participants and requested one to walk or jog closely behind the other while being (1) visually coupled, (2) blindfolded and mechanically coupled via a strapped-on foam cushion, or (3) both visually and mechanically coupled. Spontaneous phase locking was twice as frequent in walking as in jogging and increased across the three coupling conditions. Condition 1 favored in-phase coordination, whereas both in-phase and antiphase coordination were observed in Condition 2. Antiphase coordination was most favored in Condition 3, perhaps because the foam cushion restricted vision to the shoulders of the leading participant, which moved in antiphase with his legs. The authors argued that coupled walkers tend to form a single coordinative structure, similar to a quadruped.

Spontaneous entrainment can also occur in body sway. When pairs of participants were asked to sway rhythmically side to side with closed eyes while maintaining light fingertip contact, thus minimizing mechanical coupling, frequent spontaneous in-phase coordination was observed (Sofianidis, Hatzitaki, Grouios, Johannsen, & Wing, 2012). When the swaying was paced by a metronome, however, only experienced (Greek) dancers were able to improve their coordination through tactile contact, as compared to a no-contact condition.

Nessler, Kephart, et al. (2011) noted that, even under optimal conditions, some pairs of individuals are much less likely than others to spontaneously entrain to each other, which suggests that social or personality factors also play a role. Indeed, Lumsden, Miles, Richardson, Smith, and Macrae (2012) found that individuals classified as pro-social on the basis of questionnaire results were more likely than “pro-self” individuals to spontaneously coordinate rhythmic arm movements with a video of another person carrying out the same movements. Instructions that primed pro-social or pro-self attitudes had a similar effect. Conversely, negative social experiences may reduce the likelihood of spontaneous synchronization. Miles, Griffiths, Richardson, and Macrae (2010) investigated female participants’ spontaneous entrainment to the steps of a confederate of the experimenter who had either shown up for the experiment on time or had been 15 min late. In-phase entrainment was significantly more likely in the on-time condition. Miles, Lumsden, Richardson, and Macrae (2011) further reported that participants, after assigning themselves to groups with regard to aesthetic preferences, were more likely to spontaneously synchronize their movements with a partner from a different group. The authors suggest that participants aimed to reduce the group distance in anticipation of benefits that their interpersonal synchronization might have for later social interaction.

Music performance

Clayton, Sager, and Will (2005) published a substantial theoretical and methodological article in which they discussed the importance of the concept of entrainment for ethnomusicology and illustrated methods for investigating it. The article includes examples from African and Australian aboriginal music and is followed by a number of commentaries by psychologists and musicologists. Clayton (2007) made video recordings of Indian musicians—a singer who also played the tanpura, a harmonium player, and two additional tanpura players—performing together and looked for evidence of unintentional mutual entrainment. Although tanpura players do not intend to coordinate their plucking cycles with each other or with other musicians, Clayton found evidence that they entrained to the singer’s movements, albeit not in a simple 1:1 fashion. Another form of unintentional entrainment, namely between separate groups of performers, was observed in the Afro-Brazilian Congado ritual (Lucas, Clayton, & Leante, 2011). The authors analyzed video recordings of four pairs of groups encountering each other on the street during the ritual. Two pairs of groups from the same community showed clear evidence of in-phase entrainment, even though there was supposedly no explicit intention to coordinate their playing. Groups from different communities are said to actively avoid entrainment, but one of the two pairs nevertheless showed intermittent entrainment, albeit at varying phase relationships. Maduell and Wing (2007) discussed in detail the coordination problems that arise in Flamenco ensembles, which typically consist of a guitarist, a singer, and a dancer, and involve hand clapping and foot stomping as well. The authors described a connected network model that attempts to capture the interactions among the musicians, and also offered some preliminary quantitative data on coordination in such ensembles.

Moore and Chen (2010) investigated the coordination between two string quartet players as they executed a rapid passage of uniform note values in synchrony. The fact that the players were able to maintain synchrony is proof that there was some perceptual coupling between them, and this was confirmed by time series analyses of their bow strokes. Interestingly, each player nevertheless followed a different repeating pattern of microtiming, reflecting different groupings of notes. In a similar but more ambitious study, Wing Endo, Bradbury, and Vorberg (2011) applied a linear phase correction model (Vorberg & Schulze, 2002) to recorded bow strokes from all four players of a string quartet playing a rapid unison passage. The model revealed that all four players engaged in phase correction, and thus were coupled to each other, but the first violinist functioned as the leader, and thus corrected less strongly than the other players.

Goebl and Palmer (2009) studied synchronization in piano duet playing, in which one pianist was considered the leader (playing the melody) and the other, the follower (playing the accompaniment). Analysis of timing patterns showed that, regardless of assigned roles, the pianists adjusted to each other when they received complete auditory feedback (see also section 3.1). Their assigned roles were reflected in head and finger movements, however, and head movements were more strongly synchronized when auditory feedback was reduced, indicating that visual information played an increased role in that case. Keller and Appel (2010) later showed that in a piano duet, the pianists’ anticipatory auditory imagery abilities (measured with a method devised by Keller, Dalla Bella, & Koch, 2010) were predictive of the accuracy of their mutual synchronization, in terms of both keystrokes and body sway. For additional discussion and a theoretical framework for coordination in ensemble playing, see Keller (2008). Keller, Knoblich, and Repp (2007) demonstrated that pianists, when playing a duet with a recording, synchronize more accurately with a recording of their own playing than with a recording of another pianist’s playing. The authors attributed this advantage to internal action simulation, which may enable pianists to better anticipate the expressive timing patterns of their own playing. Finally, Uhlig, Schroeder, and Keller (2012) reported that knowledge of a duet partner’s musical part increases coordination of body sway. However, it did not improve the synchronization of keystrokes during playing, presumably because predictions of an unfamiliar partner’s expressive timing are based on one’s own performing style.

Timing delays, such as occur over the Internet, can wreak havoc with ensemble performance. Bartlette, Headlam, Bocko, and Velikic (2006) placed musicians in separate rooms and had them play a duet while hearing their partner over earphones with delays ranging from 0 to 200 ms. Delays greater than 100 ms resulted in significant increases of asynchrony and variability, and the musicians rated their performances as being less musical. In related research by Chafe, Cáceres, and Gurevich (2010), participants clapped an interlocking rhythm pattern while hearing each other with delays ranging from 3 to 78 ms. The authors found the best coordination at delays of 8–25 ms, which musicians naturally experience in ensemble situations, due to sound transmission over short distances. A tendency to accelerate was observed at very short delays (<8 ms). Longer delays resulted in a progressive slowing of tempo and in one participant lagging behind the other, and synchronization tended to break down if the delay exceeded 55 ms. Farner, Solvang, Sæbø, and Svensson (2009) obtained similar results using the same clapping task with delays of 6–68 ms. They also included a real reverberant condition, in which participants were separated by various distances in a large hall, as well as a virtual reverberant condition, in which artificial reverberation was played over headphones.

Coordination in musical contexts, like joint tapping (see section 3.1), has been found to engender pro-social behavior. Cirelli, Einarson, and Trainor (2012) reported results suggesting that passive bouncing in synchrony with music and with a bouncing experimenter increased 14-month-old infants’ helpfulness in a subsequent test of cooperation. Kirschner and Tomasello (2010) engaged pairs of 4- to 5-year-old children in a music game with the experimenter, while other children carried out comparable joint activities without music. The former were subsequently more likely to engage in helpful behavior and in cooperative problem solving. In a related study with adults, Wiltermuth and Heath (2009) had small groups of participants engage in synchronous or asynchronous walking, singing, and hand movements. In a subsequent test, the participants who had synchronized with each other proved to be more cooperative than the others. These results support the hypothesis that synchrony increases group cohesion (McNeill, 1995). However, there may also be a negative side to it. Wiltermuth (2010) found that individuals who had engaged in a joint activity, such as singing or walking synchronously with an experimenter, were more willing to follow an experimenter’s instructions to lie or to destroy insects (though they were not actually killed). Synchronization experience thus facilitated “destructive obedience.”

Interpersonal entrainment: what have we learned since 2005?

Studies of interpersonal entrainment were not covered in R05, and up to that time concerned mainly the intentional or unintentional coordination of oscillatory movements. In recent years, research on interpersonal SMS has been extended to many other rhythmic activities, including tapping, walking, dancing, speaking, and music performance. Two general principles have emerged from this research: People have a tendency to entrain to each other’s movements on the basis of perceptual information that they receive, and the likelihood of entrainment depends on the strength of this perceptual coupling, as well as on individual dynamic (preferred tempo) and social factors. Moreover, interpersonal entrainment has been shown to affect subsequent social attitudes, usually in a positive way, which supports the idea that synchronous activities increase group cohesion.

Neuroscience of SMS

Synchronizing one’s movement with a sensory rhythm, such as tapping to the beat of music, appears to be a simple task that demands little cognitive effort. However, it engages a complex brain machinery of distributed functions and neural circuitries whose functions range from basic timing processes to sensorimotor coupling. In this part of our review, we first give an overview of recent neuroscience research on both time and rhythm perception, which are covert processes related to SMS. Then we discuss new research in the proper SMS domain, involving overt motor activity. With few exceptions that are specifically mentioned, the studies reviewed here concern the human brain.

Neural correlates of covert synchronization

Timing mechanisms

The ability to time the sensory input, such as by processing the temporal intervals between successive auditory or visual events, is one basic requirement for synchronizing a movement to it. It has been proposed that two functionally distinct systems may be involved in time perception, implicating separate cortical–subcortical networks. “Automatic timing” of intervals in the subsecond range is believed to be subserved by the motor system including the cerebellum and primary and secondary motor cortices, and is often linked to movement timing. “Cognitively controlled timing” of longer intervals, on the other hand, is considered to engage a cortical–subcortical loop involving the basal ganglia, parietal cortex, and prefrontal areas, and is subject to attentional modulation (Buhusi & Meck, 2005; Lewis & Miall, 2003). The distinction between these two timing systems is still being debated (Merchant, Zarco, & Prado, 2008; Shih, Kuo, Yeh, Tzeng, & Hsieh, 2009; for an early review, see Diedrichsen, Ivry, & Pressing, 2003; for a recent review of neuroanatomical structures—especially the basal ganglia—in interval timing, see Coull, Cheng, & Meck, 2010). In a recent meta-analysis of the timing literature (Wiener, Turkeltaub, & Coslett, 2010), subcortical structures such as the cerebellum and basal ganglia are identified as being more likely to be activated in subsecond timing tasks, whereas cortical structures such as the supplementary motor area (SMA) and prefrontal areas are more involved in timing of longer intervals. The basal ganglia (putamen) are more likely to be activated in timing tasks requiring perceptual judgment (e.g., discrimination between intervals) than in tasks requiring motor responses (e.g., tapping to a metronome or reproducing an interval). Conjunction analysis has identified bilateral SMA and right inferior frontal gyrus (IFG) as being activated across all timing tasks (Wiener et al., 2010). The involvement of prefrontal cortex, especially the right dorsolateral prefrontal cortex (DLPFC), in timing longer intervals seems to be related to the higher demand that such intervals make on working memory (Koch, Oliveri, & Caltagirone, 2009). In addition, although posterior parietal cortex (PPC) is typically involved in spatial perception, some authors have proposed that PPC is engaged in timing by representing both time and space as a generalized concept of magnitude (Bueti & Walsh, 2009).

The distinction between automatic and cognitively controlled timing systems, or between sub- and suprasecond timing systems, has most often been tested in the perception or production of single intervals. However, some recent studies have investigated the timing of successive intervals and yielded results suggesting that different neural substrates are engaged, depending on whether or not an underlying beat is perceived (Grube, Cooper, Chinnery, & Griffiths, 2010; Teki, Grube, Kumar, & Griffiths, 2011). A duration-based mechanism is thought to be responsible for timing successive intervals in the absence of a beat, and it recruits an olivo-cerebellar network. A beat-based mechanism, on the other hand, recruits a striato-thalamo-cortical system involving basal ganglia, thalamus, premotor cortex (PMC), SMA, and DLPFC. These two circuits are neuroanatomically and functionally interconnected, and have been recently proposed to work as a unified timing system in which the beat-based network serves as the default mechanism for both single and multiple interval timing, and the duration-based network is activated subsequently to carry out error correction (Teki, Grube, & Griffiths, 2012). Finally, a recent study with monkeys has revealed more specific roles of PMC in rhythmic timing: When monkeys tapped to an isochronous sequence of tones (reacting to the tones rather than anticipating them; see section 2.5.2), the time elapsed since the previous tap and the time left till the next tap were found to be coded by different neuronal populations in the medial PMC (Merchant, Zarco, Pérez, Prado, & Bartolo, 2011).

SMS, such as tapping to a metronome, is typically expected to recruit the striato-thalamo-cortical system. However, the cerebellum is also important in such tasks, due to its role in interval timing, error correction (Diedrichsen, Hashambhoy, Rane, & Shadmehr, 2005), and predictive movement control (Bastian, 2006). Not surprisingly, neural findings concerning SMS generally suggest the involvement of most of the aforementioned substrates, each with its role in certain aspects of the task (see section 4.2).

Rhythm and beat perception

Recent findings concerning rhythm and beat perception have shed light on the brain activities during covert synchronization to external rhythms. Covert synchronization creates an internal link between sensory and motor processes that may be similarly involved in overt SMS.

Recent imaging studies have shown that participants’ motor systems are activated when they listen to auditory rhythms without executing any motor task (Bengtsson et al., 2009; Chapin, Zanto, et al., 2010; J. L. Chen, Penhune, & Zatorre, 2008a, 2009; Grahn & Brett, 2007; Grahn & McAuley, 2009), and this typically implicates basal ganglia, cerebellum, SMA, pre-SMA, and PMC (see Fig. 3a). The ability to perceive the underlying structure of a rhythm, such as a regular beat, requires intact basal ganglia (Grahn & Brett, 2009). Specifically, basal ganglia are associated with endogenous generation and prediction of the beat in response to an auditory rhythm (Grahn & Rowe, 2013). When the beat is less distinct (such as in syncopated rhythms), the basal ganglia activations depend on the deployment of attention as well as the time needed for a listener to establish a stable pulse/beat percept (Chapin, Zanto, et al., 2010). The extent to which cortical or subcortical motor activations are coupled with the activation in the auditory cortex (superior temporal gyrus, STG) also depends on the salience of the beat and on musical training (J. L. Chen et al., 2009; Grahn & Rowe, 2009). Specifically, musicians, as compared to nonmusicians, show a higher internal coupling between auditory (STG) and motor areas (PMC and SMA) when listening to or playing a melody (Bangert et al., 2006; see also Jäncke, 2012, for evidence of a causal relationship between auditory and premotor activations during piano performance), or when perceptually processing the beat of an auditory rhythm (Grahn & Rowe, 2009; see also an electroencephalography [EEG] study by James, Michel, Britz, Vuilleumier, & Hauert, 2012). In addition, prefrontal cortex (PFC) is more active when the heard rhythm is metrically more complex (Bengtsson et al., 2009). A modality difference has also been observed in beat perception tasks, with greater basal ganglia activation for auditory rhythms than for visual rhythms presented as discrete, repetitive flashes (Grahn, Henry, & McAuley, 2011; however, Hove, Fairhurst, Kotz, & Keller, 2013, found similar basal ganglia activations when the visual rhythms consisted of periodic movement). Besides, activation in putamen increased when visual rhythms were preceded by similar auditory ones, but not with the reverse presentation order. Grahn et al. (2011) argued that the preceding auditory rhythm activated a strong beat representation that was reactivated during the following visual rhythm. Finally, neural findings seem to support the hypothesis that beat perception is an innate ability (Honing, 2012) specific to humans (Honing, Merchant, Haden, Prado, & Bartolo, 2012; see also section 2.5.2).

Fig. 3
figure 3

Brain activations reported in the literature, visualized in Montreal Neurological Institute space using the MRICron software (www.mccauslandcenter.sc.edu/mricro/mricron/). (a) Areas associated with covert sensorimotor synchronization and a perceived beat (Grahn & Brett, 2007; Grahn & Rowe, 2013; Teki et al., 2011). (b) Areas associated with externally paced and self-paced tapping (Brown et al., 2006; Jantzen et al., 2007; Kornysheva & Schubotz, 2011). (c) Areas within 16 mm from the brain surface that are associated with antiphase versus in-phase tapping and with the complexity of the pacing signal (J. L. Chen et al., 2009; Oullier et al., 2005; Thaut et al., 2008). (d) Areas associated with error correction in SMS, especially regarding supra- versus subliminal tracking (Bijsterbosch, Lee, Hunter, et al., 2011; Pollok, Gross, et al., 2008; Thaut et al., 2009).

Further evidence that the brain synchronizes to an external rhythm in the absence of motor tasks comes from EEG and magnetoencephalography (MEG) studies. Cortical neuronal oscillations can be tuned (i.e., phase-locked) to periodicities in an external sensory stream through attentional selection (Lakatos, Karmos, Mehta, Ulbert, & Schroeder, 2008), such that the high-excitability phases of the oscillation align with the periodic occurrence of the events to allow for optimal processing. Such oscillatory activities represent endogenous entrainment to the pulse or beat at different metrical levels of an auditory rhythm (Large & Snyder, 2009; see also Nozaradan, Peretz, Missal, & Mouraux, 2011, for a similar finding of cortical entrainment to an auditory pulse, as reflected in steady-state evoked potentials). Oscillations at slow frequencies can be regulated by an isochronous sequence of tones such that, when the sequence is presented at temporal frequencies of 1–5 Hz, the stimulus-locked intertrial coherence (ITC, or “phase-locking factor”) shows a maximum at the frequency band corresponding to the stimulus rate (Will & Berg, 2007). Furthermore, the maximum is greatest with a stimulus rate of 2 Hz, which seems to correspond to an intrinsic preferred beat tempo in humans (Moelants, 2002).

When participants listen to a sequence of isochronous tones, two kinds of responses can be measured in their cortical oscillations at much higher frequencies: an induced beta band (15–30 Hz) activity, which is time-locked to the tone onset, and an evoked gamma band (>30 Hz) activity, which is phase-locked to the tone onset.Footnote 29 The induced brainwaves can also be observed at occasional tone omissions (Snyder & Large, 2005), even though there is no evoked response in that case. This finding parallels the psychological phenomenon of beat perception, which is linked to temporal expectancy. When a temporal perturbation is introduced in an isochronous sequence, the induced gamma peak precedes a late deviant and follows an early one (Zanto, Large, Fuchs, & Kelso, 2005; Zanto, Snyder, & Large, 2006; see also Chapin, Jantzen, Kelso, Steinberg, & Large, 2010, for results of cortical and subcortical motor activations associated with processing temporal fluctuations of the beat in expressive timing).

Oscillations in the beta band (15–30 Hz) have been found to be modulated by the mental superimposition of a beat (“metrical accent”) on a sequence of physically identical tones (Iversen, Repp, & Patel, 2009). This modulation resembles that elicited by physical accentuation of the tones; however, physical accentuation also influences neural responses in low-frequency bands (reflected in event-related potentials, ERPs) and high-frequency (gamma) bands, whereas mental accentuation seems to specifically influence the beta band. However, Schaefer, Vlek, and Desain (2011), found an early component of the ERP (N1/P2) to be larger in response to both physically and mentally accented tones, as compared to unaccented ones, though it was stronger for the former condition. Fujioka, Trainor, Large, and Ross (2009) found that the beta modulation caused by an isochronous sequence of tones consists of an immediate decrease in power (i.e., desynchronization of neuronal firing) after each tone onset, followed by a rebound that reaches its maximum slightly before the next tone onset. This anticipatory rebound does not occur when the sequence is irregular, and the slope of the rebound curve becomes shallower as the sequence tempo decreases (Fujioka, Trainor, Large, & Ross, 2012). The time course of phase coherence in the beta band between cortical auditory and motor areas (e.g., SMA and pre-SMA) exhibits periodic patterns in accordance with the rate (or tempo) of an isochronous sequence (Fujioka et al., 2012).

Cortical oscillations in the beta and gamma bands are relevant to motor activities. Synchronized oscillations in the beta band have been observed in cortical motor areas and spinal motoneurons during motor tasks (Salenius & Hari, 2003). Gamma oscillations associated with an overt limb movement have also been identified in the primary motor cortex, suggesting that they play a role in motor control (Muthukumaraswamy, 2010). In addition, it has been proposed that the rhythmicity of motor cortical oscillations modulates oscillations in the auditory cortex, which may in turn influence auditory perceptual processes (Schroeder, Lakatos, Kajikawa, Partan, & Puce, 2008; Schroeder, Wilson, Radman, Scharfman, & Lakatos, 2010). Overall, the findings reviewed in this subsection seem to point to an intrinsic link between sensory and motor systems regarding their involvement in rhythm perception.

Neural correlates of overt SMS

SMS tasks in neuroscience research most often involve finger tapping. Finger-tapping tasks generally recruit primary sensorimotor cortex (S1 and M1), SMA, PMC, inferior parietal cortex, basal ganglia, and cerebellum (see Witt, Laird, & Meyerand, 2008, for a meta-analysis of fMRI findings in finger-tapping studies up to 2006). However, different task-specific parameters may modulate the neural mechanisms.

Paced versus unpaced tapping

The similarities and difference in the neural circuits engaged by paced and unpaced (i.e., self-paced) tapping can be investigated using the synchronization–continuation paradigm. In monkeys, a large neuron population in medial PMC shows similar responses during synchronization and continuation tapping (Merchant et al., 2011).Footnote 30 In humans, motor areas such as sensorimotor cortex (SM1), SMA, and anterior cerebellum are commonly activated during both paced and unpaced tapping (Witt et al., 2008). However, earlier studies also suggested that motor or prefrontal areas are recruited to a greater extent in continuation than in synchronized tapping (see R05, p. 983). Boonstra, Daffertshofer, Peper, and Beek (2006) used MEG measurements and compared the dynamics of phase coherence and amplitude in pure listening, paced tapping to an isochronous sequence, and unpaced tapping. They found that, while the evoked response in the slower cortical oscillations, in the theta and alpha bands, during paced tapping reflected auditory stimulus processing, the induced response (modulation in phase and amplitude) in the beta band was associated with motor activities of tapping and was phased-locked with the tap onsets. The amplitude of the event-related changes in the beta band also decreased with increasing movement rate.

In another EEG study, Serrien (2008) found somewhat higher tapping variability, as well as greater functional connectivity in the beta band in the mesial–central area (covering areas such as PMC and SMA), in continuation than in synchronization tapping. This pattern was found for both unimanual and bimanual tapping, and it suggested a higher demand of motor timing in areas such as SMA when the external pacing signal was absent. Similarly, Jantzen, Oullier, Marshall, Steinberg, and Kelso (2007) found that significant activity in the prefrontal–parietal–temporal network—consisting of dorsal and ventral prefrontal cortex, middle temporal gyrus, and bilateral parietal lobes—was present only in continuation tapping, not in synchronized tapping. Activity in these areas is typically associated with various working memory manipulations, and the observed activity was interpreted as the increased demand on working memory for the temporal representation of stimuli when the pacing signal was switched off.

Some other cortical areas, however, have been found to be more associated with paced than with unpaced tapping: Disruption by repetitive transcranial magnetic stimulation (rTMS) of the ipsilateral cerebellum and contralateral dorsal PMC (dPMC; Del Olmo, Cheeran, Koch, & Rothwell, 2007), as well as of the contralateral ventral PMC (vPMC; Kornysheva & Schubotz, 2011), has been found to affect the variability of tapping synchronized with an auditory metronome while sparing continuation tapping. Brown, Martinez, and Parsons (2006) reported similar findings in dancers, obtained using positron emission tomography (PET): Bipedal leg movements synchronized to the beat of a given musical rhythm elicited higher activity in the vermis of anterior cerebellar lobule III, as compared to activation during similarly timed movements executed in a self-paced manner. (Note that the self-paced movements here consisted of a different pattern from that adopted in paced movements in order to reduce mental imagery of the previously heard rhythm.) Kornysheva and Schubotz went further to show that, following application of rTMS to the left vPMC, a compensatory increase in activation occurred in the right inferior vPMC and in vermal area V of the anterior cerebellum. While the former activation did not seem to have any behavioral relevance, the latter predicted synchronization stability, such that the greater the increase in vermal activity, the less was tapping variability (as indexed by the coefficient of variation of the ITIs) affected by the rTMS. Figure 3b gives an overview of subcortical and cortical areas involved in paced and self-paced SMS.

In another study, which used a synchronization–continuation paradigm with a visual metronome presented as animated visual movements (a hinged bar hitting a horizontal line or a finger executing a tapping movement), Ruspantini, Mäki, Korhonen, D’Ausilio, and Ilmoniemi (2011) found that, when a triple-pulse TMS was repetitively delivered to the vPMC prior to every fourth pacing signal during the synchronization phase, the mean negative asynchrony was reduced (i.e., taps moved closer to the visual pacing signals) relative to a condition without rTMS. However, this effect was related to the time course of the TMS application, as it was observed only in the first two taps immediately following the TMS. Since TMS was not administered during continuation tapping, it remains unclear whether vPMC is indeed more relevant for externally (visually) paced than for self-paced tapping.

In sum, the findings so far suggest that while synchronized tapping implicates the cerebellar–premotor network because of the required sensorimotor coordination (Molinari, Leggio, & Thaut, 2007) and audio–motor coupling (J. L. Chen et al., 2009), continuation tapping relies more on the internal representation of the given sequence tempo, thus requiring the working memory loop. However, when employing a simple, static visual metronome of repetitive flashes, Cerasa et al. (2006) did not observe any difference in brain activity between synchronized and continuation tapping in both healthy controls and patients with Parkinson’s disease. It could be that SMS with auditory metronomes and with periodically moving visual stimuli (as used by Ruspantini, Mäki, et al., 2011) relies on a similar sensorimotor coupling mechanism that recruits especially the motor area, such as PMC, while SMS with static visual metronomes does not. This interpretation seems consistent with the findings of modality differences in the neural activity underlying rhythm perception (Grahn et al., 2011; see section 4.1.2).

In-phase versus antiphase tappingFootnote 31

When tapping to a metronome in in-phase and antiphase coordination modes, antiphase has been found to lead to higher activity in pre-SMA, cingulate, dPMC, insula, STG, thalamus, and lateral cerebellum, and this difference was observed in both executed and imagined tapping (Oullier, Jantzen, Steinberg, & Kelso, 2005). Jantzen et al. (2007) also found greater activation in pre-SMA, lateral PMC, and parts of the cerebellum (bilateral declive and left inferior semilunar lobule) during antiphase relative to in-phase tapping. In addition, the difference in activation observed in these areas due to coordination mode was observed during both synchronization and continuation (although in the latter condition the pacing signal was not present any more), even after a break of up to 9 s between these two phases. The authors concluded that, during synchronization, different temporal information is represented in these cortical and subcortical motor areas for antiphase and in-phase tapping, and this coordination-dependent temporal memory may accordingly be required during continuation tapping.

The difficulty of antiphase tapping increases as the rate of the pacing signal increases, leading to lower tapping stability, as compared to in-phase tapping with the same increasing rates. Coordination mode and rate have been found to interact in the neural activities in the following way (Jantzen, Steinberg, & Kelso, 2009): When the rate of the pacing signal was increased from 0.75 to 1.75 Hz, activity increased linearly in a network comprising dPMC, vPMC, SMA, pre-SMA, right anterior insula, the left dentate nucleus, and the left inferior semilunar lobe of cerebellum during antiphase tapping, but not during in-phase tapping. Furthermore, when the rate increased beyond 1 Hz, antiphase tapping elicited a greater increase, as compared to in-phase tapping, in functional coupling from PMC to SMA, as well as from SMA to M1. That is, while the whole network of auditory and motor cortices responds to parametric changes in pacing/movement rate (Jantzen et al., 2007), the activations in the motor circuitry of SMA, bilateral PMC, and lateral cerebellum specifically reflect the stability of the coordination pattern (Jantzen et al., 2009), as it is modulated by the interaction between movement rate and coordination mode. An increase in the difficulty of antiphase tapping as the rate increased from 0.5 to 1.3 Hz has also been observed in 8- to 10-year-old children tapping to a visual metronome (de Castelnau, Albaret, Chaix, & Zanone, 2008). It was accompanied by increasing frontocentral coherence in the alpha (8–12 Hz) and beta (12–30 Hz) bands in the left hemisphere. Here, the authors argued, the frontal area was recruited due to the higher demands of motor planning at faster tempi. In sum, relative to in-phase tapping, antiphase tapping seems to place greater demands on the motor circuitry (see Fig. 3c).

Effects of the temporal complexity of the pacing signal

Tapping to rhythmic sequences of various temporal structures, in which the underlying beat may not be equally salient, may recruit the neural circuitry differently. When the task is to tap with the beat of an isochronous sequence, an increase in beat saliency (implemented by increasing the contrast in sound amplitude between accented and unaccented tones) elevates dPMC activity, as well as the functional connectivity between auditory cortex (STG) and dPMC (J. L. Chen, Zatorre, & Penhune, 2006). Similarly, when the task is to tap in synchrony with each tone of a nonisochronous auditory rhythmic pattern, the audio–motor connectivity is modulated by the metrical complexity of the rhythms (J. L. Chen, Penhune, & Zatorre, 2008b). Though both dPMC and vPMC are sensitive to the metrical structure of the rhythm and can be activated in the absence of movement, the authors argued that vPMC is activated only when the sounds are directly linked to motor tasks (e.g., when participants listen with anticipation to tap subsequently), and not when the participants listen passively without any instructed motor task (J. L. Chen et al., 2009). Brown et al. (2006) found higher activation in putamen when dancers moved their legs to the beat of metrical rather than nonmetrical rhythms, a result similar to that reported for putamen activity in beat perception without overt movement (Grahn & Brett, 2007; Grahn & Rowe, 2009; see section 4.1.2).

Using EEG, Serrien (2009) measured the phase coherence at the beta frequencies emanating from areas such as PMC, sensorimotor area, superior parietal cortex, and SMA when participants tapped to a long sequence that contained abrupt tempo changes (slow–fast–slow or fast–slow–fast). Coherence in the hemisphere contralateral to the tapping hand was higher in the last segment when it followed a faster segment. The author argued that “the processing constraints associated with fast tapping operated as a dynamic background that subsequently influenced regulation of the less demanding slow tapping” (p. 68). A direct comparison between coherences in fast tapping and slow tapping was not reported, nor was a comparison between tapping in the first and second segments.

Vuust, Roepstorff, Wallentin, Mouridsen, and Ostergaard (2006) investigated the effect of metrical complexity by employing a polyrhythmic (3:4) musical excerpt consisting of a synthesizer sequence at a constant meter (“main meter”) and a melody (played by several instruments) that alternated between the same main meter and a faster meter (“counter meter”). Musicians were asked to tap to the beat of the main meter. The authors found that, when the melody emphasized the counter meter, as compared to when it emphasized the main meter, the BA47 part of the IFG—known for its role in semantic processing of language—was more activated bilaterally. The difference in activity here was proposed to be associated with the higher rhythmic tension created by the former condition. The authors did not observe any difference in the motor system between these two conditions. The higher activation in BA47 was also observed when the counter meter was mentally imposed on top of a sequence presented only at the main meter (Vuust, Wallentin, Mouridsen, Ostergaard, & Roepstorff, 2011). In a similar vein, Thaut, Demartin, and Sanes (2008) compared brain activity when musicians tapped to an auditory sequence isorhythmically (tapping rate : stimulus rate = 1:1) and polyrhythmically (tapping rate : stimulus rate = 2:3 or 3:2). In both conditions, a network typically involved in sensorimotor timing was recruited, consisting of contralateral M1/S1, thalamus, putamen, the parietal operculum, and ipsilateral cerebellum, though the activations were greater for polyrhythmic tapping. Relative to isorhythmic tapping, polyrhythmic tapping elicited more activation in contralateral cerebellum, bilateral SMA, and ipsilateral inferior parietal lobule, but less activation in ipsilateral putamen and caudate nucleus. The authors postulated differential roles of the cerebellum–SMA loop and of the basal ganglia in SMS: The former is more involved in sensorimotor integration and is modulated by temporal complexity, while the latter are more involved in basic timing and sequencing. However, reminiscent of the role of the basal ganglia in beat perception (Grahn & Rowe, 2009), it may also be argued that the observed lower putamen activities could be related to the less distinct beat in the tapped sequence because of interference from the beat of the stimulus, which had a different tempo. See Fig. 3c for an overview of brain areas whose activations are modulated by the temporal complexity of the pacing signal in SMS.

Finally, Boonstra, Daffertshofer, Breakspear, and Beek (2007) recorded brain activity with MEG while the participants learned to produce a 3:5 polyrhythm bimanually (each stimulus rate executed by one hand), paced by two concurrent auditory metronomes (each at one stimulus rate and presented to one ear). Participants’ performance was indexed by the actual frequency relationship between the tap series produced by each hand. The authors found that behavioral improvement in the course of an experimental session was correlated with the event-related beta band activity in the motor cortex contralateral to the hand tapping at the slower rate, which was considered to be the more difficult part in bimanual production of a polyrhythm.

Movement factors

Here we discuss the effects of tapping movement, unimanual versus bimanual tapping, overt versus covert tapping, and other limb movements. Intending to compare event-based and emergent timing (see section 2.3), Spencer, Verstynen, Brett, and Ivry (2007) asked participants to tap without contacting a surface, either in a discrete manner, by inserting a pause between finger flexion and extension, or in a smooth, continuous manner without any pause during the tapping movement. Discrete tapping led to higher cerebellar activation in the focal region of lobule V/VI of the vermis, even when it was carried out at a slower rate than the continuous movement. The lateral part of lobule V–VI ipsilateral to the tapping hand was activated similarly for both discrete and continuous movement. The results suggest that while lobules V and VI are typically engaged in sensorimotor tasks such as finger tapping to a metronome (see Stoodley & Schmahmann, 2009, for a meta-analysis of functional localization of the human cerebellum), a subregion of these cerebellar lobules is specifically involved in event-based timing, and also that the discrete or continuous nature of the movement in SMS may engage different parts of the cerebellum.

Most SMS studies surveyed so far in this section have employed unimanual tapping. Pollok, Südmeyer, Gross, and Schnitzler (2005) studied interhemispheric integration of the neural activities underlying simultaneous bimanual tapping to an isochronous auditory sequence. Cortical coupling—identified by MEG phase coherence at 8–12 Hz—was found in a network comprising bilateral SM1, PMC, posterior-parietal and primary auditory cortex, thalamus, and the cerebellum. In particular, the interhemispheric coupling occurred at PMC (from the ipsilateral to the contralateral site of the dominant hand), posterior parietal cortex (PPC), and cerebellum. The results indicate cortical integration of bilateral motor and somatosensory information, as well as subcortical integration of motor timing signals, during simultaneous bimanual task execution. Serrien (2008) investigated the effect of switching from simultaneous bimanual tapping during synchronization to unimanual tapping during continuation and found a higher interhemispheric connectivity (covering PMC, SMA, and sensorimotor areas) in the beta band during continuation. The author interpreted this increase as being due to the increased motor demand made by effector reorganization, possibly as a result of suppressing bimanual coupling.

Covert (imagined) and overt tapping movements synchronized to a pacing signal seem to share similar neural substrates, including SMA, PMC, inferior parietal lobe, STG, IFG, and basal ganglia (Osman, Albert, Ridderinkhof, Band, & van der Molen, 2006; Oullier et al., 2005; Stavrinou, Moraru, Cimponeriu, Penna, Della, & Bezerianos, 2007). These activated areas are not unlike those reported in rhythm perception tasks (see section 4.1.2). In the study of Oullier et al. (2005), the neural difference between antiphase and in-phase tapping was observed in both executed and imagined movements, with greater activations in the following areas for antiphase tapping: PMC, SMA, basal ganglia, and lateral cerebellum. Stavrinou et al. used EEG and phase synchronization analysis to reveal connectivity between cortical areas during executed and imagined finger tapping paced by an isochronous auditory metronome. They found a similar pattern of desynchronization and synchronization in the beta band following the tone onset in both executed and imagined tapping (see also Fujioka et al., 2012, for a similar result in beta oscillations during a perceptual task). In addition, synchronized activation in the frontoparietal area contralateral to the intended finger was similarly observed for both real and imagined tapping. Indeed, sensorimotor coupling seems to occur whenever we process temporally structured auditory input, and it may underlie both overt and covert actions that are related to the input. On the other hand, it has been found that bipedal isometric rhythmic muscle contractions and bipedal rhythmic leg movement—both synchronized to the same metrical beat—lead to different activations: The real leg movement leads to greater activity in the posterior parietal lobule, an area associated with spatial guidance of limb movement, not with motor timing per se (Brown et al., 2006). The neural mechanisms underlying SMS via different limbs or whole-body movement may receive more attention in the future as recording techniques advance.

Asynchrony and error correction

The negative mean asynchrony (NMA) is a typical attribute of SMS when it is paced by an auditory metronome. Doumas, Praamstra, and Wing (2005) found that while disruption of contralateral PMC by rTMS altered neither the NMA nor error correction after a phase shift, disruption of the contralateral motor cortex reduced the magnitude of NMA (i.e., taps fell closer to the tones). The less negative asynchronies as a result of motor cortical inhibition were interpreted as being due to either a slowdown of the timekeeper process, which lengthened the internally measured period, or to a reduced sensitivity of motor cortex to the somatosensory input. Somewhat similar results were obtained in a paradigm without perturbations (Malcolm, Lavine, Kenyon, Massie, & Thaut, 2008): rTMS applied to the left vPMC had no significant effect on the mean asynchrony of right-hand tapping, whereas rTMS applied on the left posterior superior temporal–parietal junction (STP)—an area associated with audio–motor entrainment in speech and music (Hickok & Poeppel, 2007)—did affect the NMA. However, Malcolm et al. found an increased NMA following rTMS to the left STP, a result interpreted as “the inhibition of neural inputs to central anticipatory processes, leading to a greater anticipatory response” (p. 243). The authors argued for the role of STP in conscious tracking and phase matching between auditory input and motor output, which underlies tap–tone asynchrony. In both aforementioned studies, the inhibition of PMC was not found to affect asynchrony in SMS.

However, Pollok, Rothkegel, Schnitzler, Paulus, and Lang (2008) did find that rTMS applied to the left (but not the right) PMC increased the magnitude of NMA as well as intertap variabilities paced by an auditory metronome, and this effect was observed for both right- and left-hand tapping. Furthermore, the inhibitory effect of TMS on left PMC occurred around 90 ms before the onset of the left-hand tap, and around 50 ms before the onset of the right-hand tap. The authors thus argued that the effect of left PMC on timing in SMS did not occur via its direct connection with the right M1, but rather via other indirect connections with structures such as left M1, right dPMC, or subcortical loci. Applying TMS at a different frequency—theta burst stimulation (TBS)—on the left PMC, Bijsterbosch, Lee, Dyson-Sutton, Barker, and Woodruff (2011) also found that continuous stimulation increased the magnitudes of both NMA and variability in both hands. Overall, the findings reviewed here have been inconsistent regarding the role of PMC in asynchrony. The neural mechanism of NMA still remains to be firmly established, and the left PMC seems to be a plausible candidate to receive further investigation. On the other hand, PMC has been consistently shown to be relevant to synchronization stability (Bijsterbosch, Lee, Dyson-Sutton, et al., 2011; Del Olmo et al., 2007; Kornysheva & Schubotz, 2011; Pollok, Rothkegel, et al., 2008). Finally, the period matching between ITI and IOI might be mediated by a different subcortical circuitry, including cerebellum and basal ganglia (Molinari et al., 2007).Footnote 32

SMS is based on internal prediction of the sensory input as well as error correction, as reflected in immediate reaction to any temporal shift in the pacing stimuli. Bijsterbosch, Lee, Dyson-Sutton, et al. (2011) found that continuous TBS on the left PMC affected the correction for supraliminal phase shifts in the following way: After a negative phase shift, overcorrection occurred both pre- and post-TBS; however, the correction approached baseline faster post-TBS than pre-TBS. This seems like improved error correction in response to a negative perturbation following continuous TBS. After a positive shift, overcorrection was found pre-TBS, but hardly at all post-TBS, which also seems like an improvement, but actually reflects less vigorous phase correction. The effect of TBS on error correction was, however, not immune to the practice effect: Participants received TBS on either the first or the second of two consecutive days, but the effect of TBS was observed only on the first day.

Predictive and reactive mechanisms in SMS have been found to recruit somewhat different neural circuits (Pollok, Gross, Kamp, & Schnitzler, 2008): When participants tapped to a regular, predictable isochronous sequence, in which taps typically preceded the tones, this was accompanied by an increase in functional connectivity in the alpha and beta frequencies within the cerebello–diencephalic–parietal network (comprising bilateral cerebellum, S1/M1, PMC, PPC, and SMA) around 150–200 ms before the tap onset, suggesting anticipatory motor control. When participants tapped to an irregular sequence, which required reactive motor responses, connectivity increased within the parietal–cerebellar loop around 150 ms after the tap onset, reflecting feedback processing. Both pathways seem behaviorally relevant in most SMS tasks. According to Pollok, Gross, et al., PPC maintains the internal prediction and compares it with the sensory feedback, and the outcome of this comparison is then transferred to the cerebellum for updating the next prediction. In a further relevant study, Krause, Schnitzler, and Pollok (2010) found that the activation in the anticipatory neural network was associated with musical expertise, such that the connectivities in both alpha and beta frequencies between PMC and thalamus, as well as between PPC and thalamus, were stronger in professional drummers than in nonmusicians.

The contrast between predictive, anticipatory motor control and reactive tracking in SMS was further evident in a study from Thaut et al. (2009), in which participants tapped to a sequence with a rather long base IOI (1,250 ms) in isochronous, subliminal, and supraliminal perturbation conditions. Subliminal perturbations (not consciously perceived by the listeners) were implemented as systematically modulated interval changes following a cosine-wave function within a sequence, with an amplitude of either 3 % or 7 % of the IOI. Supraliminal perturbations consisted of similarly modulated intervals with an amplitude of 20 %. During isochronous as well as subliminal perturbation conditions, the asynchronies were negative, and the series of ITIs and IOIs exhibited a positive lag-1 cross-correlation, indicative of tracking. Bilateral posterior cerebellar lobule (lobule VI) and areas around the intraparietal sulcus increased their activity stepwise with the mean tempo modulation, possibly subserving adaptation to subliminally perceived temporal changes. With supraliminal perturbations, and in another condition in which perturbations of the same average magnitude occurred randomly, the taps became reactive and followed the tones (evidently, the participants did not learn to anticipate the regular pattern of the cosine function), though the variability in the former condition was lower. This was accompanied by activity in the more posterior parts of lobule VI, as well as in the right analogue of Broca’s area and DLPFC, drawing on prefrontal resources for conscious monitoring. These results demonstrate specific and distinct circuits through which the cerebellum becomes coupled with the higher cortical areas for motor timing and conscious temporal monitoring.

Bijsterbosch, Lee, Hunter, et al. (2011) also reported finding different neural substrates for subliminal and supraliminal error correction. Combining fMRI and TBS in a right-hand tapping task, these researchers found that subliminal error correction (a perturbation magnitude of 3 %) and tapping to unperturbed sequences produced the same activation pattern in right PMC, left M1, bilateral primary auditory cortices, and right cerebellum extending to vermis, as compared to a baseline rest condition. However, on top of that, activations of left cerebellum were observed in supraliminal error correction (a perturbation magnitude of 15 %), as well as greater connectivity between left cerebellum and frontal and sensory cortices. Consistently, the disruption of left (but not right or medial) cerebellum by continuous TBS impaired supraliminal (but not subliminal) error correction, especially when the taps were temporally closer to the time of TBS application. In sum, the studies reviewed here converge on different roles played by subregions of the cerebellum, which are coupled to different cortical areas in different error correction processes. Figure 3d shows an overview of PPC and the cerebellar subregions associated with tracking in SMS.

Modality effects

The findings on rhythm and beat perception (section 4.1.2) have already suggested different capacities for covert synchronization when the rhythms are presented in different modalities, typically pointing to an auditory advantage in the form of stronger coupling between sensory and motor areas of the brain (e.g., Grahn, Henry, & McAuley, 2011). Similarly, auditory rhythms generally carry an advantage in overt SMS over visual rhythms, even when the visual rhythm is composed of spatiotemporal periodicities (see section 1.4.2).

Although finger tapping paced by auditory and visual cues activates common areas such as primary sensorimotor cortex, SMA, and anterior cerebellum, the basal ganglia seem to be active mainly when taps are paced by auditory signals, while DLPFC is more active during visual pacing (Witt et al., 2008). Witt et al. also found that, as compared to both auditorily paced and self-paced tapping, visually paced tapping recruits additional areas, including DLPFC, insula, right IFG, and left posterior cerebellum. On the other hand, ipsilateral cerebellum and contralateral dPMC seem to be more crucially implicated in maintaining tapping stability paced by auditory cues, as evidenced by the increased tapping variability after rTMS has been applied to these areas (Del Olmo et al., 2007). No such disruption has been observed in visually paced or self-paced tapping.

Pollok, Krause, Butz, and Schnitzler (2009) found a distinction in premotor cortical activations according to the modality of the pacing signals: The dPMC was more active when taps were paced by auditory metronomes, with a greater connectivity between dPMC and STG in the alpha oscillation frequency. The vPMC, on the other hand, was more active during visual pacing signals (a repetitive static dot), accompanied by increased connectivity between vPMC and the thalamus at the beta frequency. The coupling between dPMC and STG in overt synchronization with auditory cues is reminiscent of several findings in which temporal structures of the auditory rhythm modulated the connectivity between dPMC and STG during listening tasks (J. L. Chen et al., 2008a; J. L. Chen et al., 2006; Grahn & Rowe, 2009). The role of vPMC in visually paced tapping may be related to its typical engagement in visuomotor representation (Murata et al., 1997) or, as has also been argued, to visuomotor task learning that is dependent on sensory feedback (Grafton, Schmitt, Van Horn, & Diedrichsen, 2008). Similarly, Ruspantini, Mäki, et al. (2011) showed that the mean asynchrony of taps synchronized to an animated visual cue (a tilting bar) became less negative following rTMS on vPMC, while rTMS on dPMC did not affect the asynchrony. However, Kornysheva and Schubotz (2011) did find an impairment in auditorily paced tapping stability following rTMS on the left vPMC.

It seems that the roles of dPMC and vPMC in modality-specific synchronization are not yet clear. One reason might be that the engagement of different parts of PMC reflects not only the sensory modality in which the rhythms are presented, but also the different mechanisms of linking sensory input to motor output, which may differ between auditory and visual rhythms. In particular, vPMC and dPMC have been suggested to underlie direct and indirect sensorimotor coupling, respectively, in the visuomotor domain (Hoshi & Tanji, 2006, 2007; see also J. L. Chen et al., 2009): vPMC is engaged in directly matching the motor act with the sensory, often visuospatial, cue (e.g., the act of grasping associated with a defined object), which may also include matching a viewed act with one’s own action (Rizzolatti & Sinigaglia, 2010). dPMC, on the other hand, represents motor action instructed by an arbitrary visual signal, also termed the conditional rules for motor behaviors, as the visual cue does not directly serve the object of the action. It is arguable whether the vPMC activities could reflect movement planning directed by a visual pacing signal, or whether the dPMC activity might reflect indirect rules, such as metrical structure, derived from the auditory signal (J. L. Chen et al., 2009). Indeed, the distinction between direct and indirect sensorimotor coding in the case of rhythm—whether it parallels that of the traditional visuomotor findings—is not yet clear. Besides, in the study of Ruspantini, Mäki, et al. (2011), synchronization to a visual cue consisting of biological movement (an animation of finger tapping) was not affected by vPMC disruption. This seems to contradict the role of vPMC in matching a viewed action to self-generated action. More research is thus needed to specify the similarities and differences between the neural mechanisms underlying synchronization to auditory rhythms and to visual movement rhythms (see also Hove et al., 2013).

Interpersonal synchronization

The brain synchronizes not only to environmental stimuli but also to sensory or semantic cues derived from another agent in a joint action or communication (Hasson, Ghazanfar, Galantucci, Garrod, & Keysers, 2012). Simultaneous EEG recordings of multibrain activities allow for the study of such interbrain synchronization during interpersonally coordinated behaviors (Astolfi et al., 2010; Babiloni et al., 2011; Dumas, Nadel, Soussignan, Martinerie, & Garnero, 2010). Lindenberger, Li, Gruber, and Müller (2009) recorded EEG simultaneously in pairs of guitarists playing a melody together, and they found that during the metronome pacing prior to the start of playing, the within-brain synchronization (measured by the phase-locking index) was highest at the frontocentral site with a maximum at 3–7 Hz (theta band) and was related to the onset of the metronome beat. During playing, the interbrain synchronization (measured by phase coherence) was highest also at the frontocentral site, with a maximum around 3.3 Hz; both within-brain and interbrain synchronizations were related to the leading guitarist’s gesture right before the onset of playing, as well as the onset of the first tone. Notably, during both metronome pacing and instrument playing, the interbrain synchronization was higher for those pairs who also showed higher within-brain synchronization.

Besides the temporally linked brainwaves, in a joint action such as playing in a musical ensemble, one’s own action and those of the others are thought to share common cortical motor representations (Rizzolatti & Sinigaglia, 2010); however, the resultant corticospinal excitability may differ according to the perceived extent of agency (Novembre, Ticini, Schütz-Bosbach, & Keller, 2012). The degree of mutual adaptation in a cooperative SMS task can also be associated with different brain activations. Adopting the paradigm of an adaptive virtual tapping partner (Repp & Keller, 2008; see section 1.2.1), Fairhurst, Janata, and Keller (2013) found that SMS with an optimally adaptive virtual partner, as evidenced by the lowest variability (SD asy), was accompanied by activations in ventromedial PFC, hippocampus, SMA, S1/M1, posterior cingulate, and precuneus. Activations in these midline structures were argued to represent possible involvement of the brain’s “default mode network” during tasks of relative ease (here, more easily achieved synchrony with the virtual partner). An overly adaptive partner (more difficult to synchronize with) led to higher tapping variability, which was associated with activation in anterior insula, IFG, superior frontal gyrus (SFG), ventrolateral PFC, and inferior parietal lobe, areas that are typically found in tasks demanding greater cognitive control.

Neuroscience of SMS: what have we learned since 2005?

Much progress has been made in studying the neural correlates of various processes relevant to SMS. The role of (sub)cortical motor and other frontal areas in interval timing has been further documented. Multiple timing mechanisms have been proposed, and a unified system comprising beat-based and interval-based networks seems to support SMS. The involvement of cortical and subcortical motor areas, as well as auditory–motor coupling in the brain, in rhythm perception has been well established. Different cortical oscillations have been found to entrain to a regular auditory beat/pulse. Simple synchronized finger tapping engages the cerebellar–premotor network, while continuation tapping relies more on prefrontal areas due to its load on working memory. More complex SMS tasks result in greater activations in related motor areas (pre-SMA, PMC, and cerebellum), as well as in stronger coupling to the auditory area. Specific subregions in the cerebellum have been identified as being associated with event-based timing. The integration of interhemispheric signals at the cortical and subcortical levels has been observed during bimanual paced tapping. Covert (imagined) and overt tapping share overlapping neural substrates, which are similar to those reported in rhythm perception tasks. Predictive and reactive tracking have been found to implicate different neural circuits, especially different subregions of the cerebellum. A modality difference has been found when taps are paced by auditory and by (usually static) visual metronomes, with some results supporting the involvement of dPMC in the former and vPMC in the latter. However, more recent investigations have revealed overlapping neural substrates for SMS with moving visual stimuli and with an auditory metronome. Finally, a few new studies have shown how brainwaves synchronize between individuals in joint music-making, as well as how the degree of cooperativity in a joint SMS task engages different neural networks.

However, just as the NMA has not yet been fully explained (see section 1.1.2), the exact neural mechanisms underlying the NMA remain to be elucidated, in particular concerning the role of PMC. In addition, although several new behavioral findings and models have been presented regarding error correction in SMS (see section 1.2), little has been reported lately regarding the neural substrates for phase and period correction.

Conclusion

In this article, we have reviewed a broad range of studies on SMS. In recent years, experimental tasks have been extended beyond the traditional tapping paradigm, and abundant new findings have advanced our understanding of the behavioral and neural mechanisms underlying this intricate and yet ubiquitous ability. However, many questions still remain unanswered: For example, there is still no convincing explanation for the commonly found NMA, and neural evidence supporting the different hypotheses about its cause is lacking. Following the discovery of effective visual pacing stimuli, modality differences warrant renewed investigation, especially regarding their neural substrates. Error correction mechanisms in synchronization with various forms of stimuli in each modality await comparison and neural findings. Research in this field has clinical implications, as more effective stimuli for rehabilitating movement or speech disorders may yet be developed. Interpersonal entrainment is a relatively young field of research, but findings there may provide ideas for designing learning or training programs for children to aid language or music acquisition. Without any doubt, SMS continues to be an exciting research area, and with this review we hope to generate further interest and creative ideas.