Abstract
Children’s sensitivity to regularities within the linguistic stream, such as the likelihood that syllables co-occur, is foundational to speech segmentation and language acquisition. Yet, little is known about the neurocognitive mechanisms underlying speech segmentation in typical development and in neurodevelopmental disorders that impact language acquisition such as autism spectrum disorder (ASD). Here, we investigate the neural signals of statistical learning in 15 human participants (children ages 8–12) with a clinical diagnosis of ASD and 14 age-matched and gender-matched typically developing peers. We tracked the evoked neural responses to syllable sequences in a naturalistic statistical learning corpus using magnetoencephalography (MEG) in the left primary auditory cortex, posterior superior temporal gyrus (pSTG), and inferior frontal gyrus (IFG), across three repetitions of the passage. In typically developing children, we observed a neural index of learning in all three regions of interest (ROIs), measured by the change in evoked response amplitude as a function of syllable surprisal across passage repetitions. As surprisal increased, the amplitude of the neural response increased; this sensitivity emerged after repeated exposure to the corpus. Children with ASD did not show this pattern of learning in all three regions. We discuss two possible hypotheses related to children’s sensitivity to bottom-up sensory deficits and difficulty with top-down incremental processing.
Significance Statement
Language acquisition involves segmenting the continuous speech stream into sounds, syllables, and words. Learning these units relies on both the properties of the input, as well as emerging high-order cognitive mechanisms that guide learning from the top-down. We examined the neurobiology underlying the integration of top-down and bottom-up information in statistical speech segmentation in children with and without autism spectrum disorder (ASD). We offer evidence of neural and behavioral effects of syllable-to-syllable processing in speech segmentation that differ in typically developing children from children with a clinical diagnosis of ASD. Our findings inform developmental and cognitive theories of language acquisition by examining the computational nature of speech segmentation across different populations of learners.
Introduction
Language acquisition involves segmenting continuous speech into sounds, syllables, and words. By detecting statistical regularities in the input, learners can incrementally anticipate upcoming information for subsequent word learning. For instance, after 2 min of exposure to a foreign language, infants begin to identify statistically frequent syllable sequences and treat those as labels for novel objects (Hay et al., 2011). Learning the linguistic units relies on the properties of the input; it is a bottom-up driven cognitive process. In parallel, experience and high-order cognitive mechanisms also guide this learning process from the top-down (Kuhl, 2004; Werker, 2018). However, little is known about the neurobiology underlying the integration of bottom-up and top-down information in statistical speech segmentation. This is an important knowledge gap that impedes our understanding of acquisition in typical development and neurodevelopmental disorders that impact language acquisition, such as autism spectrum disorder (ASD; Tager-Flusberg et al., 2005). We investigate neural signals underlying statistical learning in children with and without ASD using magnetoencephalography (MEG).
Behavioral work suggests that children with ASD may be as equally equipped as their neurotypically developing (NT) peers to use statistical patterns to find words in speech (Obeid et al., 2016). For example, Mayo and Eigsti (2012) varied the likelihood that syllables co-occur [transitional probability (TP)] in a 21 min long corpus and found similar segmentation outcomes for children with and without ASD. Scott-van Zeeland et al. (2010) also found comparable learning performance between NT and ASD children after exposure to a continuous speech stream. Importantly, the groups differed in their neural responses. With increased exposure to the input, NT children showed reduced activation in a fronto-temporal-parietal network, while children with ASD did not show task related changes in brain activity.
Both prior studies used artificial language materials which lacked varying prosodic and stress patterns integral to everyday speech (Johnson and Jusczyk, 2001), thus, leaving open the question of how individuals would perform given more natural language input. Indeed, children with ASD may struggle to find words in natural speech for at least two reasons. MEG studies show that children with ASD have a delayed mismatch response to speech and non-speech sounds (Roberts et al., 2011) and demonstrate atypical responses to irregular speech sound sequences (Brennan et al., 2016b; Galilee, et al., 2017). This may indicate potential deficits in bottom-up early sensory processing of speech. We label this the sensory-differences hypothesis.
In addition, children with ASD have difficulty extracting global regularities (“weak central coherence”; Frith, 1989) and allocating attention within sound sequences (Whitehouse and Bishop, 2008), which may be a disadvantage in the types of top-down processing necessary for statistical learning. Such differences are supported by reduced patterns of activation in a network of fronto-temporal regions associated with typical language acquisition (Redcay and Courchesne, 2008) which are more pronounced in children who have poor language learning outcomes (Lombardo et al., 2015). We label this the prediction-differences hypothesis. We propose that early sensory deficits and/or atypical predictive processing may lead to difficulties in extracting statistical regularities from fluent speech.
We asked children to listen to naturally spoken passages in Italian with a range of TPs between syllables. We quantify TP using the information processing metric of surprisal, defined as the inverse-log of conditional probability between two syllables (for details, see Materials and Methods; Hale, 2016). We apply this metric for the first time to measure syllable-to-syllable prediction in natural speech with a focus on children with and without ASD. To tease apart the hypotheses, we track evoked neural responses for syllables in left hemisphere regions implicated in key steps of speech processing (Hickok and Poeppel, 2007): early perception in the primary auditory cortex (LAC), mapping percepts to linguistic units in the posterior superior temporal gyrus (pSTG), and higher-order analysis of linguistic regularities in the inferior frontal gyrus (IFG). Passages were repeated three times to capture a neural index of learning, defined as change in the evoked amplitude as a function of surprisal across repetitions. In NT children, we expect to see the index of learning across all three regions of interest (ROIs). As surprisal increases, amplitude of the evoked neural response should increase; this sensitivity should emerge after repeated exposure to the passages. Crucially, this effect may differ between the NT and ASD groups. The sensory deficit hypothesis holds that ASD individuals will show reduced sensitivity to surprisal in early sensory regions, such as the left LAC and pSTG. The prediction hypothesis holds that children will show reduced sensitivity to surprisal within higher order regions like the left IFG.
Materials and Methods
Participants
Fifteen children with ASD (1 female, Mage = 10.00, SD = 1.16) and fourteen age and gender matched NT children (Mage = 10.06, SD = 1.46) participated in the study. All children (age range = 8–12 years) were prescreened for eligibility through a phone interview with a parent or caregiver and were monolingual English speakers. The study was approved by all participating institutional review boards, as part of a larger project assessing language and communication in ASD using MEG (Brennan et al., 2016b; Lajiness‐O'Neill et al., 2018; Brennan et al., 2019). Parents and children provided informed consent and assent and received monetary compensation for their participation.
Inclusion and exclusion criteria
Participants were recruited through local clinics and communities in southeast Michigan. During prescreening, caregivers completed the social communication questionnaire (SCQ; Rutter et al., 2003). The SCQ is a 40-item caregiver screening to assess communication and social functioning in individuals who may have an ASD. Items referenced across the symptomology domains of ASD are totaled for a single score and a cutoff classification score of 11 is often used for research purposes (Rutter et al., 2003). To participate in the current study, ASD-likely candidates required a SCQ ≥ 11 (Corsello et al., 2007), and NT participants required a SCQ < 11.
The behavior assessment system for children (BASC; Reynolds and Kamphaus, 2002) and the Wechsler abbreviated scale of intelligence-2 (WASI-2; Wechsler and Hsiao-pin, 2011) were administered to rule out adaptive and intellectual deficits consistent with intellectual disability. The BASC measures general behaviors and emotions of children such as hyperactivity, aggression, and conduct problems. The WASI-2 is a brief and reliable measure of intellectual functioning and includes subtests tapping into verbal, nonverbal, and general cognition. Inclusion criteria for all participants included at least low average intelligence [full-scale IQ (FSIQ) ≥ 80; Wechsler and Hsiao-pin, 2011].
A formal diagnosis of all ASD-likely participants was based on the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5; American Psychiatric Association, 2013) diagnostic criteria and the autism diagnostic observation schedule (ADOS), administered by a clinical and research reliable psychologist (Lord et al., 2012). The ADOS is a semi-structured standardized assessment of communication, play, social interaction, and restricted and repetitive behaviors. To confirm the diagnosis of ASD, the ADOS Module 3 was administered. The revised algorithm (see Gotham et al., 2007) was used to compute individual and a combined total score for subdomains of social interaction, communication, and stereotyped behaviors/circumscribed interests. Participants with ASD had a combined total score above the clinical cutoff suggestive for autism (Gotham et al., 2007).
Exclusionary criteria for ASD and NTs included any known history of head injury with loss of consciousness, other neurologic disorders including active epilepsy/seizures, environmental deprivation, anxiety disorders or other forms of psychopathology, and anything that might interfere with the MEG procedure (e.g., dental braces). Additional exclusion criteria for NTs included any history of developmental delay or a first-degree relative with an ASD diagnosis. Two NT participants were excluded from analyses because of equipment error during MEG data acquisition and one ASD participant was excluded because of an inability to comply with the task demands and tolerate the assessment procedures. The final group of 29 did not significantly differ in age or gender (see Table 1).
Mean (standard deviations) of standardized assessments
Experimental design
Participants passively listened to ∼6 min of a naturally produced passage in a foreign language (Italian) modeled after stimuli previously used by Hay et al. (2011; see Fig. 1). The Italian passage consisted of grammatically plausible but semantically nonsensical sentences made up of legal Italian words. To ensure natural production of Italian pronunciations, a female native Italian speaker recorded three different instances of the passage. Each participant listened to all three versions (three repetitions) presented via E-Prime Software 2.0 (Schneider et al., 2002). Of interest were the relative distributions and occurrences of eight key target syllables (fu, ga, me, lo, ca, ne, bi, ci) presented throughout the passage. A trigger signal marking the onset of each passage segment was used to pinpoint the time signature of these syllables, which was then aligned with the continuous MEG signal.
Schematic of the experimental stimuli as adapted from Hay et al. (2011). An excerpt of the ∼2-min-long Italian passage showing key target (controlled) syllables (red) and non-controlled syllables (green) pairs. The passage was repeated three times for a total duration of ∼6 min.
We tracked the exact timing of the syllable occurrences and the resulting brain responses given the following methodological manipulation. For each occurrence of a target syllable, the forward internal TP between its preceding syllable and the target syllable was calculated [i.e., frequency of target syllable given frequency of preceding syllable; TP = P(σ2|σ1)]. TP for all target syllables ranged from 0.028 to 1.00. These TP values were converted to surprisal [surprisal = -log2(TP)] as prior work on phonological and lexical processing has shown that linguistic frequencies affect processing on a logarithmic scale (Hale, 2001, 2016). This yielded a total of 576 surprisal values for each presentation of the target syllables across the three repetitions of the passage for each participant (for distributions of surprisal, see Fig. 2). This metric of surprisal allows us to measure, in a continuous way, the information conveyed by a linguistic event, such as the likelihood of a particular syllable, based on its given context. Thus, a context of low syllable-to-syllable TP yields high surprisal and high syllable-to-syllable TP yields low surprisal.
Histogram of the range of surprisal distributions of surprisal values across all target syllables.
The surprisal metric taps into the brain’s sensitivity to statistical regularities at multiple levels of representation (Hale, 2001; Levy, 2008). Prior work with surprisal has documented behavioral and neurobiological measures on adults at syntactic (Monsalve et al., 2012; Frank et al., 2013; Gwilliams and Marantz, 2015; Willems et al., 2016; Brennan et al., 2016a; Lopopolo et al., 2017; Gwilliams et al., 2018) and lexical or phonemic levels of processing (Gwilliams and Marantz, 2015; Lopopolo et al., 2017; Gwilliams et al., 2018).
The target syllables were drawn from four target words (fuga, melo, cane, bici) that were systematically placed throughout the corpus. The component syllables of fuga and melo (fu, ga, me, lo) appeared nowhere else in the passage, giving these words a high TP = 1.0 (surprisal = 0.0). In contrast, the component syllables of cane and bici (ca, ne, bi, ci) appeared within the passage another 24 times each (e.g., taCI, CAro), thus giving them a lower TP = 0.33 (surprisal = 1.585). The syllables of these words appeared 36 times within the passage, only 12 of which were in the target words and the others as initial (e.g., CAdi), medial (e.g., sindaCAto), or final (e.g., spreCA) syllables. The inclusion of these four legal Italian words, comprised of the key target syllables, allowed us to control and test for statistical learning effects of relatively moderate and highly predictive syllable sequences within a continuous and varied range of syllable probabilities.
Behavioral measures
After listening to the Italian passages, a statistical learning post-test was given outside the scanner to explicitly measure children’s ability to distinguish words with TP of 1.0 and 0.33 from novel Italian words that did not occur within the corpus. Children listened to a pair of words presented via E-Prime Software 2.0. One of the two words was a bi-syllabic target word from the passages and the other word (non-target) was one of four bi-syllabic Italian words comprised of syllable combinations that were not included in the Italian corpus (e.g., mugo, azza, pipa, zebu). However, the component syllables of these novel words did appear in the Italian passages (e.g., mu). Children were tested using a two-alternative forced-choice task by asking, “Which of the following two words could be a possible word in the language you just heard?” Participants were instructed to press the “1” key if the first word could be a possible word in the foreign language, and similarly, to press the “2” key if the second word could be a possible word in the foreign language. Children completed four practice trials using common English words (e.g., teacher) versus nonsense, phonotactically illegal words (e.g., pmfkin) followed by 16 trials of the Italian target and non-target pairs of words.
Standardized measures of language and attention were also obtained as part of the larger project investigating language and communication in children with ASD (see Table 1). For the purpose of this particular study, we simply report descriptive statistics on a subset of these measures to note the language and communication skills of the ASD group studied here, and for discussion in relation to the previous studies of statistical learning in children with ASD (Scott-van Zeeland et al., 2010; Mayo and Eigsti, 2012). Measures of language include the comprehensive test of phonological processing (CTOPP; Wagner et al., 1999), clinical evaluation of language fundamentals (CELF-5; Semel et al., 2006), and test of problem solving (TOPS 3; Bowers et al., 2005) to assess phonological, syntactical, grammatical, and pragmatic competence, respectively. A test of auditory attention included the Auditory Attention subtests of the NEPSY developmental neuropsychological assessment (NEPSY-II; Korkman et al., 2007).
Procedure
Participants completed the neuroimaging portion (∼10 min), immediately followed by the behavioral statistical learning test, and lastly, the behavioral battery of language and cognitive assessments (60–90 min). Participants laid supine on a bed with a helmet-shaped dewar containing 148 magnetometer MEG sensors placed around their head (4D Neuroimaging). Children were instructed to keep their eyes open (monitored via video) and listen to the foreign language while remaining as still as possible. During scanning, the stimuli were delivered via computer speakers placed at an aperture in the shielded room; loudness was set at a comfortable level for each participant.
Data acquisition and processing
Three small electrode coils, used to transmit head location information to the neuromagnetometer probe, was affixed to each participant’s forehead with two-sided tape. Additional localization coils were attached to each preauricular point (PA), anterior to the tragus of the ear on the two sides of the head. Standard automatic probe position routines (4D Neuroimaging Hardware) were used to locate the five coils placed on the head with respect to the neuromagnetometer detector coils and to digitize the shape of the head for co-registration to a standard MRI. Neuromagnetic fields were recorded with a whole-head 148-chanel magnetometer (WH 2400, 4D Neuroimaging system). During acquisition, the data were bandpass filtered between 0.1 and 100 Hz and digitally sampled at 508.63 Hz. Data were recorded continuously for later analyses. The onset of each repetition of the Italian passage was recorded as pulse codes whose strength indicated the type of stimulus on a trigger channel collected simultaneously with the MEG data. The location of events on the trigger and response channels were used to select epochs from −0.3 to 1 s of MEG data around each target syllable for each 2-min repetition of the passage. Data analysis was performed using the Fieldtrip toolbox for EEG/MEG-analysis (Oostenveld et al., 2010).
Extracranial sources of interference were attenuated by subtracting signals recorded by five gradiometer and six magnetometer reference channels placed ∼15–20 cm from the head. Epochs were filtered using a discrete Fourier transform filter at 60, 120, and 180 Hz with a 2-s padding and a high pass filter at 0.5 to attenuate line noise. Trials and channels containing artifacts were removed based on visual inspection. No >23 channels of 148 and 106 trials of 576 were removed during artifact rejection (mean trials removed ASD = 50, NT = 58). The two groups did not significantly differ on the total number of channels (t(27) = 0.11, p = 0.74) or trials (t(27) = 2.07, p = 0.16) removed.
ROIs analysis
Source time courses were reconstructed on to a 7- to 11-year-old pediatric template brain (Fonov et al., 2011) at four ROIs using Montreal Neurologic Institute (MNI) coordinates. Three ROIs were selected a priori based on previously reported findings on statistical learning paradigms in the speech domain with adults (Karuza et al., 2013), which included left primary auditory cortex (x = −48, y = 18, z = 2), posterior region of the left STG (x = −64, y = −12, z = 4), and left IFG (BA 44; x = −52, y = 26, z = −6). We also included a right superior parietal region (x = 24, y = −46, z = 60) as a control ROI.
Single-trial source-localized time courses were estimated using a linear constrained minimum variance (LCMV) beamformer (Van Veen et al., 1997). The LCMV beamformer forms a linear combination of the external field measurements to monitor the activity at a single brain location, while optimally suppressing all other noise and other source contributions to the MEG data. The beamformer filter was estimated using a sensor covariance matrix based on the average of all epochs per participant. MEG sensor averages were then projected through the filter for each location, yielding source time courses in three dimensions for each ROIs. The root-mean-square (RMS) time course within three 100-ms time bins (Teinonen et al., 2009): 200–300, 250–350, and 300–400 ms following syllable onset, at each location, per participant, per trial, for each repetition of the passage was entered into the statistical analysis. Time windows of interest were chosen based on two related accounts: first, prior work shows consistent modulation of the evoked response between 200 and 500 ms during statistical segmentation of a syllable stream (Sanders et al., 2002; Cunillera et al., 2006); second, theoretical frameworks of speech perception suggest that temporal sampling of the speech stream for syllables occurs over longer intervals, roughly 150–300 ms, and that this time window carries syllable-boundary and syllabic-rate cues as well as other prosodic and stress cues relevant for the type of perceptual processing assessed here (Näätänen and Picton, 1987; Poeppel, 2003; Hickok and Poeppel, 2007; Giraud and Poeppel, 2012).
Statistical analysis
To test for a neural index of learning, we measured the relative change in evoked response amplitude as a function of surprisal across the repeated passages. A linear mixed-effects model was fit using the lmer function in the lme4 package in R (Bates et al., 2015) with passage repetition, ROI, group, and time window as categorical variables and surprisal as a continuous variable (all as fixed effects). Variation among participants was taken into account by including individuals as a random effect intercept; p values were computed via the Satterthwaite approximation using the lmerTest package in R. Statistical inference was based on F tests of main effects and higher order interactions using the anova function in R. We excluded 54 trials from statistical analyses corresponding to target syllables with only one occurrence (i.e., a trivial case of TP = 1.0, surprisal = 0).
Additionally, a Bayesian multilevel model was fit using the brms package (Bürkner, 2017) with the same parameters as mentioned above. Models were fit using two chains of 1000 warm-up iterations and 2000 sampling iterations. Prior distributions on all terms were the default values from brm(). To report on the key manipulations of interest (e.g., change in evoked response as a function of surprisal for third repetition between NT and ASD groups), we extracted the mean β coefficient and the 95% credible interval (CI) for the slope of the amplitude over surprisal as sampled from the posterior distribution of the model. All model terms had a R-hat value ≤1.01.
For behavioral responses on the statistical learning task, the proportion of correct responses was calculated out of 16 trials from 14 NT and a subset of 12 ASD participants who completed the task (three ASD children did not complete the postscan behavioral test because of computer error and/or inability to comply with the task demands).
Code accessibility
The brms model output described in the paper is freely available online at Open Science Framework, https://osf.io/zbvhc/.
Results
Statistical learning behavioral results
Performance on the Italian behavioral test is shown in Figure 3. A two-way ANOVA [group (NT, ASD) × TP (high, low)] revealed there was a significant main effect of group (F(1,48) = 24.3, p < 0.001, ηp2 = 0.34). NT children outperformed children with ASD in correctly identifying both the high TP (t(24) = 2.78, p = 0.002, Cohen’s d = 0.97) and low TP (t(23) = 4.33, p = 0.001, d = 1.28) words from novel Italian words, as revealed by independent sample t tests. There was no group by TP interaction effect (F(1,48) = 0.95, p = 0.33, ηp2 = 0.02). In both groups, there were no differences in accurately identifying high TP from low TP words in comparison to novel Italian words (no main effect of condition; F(1,48) = 0.02, p = 0.89, ηp2 = 0.00). Therefore, accuracy on all trials were averaged as one and counted as total proportion of correct responses for each group and tested against chance (i.e., 0.5). One-sample t tests showed that NT children had above-chance accuracy in identifying the target-words [M (SD) = 0.68 (0.17); t(13) = 3.93, p = 0.002, d = 0.13], whereas children with ASD performed below chance in accurately identifying the target words [M (SD) = 0.40 (0.14), t(11) = −2.71, p = 0.02, d = −1.23].
Proportion of correct responses to high and low TP target words in comparison to novel Italian words, calculated out of 16 trials from 14 NT and a subset of 12 ASD children who completed the behavioral learning test. Error bars represent standard error.
MEG results
Figure 4 shows the linear effect of evoked response amplitude as a function of syllable surprisal for each group, ROI, and passage repetition. These plots are averaged across time windows for ease of visualization (the statistical results, summarized below, showed no higher-order interactions with time). ANOVA results are reported in Table 2.
Results of an ANOVA comparing mean amplitude across group (ASD and NT), syllable surprisal, passage repetitions, ROIs, and time windows
Linear effect of evoked response amplitude (averaged across time windows) as a function of syllable surprisal for each group and ROI across the first, second, and third passage repetitions (light blue to dark blue lines). Gray shading represents SE.
A neural index of learning would be reflected by an increase in the amplitude of the evoked response as a function of surprisal and passage repetition. We tested whether this interaction effect differed across groups, ROIs, and time windows. We found a key four-way interaction showing surprisal by passage repetition varied by group and ROI (p = 0.001, ηp2 = 0.95). This interaction reflects the fact that a positive slope for the effect of surprisal emerged in the third repetition for NT participants but not for ASD participants. The pattern of positive slope in the third repetition in the NT group is consistent across the left LAC, pSTG, and IFG regions and differs for the right parietal region.
We further break-down this interaction effect. In LAC (Fig. 4A), for the NT group, the effect of evoked response amplitude across surprisal (slope of blue lines) shows a positive incline in the third passage repetition relative to the first two passage repetitions (β = 4.53, CI95% = [2.85, 6.21]). This pattern of data differs for the ASD group where we observe a flat trend in passage repetition three in the LAC (β = −1.66, CI95% = [−2.86, −0.44]), relative to the first two passage repetitions. In LSTG (Fig. 4B), for the NT group, the blue line is overall flat for the first two repetitions and shows a positive trend in the third passage repetition. Meanwhile, the ASD group’s blue lines reflect a slight negative trend in the first and third repetitions and a positive trend in the second repetition. In the LIFG (Fig. 4C), we again observe overall flat blue line for the NT group in passage repetition one and a positive trend in the second and third repetitions; no such pattern is observed for ASD across all three repetitions. In the right superior parietal, as expected, we observe no learning response across passage repetitions in both NT and ASD groups (Fig. 4D).
The ANOVA showed a marginally significant three-way interaction of surprisal by repetition by group effect (p = 0.046, ηp2 = 0.82). Additionally, we observed several significant two-way interactions: the effect of surprisal varied across ROI (p < 0.001, ηp2 = 0.96), brain activity across the three repetitions varied by ROIs (p = 0.001, ηp2 = 0.95), the effect of surprisal varied by group (p = 0.032, ηp2 = 0.77), and brain activity at the three ROIs varied by group (p = 0.001, ηp2 = 0.94). We also observed several lower-order significant effects including main effects of surprisal (ηp2 = 0.94), passage repetition (ηp2 = 0.94) and ROIs (ηp2 = 1.0; all p < 0.001). The main effect of time window (ηp2 = 0.64) and group (ηp2 = 0.01) were not significant. The five-way interaction between surprisal, time window, repetition, ROIs, and group was not significant.
Discussion
The present study used surprisal to investigate the neural mechanisms underlying speech segmentation in typical development and in children with ASD. Speech segmentation, foundational to language acquisition, requires the integration of top-down and bottom-up cognitive processes. To this end, we proposed two possible hypotheses as to why children with ASD might struggle to use distributional cues to find words in speech: a sensory-differences hypothesis that suggests potential deficits in the bottom-up early sensory processing of auditory input, and a prediction-differences hypothesis related to potential deficits in the high-order analysis of concatenated input. To investigate these two hypotheses, we used MEG to examine the functionality of the left primary auditory cortex, left posterior STG, and left IFG region during a passive language listening paradigm. Our key interest was a neural index of learning, measured as an increase in the amplitude of the evoked response as a function of surprisal. We expected this interaction to emerge with repeated exposure to the language paradigm. Critically, we tested whether neural responses differed across groups and ROIs. We observed the neural index of learning in typically developing children, but not in the children with ASD, across all three ROIs. These data speak to two competing hypotheses.
First, prior literature on speech and sound processing have shown that children with ASD present with low-level auditory processing deficits, such as disruptions or delays in early neural responses to both verbal and non-verbal acoustic stimuli (Bomba and Pang, 2004; Jeste and Nelson, 2009; Edgar et al., 2014, 2015). In fact, the set of children with ASD in this sample previously showed atypical responses to phototactically illegal, in comparison to legal, sequences (Brennan et al., 2016b). Our LAC and pSTG results are consistent with the sensory-differences hypothesis that suggests a possible disruption in initial acoustic processing may have led to difficulties in extracting speech sound patterns from natural fluent speech (Roberts et al., 2010, 2011).
Second, research into the development of auditory pathways in ASD show atypical development of white matter and cortical function within the auditory and language systems (Berman et al., 2016), such as delayed STG auditory 100-ms responses (Roberts et al., 2010) and atypical hemispheric lateralization of auditory responses (Stroganova et al., 2013). These patterns of responses in auditory processing may be because of the documented deficits of orienting attention (Whitehouse and Bishop, 2008). ASD children in this study showed a varied pattern of neural responses to syllable sequences as compared with neurotypical peers, within and across all three ROIs. Specifically, the IFG results are in line with the prediction-differences hypothesis. Prior work has suggested that the language network’s feed-forward mechanisms of higher-order computations might be particularly impaired in those with ASD and poor language learning outcomes (Courchesne and Pierce, 2005; Redcay et al., 2008). While speculative, such impairments have the potential to propagate extraction and integration learning deficits in ASD, especially in the beginning phases of learning.
Behavioral measures of statistical learning suggest that ASD children could be as sensitive to statistical regularities as their typically developing peers (Haebig et al., 2017), across paradigms with (Scott-van Zeeland et al., 2010) and without (Mayo and Eigsti, 2012) additional cues to segmentation. In the present study, most of the ASD children were unable to identify the target syllable pairs heard within the novel fluent speech relative to a foil. Performance for this group of children with ASD was significantly below chance, suggesting that some learning may be happening within the 6-min exposure. The pattern of data suggests that children with ASD were able to recognize some syllable components that were part of words used in the postscan behavioral test, but not the syllable sequences that formed the target words. One interpretation of these findings is consistent with to our second hypothesis relating to higher-order analysis of linguistic events. Children with ASD may have been sensitive to the frequency of syllables presented but failed in the appropriate grouping of syllable sequences given the distributional cues. This is an interesting finding that warrants further investigation.
Our NT and ASD children did not differ in their phonological competence, although they differed on measures of attention, syntax, and pragmatics. Children with ASD showed normative performance on the phonological awareness tasks that ask children to segment and manipulate word sounds (e.g., elision, CTOPP), but poorer performance on syntax tasks (e.g., formulating sentences, CELF-4) that tap into children’s knowledge of language structure. Observed differences in neural learning patterns within left hemisphere regions and poor statistical learning performance in ASD may be revealing of ASD children’s underlying difficulty in extracting linguistic structure or sequence learning that extends beyond processing of single speech sounds. However, exploratory bivariate correlations between language and attention measures with experimental task performance indicated no meaningful trends (r = 0.01–0.37). The sample size significantly limits our ability to examine the links between the current paradigm and children’s language or cognitive skills. In future work, we aim to take a closer look at defining subpopulations of children with ASD and their learning outcomes.
The Italian statistical learning paradigm, adapted from Hay et al. (2011), maintained virtually all complexities found in natural speech with the exception that the transitional probabilities between syllable sequences were precisely manipulated in a subset of words. By specifically examining prediction-based processing demands with the measure of surprisal, we were able to assess the computational nature of statistical learning across a range of unexpectedness values. This allowed us to control and test for statistical learning effects of relatively moderate and highly predictive syllable sequences within a continuous and varied range of syllable probabilities. Prediction has been implicated as an important component of early learning (Romberg and Saffran, 2013), and some suggest prediction plays a major role in the underlying impairments observed in ASD (Sinha et al., 2014). This hypothesis suggests that tracking of statistical regularities in ASD might compare to neurotypical peers when the environment is relatively stable, and perhaps with longer exposure time (e.g., 21 min in Mayo and Eigsti, 2012). However, when tasks involve varying distribution of events (e.g., range of probabilities), integration of new events with prior experiences may be more difficult for children with ASD, resulting in learning differences between the two groups.
The use of a naturalistic language paradigm, combined with MEG imaging, is one of the key innovations of this study. Previous studies of speech segmentation that vary the type and number of speech cues available to learners have found differences in the neural activity across manipulations, despite participants’ inability to behaviorally detect differences between conditions. This has been documented in a sample with typically developing children (McNealy et al., 2010; Scott-van Zeeland et al., 2010) and adults (McNealy et al., 2006) using fMRI. Scott-van Zeeland et al. (2010) found that both children with and without ASD were at chance in their behavioral learning performance. Importantly, they differed in their neural responses. First, the authors found that patterns of brain activity in the fronto-temporo-parietal network changed with the increase in the number of cues to word boundaries, but only in the group of typically developing children. Second, the authors observed a lack of frontal lobe engagement during task of speech processing in children with ASD. Lastly, children with more severe communicative deficits showed fewer changes in brain activity with increased exposure to speech. Our results parallel these findings and provide corroborating support for the hypotheses that integration of top-down and bottom-up cognitive processes are involved in successful speech segmentation, which may be impaired in children with ASD. In the present study, we found no evidence of a timing effect in relation to early speech processing in the auditory cortex and later analysis in higher-level auditory and speech processing regions. This an interesting null result that warrants further investigation with a more granular experimental design.
The use of the beamforming method for localization introduces some limitations, such as possible differences in the quality of fit between ASD and NT groups. Thus, we cannot rule out an anatomic-based explanation of our results. However, we have two reasons to think such an explanation is not likely. First, potential anatomic differences in ASD and NT may be smaller than the spatial specificity of the beamformer. Second, the anatomical differences in the left hemisphere between ASD and NT groups pointed out by Berman et al. (2016) emerge at later ages than the 8- to 12-year-old range studied in our sample. To test this reasoning in future studies, we could measure the statistical fit of the beamforming method across the two groups or acquire individual MRI anatomical scans for each participant to estimate source localizations with more precision.
In sum, the present study offers novel evidence investigating the neural mechanisms underlying statistical learning using a naturalistic language paradigm, in typical development and in children with ASD. Results show neural and behavioral effects of speech segmentation specific to syllable-level surprisal, extending previous work by examining statistical learning from two perspectives – input-driven auditory processing and higher-order predictive processing. These findings offer insight into the cognitive mechanisms foundational for language acquisition and helps inform our understanding of development across different populations of learners.
Acknowledgments
Acknowledgements: We thank Stefanie Younce for assistance with data collection and University of Michigan Department of Psychology and the Center for Human Growth and Development.
Footnotes
The authors declare no competing financial interests.
This work was supported in part by the University of Michigan M-Cubed initiative (J.R.B., I.K., R.L.-O.). This publication was also supported by the The Eunice Kennedy Shriver National Institute of Child Health and Human Development grant R01HD092498 (I.K.).
This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.