Expectation-based syntactic comprehension
Introduction
There are several important properties that must be accounted for by any realistic theory of human sentence comprehension. These include:
- 1.
robustness to imperfectly formed input;
- 2.
accurate ambiguity resolution;
- 3.
inference on the basis of incomplete input; and
- 4.
differential, localized processing difficulty.
This paper attempts to show how these four properties can be tightly interconnected in a probabilistic, expectation-based theory of syntactic comprehension. In particular, this paper focuses on deriving a theory of Property 4 – namely, that not all sentences are equally easy to comprehend, and different parts of sentences differ in their difficulty – from Properties 1 through 3.
To a considerable extent, the dominant paradigm for investigating differential processing difficulty has been what I will call resource-requirement or resource-limitation theories. These propose that:
- •
some syntactic structures require more of a given resource than do others; and
- •
that resource is in short supply in the human parser; and
- •
this gives rise to greater processing difficulty for more resource-intensive structures.
Typically this limited resource is some form of memory. The resource-limitation position has also come to inform a persistent view of ambiguity resolution: the resource-limited parser can only pursue one alternative at a time (i.e., the parser is serial), and in the face of local ambiguity, the processor chooses the alternative that minimizes the resources consumed. This viewpoint has inspired a variety of ambiguity resolution theories, including Late Closure (Frazier & Fodor, 1978) and Minimal Attachment (Frazier, 1979). Perhaps the most salient modern incarnations of memory-centered resource-requirement theories are, for ambiguity resolution, the Active Filler Hypothesis (AFH; Clifton & Frazier, 1989); and, for locally unambiguous sentences, the Dependency Locality Theory (DLT; Gibson, 1998, Gibson, 2000).
At the same time, an alternative line of research has focused on the role of expectations in syntactic processing. This idea has historically been associated most closely with constraint-satisfaction processing models such as those of MacDonald, 1993, MacDonald et al., 1994, Tanenhaus et al., 1995, and McRae, Spivey-Knowlton, and Tanenhaus (1998), and can be traced back to early work by Marslen-Wilson (1975).1 This line of work typically takes a strong integrationist and parallelist perspective: the comprehender draws on a variety of information sources (structural, lexical, pragmatic, discourse) to evaluate in parallel a number of possible alternatives for the input seen thus far. For the most part, the primary concern of constraint-based work has been ambiguity resolution, the argument being that possible structural analyses are ranked according to their plausibility on a number of dimensions, rather than according to the amount of resources they consume. Empirically observed processing difficulty after local ambiguity resolution is informally ascribed to either a reranking of the favored analysis, or competition between closely ranked analyses. The constraint-based position can be thought of as a resource-allocation approach to syntactic processing: the parser allocates different amounts of resources to different interpretations of the partial input, and difficulty arises when those resources turn out to be inefficiently allocated.
As argued by Jurafsky (2003), probability theory fits naturally as an underlying infrastructure for constraint-based approaches to express the rational (in the sense of Anderson, 1990) combination of multiple information sources. The use of probability theory for psycholinguistic modeling has in fact become more prevalent over the past decade, beginning with Jurafsky (1996) and continuing in Narayanan and Jurafsky, 1998, Narayanan and Jurafsky, 2002, Crocker and Brants, 2000.
This paper proposes a resource-allocation theory of processing difficulty grounded in parallel probabilistic ambiguity resolution: the possible structural analyses consistent with a partial input are preferentially ranked in parallel, and the difficulty of a new word corresponds to the amount of reallocation necessary to reflect the word’s effect on the preference ranking. Section 2 gives the derivation of this theory and shows that it turns out to be equivalent to the surprisal theory originally proposed by Hale (2001). 2 As a result we have a single theory (simply called the surprisal theory in this paper) unifying the idea of the work done incremental probabilistic disambiguation with expectations about upcoming events in a sentence. In this theory, surprisal serves as a causal bottleneck between the linguistic representations constructed during sentence comprehension and the processing difficulty incurred at a given word within a sentence. This paper argues that the surprisal theory, when conjoined with probabilistic models chosen according to appropriate principles (see Section 3), makes a wide range of precise predictions consistent with empirical observations, while remaining relatively neutral as to the exact representations of possible structural analyses. Section 4 contrasts the surprisal theory with alternative resource-allocation and resource-limitation theories of processing difficulty, illustrating the general conditions under which their predictions maximally diverge. The remainder of the paper examines a number of established experimental results pertaining to these divergent predictions, and shows that they lend considerable support to the surprisal theory.
Section snippets
Deriving a resource-allocation theory of processing difficulty
This section presents a new derivation of a theory of resource-allocation processing difficulty, based on a highly general conception of sentence comprehension, and accounting for principles that are necessary for any realistic model of human sentence processing.
A language contains a (normally infinite) set of complete structures such that a fully disambiguated utterance corresponds to exactly one structure. Each structure contains the complete string of the utterance, plus presumably at least
The structure of probabilistic grammatical models
The goal of this paper is to present an argument for the presence of probabilistically formulated expectation-based effects in syntactic comprehension, and more specifically to advocate a particular relationship – surprisal – between incremental probabilistic disambiguation and processing difficulty. I do not take it as a goal of this paper to advance a particular probabilistic model over trees or strings as the correct one used by adult native speakers of any language. The formulation of such
Predictability
The surprisal theory bears the greatest conceptual similarity to the well-known observation that words are easier to comprehend in contexts where they are highly predictable (e.g., (2-a) below) than in unconstraining contexts (2):
(2) a. He mailed the letter without a stamp. b. There was nothing wrong with the car.
This effect of predictability has been observed in both eye-tracking reading studies, as reduced reading time and increased skipping probability (e.g., Ehrlich & Rayner, 1981), and in
Verb-final contexts, surprisal, and locality
There are contexts in nearly every language where a head follows one or more of its dependents. When a language comprehender recognizes that a partial input has entered such a context, they are in a position where they obtain increasing amounts of information about the upcoming head. Intuitively, this accumulating information has two effects: on the one hand it places a greater memory load on the comprehender, on the other hand it can help sharpen comprehenders’ expectations about the upcoming
When ambiguity facilitates comprehension
The fully parallel surprisal theory entails an unusual relationship between structural ambiguity and processing difficulty. In most processing theories, local structural ambiguity leads to difficulty under a variety of circumstances. In serial theories, local ambiguity is a precondition for garden-path effects; in competition-based parallel accounts, equibias while an ambiguity is unresolved is the primary source of syntactic comprehension difficulty. In the surprisal theory, on the other hand,
The subject preference
Variable word order in natural languages can give rise to local ambiguities involving which grammatical function (GF) is assigned to a particular noun phrase. Such local ambiguity is possible in a wide variety of languages: although languages with free word order often use case to mark GFs on noun phrases, syncretism of case form across multiple GFs is also widespread, being documented in Australian, Finno-Ugric, Indo-European, and Turkic languages (e.g., Carstairs, 1984, Comrie, 1978, Comrie,
Empirical difficulties for the theory
In Sections 5 Verb-final contexts, surprisal, and locality, 6 When ambiguity facilitates comprehension, 7 The subject preference we have seen cases where surprisal makes predictions consistent with online processing data that may be difficult to reconcile with other theories. This section touches on empirical data that may be difficult for surprisal, and suggests what support for other types of processing theories can be drawn from these data.
Conclusion
Recent experimental results in syntactic ambiguity resolution indicate that comprehenders incrementally integrate a variety of evidential knowledge in the process of discriminating the preferred interpretation of a sentence; probability theory serves as a coherent architecture for this constraint-based, resource-allocation paradigm of ambiguity resolution. We can extend the parallel, probabilistic disambiguation perspective of incremental sentence processing into a theory of syntactic
Acknowledgments
This work has benefited from presentation and discussion at a variety of venues, including the 2005 annual meeting of the Linguistic Society of America and the 18th annual CUNY Sentence Processing Conference. I am grateful to feedback on the manuscript from John Hale, Florian Jaeger, Dan Jurafsky, Andrew Kehler, Frank Keller, Christopher Manning, Don Mitchell, Martin Pickering, Ivan Sag, Tom Wasow, and three anonymous reviewers. I accept full responsibility for all errors and omissions. Part of
References (105)
- et al.
Incremental interpretation at verbs: Restricting the domain of subsequent reference
Cognition
(1999) - et al.
Grammar overrides frequency: evidence from the online processing of flexible word order
Cognition
(2002) - et al.
Toward a connectionist model of recursion in human linguistic performance
Cognitive Science
(1999) - et al.
Contextual effects on word perception and eye movements during reading
Journal of Verbal Learning and Verbal Behavior
(1981) Finding structure in time
Cognitive Science
(1990)- et al.
A rose by any other name: Long-term memory structure and sentence processing
Journal of Memory and Language
(1999) - et al.
Recovery from misanalyses of garden-path sentences
Journal of Memory and Language
(1991) - et al.
The sausage machine: A new two-stage parsing model
Cogntion
(1978) Linguistic complexity: Locality of syntactic dependencies
Cognition
(1998)- et al.
Effects of noun phrase type on sentence complexity
Journal of Memory and Language
(2004)
Absence of real evidence against competition during syntactic ambiguity resolution
Journal of Memory and Language
A probabilistic model of lexical and syntactic access and disambiguation
Cognitive Science
The role of discourse context in the processing of a flexible word-order language
Cognition
Individual differences in syntactic processing: The role of working memory
Journal of Memory and Language
Structural case in Finnish
Lingua
The interaction of lexical and syntactic ambiguity
Journal of Memory and Language
Low-level predictive inference in reading: The influence of transitional probabilities on eye movements
Vision Research
Modeling the influence of thematic fit (and other constraints) in on-line sentence comprehension
Journal of Memory and Language
Effects of merely local syntactic coherence on sentence processing
Journal of Memory and Language
Dynamical models of sentence processing
Cognitive Science
Adjunct attachment is not a form of lexical ambiguity resolution
Journal of Memory and Language
Evidence against competition during syntactic ambiguity resolution
Journal of Memory and Language
The adaptive character of human thought
An integrated theory of the mind
Psychological Review
Outlines of a constraint on syncretism
Folia Linguistica
Comprehending sentences with long distance dependencies
Definite direct objects and referent identification
Pragmatics-Microfiche
On delimiting cases
Wide-coverage probabilistic sentence processing
Journal of Psycholinguistic Research
Parsing in different languages
The complexity of the vocabulary of Bambara
Linguistics and Philosophy
Distributed representations, simple recurrent networks, and grammatical structure
Machine Learning
SWIFT: a dynamical model of saccade generation during reading
Psychological Review
From the cover: Phonological typicality influences on-line sentence comprehension
Proceedings of the National Academy of Sciences
Effects of contextual predictability and transitional probability on eye movements during reading
Journal of Experimental Psychology: Learning, Memory, and Cognition
Reading relative clauses in English
Language and Cognitive Processes
Distinguishing serial and parallel parsing
Journal of Psycholinguistic Research
The dependency locality theory: A distance-based theory of linguistic complexity
Cited by (1326)
Large-scale benchmark yields no evidence that language model surprisal explains syntactic disambiguation difficulty
2024, Journal of Memory and LanguageA predictive coding model of the N400
2024, CognitionWhat could have been said? Alternatives and variability in pragmatic inferences
2024, Journal of Memory and LanguageThe representation of agreement features in memory is updated during sentence processing: Evidence from verb-reflexive interactions
2024, Journal of Memory and LanguageSEAM: An integrated activation-coupled model of sentence processing and eye movements in reading
2024, Journal of Memory and Language