Elsevier

Cognition

Volume 106, Issue 3, March 2008, Pages 1126-1177
Cognition

Expectation-based syntactic comprehension

https://doi.org/10.1016/j.cognition.2007.05.006Get rights and content

Abstract

This paper investigates the role of resource allocation as a source of processing difficulty in human sentence comprehension. The paper proposes a simple information-theoretic characterization of processing difficulty as the work incurred by resource reallocation during parallel, incremental, probabilistic disambiguation in sentence comprehension, and demonstrates its equivalence to the theory of Hale [Hale, J. (2001). A probabilistic Earley parser as a psycholinguistic model. In Proceedings of NAACL (Vol. 2, pp. 159–166)], in which the difficulty of a word is proportional to its surprisal (its negative log-probability) in the context within which it appears. This proposal subsumes and clarifies findings that high-constraint contexts can facilitate lexical processing, and connects these findings to well-known models of parallel constraint-based comprehension. In addition, the theory leads to a number of specific predictions about the role of expectation in syntactic comprehension, including the reversal of locality-based difficulty patterns in syntactically constrained contexts, and conditions under which increased ambiguity facilitates processing. The paper examines a range of established results bearing on these predictions, and shows that they are largely consistent with the surprisal theory.

Introduction

There are several important properties that must be accounted for by any realistic theory of human sentence comprehension. These include:

  • 1.

    robustness to imperfectly formed input;

  • 2.

    accurate ambiguity resolution;

  • 3.

    inference on the basis of incomplete input; and

  • 4.

    differential, localized processing difficulty.

This paper attempts to show how these four properties can be tightly interconnected in a probabilistic, expectation-based theory of syntactic comprehension. In particular, this paper focuses on deriving a theory of Property 4 – namely, that not all sentences are equally easy to comprehend, and different parts of sentences differ in their difficulty – from Properties 1 through 3.

To a considerable extent, the dominant paradigm for investigating differential processing difficulty has been what I will call resource-requirement or resource-limitation theories. These propose that:

  • some syntactic structures require more of a given resource than do others; and

  • that resource is in short supply in the human parser; and

  • this gives rise to greater processing difficulty for more resource-intensive structures.

Typically this limited resource is some form of memory. The resource-limitation position has also come to inform a persistent view of ambiguity resolution: the resource-limited parser can only pursue one alternative at a time (i.e., the parser is serial), and in the face of local ambiguity, the processor chooses the alternative that minimizes the resources consumed. This viewpoint has inspired a variety of ambiguity resolution theories, including Late Closure (Frazier & Fodor, 1978) and Minimal Attachment (Frazier, 1979). Perhaps the most salient modern incarnations of memory-centered resource-requirement theories are, for ambiguity resolution, the Active Filler Hypothesis (AFH; Clifton & Frazier, 1989); and, for locally unambiguous sentences, the Dependency Locality Theory (DLT; Gibson, 1998, Gibson, 2000).

At the same time, an alternative line of research has focused on the role of expectations in syntactic processing. This idea has historically been associated most closely with constraint-satisfaction processing models such as those of MacDonald, 1993, MacDonald et al., 1994, Tanenhaus et al., 1995, and McRae, Spivey-Knowlton, and Tanenhaus (1998), and can be traced back to early work by Marslen-Wilson (1975).1 This line of work typically takes a strong integrationist and parallelist perspective: the comprehender draws on a variety of information sources (structural, lexical, pragmatic, discourse) to evaluate in parallel a number of possible alternatives for the input seen thus far. For the most part, the primary concern of constraint-based work has been ambiguity resolution, the argument being that possible structural analyses are ranked according to their plausibility on a number of dimensions, rather than according to the amount of resources they consume. Empirically observed processing difficulty after local ambiguity resolution is informally ascribed to either a reranking of the favored analysis, or competition between closely ranked analyses. The constraint-based position can be thought of as a resource-allocation approach to syntactic processing: the parser allocates different amounts of resources to different interpretations of the partial input, and difficulty arises when those resources turn out to be inefficiently allocated.

As argued by Jurafsky (2003), probability theory fits naturally as an underlying infrastructure for constraint-based approaches to express the rational (in the sense of Anderson, 1990) combination of multiple information sources. The use of probability theory for psycholinguistic modeling has in fact become more prevalent over the past decade, beginning with Jurafsky (1996) and continuing in Narayanan and Jurafsky, 1998, Narayanan and Jurafsky, 2002, Crocker and Brants, 2000.

This paper proposes a resource-allocation theory of processing difficulty grounded in parallel probabilistic ambiguity resolution: the possible structural analyses consistent with a partial input are preferentially ranked in parallel, and the difficulty of a new word corresponds to the amount of reallocation necessary to reflect the word’s effect on the preference ranking. Section 2 gives the derivation of this theory and shows that it turns out to be equivalent to the surprisal theory originally proposed by Hale (2001). 2 As a result we have a single theory (simply called the surprisal theory in this paper) unifying the idea of the work done incremental probabilistic disambiguation with expectations about upcoming events in a sentence. In this theory, surprisal serves as a causal bottleneck between the linguistic representations constructed during sentence comprehension and the processing difficulty incurred at a given word within a sentence. This paper argues that the surprisal theory, when conjoined with probabilistic models chosen according to appropriate principles (see Section 3), makes a wide range of precise predictions consistent with empirical observations, while remaining relatively neutral as to the exact representations of possible structural analyses. Section 4 contrasts the surprisal theory with alternative resource-allocation and resource-limitation theories of processing difficulty, illustrating the general conditions under which their predictions maximally diverge. The remainder of the paper examines a number of established experimental results pertaining to these divergent predictions, and shows that they lend considerable support to the surprisal theory.

Section snippets

Deriving a resource-allocation theory of processing difficulty

This section presents a new derivation of a theory of resource-allocation processing difficulty, based on a highly general conception of sentence comprehension, and accounting for principles that are necessary for any realistic model of human sentence processing.

A language contains a (normally infinite) set of complete structures such that a fully disambiguated utterance corresponds to exactly one structure. Each structure contains the complete string of the utterance, plus presumably at least

The structure of probabilistic grammatical models

The goal of this paper is to present an argument for the presence of probabilistically formulated expectation-based effects in syntactic comprehension, and more specifically to advocate a particular relationship – surprisal – between incremental probabilistic disambiguation and processing difficulty. I do not take it as a goal of this paper to advance a particular probabilistic model over trees or strings as the correct one used by adult native speakers of any language. The formulation of such

Predictability

The surprisal theory bears the greatest conceptual similarity to the well-known observation that words are easier to comprehend in contexts where they are highly predictable (e.g., (2-a) below) than in unconstraining contexts (2):

(2)a. He mailed the letter without a stamp.
b. There was nothing wrong with the car.

This effect of predictability has been observed in both eye-tracking reading studies, as reduced reading time and increased skipping probability (e.g., Ehrlich & Rayner, 1981), and in

Verb-final contexts, surprisal, and locality

There are contexts in nearly every language where a head follows one or more of its dependents. When a language comprehender recognizes that a partial input has entered such a context, they are in a position where they obtain increasing amounts of information about the upcoming head. Intuitively, this accumulating information has two effects: on the one hand it places a greater memory load on the comprehender, on the other hand it can help sharpen comprehenders’ expectations about the upcoming

When ambiguity facilitates comprehension

The fully parallel surprisal theory entails an unusual relationship between structural ambiguity and processing difficulty. In most processing theories, local structural ambiguity leads to difficulty under a variety of circumstances. In serial theories, local ambiguity is a precondition for garden-path effects; in competition-based parallel accounts, equibias while an ambiguity is unresolved is the primary source of syntactic comprehension difficulty. In the surprisal theory, on the other hand,

The subject preference

Variable word order in natural languages can give rise to local ambiguities involving which grammatical function (GF) is assigned to a particular noun phrase. Such local ambiguity is possible in a wide variety of languages: although languages with free word order often use case to mark GFs on noun phrases, syncretism of case form across multiple GFs is also widespread, being documented in Australian, Finno-Ugric, Indo-European, and Turkic languages (e.g., Carstairs, 1984, Comrie, 1978, Comrie,

Empirical difficulties for the theory

In Sections 5 Verb-final contexts, surprisal, and locality, 6 When ambiguity facilitates comprehension, 7 The subject preference we have seen cases where surprisal makes predictions consistent with online processing data that may be difficult to reconcile with other theories. This section touches on empirical data that may be difficult for surprisal, and suggests what support for other types of processing theories can be drawn from these data.

Conclusion

Recent experimental results in syntactic ambiguity resolution indicate that comprehenders incrementally integrate a variety of evidential knowledge in the process of discriminating the preferred interpretation of a sentence; probability theory serves as a coherent architecture for this constraint-based, resource-allocation paradigm of ambiguity resolution. We can extend the parallel, probabilistic disambiguation perspective of incremental sentence processing into a theory of syntactic

Acknowledgments

This work has benefited from presentation and discussion at a variety of venues, including the 2005 annual meeting of the Linguistic Society of America and the 18th annual CUNY Sentence Processing Conference. I am grateful to feedback on the manuscript from John Hale, Florian Jaeger, Dan Jurafsky, Andrew Kehler, Frank Keller, Christopher Manning, Don Mitchell, Martin Pickering, Ivan Sag, Tom Wasow, and three anonymous reviewers. I accept full responsibility for all errors and omissions. Part of

References (105)

  • M.J. Green et al.

    Absence of real evidence against competition during syntactic ambiguity resolution

    Journal of Memory and Language

    (2006)
  • D. Jurafsky

    A probabilistic model of lexical and syntactic access and disambiguation

    Cognitive Science

    (1996)
  • E. Kaiser et al.

    The role of discourse context in the processing of a flexible word-order language

    Cognition

    (2004)
  • J. King et al.

    Individual differences in syntactic processing: The role of working memory

    Journal of Memory and Language

    (1991)
  • P. Kiparsky

    Structural case in Finnish

    Lingua

    (2001)
  • M.C. MacDonald

    The interaction of lexical and syntactic ambiguity

    Journal of Memory and Language

    (1993)
  • S.A. McDonald et al.

    Low-level predictive inference in reading: The influence of transitional probabilities on eye movements

    Vision Research

    (2003)
  • K. McRae et al.

    Modeling the influence of thematic fit (and other constraints) in on-line sentence comprehension

    Journal of Memory and Language

    (1998)
  • W. Tabor et al.

    Effects of merely local syntactic coherence on sentence processing

    Journal of Memory and Language

    (2004)
  • W. Tabor et al.

    Dynamical models of sentence processing

    Cognitive Science

    (1999)
  • M.J. Traxler et al.

    Adjunct attachment is not a form of lexical ambiguity resolution

    Journal of Memory and Language

    (1998)
  • R.P.G. van Gompel et al.

    Evidence against competition during syntactic ambiguity resolution

    Journal of Memory and Language

    (2005)
  • J.R. Anderson

    The adaptive character of human thought

    (1990)
  • J. Anderson et al.

    An integrated theory of the mind

    Psychological Review

    (2004)
  • Bod, R. (1992). A computational model of language performance: Data oriented parsing. In Proceedings of...
  • Booth, T. L. (1969). Probabilistic representation of formal languages. In IEEE conference record of the 1969 tenth...
  • Brants, S., Dipper, S., Hansen, S., Lezius, W., & Smith, G. (2002). The TIGER treebank. In Proceedings of the workshop...
  • A. Carstairs

    Outlines of a constraint on syncretism

    Folia Linguistica

    (1984)
  • Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of AAAI (pp....
  • Charniak, E. (2001). Immediate-head parsing for language models. In Proceedings of...
  • C. Clifton et al.

    Comprehending sentences with long distance dependencies

  • Collins, M., Hajic, J., Ramshaw, L., and Tillmann, C. (1999). A statistical parser for Czech. In Proceedings of...
  • Collins, M. (1999). Head-driven statistical models for natural language parsing. PhD thesis, University of...
  • B. Comrie

    Definite direct objects and referent identification

    Pragmatics-Microfiche

    (1978)
  • B. Comrie

    On delimiting cases

  • M. Crocker et al.

    Wide-coverage probabilistic sentence processing

    Journal of Psycholinguistic Research

    (2000)
  • F. Cuetos et al.

    Parsing in different languages

  • C. Culy

    The complexity of the vocabulary of Bambara

    Linguistics and Philosophy

    (1985)
  • Dubey, A. and Keller, F. (2003). Parsing German with sister-head dependencies. In Proceedings of...
  • J.L. Elman

    Distributed representations, simple recurrent networks, and grammatical structure

    Machine Learning

    (1991)
  • R. Engbert et al.

    SWIFT: a dynamical model of saccade generation during reading

    Psychological Review

    (2005)
  • T.A. Farmer et al.

    From the cover: Phonological typicality influences on-line sentence comprehension

    Proceedings of the National Academy of Sciences

    (2006)
  • Ferretti, T.R. & McRae, K. (1999). Modeling the role of plausibility and verb-bias in the direct object/sentence...
  • Frazier, L. (1979). On comprehending sentences: syntactic parsing strategies. PhD thesis, University of...
  • S. Frisson et al.

    Effects of contextual predictability and transitional probability on eye movements during reading

    Journal of Experimental Psychology: Learning, Memory, and Cognition

    (2005)
  • Gazdar, G., Klein, E., Pullum, G., & Sag, I. (1985). Generalized phrase structure grammar....
  • E. Gibson et al.

    Reading relative clauses in English

    Language and Cognitive Processes

    (2005)
  • E. Gibson et al.

    Distinguishing serial and parallel parsing

    Journal of Psycholinguistic Research

    (2000)
  • Gibson, E. (1991). A computational theory of human linguistic processing: memory limitations and processing breakdown....
  • E. Gibson

    The dependency locality theory: A distance-based theory of linguistic complexity

  • Cited by (1326)

    View all citing articles on Scopus
    View full text