Of bits and wows: A Bayesian theory of surprise with applications to attention

Pierre Baldi; Laurent Itti

doi:10.1016/j.neunet.2009.12.007

Of bits and wows: A Bayesian theory of surprise with applications to attention

Neural Netw. 2010 Jun;23(5):649-66. doi: 10.1016/j.neunet.2009.12.007. Epub 2009 Dec 28.

Authors

Pierre Baldi¹, Laurent Itti

Affiliation

¹ Department of Computer Science, UCI, Irvine, CA 92697-3435, USA. pfbaldi@ics.uci.edu

Abstract

The amount of information contained in a piece of data can be measured by the effect this data has on its observer. Fundamentally, this effect is to transform the observer's prior beliefs into posterior beliefs, according to Bayes theorem. Thus the amount of information can be measured in a natural way by the distance (relative entropy) between the prior and posterior distributions of the observer over the available space of hypotheses. This facet of information, termed "surprise", is important in dynamic situations where beliefs change, in particular during learning and adaptation. Surprise can often be computed analytically, for instance in the case of distributions from the exponential family, or it can be numerically approximated. During sequential Bayesian learning, surprise decreases as the inverse of the number of training examples. Theoretical properties of surprise are discussed, in particular how it differs and complements Shannon's definition of information. A computer vision neural network architecture is then presented capable of computing surprise over images and video stimuli. Hypothesizing that surprising data ought to attract natural or artificial attention systems, the output of this architecture is used in a psychophysical experiment to analyze human eye movements in the presence of natural video stimuli. Surprise is found to yield robust performance at predicting human gaze (ROC-like ordinal dominance score approximately 0.7 compared to approximately 0.8 for human inter-observer repeatability, approximately 0.6 for simpler intensity contrast-based predictor, and 0.5 for chance). The resulting theory of surprise is applicable across different spatio-temporal scales, modalities, and levels of abstraction.

Publication types

Research Support, N.I.H., Extramural
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Attention* / physiology
Bayes Theorem*
Computer Simulation
Databases, Factual
Eye Movements
Humans
Information Theory
Learning / physiology
Mental Processes* / physiology
Models, Psychological
Neural Networks, Computer*
Normal Distribution
Poisson Distribution
Psychophysics
Recognition, Psychology / physiology
Saccades
Time Factors
Visual Perception / physiology

Abstract

Publication types

MeSH terms

Grants and funding