Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Reading population codes: a neural implementation of ideal observers

Abstract

Many sensory and motor variables are encoded in the nervous system by the activities of large populations of neurons with bell-shaped tuning curves. Extracting information from these population codes is difficult because of the noise inherent in neuronal responses. In most cases of interest, maximum likelihood (ML) is the best read-out method and would be used by an ideal observer. Using simulations and analysis, we show that a close approximation to ML can be implemented in a biologically plausible model of cortical circuitry. Our results apply to a wide range of nonlinear activation functions, suggesting that cortical areas may, in general, function as ideal observers of activity in preceding areas.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Methods for reading out population codes.
Figure 2: Activity in the network immediately after initialization and after several iterations.
Figure 3: Network performance compared to maximum likelihood and population vector.
Figure 4: Temporal evolution of the network estimate and its sensitivity to contrast.

Similar content being viewed by others

References

  1. Maunsell, J. H. & Van Essen, D. C. Functional properties of neurons in middle temporal visual area of the macaque monkey. I. selectivity for stimulus direction, speed, and orientation. J. Neurophysiol. 49, 1127–1147 (1985).

    Article  Google Scholar 

  2. Georgopoulos, A. P., Kalaska, J. F., Caminiti, R. & Massey, J. T. On the relations between the direction of two-dimensional arm movements and cell discharge in primate motor cortex. J. Neurosci. 2, 1527–1537 (1982).

    Article  CAS  Google Scholar 

  3. Salinas, E. & Abbott, L. Vector reconstruction from firing rate. J. Comput. Neurosci. 1, 89– 108 (1994).

    Article  CAS  Google Scholar 

  4. Seung, H. S. & Sompolinsky, H. Simple model for reading neuronal population codes. Proc. Natl. Acad. Sci. USA 90, 10749–10753 (1993).

    Article  CAS  Google Scholar 

  5. Pouget, A., Zhang, K., Deneve, S. & Latham, P. E. Statistically efficient estimation using population coding. Neural Comput. 10, 373–401 (1998).

    Article  CAS  Google Scholar 

  6. Paradiso, M. A. A theory of the use of visual orientation information which exploits the columnar structure of striate cortex. Biol. Cybern. 58, 35–49 (1988).

    Article  CAS  Google Scholar 

  7. Hawken, M. J. & Parker, A. J. in Vision: Coding and Efficiency (ed. Blakemore, C.) 103–116 (Cambridge Univ. Press, Cambridge, UK, 1990).

    Google Scholar 

  8. Britten, K. H., Shadlen, M. N., Newsome, W. T. & Movshon, J. A. The analysis of visual motion: A comparison of neuronal and psychophysical performance. J. Neurosci. 12, 4745– 4765 (1992).

    Article  CAS  Google Scholar 

  9. Zohary, E., Shadlen, M. N. & Newsome, W. T. Correlated neuronal discharge rate and its implication for psychophysical performance. Nature 370, 140–143 (1994).

    Article  CAS  Google Scholar 

  10. Tolhurst, D. J., Movshon, J. A. & Dean, A. D. The statistical reliability of signals in single neurons in cat and monkey visual cortex. Vision Res. 23, 775–785 (1982).

    Article  Google Scholar 

  11. Shadlen, M. N. & Newsome, W. T. Noise, neural codes and cortical organization. Curr. Opin. Neurobiol. 4, 569–579 (1994).

    Article  CAS  Google Scholar 

  12. Gershon, E. D., Weiner, M. C., Latham, P. E. & Richmond, B. J. Coding strategies in monkey V1 and inferior temporal cortices. J. Neurophysiol. 79, 1135–1144 (1998).

    Article  CAS  Google Scholar 

  13. Nelson, M. E. A mechanism for neuronal gain control by descending pathways. Neural Comput. 6, 242–254 (1994).

    Article  Google Scholar 

  14. Heeger, D. J. Normalization of cell responses in cat striate cortex. Vis. Neurosci. 9, 181–197 ( 1992).

    Article  CAS  Google Scholar 

  15. Carandini, M. & Heeger, D. J. Summation and division by neurons in primate visual cortex. Science 264, 1333 –1336 (1994).

    Article  CAS  Google Scholar 

  16. Carandini, M., Heeger, D. J. & Movshon, J. A. Linearity and normalization in simple cells of the macaque primary visual cortex. J. Neurosci. 17 3061–3071 (1997).

    Article  Google Scholar 

  17. Simoncelli, E. P. & Heeger, D. J. A model of neuronal responses in visual area MT. Vision Res. 38, 743–761 (1998).

    Article  CAS  Google Scholar 

  18. Li, Z. A neural model of contour integration in the primary visual cortex. Neural Comput. 10, 903–940 ( 1998).

    Article  CAS  Google Scholar 

  19. Lee, D. K., Itti, L., Koch, C. & Braun, J. Attention activates winner-take-all competition among visual filters. Nat. Neurosci. 2, 375–381 ( 1999).

    Article  CAS  Google Scholar 

  20. Hubel, D. & Wiesel, T. Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. J. Physiol. (Lond.) 160, 106–154 ( 1962).

    Article  CAS  Google Scholar 

  21. Shadlen, M. N., Britten, K. H., Newsome, W. T. & Movshon, T. A. A computational analysis of the relationship between neuronal and behavioral responses to visual motion. J. Neurosci. 16, 1486–1510 (1996).

    Article  CAS  Google Scholar 

  22. Papoulis, A. Probability, Random Variables, and Stochastic Processes (McGraw-Hill, New York, 1991).

    Google Scholar 

  23. Abbott, L. & Dayan, P. The effect of correlated activity on the accuracy of a population code. Neural Comput. 11, 91–101 (1999).

    Article  CAS  Google Scholar 

  24. Tolhurst, D. J., Movshon, J. A. & Thompson, I. D. The dependence of response amplitude and variance of cat visual cortical neurons on stimulus contrast. Exp. Brain Res. 41, 414–419 ( 1981).

    CAS  PubMed  Google Scholar 

  25. Barto, A. G. in The Computing Neuron (eds. Durbin, R., Miall, C. & Mitchison, G.) 73–98 (Addison-Wesley, Wokingham, 1989).

    Google Scholar 

  26. Mazzoni, P., Andersen, R. A. & Jordan, M. I. A more biologically plausible learning rule for neural networks. Proc. Natl. Acad. Sci. USA, 88, 4433–4437 (1991).

    Article  CAS  Google Scholar 

  27. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. in Parallel Distributed Processing (eds. Rumelhart, D. E., McClelland, J. L. & PDP Research Group) 318– 362 (MIT Press, Cambridge, Massachusetts, 1986).

    Book  Google Scholar 

Download references

Acknowledgements

We thank Laurent Itti for discussions on divisive normalization.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexandre Pouget.

Supplementary information

In the main text we considered an example in which the goal was to estimate two quantities, the orientation and spatial frequency, given the response of a population of noisy neurons. With a minor change in notation this can be generalized to the problem of estimating M quantities; the only real difference is that instead of labeling the neurons with two indices, i and j, we need M indices, i1, ..., iM, where the mth index runs from 1 to Nm. In this case the response of the neurons is denoted ai1, ..., iM, and there are M "angles" to estimate, θ1, ... θM. We will let a denote the complete set of responses, θ the set of angles, and P(a|θ) the conditional probability of observinga given the presentation angles, θ.

The network presented in the main text can be viewed as an algorithm for computing the presentation angles: given a set of responses, a, the network finds a set of angles, , k = 1, ..., M, that are estimates of the true angles, θk. How good is that algorithm? The answer depends on what we mean by "good". Here we use the determinant of the covariance matrix, denoted (defined below), to measure the quality of the estimate. Our motivation for using this quantity is that it determines, to a large extent, the mutual information between the noisy neuronal responses, a, and the presentation angles, θ The smaller the determinant of the covariance matrix, the larger the mutual information[1]. (Strictly speaking, this result applies only when the neurons are uncorrelated. We believe it also applies to correlated neurons, as long as the covariance matrix is small. In any case, it is a good starting point.) The components of the covariance matrix are given by

where the angle brackets indicate an average with respect to the probability distribution P(a|θ), and is the mean value of the estimate.

We now have two tasks. First, we must compute the determinant of the covariance matrix, and second, we must compare that determinant to the optimal one. Fortunately, one does not have to consider all possible estimators to find the optimal one; for unbiased estimators, the lower bound on the determinant is given by the inverse of the Fisher Information[2],

(1)

where I is the Fisher information,

(2)

Equation (1) is the multidimensional analog of the Cramér-Rao bound[2].

We now compute the covariance matrix associated with the network estimate, then compare that to the Cramér-Rao bound, Eq. (1). We consider networks that asymptote to a smooth M-dimensional attractor with the property that each point on the attractor is neutrally stable (that is, we exclude such behavior as limit cycles). These networks are estimators in the sense that, given the initial condition of the network, a, generated from the conditional probability distribution, P(a|θ), the network evolves in time and relaxes onto its M-dimensional attractor. The position on the attractor provides an estimate of theθk.

To see how this works in practice, we consider a set of network evolution equations of the form

(3)

where o and H have components and , H is a nonlinear function of o, and we interpret o as a set of firing rates. This equation is initialized via the relation

(4)

The existence of the attractor implies that there is some smooth function, g(ɸ)>, satisfying

(5)

where the ɸ are a set of M generalized angles. In the limitt→∞, o approaches the attractor; i.e., .The point on the attractor, ,is the network estimate of the set of presentation angles, θ.Note that the network we use in the main text can be cast into the form given in Eq. (3) by expressing uij in terms of oij via the first equation in the main text, and inserting the resulting expression into the second.

Because the initial condition is generated probabilistically, the estimate will be different on each trial. As just discussed, the quality of the estimate is related to its covariance matrix. To compute that covariance matrix, we take a perturbative approach: we iterate a linearized version of the full nonlinear network, Eq. (3), which allows us to analytically determine the approximate final position on the attractor given the initial condition. A difficulty arises because, unlike point (0-dimensional) attractors, it is not obvious which point on the M-dimensional attractor to linearize around. It turns out that we may determine the appropriate point by going past linear order in our perturbation expansion. Let us for now perturb around an arbitrary point, say o=g(θ), and later determine ɸ. Letting

(6)

through second order in ẟo(t), Eq. (3) becomes

(7)

where J is the Jacobian evaluated on the attractor,

the ":" notation is shorthand for a sum on two indices,

and the indices have multiple components, ii1, ..., iM.In Eq. (7), and in the remainder of the Appendix, we use standard dot-product notation; for example, the ith component ofJ · ẟo is ∑jJijẟoj.

The expansion, Eq. (7), is valid as long as the higher order nonlinearities are small compared to the quadratic term. In that case, Eq. 7 may be rewritten as

(8)

Here the superscript t means multiply J by itself t times; it does not mean transpose. To avoid secularity -- the last term in Eq. (8) going to infinity as t goes to infinity -- we require that

(9)

Equation (9) is the condition that tells us what angle, ɸ, to linearize around. It also tells us that ẟo(∞)=0, which in turn implies, using Eq. (6), that ɸ is the network estimate of the presentation angles, θ.

We can recast Eq. (9) into a form suitable for calculation by noting, via Eq. (5), that J(ɸ) has M eigenvectors with eigenvalue 1; those eigenvectors are, vk ≡ ẟɸkg, k = 1, ..., M.Since we are assuming that H admits an attractor, all the other eigenvalues of J must be less than 1. Thus, in the limit that t→∞, Jt takes on a very simple form

where the are the adjoint eigenvectors of the Jacobian. Using the orthogonality condition ( ẟkl is the Kronecker delta), Eq. (9) breaks into M equations, one for each k,

(10)

where we used Eqs. (4) and (6) to express ẟo(0) in terms of a(θ) and g(ɸ). The value of ɸ that satisfies Eq. (10) corresponds to the point on the attractor that we linearized around, and also to the network estimate of θ. Letting ɸ = θ + ẟθ>, which, term by term, means ɸk = θk + ẟθk, and expanding Eq. (10) to first order in ẟθ, we arrive at the set of equations

(11)

To proceed we define the noise, N(θ), through the relationship

(12)

where

f(θ) ≡ (a)

is the mean value of the neuronal response given the set of presentation angles, θ.Inserting Eq. (12) into (11) yields

(13)

If the term does not vanish for all θ, then, for some θ, ẟθ will be nonzero in the limit that the noise goes to zero, and the network will produce biased estimates. Conversely, if it does vanish, then the network will be unbiased. We thus make the assumption that θ. If this condition is satisfied (and it is for the divisive normalization used in the main text), then Eq. (13) implies that, for small N, ẟθ N.Thus, the term that appears in Eq. (13) is O(N2) and can be ignored. With this simplification, we find that ẟθ is given by

(14)

In this expression and in what follows, we are using a shorthand notation for the inverse of a matrix [Akl]-1 ≡ [A-1]kl. Thus, is the klth component of the inverse of the matrix .

Using Eq. (14), it is now straightforward to compute the covariance matrix that determines the error in the estimate of the angles, and we find that

where R(θ) is the noise covariance matrix,

R(θ) ≡ (N(θ)N(θ)).

Because we now have two covariance matrices, R and (ẟθθ), we will consistently refer to R as the noise covariance matrix and (ẟθθ) simply as the covariance matrix.

As discussed above, decreasing the determinant of the covariance matrix increases the mutual information between the the neuronal responses and the presentation angles. A straightforward but somewhat tedious calculation shows that the determinant is minimized when

(15)

at which value of the covariance matrix simplifies to

Thus, whenever Eq. (15) is satisfied, the nonlinear recurrent network given in Eq. (3) leads to a covariance matrix such that

(16)

This equation gives us the minimum determinant of the covariance matrix associated with the estimate produced by the network. To determine whether that minimum reaches the Cramér-Rao bound, we need to know the distribution of the noise; that is, we need to know the explicit form of P(a|θ).Let us consider two types of noise, Gaussian with an arbitrary correlation matrix, for which

where N ≡ ∑m Nm is the total number of neurons, and Poisson with uncorrelated noise, for which

(17)

In the second expression, a and f now refer to the number of spikes in an interval rather than the firing rate. For the Poisson distribution the mean value of a is f, and the noise covariance matrix is given by

((ai - fi)(aj - fj))Poisson = fiijRij

where the subscript "Poisson" indicates an average over the probability distribution given in Eq. (17). Note that we are using the symbol R for the noise covariance matrix of both the Gaussian and Poisson distributions; which distribution we mean should be clear from the context.

It is straightforward to show that the Fisher information, Eq. (2), for the two cases is given by

(18)

(19)

where tr refers to the trace of a matrix. The trace term in Eq. (18) is a non-negative definite matrix with respect to the indices k and l, as is the first term on the right hand side of Eq. (18). Thus,

(20)

(21)

Equality is achieved in Eq. (20) only if the trace term vanishes, which happens only when R is independent of θ. Comparing Eqs. (16) and (20), we arrive at our final result:

  1. 1.

    For Gaussian noise with constant noise covariance matrix, R, the minimum determinant of the covariance matrix achieved by the network is equal to the Cramér-Rao bound.

  2. 2.

    For Gaussian noise with a noise covariance matrix that depends on presentation angle, the minimum determinant of the covariance matrix achieved by the network exceeds the Cramér-Rao bound. The difference may be calculated by comparing Eqs. (16) and (18).

  3. 3.

    For uncorrelated Poisson noise, the minimum determinant of the covariance matrix achieved by the network is equal to the Cramér-Rao bound.

REFERENCES

  1. 1.

    N. Brunel and J.P. Nadal. Mutual information, Fisher information and population coding. Neural Computation, 10(7):1731-57, 1998.

  2. 2.

    T.M. Cover and J.A. Thomas. Elements of information theory. John Wiley & Sons, New York, 1991.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Deneve, S., Latham, P. & Pouget, A. Reading population codes: a neural implementation of ideal observers . Nat Neurosci 2, 740–745 (1999). https://doi.org/10.1038/11205

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1038/11205

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing