A General Framework for Inferring Bayesian Ideal Observer Models from Psychophysical Data

Abstract A central question in neuroscience is how sensory inputs are transformed into percepts. At this point, it is clear that this process is strongly influenced by prior knowledge of the sensory environment. Bayesian ideal observer models provide a useful link between data and theory that can help researchers evaluate how prior knowledge is represented and integrated with incoming sensory information. However, the statistical prior employed by a Bayesian observer cannot be measured directly, and must instead be inferred from behavioral measurements. Here, we review the general problem of inferring priors from psychophysical data, and the simple solution that follows from assuming a prior that is a Gaussian probability distribution. As our understanding of sensory processing advances, however, there is an increasing need for methods to flexibly recover the shape of Bayesian priors that are not well approximated by elementary functions. To address this issue, we describe a novel approach that applies to arbitrary prior shapes, which we parameterize using mixtures of Gaussian distributions. After incorporating a simple approximation, this method produces an analytical solution for psychophysical quantities that can be numerically optimized to recover the shapes of Bayesian priors. This approach offers advantages in flexibility, while still providing an analytical framework for many scenarios. We provide a MATLAB toolbox implementing key computations described herein.


Introduction
Sensory systems must encode information about environmental stimuli in a way that supports successful behaviors. However, sensory measurements are often noisy and ambiguous, making this a demanding task.
For example, in the visual system, each retinal image is consistent with an infinite number of possible threedimensional scenes. In the auditory system, the vibration of the inner ear intermixes both the identity and elevation of sound sources. Prior knowledge about the environment can help resolve these ambiguities (Knill and Richards, 1996;Simoncelli and Olshausen, 2001). Thus, advances in understanding sensation and perception often rely on understanding how prior knowledge is represented in the nervous system and how this prior knowledge influences our percepts.
The influence of prior knowledge on perception is often characterized using psychophysical experiments that measure the bias and variability of perceptual reports (Hürlimann et al., 2002;Weiss et al., 2002;Adams et al., 2004;Girshick et al., 2011;Vacher et al., 2018). For example, measured biases can be compared with biases predicted by ideal observer models, which can also inform our understanding of how sensory information is represented within neuronal populations (Ganguli and Simoncelli, 2010;Stocker, 2015, 2017;Morais and Pillow, 2018). Bayesian ideal observer models specifically posit that observers optimally combine noisy sensory measurements with a probability distribution representing the relative frequency with which events occur in the world (called the prior distribution, or simply the prior). Bayesian models are popular across many domains, including sensation and perception, because they can successfully explain a wide range of phenomena (Weiss et al., 2002;Adams et al., 2004;Burge et al., 2010;Girshick et al., 2011;Kim and Burge, 2018). However, these models are often poorly constrained. Without constraints on the shape of the prior, Bayesian models can effectively explain any biases. Thus, a set of important questions arise: What is the shape of the prior the observer is using? Does this shape accurately reflect probabilities in the world? Does it change systematically with experience?
Bayesian priors are often assumed to take the form of a Gaussian distribution for computational efficiency (Mamassian and Landy, 1998;Weiss et al., 2002;Beierholm et al., 2009;Sotiropoulos et al., 2011;Saunders and Chen, 2015;Rokers et al., 2018). This assumption, however, limits the ability to ask questions about the shape of the prior because a Gaussian only has two parameters. In addition, analyses of natural scene statistics suggest that the probability distributions of environmental stimuli are generally non-Gaussian (Dong and Atick, 1995;Girshick et al., 2011;Sprague et al., 2015). In order to more flexibly model prior distributions, a previous study introduced an analytic approach based on piecewise approximations that leverages assumptions about the local shape of the prior relative to the magnitude of measurement noise (Stocker and Simoncelli, 2006). An alternative approach to increasing flexibility without introducing assumptions about prior shape is to use numeric methods that do not place constraints on the global parametric form or local properties of the prior (Girshick et al., 2011;Acerbi et al., 2014;Sprague et al., 2015). Numeric methods, while able to fit an arbitrary prior, are often slower and require hand-tuning of the numerical support and precision. Thus, while researchers have a varied toolkit for modeling the shapes of Bayesian priors, there is still a need to diversify our tools for using these models in perceptual research.
Our goal is to provide an overview of how Bayesian ideal observer models can be used in perceptual research, and to describe a computational approach that uses mixture of Gaussian models to flexibly and efficiently model the influence of priors on perception. First, we review the general mathematical principles that link a Bayesian ideal observer to psychophysical data. Then, we present the analytic solutions for psychophysical quantities assuming a simple Gaussian prior and Gaussian measurement noise. Next, we introduce a mixture of Gaussians model of priors that provides increased flexibility. Mixture of Gaussian priors have been employed in other contexts, such as computer vision and signal processing (Olshausen and Millman, 1999;Snoussi and Mohammad-Djafari, 2001), but are not commonly used in ideal observer models (but see related applications for modeling perceptual inferences by Acerbi et al., 2014;Orhan and Jacobs, 2014). Lastly, we introduce a new analytical approximation that increases the computational efficiency of the mixture of Gaussians model. This approximation offers improvements in efficiency for adaptive experimental methods (e.g., adaptive stimulus staircasing) as compared with fully numerical approaches. An accompanying MATLAB (MathWorks) toolkit provides implementations that can be used to simulate and fit psychophysical data.

Bayesian ideal observer models
In a Bayesian ideal observer model, the observer makes a noisy measurement m of a stimulus x and uses that measurement to generate an estimate of the stimulus in the worldx or to select an appropriate behavioral response r. We can represent this mapping of measurement onto response with the function r = T(·) where T is some estimation function. For example, in the context of a psychophysical experiment, T(·) may represent a point estimate of the presented stimulus (in which case r ¼ TðmÞ ¼x) or a binary judgment in a two-alternative forced-choice (2AFC) experiment (e.g., r ¼ Tðm 1 ; m 2 Þ ¼ "yes" when queried whether x 2 . x 1 ).
A Bayesian ideal observer selects the optimal response to a set of stimuli on the basis of the three components: The prior p( x) represents the observer's knowledge of the probability of encountering the stimulus based on previous experience. The likelihood pðmjxÞ, the probability of a measurement given the stimulus, captures the noisiness in the observer's measurement of the stimulus. The noisiness depends on both external factors (such as signal strength and presentation time) and internal factors (such as neuronal noise and attentional state).
To obtain the observer's belief about the current stimulus x given a measurement m we first use Bayes' rule to obtain the posterior distribution, pðxjmÞ, as follows: The posterior represents the probability distribution of a stimulus, given the current measurement, and can thus be used for drawing inferences. Here, p( m ) is the model evidence (or marginal likelihood) that serves to normalize the posterior. This calculation is represented graphically in Figure 1. Since p( m ) is a scalar value and does not affect the shape of the posterior, we can note that pðxjmÞ / p ðmjxÞpðxÞ.
This simple illustration, however, shows a likelihood based on only one example measurement. If we instead consider the full range of possible measurements, as shown in Figure 2, we can see how the resulting shape of the posterior varies. Figure 2A shows the prior as a function of x. By definition, the prior is independent of the measurement m, so it varies horizontally, but is constant along the vertical dimension. This two-dimensional (2D) format, similar to that used in (Girshick et al., 2011), helps illustrate the point that the posterior (Fig. 2C) arises from pointwise multiplying the prior ( Fig. 2A) and likelihood (Fig. 2B). Figure 2B illustrates the likelihood by plotting the probability of the observer making each measurement, conditioned on each possible stimulus value. This 2D distribution is generated by assuming that the measurement associated with each stimulus value is corrupted by additive Gaussian noise, but is unbiased. A vertical slice through B represents what we refer to as the measurement distribution pðmjx n Þ, which is the probability over measurement values m given a particular stimulus x n . A horizontal slice through B, on the other hand, represents the likelihood function p(m n , x), which is the probability of a given measurement m as a function of different stimulus values x. Thus, the 2D object pðmjxÞ may represent either the likelihood when it is conditioned on a specific measurement and considered as a function of the stimulus, or a measurement distribution when it is conditioned on a specific stimulus and considered as a function of measurements.
While the likelihood is the pertinent quantity for applying Bayes' rule, the measurement distribution is the relevant quantity when considering samples of the noisy sensory observation process. Note the measurement distribution is a true probability density function based on a noise model (here, we use additive Gaussian noise). The likelihood, on the other hand, is not generally a probability distribution because it does not necessarily integrate to one.
By multiplying each row of the prior and likelihood plots and normalizing, we obtain the set of possible posterior distributions pðxjmÞ for each possible measurement (Fig.  2C). Note that since the prior is non-Gaussian and steeper around the left flank of the peak, the posteriors are more concentrated around these values.
Finally, a loss function is needed to complete the model. The loss function L(x,r) refers to the penalty of making a response r when the true stimulus was x. An optimal decision rule is one where the observer will minimize the loss on average over the course of a set of responses. To calculate the expected loss of a particular response, we can find the expected loss under the posterior: A decision rule is Bayes optimal under a particular loss function if it minimizes the expected loss for all measurements. That is, T*(·) is Bayes optimal if for all estimation functions T(m) and all measurement values m: E½Lðx; T p ðmÞÞ E½Lðx; TðmÞÞ: (3) Note that here we show T(·) as a function of a single m, but it may also take multiple measurements into account, as in a two-alternative forced-choice paradigm (2AFC). In the following sections, we will discuss this loss in more concrete terms in the context of point estimation and 2AFC tasks.
While the derivations outlined in this paper do not assume any particular stimulus, they do assume that the measurements are unbiased, and that the measurement noise is additive and Gaussian distributed. In this case, the likelihood always takes the form of a Gaussian. Under these assumptions, the mean of the likelihood varies with and is equal to the measurement. We also assume that the width of the likelihood (i.e., the amount of noise) does not inherently vary with the measurement. However, the Weber-Fechner law across many stimuli suggests that this assumption does not hold if stimulus values are represented in many common sense units (e.g., candelas per square meter for luminance, visual degrees per second for speed), because sensory thresholds in these units increase systematically as stimulus values increase (Hecht, 1924;McKee et al., 1986;Pardo-Vazquez et al., 2019). Thus, a transformation of the stimulus values from physical space to "sensory space" may often be necessary to satisfy this assumption (Stocker and Simoncelli, 2006;Kwon et al., 2015). Indeed, the Weber-Fechner law suggests that the width of the likelihood or measurement distribution is approximately constant in logarithmic units across many stimulus domains (although deviations have been noted). For example, if one were to model visual speed perception, the measurement distribution and prior could be represented in terms of pðlogðmÞjlogðxÞÞ and p(x), respectively. Throughout this document, we will represent the likelihoods as Gaussians even if a transformation is necessary, to keep the estimation of the prior computationally tractable. For reference, Table 1 provides a summary of notation used for each of the ideal observer parameters.

Modeling psychophysical data from an observer with a Gaussian prior
We begin with a simple case in which the prior takes the form of a Gaussian distribution. If this condition is met, the posterior has an analytic solution and is also Gaussian. This property follows from the general rule defining the product of any two Gaussians. Specifically, if we denote a Gaussian distribution generally as N ða; b 2 Þ with mean a and standard deviation b, we can write the prior as N ð; g 2 Þ (see Table 1). We define the likelihood as a Gaussian function with its mean equal to the measurement value m and a SD of s : N ðm; s 2 Þ. We can then write the posterior as: Figure 2. The canonical Bayesian computation as in Figure 1 but expanded to a set of likelihood functions. The prior (A) is multiplied by the likelihood defined by a given measurement (B, shown for m 1 and m 2 ) to obtain the posterior (C). Note that the shape of the posteriors change for different likelihoods since the prior is non-Gaussian, but the posteriors are overall drawn to the largest probability region of the prior. In each panel, the heat map values represent probability with higher intensity mapping to higher probability. Identity lines are indicated with dashed black lines. Toolkit script: Fig2_2DBayesianDemo.m. where the normalizing constant r , which relates the posterior to the product of prior and likelihood, is given by:

C B A
and the posterior variance and mean are given by: Selecting a sensory estimate from the posterior To start linking this framework to psychophysical data, we first consider an experiment in which we want to fit a Bayesian ideal observer model to a set of data in which participants reported point estimates of the presented stimuli (e.g., through method of adjustment such that x9 ¼ r is a possible estimate response when x is the true value). To convert the posterior into an optimal estimate, we can assert a loss function for our Bayesian ideal observer. In the general form, this loss function will determine the Bayes estimate that minimizes the expected error defined in Two commonly used loss functions are the zero-one loss (where the loss is 0 when ðx À x9Þ ¼ 0, and 1 for all other values), and squared error loss ( Lðx; x9Þ ¼ ðx À x9Þ 2 ). Using zero-one loss, we obtain a Bayes optimal estimatex that is the mode of the posterior, the maximum a posteriori (MAP) estimate:x For an ideal observer that uses a squared error loss function, the Bayesian least squares (BLS) estimate is the mean of the posterior:x BLS ¼ E½xjm: When the posterior is Gaussian, the MAP and BLS estimates are equivalent and equal to m post (Eq. 7), which can be simplified to:x BLS ¼x MAP ¼ am 1: Here, we have simplified the equation for m post such that a is a shrinkage factor that determines how biased the posterior is toward the prior mean: and offsets the posterior when the prior is not zerocentered: With these simplifications we can rewrite the posterior as: pðxjmÞ ¼ N ða m 1; as 2 Þ: Whenx BLS ¼x MAP , we simply adoptx to denote the estimate. The solution forx here can also be considered as a weighted average of the prior and likelihood means, where the weights are inversely proportional to the variance of the prior and likelihoods (Landy et al., 1995). To make that link explicit, we can represent Equation 11 aŝ x ¼ am1ð1 À aÞ, since is equal to (1 -a). Note that when the posterior is not Gaussian, the MAP and BLS estimates are not necessarily equivalent.

Distribution of sensory estimates
While the ideal observer model outlined in this paper is defined from the perspective of the observer, we will briefly shift our perspective to that of an experimenter to demonstrate how the model can be used in practice. In a task in which the observer is making repeated point estimates of the stimulus (e.g., judging its visual brightness, auditory volume, or speed), the mean of the measurement distribution on each trial will be equal to the true value of the stimulus, x, and we can define TðmÞ ¼x ¼ am1 as the function by which the ideal observer converts noisy measurements into a response on each trial. While this is a deterministic function, the value will vary from one trial to another because of variability in the measurement m. The responses thus form an estimate distribution pðxjxÞ, the probability distribution of estimates, given a particular stimulus (Fig. 3).
If we want to infer the underlying ideal observer parameters from a set of real behavioral data, we can fit a set of empirically measured observer estimates to this estimate distribution. To do so, we define an analytic form of this estimate distribution pðxjxÞ with a substitution of variables in which we substitute T À1 ðxÞ for m in the measurement distribution pðmjxÞ ¼ N ðx; s 2 Þ. First, we solve for T À1 ðxÞ and the first derivative of this function with respect tox: and then perform the substitution of variables: which we can denote simply as: Research Article: Methods/New Tools While we could also derive the estimate distribution more simply using the identity for the affine transformation of Gaussian random variables, we use a substitution of variables here to draw a parallel to the mixture of Gaussians case in the next section. Note that the form of the estimate distribution is similar to the posterior distribution associated with a single measurement (Eq. 14) with two key differences: the mean of the estimate distribution is dependent on the stimulus x instead of any specific noisy measurement, and the variance is equal to the variance of the likelihood scaled by a 2 instead of a.
This distribution of observer estimates, given the stimulus, provides the likelihood function for fitting the Bayesian ideal observer model to data by performing maximum likelihood estimation (MLE; not to be confused with the likelihood of a Bayesian observer). Specifically, it is a likelihood when considered as a function of the model parameters u ¼ f; g ; s g. Given a set of paired stimuli and observer reports fðx t ;x t Þg N t ¼ 1 from a set of conditionally independent trials t = 1,...,N, the model likelihood is given by: In practice, we optimize u by minimizing the negative log-likelihood, which is obtained by taking the negative log of this expression: Two-alternative forced-choice task Experimenters often avoid having research participants report point estimates of stimuli because the origin of the noise in the measurement is ambiguous. For example, responses that incorporate a motor component may be contaminated by motor noise in addition to sensory noise. To avoid this issue, participants can make a categorical judgment about stimuli in perceptual space that can be related back to physical qualities of the stimulus. One such paradigm is a two-alternative forced-choice (2AFC) task in which participants view two stimuli either sequentially or concurrently and must select which of the two best fits the instructions they are given. In a speed judgment task, for example, the instruction might be: "indicate which of the two stimuli appeared to move faster". Often, this task is repeated for a range of stimulus values, such as stimulus speed, to build up a psychometric function. This function, for example, might describe the probability that a test stimulus is perceived as moving faster than a fixed reference stimulus, as a function of the test stimulus speed. Importantly, the two stimuli should differ in reliability to estimate the best fitting parameters for both the likelihood and the prior.
If we consider two stimuli x 1 and x 2 , on each trial, the observer makes two noise-corrupted measurements, which we model with two measurement distributions pðm 1 jx 1 Þ and pðm 2 jx 2 Þ or a single joint distribution pðm 1 ; m 2 jx 1 ; x 2 Þ (see Fig. 4 for examples). The ideal observer selects an optimal response r based on a decision function that takes both measurements as input (Tðm 1 ; m 2 Þ). Here, we assume this function indicates whether or not stimulus x 2 best satisfies the instructions given the measurements (e.g., in our speed judgment example, was x 2 faster than x 1 ). This is defined by the following decision rule: where pðx 2 . x 1 j m 1 ; m 2 Þ is determined for each pair ðm 1 ; m 2 Þ by: Figure 3. The distribution of sensory estimates arises from the variability in the measurement values about the expected value across trials (EV; i.e., the true stimulus value). A, On a given trial, a likelihood is defined around the observed measurement. Here, we plot the expected value of this likelihood for a given true stimulus, as well as other possible likelihoods that occur on a set of trials. The prior is shown for reference. The upward arrow indicates the true stimulus that used to generate the likelihood. B, The resulting posteriors for each trial are shown, along with downward arrows indicating the estimates (fxg) derived from these posteriors. C, Over many trials, these estimates (now indicated as upward arrows) create an estimate distribution, which can be predicted for a Bayesian ideal observer with a given prior and amount of sensory noise. When the prior is Gaussian, there is a closed form expression for this distribution. Toolkit script: Fig3_EstimateDistribution.m.

C B A
Since we model the likelihoods as independent and the posteriors are both Gaussian (at this point in the derivations), we can more succinctly say this occurs whenever the estimatex 2 ¼ a 2 m 2 1 2 is greater thanx 1 ¼ a 1 m 1 1 1 , which we can express using the decision rule: Because this is now a classification task, we adopt the loss function: where 1(·) denotes an indicator function that evaluates to 1 when the input is true. For simplicity, we will represent the first case in Equation 23 as "yes" and the second case as "no". Graphically, this equation is represented in Figure 4 as a white decision boundary in panels A-C for three different combinations of noise levels for m 1 and m 2 . The slope of this line is determined by: m 2 ¼ a 1 a 2 m 1 1 1 À 2 a 2 . If we want to solve for the probability of responding "yes" for a given x 2 and x 1 over repeated trials (i.e., a point on the psychometric curve), we can obtain a numerical solution by integrating the joint distribution above the decision boundary: The results of this integration for Figure 4A-C are shown in Figure 4D, along with the full psychometric curves.
However, Bayesian ideal observer models with Gaussian posteriors also allow for an equivalent analytical alternative to this calculation. Specifically, we can obtain an analytic solution for points on the psychometric curve via an alternative model of the Bayesian observer in which the observer computes the MAP estimate for each stimulus and then compares which of the two is larger. This method has been used previously (Stocker and Simoncelli, 2006) and is equivalent to the optimal computation in Equation 25 when the prior and likelihoods are both Gaussian. Sincex MAP ¼x BLS , this solution works for both estimators. The probability that a given estimate of x 2 (x 2 ) is greater than the estimate x 1 (x 1 ) can be obtained by integrating over the estimate distributions for the two stimuli in what is essentially a signal detection problem (Green and Swets, 1966): Equivalently, pð"yes"jx 1 ; x 2 Þ can be expressed as the integral over positive values ofx 2 Àx 1 in the probability distribution pðx 2 Àx 1 jx 1 ; x 2 Þ. This has an analytic solution since the difference of two Gaussian random variables is itself a Gaussian. For a Gaussian prior, the estimate distributions are indeed Gaussian (see Eq. 18) so this difference pðx 2 Àx 1 jx 1 ; x 2 Þ is defined as: From this equation, pð"yes"jx 1 ; x 2 Þ can be attained simply by integrating over positive values of this difference: pð"yes"jx 1 ; x 2 Þ ¼ ð 1 0 N ða 2 x 2 À a 1 x 1 ; a 2 2 s 2 2 1 a 2 1 s 2 1 Þ: To simplify the calculation of this integral, we can convert the difference distribution to a standard normal f (·) by subtracting the mean and scaling all values by the D C B A Figure 4. Graphical illustration of computing the observer's psychometric curve for a 2AFC task. A, Computing a single point on the psychometric curve when x 1 ¼ x 2 ¼ 3 for measurement noise variances s 2 1 ¼ 0:75; s 2 2 ¼ 0:5 and a prior with v = 0 and g = 1.5. Dashed line (top) shows the measurement distribution pðm 1 jx 1 Þ and solid line (right) shows measurements distribution pðmjx 2 Þ. The 2D grayscale image shows the joint distribution of observer measurements given the stimuli x 1 and x 2 , formed by the product of the two measurement distributions along the top and right. The white diagonal line is the observer's decision boundary, corresponding to measurement values for which the inferred speeds are equal. The probability that the observer reports "yes" (i.e., that x 2 exceeded x 1 ) is the area above the decision boundary (point "A" in panel D). B, Same as panel A but with equal noise variances s 2 m1 ¼ s 2 m2 ¼ 0:64. C, Same as panel A but with noise variances s 2 m1 ¼ 0:5; s 2 m2 ¼ 0:75. D, Full psychometric curves for the noise variances used in panels A-C, showing the probability that the observer reports "yes" as a function of the stimulus x 2 . The points labeled A, B, C represent the sum of the probability above the diagonal in panels A-C.
inverse of the SD. The location on the standard normal curve that corresponds to the lower bound on the integral in Equation 28 is then equal to the original mean divided by the SD. This is useful because it allows us to integrate the standard normal above this (standardized) mean to find pð"yes"jx 1 ; x 2 Þ for a given x 2 . That is, instead of integrating the original normal from zero to infinity, we now integrate the standard normal up to the standardized mean. Lastly, we take advantage of the fact that the standard normal is symmetric about its mean to write the equation as follows: where U(·) is the cumulative standard normal: and the symmetry about the mean of f indicates that Ð 1 ÀK f ðtÞ ¼ Ð K À1 f ðtÞ ¼ UðKÞ for all values of K. We can again take the perspective of the experimenter to demonstrate how to fit the ideal observer model to 2AFC data. This analytic solution is an efficient way to estimate the underlying parameters of the Bayesian ideal observer model given a dataset fx 1;t ; x 2;t ; T t g N t¼1 , where T t is the participant's response to stimulus pair x 1;t ; x 2;t on trial t. As in the point estimate case, we can solve for the best fitting parameters u ¼ fv; g ; s 1 ; s 2 g with MLE in which we minimize the following negative log-likelihood function: T t log ½ pð"yes"jx 1;t ; x 2;t Þ 1 ð1-T t Þlog½1-pð"yes"jx 1;t ; x 2;t Þ:

Summary
Up to this point, we have described how to determine the posterior, the individual sensory estimates, the sensory estimate distribution, and the results of a 2AFC task for a Bayesian ideal observer with a Gaussian prior and likelihood. In the next section, we will generalize this framework by deriving the same quantities for an observer with a prior that can be modelled more flexibly as a mixture of Gaussian components.
Modeling psychophysical data for an observer with a mixture of Gaussians prior While the approach outlined in the previous section is computationally efficient, it assumes that the observer's prior is well fit by a single Gaussian. This is unlikely to be the case assuming that the prior reflects knowledge of natural scene statistics, since many physical quantities have much heavier tails than a Gaussian (Dong and Atick, 1995;Sprague et al., 2015) or are even multimodal (Girshick et al., 2011;Kim and Burge, 2018). Accurately modeling these shapes is important. For example, long-tailed priors would predict that biases are reduced for stimulus values that fall within the the flatter regions of the stimulus probability distribution than in the more peaked regions. In this section, we propose an approach based on a mixture of Gaussians that retains some of the efficiency of the single Gaussian prior while better approximating realistic priors. Table 2 lists a summary of the additional notation adopted for this section.
Consider an observer with a prior defined by a mixture of C Gaussian components: where w i ! 0 is the weight of the ith component, with P w i ¼ 1, and i and g 2 i are the mean and variance of the ith Gaussian component, respectively (Fig. 5A, red lines). If we assume a Gaussian likelihood with variance s 2 , the posterior is also a mixture of Gaussians (Fig. 5B, blue lines): where a i and i are the shrinkage factor and mean of the ith posterior component, respectively: This is the mixture of Gaussians version of the posterior given in Equation 14. Here,w i ðmÞ is a set of adjusted weights that combine the weights w i of the individual components of the prior, the scale factors r i (m) for each of the components of the posterior (analogous to Eq. 5), and a normalization step to ensure the weights all sum to 1. To determinew i ðmÞ, we can first define each r i (m) as: and by substituting for i and a i with Equations 35 and 34, respectively, then simplifying, we obtain: Note that r i (m) is inversely related to the difference between the measurement m and the prior component mean i . Therefore, the posterior shape will change relative to the likelihood, not just shift as in the single Gaussian prior case. That is, as the measurement changes, the relative weight of each component changes. We can combine the scaling effects of w i and r i (m) to define: which is then normalized by the sum of all v i to obtain the set of adjusted weightsw i ðmÞ: In the following sections, we will first demonstrate how to fit the mixture of Gaussians prior to point estimation and 2AFC data using numerical evaluation of the log-likelihood. We then derive an analytical approximation that can reduce the computational load necessary to estimate the observer parameters.
Selecting a sensory estimate from the posterior As before, let us first consider the case where we want to estimate a participant's prior from a set of point estimates from an experimental dataset. We can use the posterior derived in Equation 33 and an appropriate loss function to define an optimal estimatex. For the mixture of Gaussians posterior, the MAP and BLS estimates differ. Here, we will consider onlyx BLS , since this estimate has an analytical solution in the mean of the posterior: Without an analytical solutionx MAP can be determined numerically and used instead in the numerical approaches described below. Note that if the posterior is multimodal, the BLS estimate may fall on a relatively unlikely value (since it is between the two modes of the posterior), and the MAP estimate may be unstable (since it may oscillate between the two modes depending on the measurement noise on a given stimulus presentation).

Distribution of sensory estimates
We can use Equation 40 to define TðmÞ ¼x BLS for the point estimation task. Unlike in the single Gaussian case, however, there is no clear analytic form for T À1 ðx BLS Þ with arbitrary mixture of Gaussians priors sincew i is a function of m. To demonstrate this, consider a simplified form where all i ¼ 0 and it is clear that there is no way to solve for T À1 ðx BLS Þ: Instead, we can numerically estimate T À1 ðx BLS Þ by first calculating TðmÞ ¼x BLS over a grid of points fx BLS ; mg to create a look-up table to find {m} from fxg for a given set of Bayesian ideal observer parameters u ¼ fw i ; i ; g i ; s g. With a goal of estimating an observer's prior from a set of N point estimates (all with the same sensory noise level, s ), we can then evaluate the likelihood of the data given the putative model parameters, u , using Equation 17: Note that we have abbreviatedx BLS tox for simplicity here. This process is then repeated for other parameter sets until we find an optimal solution that maximizes the likelihood of the data (or minimizes the negative log-likelihood). That is, finding u that minimizes the following: The toolkit includes a function for this numerical approach (fitEstimData_numerical.m), which we will also return to in Results. This process can be computationally expensive, however, if we are trying to fit an observer's prior with many Gaussian components (each of which is defined by three parameters w, , g ). While this may be acceptable for lower numbers of components and datasets that have already been collected, this is more problematic if the mixture of Gaussians model is used during the course of an experiment to guide an adaptive staircase.
To make the log-likelihood equation more tractable to solve, we can derive an approximate analytical solution for the point estimate distribution if we approximate Equation 40 using just the expected value of the measurement EðmÞ ¼ x when calculatingw i : This approximation allows us to solve for m in Equation 41: We can then derive an analytic solution to T À1 ðx BLS Þ and its first derivative with respect tox BLS : and in turn use the substitution of variables to derive an (approximate) analytic solution in the form of a Gaussian: This approximates the true estimate distribution with a Gaussian with a mean R C i¼1w i ðxÞða i x 1 i Þ and variance s 2 ðR C i¼1w i ðxÞa i Þ 2 . Maximum likelihood estimation can then be used as described previously to find the model parameters that best explain an empirically measured estimate distribution. In Results, we analyze the regimes in which this is a good approximation.

Two-alternative forced-choice task
As with the point estimate distributions, we will again describe a numerical and approximate analytical approach for handling data from a 2AFC task.
To numerically estimate the ideal observer's prior from a set of experimental 2AFC data using a mixture of Gaussians prior, we can again use the general form of the log-likelihood defined in Equation 31. Here, pð"yes"jx 1;t ; x 2;t Þ is defined with the general solution in Equation 25, and the decision rule Tðm 1 ; m 2 Þ follows the definition in Equation 21. Since the estimate distributions are not guaranteed to be Gaussian, there is no simple analytical solution like there was in the single Gaussian prior model. Thus, these equations must be evaluated numerically by calculating pðx 2 .x 1 jm 1 ; m 2 Þ for each measurement pair on the 2D support to define Tðm 1 ; m 2 Þ, as illustrated previously in Figure 4. Once the boundary defined by this decision rule is found, we can simply integrate the joint distribution pðm 1 ; m 2 jx 1 ; x 2 Þ above this boundary to determine pð"yes"jx 1;i ; x 2;i Þ and evaluate the model likelihood. This process is again outlined graphically in Figure 6, with the white line now denoting an example decision boundary for an observer with a mixture of Gaussians priors.
Compared with the single Gaussian case, the mixture of Gaussians decision boundary can be nonlinear for a few reasons. One reason is the dependence of each adjusted weightw i on m: the weight of the shrinkage factor for each prior component decreases with a greater difference between the component mean and the likelihood mean. As a result, the perceptual bias that the prior exerts is different at different points along the stimulus domain. Nonlinear decision boundaries can also emerge when the prior is bimodal, with measurements biased in different directions depending on which mode is closest. A function for numerically evaluating pð"yes"jx 1;i ; x 2;i Þ is included with the toolkit (calcMoGPFxn_Numeric.m).
As noted for the point estimation case with a mixture of Gaussians prior, this numerical calculation can be computationally expensive. We can, however, leverage the approximate analytical expression for the estimate distribution to define an approximate expression for the categorical data collected in a 2AFC experiment. The reason this is possible is that with this approximation, the two-point estimate distributions are Gaussian. Using the Bayesian least squares estimatê x BLS defined in Equation 40, we can generalize the decision rule Tðm 1 ; m 2 Þ in Equation 23 to an observer with a Mixture of Gaussians prior: Note that we index the modified weights and means differently for the two stimuli (i for x 1 and j for x 2 ) since these parameters of the posterior components are defined by both the prior components and the likelihood parameters, which differ whenever x 1 is different from x 2 . As before, we can derive an analytical (although approximate) solution to the psychometric function for the mixture of Gaussians approach using Equation 29, with the exception of substituting in the approximate estimate distribution pðx BLS jxÞ from Equation 49:

Code accessibility
The code is included as Extended Data 1 and is available at https://github.com/tsmanning/bayesIdealObserverMoG.

Results
In this section, we will demonstrate that there are a number of ways to maintain the flexibility of the mixture of Gaussians approach while reducing the total number of parameters describing the prior, and then show that this approach can be used to fit leptokurtotic and bimodal distributions. Lastly, we show that the approximate 2AFC solution remains close to the numerical solution for a range of model parameters constrained to realistic values. Although we do not go into detail here about how to generate synthetic estimate or 2AFC data using a Bayesian ideal observer framework, we include some example code in the toolkit about how one might benchmark implementations of an observer model with a mixture of Gaussians prior interactiveNumTrialsVSaccuracy.m.

Prior estimation error using mixture of Gaussians model with point estimation data
Theoretically, a mixture of Gaussians could fit an infinite number of prior shapes given enough Gaussian components in the model. But the number of model parameters increases by three for each additional component, potentially requiring large amounts of data to obtain reliable fits. Further, unrestricted models will likely be nonconvex with multiple local optima. These characteristics extend the number of iterations needed to find the global optimum of the log-likelihood objective functions at best and make it unlikely or impossible to find the global optimum at worst. In practice, unrestricted forms of the mixture of Gaussians model will likely need multiple optimization runs with different starting parameters to reliably minimize the log-likelihood functions. There are a few ways to maintain the flexibility of the mixture of Gaussian approach while reining in the number of parameters in the model.
In sensory subdomains where there is evidence that the probability of some stimulus values monotonically decreases with stimulus magnitude, such as the spectral content of retinal images (Field, 1987;Dong and Atick, 1995), we can reduce the number of parameters by a third in our ideal observer model by fixing all component means at zero. This allows us to model long-tailed distributions as can be seen in Figure 7A, and in fact, any distribution that is a member of the exponential power family with a peak at zero and power 1 p 2 can be approximated with enough components (West, 1987).
If there is not sufficient evidence that the true distribution of stimulus power in the environment is either D C B A Figure 6. An extension of Figure 4 to a long-tailed prior defined by a mixture of Gaussians (g 1 ¼ 2; g 2 ¼ 0:6 1 ¼ 2 ¼ 0; w 1 ¼ w 2 ¼ 0:5), similar in appearance to the prior in Figure 7A. Here, the decision boundary representing Tðm 1 ; m 2 Þ is nonlinear because the different components of the prior have different levels of influence on the percept as m varies. A, As in Figure 4, the 2D grayscale image shows the joint distribution of the observer measurements given the stimuli x 1 and x 2 , formed by the product of the two measurement distributions along the top and right. The white line is the observer's decision boundary. Here, x 1 = x 2 = 3 for measurement noise variances s 2 1 = 0.75, s 2 1 = 0.5. B, Same as panel A, but with equal noise variances s 2 m1 = s 2 m2 = 0.64. C, Same as panel A, but with measurement noise variances s 2 1 = 0.75, s 2 1 = 0.5. Toolkit script: Fig6_MoGGauss_graphicalDemo.m. D, Full psychometric curves for the noise variances used in panels A-C, showing the probability that the observer reports "yes" as a function of the stimulus X 2 . The points labeled A, B, C represent the sum of the probability above the diagonal in panels A-C.
symmetric or zero-peaked, one can take an alternative approach of tiling the components (Fig. 7B). Here, one defines a fixed number of components, their means, and their SDs and fits only the weights of the tiled components to the data. In this way, the mixture of Gaussians can approximate a prior with a peak at an arbitrary location, skewness, and kurtosis. This approach has been used previously with large numbers of components to approximate a "nonparametric" reconstruction of a complicated prior (Acerbi et al., 2014).
Here, we demonstrate proof of principle for both approaches by generating a synthetic dataset of 1000 point estimates using a zero-centered, non-Gaussian prior and a bimodal prior, and then recovering estimates of these priors using the mixture of Gaussians ideal observer model and the constraints illustrated in Figure 7.
We first defined a long-tailed prior using a Cauchy distribution pðxÞ ¼ 1=p ð1 1 x 2 Þ. We generated individual point estimates by numerically calculating the posteriors for a range of different measurement values as seen at the top right in Figure 8A and calculatingx BLS for each measurement. We used this matched set of measurements and Bayes estimates as a look up table, and generated the synthetic dataset of 1000 trials by randomly selecting a stimulus value, adding Gaussian noise to obtain a measurement, and then selecting a matched estimate by interpolating between the previously calculated estimate values. From these values, we estimated the Cauchy prior using a restricted form of the mixture of Gaussians ideal observer model in which we defined six Gaussian components with a set of fixed g i on the range ½2 À2 ; 2 3 and all component means i fixed at zero. Thus, the only observer parameters free to vary were the component weights w i , and the measurement noise level s which was constant for all simulated stimuli (that is, we are assuming the stimulus properties that may affect this measurement noise are held constant throughout the experiment). The best fitting parameters u ¼ fw i ; s g were obtained through numerical optimization by numerically estimating T À1 ðxÞ to obtain a set of {m i } from the dataset of fxg and then minimizing the negative log-likelihood, which is the sum over the individual negative log likelihoods (see Eq. 43). The correspondence between the true prior and the inferred one are shown at the in Figure 8A, as well as the correspondence between the true BLS estimates and the ones inferred through MLE. In general, the mixture of Gaussians model closely matches the true prior although each of the basis function components on their own are less kurtotic than their sum.
We then repeated this process using a bimodal prior defined by the normalized sum of two Gaussians p 1 ðxÞ ¼ B A N ð 1 ¼ À2; g 1 ¼ 1Þ and p 2 ðxÞ ¼ N ð 2 ¼ 2; g 2 ¼ 1Þ, and this time using the tiling constraint variation mixture of Gaussians model previewed in Figure 8B. Once again, the prior inferred from the data closely corresponds to the true prior, although this correspondence will change depending on the exact spacing and width of the basis functions (Fig. 8B).
Error in mixture of Gaussians analytical approximation with 2AFC data We will next examine how close the approximate analytical solution is to the numerical solution within a range of observer parameters that matches the biases and sensitivities seen in real human data.

Human bias
To get a sense of what a realistic range of biases is in the literature, we consider empircally measured perceptual biases for linear (i.e., noncircular, nonspherical) stimulus domains like speed and distance. For example, in Stone and Thompson (1992), participants performed a 2AFC speed judgment task in which they selected which of two contrast and speed-varying stimuli appeared to move faster. Depending on the contrast ratio between the two stimuli, biases in speed judgments ranged from ;0.55 to 1.55 times the veridical speed. Similar results were found in later studies that developed Bayesian ideal observer models to explain these biases (Weiss et al., 2002;Stocker and Simoncelli, 2006). An analysis of speed judgments for contrastvarying stimuli in 2D and 3D (Cooper et al., 2016) found a bias of up to ;1.75 times veridical. In a disparity judgment task, Burge and colleagues reported bias of ;1.15 (Burge et al., 2010). Thus, we will ensure that the simulated observer parameter models will at least reach these levels in our error analysis. The relationship between bias and observer parameters is straight-forward for a single Gaussian prior and Gaussian likelihood. It is simply the fraction of the shrinkage factors a 1 =a 2 for the two stimuli, where the observer is unbiased when the fraction equals one. Referring back to Equation 12, this means we need to select the stimulus likelihood widths s 1 ; s 2 and prior width g to ensure that the upper and lower bounds of This means we can produce human-like biases as long as we select observer parameters such that also falls within this range.

Sensitivity
The slope of the psychometric curve at the point of subjective equality (PSE; i.e., the value of x 2 where pð"yes"jx 1 ; x 2 Þ ¼ 0:5) is commonly used as a scalar metric to describe observer sensitivity when performing a 2AFC task. The slope has an analytical solution 1=ðs diff ffiffiffiffiffiffi ffi 2p p Þ when the psychometric curve is a cumulative normal distribution, which occurs when distribution of differences between estimates fx 1 ;x 2 g is a Gaussian N ðm diff ; s diff Þ. This is the case for the single Gaussian prior and the analytical approximate solution for a mixture of Gaussians prior (see Eqs. 29 and 51), but not necessarily for the full, numerically evaluated mixture of Gaussians prior. In psychophysical data, this slope could reasonably range from near-infinite when the task is very easy to zero when the task is impossible to solve and the observer is guessing for all stimulus parameters. Therefore, we will define the range of observer parameters to cover a large range of slopes.
Although there is an infinite range of possible prior configurations to test, we will restrict ourselves here to two useful situations not well fit by a single Gaussian: (1) a prior with only zero-mean components creating a leptokurtotic unimodal distribution and (2) a bimodal prior.

Example 1: leptokurtotic unimodal prior
First, we randomly selected a set of stimulus and observer configurations 5000 times (Fig. 9A, top). Likelihood means were selected from a uniform distribution ranging from [-1, 1] and SDs fs 1 ; s 2 g were selected from a uniform distribution in the range of 0 , s 1. The prior was restricted to two components, both zero-centered, which were both constrained to be broader than the likelihoods. Specifically, g 1 was fixed at 1:1maxðs 1 ; s 2 Þ and g 2 was randomly selected from a uniform distribution on the range ½1:25maxðs 1 ; s 2 Þ; 3:25maxðs 1 ; s 2 Þ). The weights on these components were chosen randomly from a uniform distribution and normalized such that they summed to one. Fixing g 1 to be only slightly larger than the largest s ensured that the priors were non-Gaussian and long-tailed, and that the priors produced psychometric functions with a range of biases covering the targeted range (actual biases ranged from 0.55 to 1.83).
Next, we determined the difference between the psychometric curve resulting from the analytical and numerical approaches described in the previous section (Fig. 9A, bottom). We compare the approximate solution (Eq. 50) to a numerical evaluation (Eq. 25) for the observer with a mixture of Gaussian prior (blue and black lines). We also compare the best fit single Gaussian to the mixture of Gaussians prior (yellow line). In doing so, we can directly assess the improvement of the approximate mixture of Gaussians approach over the single Gaussian approximation. These results are plotted in Figure 9B-E. Figure 9B,C show the correspondence between the single Gaussian ( Fig. 9B) and approximate mixture of Gaussians (Fig. 9C) models and the numerical evaluations. The points all fall near the identity line, indicated reasonable agreement, but the spread is clearly larger for the single Gaussian model. Figure 9D summarizes the errors, showing that the analytical mixture of Gaussians approximation has approximately three times lower RMS error than the single Gaussian fit. The error reduction is most profound about the PSE in the psychometric function, where the analytical and numerical approximations are essentially equivalent.
This means that one can precisely estimate observer biases even for non-Gaussian priors in a computationally-efficient manner. The analytical approximation does show a slight tendency to overestimate the upper flank of the psychometric curve and underestimate the lower flank (visible with the mean signed errors; Fig.  9E), indicating a bias toward steeper psychometric functions. Thus, when using this approximation to fitting psychometric data of observers with heavy-tailed priors, this will produce prior and/or likelihood estimates that are narrower than the true values.

Example 2: bimodal prior
Next, we assess the performance of the mixture of Gaussians analytical approximation for fitting psychometric data from an observer with a bimodal prior. We randomly selected likelihood means and SDs in the same fashion as we did for the zero-mean prior. To define a bimodal, two-component prior on each randomization, we selected the component means f 1 ; 2 g from a uniform distribution where one component was restricted to the range [-1, -0.5] and the other from [0.5, 1]. The component SDs were randomly selected from a uniform distribution in the range ½maxðs 1 ; s 2 Þ; 1:4maxðs 1 ; s 2 Þ to ensure each prior had two distinct peaks. Each component weight was randomly selected and the two were normalized such that they summed to one. As before, we present data from 5000 randomization runs in Figure 10. Overall, the approximate mixture of Gaussians method precisely estimated the location of the PSE (i.e., this method has low RMS error ;0.5) for the bimodal priors. Compared with the leptokurtotic unimodal case, however, the method shows increases in both RMS error and signed error along other regions of the psychometric function. The end result is that while the analytical approximation can accurately estimate an observer's bias, it will again tend to overestimate the slopes of the observer's psychometric functions.

Discussion
The Bayesian ideal observer framework has proven broadly useful for explaining perceptual phenomena in multiple sensory modalities. For example, a prior that peaks at zero speed (a "slow motion" prior) has successfully predicted systematic biases in judgements of the speed (Weiss et al., 2002;Stocker and Simoncelli, 2006) and direction (Weiss et al., 2002;Sotiropoulos et al., 2011;Rokers et al., 2018) of moving objects. A "light from above" prior about the position of the illuminant in a scene has been used to explain biases in the perceived shape of ambiguously shaded figures (Adams et al., 2004). Similarly, priors for viewing angle, convexity, and alignment between principal lines of curvature and surface contours can explain biases in the interpretation of surface curvature from simple line drawings (Mamassian and Landy, 1998). Other examples of the success of Bayesian perceptual models include prediction of biases in the timing of intervals between discrete events (Sohn and Jazayeri, 2021), the perceived structure in complex moving patterns (Yang et al., 2021), judgments in the orientation of contours (Girshick et al., 2011), and the orientation of surface tilt in natural scenes (Kim and Burge, 2018).
Here, we reviewed the straightforward approach for inferring Bayesian ideal observer models from psychophysical data when it is assumed that priors and sensory noise are Gaussian distributed. Following on a step-by-step formulation of this approach, we then extended the model to include prior distributions described with mixtures of Gaussians. In doing so, we build on previous work that has used mixture of Gaussian priors in other perceptual applications. For example, one group used a mixture of Gaussians to to define the relative probabilities of experimental stimuli and then probed suboptimalities in perceptual inference (Acerbi et al., 2014). Another group used a mixture of Gaussians approach to model human observer priors about homogeneity of orientation to understand biases in visual short-term memory tasks (Orhan and Jacobs, 2014).
Importantly, this mixture of Gaussians extension of the Bayesian ideal observer framework complements and expands on existing approaches for modeling the relationship between natural scene statistics and perceptual priors. First, if perceptual priors indeed match the non-Gaussian distributions of natural stimuli, then using a mixture of Gaussians model of priors may improve how well we predict perceptual biases when stimulus measurements fall on different regions of the stimulus domain, as compared with a single Gaussian model. Second, the mixture of Gaussians approach provides a tool for researchers to constrain Bayesian models using empirically measured stimulus statistics. Bayesian models have faced criticism because of their lack of constraint in how the priors or likelihoods are defined (Jones and Love, 2011;Marcus and Davis, 2013;Rahnev and Denison, 2018). One way to constrain the prior is to assume that the visual system has veridically learned the statistics of natural scenes and these learned statistics are reflected in the prior. In this case, one could define the ideal observer prior with a mixture of Gaussians that matches an empirically measured distribution of scene statistics, forgoing the need to fit the prior from perceptual judgment data. Indeed, several groups have made progress in the estimating the distribution of spectral content in terrestrial scenes (Field, 1987;Dong and Atick, 1995), tilt of objects in natural scenes (Burge et al., 2016), binocular disparity (Sprague et al., 2015), and the spectral content of retinal motion during eye and head movements (DuTell et al., 2020). While the match between estimates of natural statistics and perceptual biases has been investigated previously with numerical methods (Girshick et al., 2011;Sprague et al., 2015), a (relatively) low dimensional parameterization of these stimulus distributions opens up new opportunities for efficiency and experimental investigations.

Limitations and alternative approaches
Although numerical estimation of the model parameters will find an exact solution with sufficient precision, this is not always possible in practice. Estimating pð"yes"jx 1 ; x 2 Þ requires summation over many 2D probability mass functions, which must be redefined everytime the ideal observer parameters are changed (e.g.,during numerical optimization). Further, the MLE loss functions for both the numerical and analytical methods defined in this document are likely to be nonconvex and thus potentially prone to falling into a local minimum. This problem can be potentially overcome by initializing the numerical optimization in multiple locations within the loss function hypersurface, although this will add additional computation time to the estimation. While the approximate analytical method dramatically improves the computational efficiency of the ideal observer parameter estimation, it deviates from the true solution for pð"yes"jx 1 ; x 2 Þ the further x 2 gets from the point of subjective equality. As shown in Figure 10, this method is also especially prone to errors away from the PSE when the prior or posterior are bimodal. These problems can be mitigated in a few ways. If the approximate analytical method is to be used to adaptively select stimuli during an experiment, the numerical approach can be used after data collection to reach a more accurate solution. If there is good reason a priori to think that an observer's prior is bimodal (e.g., based on natural stimulus statistics), one can just fall back to the numerical solution.
Throughout this document, we assert that the ideal observer likelihood and measurement distributions are Gaussian along the domain in which the observer encodes the stimuli. Other model parameterizations, however, have been proposed that constrain the likelihood based on physiology and other assumptions, and result in notably asymmetric, non-Gaussian likelihoods (Zhang and Stocker, 2022). We also focus here on stimuli defined along a linear axis (e.g., position, velocity, binocular disparity), and therefore, the methods as presented cannot be directly applied to perceptual judgments about stimuli defined on a circular axis (e.g., orientation, visual motion direction, position of an illuminant). Despite this limitation, previous work has successfully used circular statistics to explain perceptual biases with a Bayesian ideal observer model (Mamassian and Landy, 1998;Burge et al., 2016). As a circular analog of the Gaussian, a mixture of von Mises distributions is a natural extension of the mixture of Gaussians approach.
Finally, we focus here on perceptual priors and not priors involved in decision-making or perception-action contingencies. Decision strategies could presumably affect the loss function as well, if there was an advantage to taking some other summary statistics from the posterior distribution instead of the least squares estimate. The influences of these strategies have been considered elsewhere (Chambers et al., 2019) but are out of the scope of the current work.
In conclusion, many scientific questions about how prior knowledge is incorporated into perceptual judgments and perceptually-guided behaviors remain unanswered. Within the Bayesian framework, for example, do priors vary significantly between observers and do they vary between different tasks? How closely do priors follow from the statistics we can measure empirically from the environment across multiple stimulus domains? How adaptable are priors in response to changing stimulus statistics? A major limiting factor in answering these questions is the accuracy and efficiency with which we can estimate people's priors from experimental data. Broadening the computational toolkit for experimenters and modelers to address this challenge is an important component of the larger effort to advance our understanding of the transformation from sensation to perception.