Approximate Bayesian computation with differential evolution☆
Highlights
► Traditional ABC methods are slow and scale poorly. ► We merge ABC with differential evolution (DE) to provide efficient proposals. ► ABCDE is a fast and scalable method of likelihood-free Bayesian estimation.
Introduction
Bayesian statistics has become an enormously popular tool for the analysis of random variables. While there are a myriad of practical and theoretical reasons to adopt the Bayesian framework, one clear advantage is the quantification of uncertainty in the form of the posterior distribution. In contrast to classic null hypothesis testing, directed analysis of the posterior distribution allows for statistical inference that does not compromise the process theory behind the experiments generating the data. In the last decade, many have turned to the problem of Bayesian estimation when the likelihood function is difficult or impossible to evaluate. One popular method for solving this problem is called approximate Bayesian computation (ABC). While ABC originated in genetics (Beaumont et al., 2002, Marjoram et al., 2003, Pritchard et al., 1999, Tavaré et al., 1997), it has received a considerable amount of attention in ecology, epidemiology, and systems biology (Beaumont, 2010), as well as, more generally, with regard to model choice (Robert, Cornuet, Marin, & Pillai, 2011). Recently, ABC has also become an important tool in the analysis of models of human behavior, whose likelihoods are often rather complex (Turner & Van Zandt, 2012).
The basic ABC algorithm proceeds as follows. Given an observed data set , some unknown parameters with prior distribution , and an assumed model such that , ABC algorithms propose candidate parameter values by randomly sampling and generating a data set , where the method used to draw varies from one algorithm to the next. Because this random data set is always generated and paired with the random parameter value , we often use the notation to refer to a single point in the joint distribution of and . The idea is that if the distance between the two data sets is small enough, specifically, less than a value , we can assume that the proposed candidate parameter value has some nonzero probability of being in the posterior distribution . In so doing, ABC algorithms attempt to approximate the posterior distribution by sampling from where is the support of the simulated data and is an indicator function returning a one if the condition is satisfied and a zero otherwise.
The ABC rejection framework (see Turner & Van Zandt, 2012 for a tutorial) has been embedded in Markov chain Monte Carlo (Marjoram et al., 2003), sequential Monte Carlo (Beaumont et al., 2009, Del Moral et al., 2008, Sisson et al., 2007, Toni et al., 2009), expectation propagation (Barthelmé & Chopin, 2011), and Gibbs sampling (Turner & Van Zandt, submitted for publication). While these algorithms have become increasingly more complex and efficient, they all still rely on satisfying the condition . Hence, the accuracy of the estimated posterior distribution and the computational efficiency hinge on the selection of the tolerance threshold : large values of produce inaccurate posteriors while small values of produce long computation times. Long computation times are a result of the decrease in the likelihood of proposing acceptable parameter values that produce data such that the condition is satisfied.
A number of methods have been proposed to balance the trade-off between long computation times and accurate posterior estimates. One common approach is to select a set of values that decrease monotonically from a larger tolerance value to some small one. This allows the algorithm to converge slowly to a computationally feasible final value of . Other methods allow for larger values of , and instead rely on a post-simulation correction to obtain accurate estimates of the posterior (Beaumont et al., 2002, Blum and François, 2010). These algorithms proceed as described above but then use regression techniques to reweigh the samples based on their distance from the observed data . Although these techniques are very useful in practice, the post hoc nature of the correction suggests that the samplers used prior to the correction could be substantially improved. That is, algorithms that use the reweighing scheme internally would be more efficient than the regression correction schemes.
Although the basic ABC framework has been instrumental in furthering the widespread usage of Bayesian analysis, it currently suffers from the so-called “curse of dimensionality” (Beaumont, 2010). That is, when the number of parameters is increased the acceptance rate declines dramatically because successful sets of parameter values (i.e., sets of parameter values that satisfy the tolerance threshold condition) become very hard to find. To attenuate this problem, a few new algorithms replace the indicator function in Eq. (1) with a kernel function (Wilkinson, submitted for publication). The kernel function provides continuous measures to the evaluation of a proposal parameter set rather than a dichotomous accept or reject (a one or a zero, respectively). While this dramatically improves the computational efficiency, the accuracy of the estimated posterior now relies on the selection of . In addition, when sampling from a high-dimensional parameter space, the selection of tuning parameters such as the transition kernel becomes vitally important.
In this article we present a new ABC algorithm, which we call ABCDE, that uses differential evolution (DE) as a means of proposal generation (Hu and Tsui, 2005, ter Braak, 2006, Vrugt et al., 2009). DE is an extremely efficient population-based genetic algorithm for minimizing real-valued cost functions (Storn & Price, 1997). Instead of relying solely on random mutations of each member of the population to drive evolution, DE creates a self-organizing population by evolving each member based on the weighted difference between two other members of the population, similar in spirit to particle swarm optimization (Havangi et al., 2010, Kennedy and Eberhart, 1995, Tong et al., 2006). In practice, DE requires far fewer population members than other genetic algorithms, typically just enough to span the possible parameter space, because the entire population evolves towards the minimum cost function values (or in our case the regions of the parameter space with the highest densities). Here we merge ABC and DE, including enhancements to the basic DE algorithm to make it suitable for posterior estimation instead of simply locating the global minimum (Hu & Tsui, 2005).
We first describe the algorithm (presented in Fig. 1) and motivate the use of each of its components. We then use the algorithm to fit three models whose posterior distributions are known to assess the accuracy of the estimates obtained using ABCDE. In the first example, we show that a reduced version of the ABCDE algorithm can easily handle the classic mixture of normals example that is prevalent in the ABC literature (Beaumont et al., 2009, Sisson et al., 2007, Toni et al., 2009, Wilkinson, submitted for publication). We then assess the accuracy of the ABCDE algorithm by comparing it to a basic kernel-based ABC algorithm (Wilkinson, submitted for publication). To do so, we use both algorithms to fit a model whose joint posterior distribution is highly correlated. In our final example, we test the scalability of ABCDE by fitting it to a model with 20 parameters, a problem that is extremely difficult for existing ABC algorithms to solve. We close with a brief discussion of the potential advancements and provide guidelines for the selection of the tuning parameters.
Section snippets
The algorithm
The ABCDE algorithm combines DE with a distributed genetic algorithm approach that relies on three steps: crossover, mutation, and migration (Hu & Tsui, 2005). After a high-level overview of the algorithm, we will discuss each of these steps in turn. Fig. 1 illustrates the structure of the ABCDE algorithm. We begin by selecting a pool of particles and dividing this pool evenly into groups, each of size . Although we will refer to the group of individual members of our sampler as
Simulations
We now present three examples meant to test the validity, speed, and scalability of the ABCDE algorithm. First, we use the ABCDE algorithm to fit the classic mixture of normals example. In this first example, we employ a simplified version of the ABCDE algorithm that uses only the crossover step. In the second example, we compare the speed of the ABCDE algorithm to a recently developed KABC algorithm (Wilkinson, submitted for publication). In this example, the joint posterior distribution is
General discussion
Typically, Monte Carlo methods such as Markov chain Monte Carlo or sequential Monte Carlo rely on transition kernels to move proposals around the region of interest. These transition kernels generally are Gaussian in form and are centered at the previous state of the chain. Thus, to fully specify the kernel, one only needs to select the standard deviation of the Gaussian as a tuning parameter. However, it can be very difficult to specify the optimal transition kernel, especially in the case of
Conclusions
In this article, we have presented a new algorithm, ABCDE, that attempts to solve the problem of scalability associated with ABC methods. The algorithm relies on DE to drive the sampler to high-density regions of the posterior distribution and automatically optimize the proposal efficiency. This type of proposal generation is very efficient, capturing both multi-modal and high-dimensional posteriors both quickly and accurately. Although we have not provided proofs of convergence for the ABCDE
References (44)
- et al.
Fast and accurate calculations for first-passage times in Wiener diffusion models
Journal of Mathematical Psychology
(2009) - et al.
A tutorial on approximate Bayesian computation
Journal of Mathematical Psychology
(2012) - Barthelmé, S., & Chopin, N. (2011). ABC-EP: expectation propagation for likelihood-free Bayesian computation. In...
Approximate Bayesian computation in evolution and ecology
Annual Review of Ecology, Evolution, and Systematics
(2010)- et al.
Adaptive approximate Bayesian computation
Biometrika
(2009) - et al.
Approximate Bayesian computation in population genetics
Genetics
(2002) - et al.
Non-linear regression models for approximate Bayesian computation
Statistics and Computing
(2010) - et al.
The inverse Gaussian distribution: theory methodology and applications
(1989) - et al.
Hierarchical Bayes models for response time data
Psychometrika
(2010) - Del Moral, P., Doucet, A., & Jasra, A. (2008). An adaptive sequential Monte Carlo method for approximate Bayesian...
An introduction to probability theory and its applications, Vol. 1
Bayesian data analysis
A multi swarm particle filter for mobile robot localization
International Journal of Computer Science
Fitting Wald and ex-Wald distributions to response time data: an example using functions for the S-PLUS package
Behavioral Research Methods, Instruments, & Computers
A Bayesian approach to diffusion models of decision-making and response time
Response times: their role in inferring elementary mental organization
Markov chain Monte Carlo without likelihoods
Proceedings of the National Academy of Sciences of the United States
Psychological interpretation of the ex-Gaussian and shifted wald parameters: a diffusion model analysis
Psychonomic Bulletin and Review
A hierarchical Ornstein–Uhlenbeck model for continuous repeated measurement data
Psychometrika
Cited by (68)
Bayesian neuroevolution using distributed swarm optimization and tempered MCMC[Formula presented]
2022, Applied Soft ComputingCitation Excerpt :The authors also provided a proof of symmetric proposal distribution for DE-MC. The computational efficiency of DE has motivated its use in approximate Bayesian computation (ABC) framework which needs an extensive number of samples to be drawn, thus requiring more computation time [72]. Due to the robustness and accuracy of swarm-based optimization, PSO algorithms have been successfully applied for the learning structure of Bayesian networks [73].
Fundamental tools for developing likelihood functions within ACT-R
2022, Journal of Mathematical PsychologyDiscrete cosine transform for parameter space reduction in linear and non-linear AVA inversions
2020, Journal of Applied Geophysics
- ☆
Portions of this work were presented at the 8th Annual Context and Episodic Memory Symposium, Bloomington, Indiana, and the 45th Annual Meeting of the Society for Mathematical Psychology, Columbus, Ohio. The authors would like to thank Trisha Van Zandt, Simon Dennis, Rich Shiffrin, Jay Myung, and Dan Navarro for helpful comments that improved an earlier version of this article.