Short-Term Synaptic Plasticity Makes Neurons Sensitive to the Distribution of Presynaptic Population Firing Rates

Visual Abstract


Introduction
The brain is a highly noisy system. At the cellular level, the neurons are unreliable in eliciting spikes and synapses are unreliable in transmitting the spikes to the postsynaptic neurons. At the network level, the connectivity and balance of excitation and inhibition gives rise to fluctuations in the background activity (Brunel, 2000;Kumar et al., 2008), which can be as high as the mean stimulus response (Arieli et al., 1996;Kenet et al., 2003). In such a noisy environment, a neuron is faced with a crucial task: how to discriminate stimulus-induced firing rate changes from fluctuations in the firing rate of the background activity of the same magnitude?
If synapses were static, that is, when the postsynaptic conductances (PSCs) do not depend on the immediate spike history, this task could not be accomplished, unless synapses are specifically tuned to do so. For instance, the identification of specific spiking patterns, filtering out presumed noise sequences, can be accomplished by precise tuning of synaptic weights (Gütig and Sompolinsky, 2006). This solution, however, relies on training synaptic weights using a certain supervised learning rule, and even then, it could only work for a specific set of spike timing sequences. Active dendrites (with voltage dependent ionic conductance) can also work as pattern detectors (Hawkins and Ahmad, 2016), but this mechanism would only work for signals constrained to locally clustered synapses. Therefore, despite being relevant for the understanding of signal processing in the brain, the mechanisms by which neural ensembles solve the activity discrimination problem have remained elusive.
Here, we show that short-term plasticity (STP) of synapses provides an effective and general mechanism to solve the aforementioned task. STP refers to the observation that synaptic strength changes on spike-by-spike basis, depending on the timing of previous spikes (Stevens and Wang, 1995;Zucker and Regehr, 2002), that is, STP arises because neurotransmitter release dynamics is history dependent and can be manifest as either short-term facilitation (STF) or short-term depression (STD). Thus, STP becomes a crucial part of neural hardware when information is encoded as firing rate. Indeed, STP has been suggested to play several important roles in neural information processing (Buonomano, 2000;Fuhrmann et al., 2002;Izhikevich et al., 2003;Abbott and Regehr, 2004;Middleton et al., 2011;Rotman et al., 2011;Scott et al., 2012;Rotman and Klyachko, 2013;Jackman and Regehr, 2017;Grangeray-Vilmint et al., 2018;Naud and Sprekeler, 2018).
An immediate consequence of STP is that the effective PSCs depend on the firing rates of individual presynaptic neurons (Fig. 1). This suggests that postsynaptic targets of populations with dynamic synapses could distinguish among different input firing rate distributions even without supervised learning. To demonstrate this feature of STP, we measured the response of postsynaptic neurons for a weak stimulus with amplitude one order of magnitude smaller than the background activity. By systematically changing the distribution of firing rates over the presynaptic neuron ensemble, we found that weak signals can be differentiated from the noisy fluctuations if the signal is appropriately distributed over the input ensemble. The optimal distribution that maximizes the discriminability depends on the nature of STP. We found that, for facilitatory synapses, sparse codes give better discrimination between a weak signal and dense background changes of the same intensity. By contrast, for depressing synapses, sparse codes result in highly negative gains in relation to dense background changes of the same magnitude. We also investigated feedforward networks with excitation and disynaptic inhibition, with targetdependent STP, and found that this arrangement allows for extra robustness for the output gain.
Finally, we demonstrate how STP can endow a postsynaptic neuron with the ability to differentiate sparsely encoded activity from dense activity of the same magnitude, a function that would be especially important at very low signal-to-noise regimes. Thus, our results reveal that the nature of STP may also constrain the nature of firing ratebased population code.

Model of STP
One parsimonious and yet powerful mathematical description of short-term synaptic dynamics was proposed already 20 years ago (Tsodyks and Markram, 1997). The Tsodyks-Markram (TM) model could first account for activity-dependent synaptic depression observed in pairs of neocortical pyramidal neurons and was soon extended to cover for facilitation (increase in probability) of vesicle release . With a small set of parameters, the TM model is able to explain the opposed effects of depletion of available synaptic vesicles and of the increase in release probability caused by accumulation of residual calcium in the presynaptic terminal, making it suitable as a framework to conjecture general impact of STP in neural information processing.
Here, we use the TM model (Eq. 1) to describe the short-term synaptic dynamics. The effect of depression is modeled by depletion of the proportion of available resources, represented by the variable x (0 x 1), which instantaneously decreases after each spike and returns to 1 with recovery time t rec . The gain effect of short-term facilitation is modeled by the facilitation factor U (0 U 1), which accounts for the accumulation of calcium at the presynaptic terminal after the arrival of an action potential. U transiently increases the release probability u (0 u 1), which returns to 0 with time constant t f : where t sp is the last spike time.

Proportion of released resources (PRR)
The change in the PSC g s after a presynaptic spike is proportional to the instantaneous PRR (PRRðt sp Þ / u 1 ðt sp Þ x À ðt sp Þ) and to the absolute synaptic strength B s . The average instantaneous PRR of a presynaptic unit can also be described as a function of a time-dependent Poissonian firing rate r(t)  as: dhui dt ¼ À hui t f 1Uð1 À hu À iÞrðtÞ hu 1 i ¼ hu À i1Uð1 À hu À iÞ dhxi dt ¼ 1 À hxi t rec À hu 1 ihx À irðtÞ PRR s ðtÞ ¼ hu 1 ihx À irðtÞ; (2) where the brackets denote the average over many realizations. The total PRR contribution of a single synapse, for a time window of duration T s , can then be obtained by integrating Equation 2 over this period: Total effective input to a postsynaptic neuron For a homogeneous presynaptic population with same STP parameters and individual basal firing rate r bas , the population basal activity is R bas ¼ N Á r bas , where N is the population size. We quantify R ext as a multiple of R bas . Our analysis is restricted to the case of low signal-to-noise ratio, i.e., R ext , 0:1R bas . We consider a simplified scenario where R ext is distributed homogeneously through a number N ext of selected presynaptic units, which will increase their firing rate by r ext ¼ R ext =N ext , while the remaining presynaptic units will keep their activity unchanged.
The total PRR released to a target neuron by the entire population, during T s , will then be Q p ext ¼ ðN À N ext Þ Á Q s bas 1 N ext Á Q s ext ; where Q s bas and Q s ext are the total PRR (Eq. 3) delivered by a stationary unit (firing at r bas ) and a stimulus encoding unit (firing at r bas 1 r ext ), respectively.

Gain in the effective input
We are interested in the effects of varying the presynaptic distribution (over N ext inputs) of this total extra rate (R ext ) on the effective input to postsynaptic targets. To estimate the change in the gain because of STP we used the maximally dense distribution, when N ext = N as the reference point: where the d subscript denotes the smallest possible increase in individual firing rates, r d (maximally distributed R ext ). We refer to this as the dense distribution case and it ideally represents a situation of homogeneous increase in the basal activity of the system, against which a stimulus would need to be distinguished from. N ext = N also implies smallest increase in individual operating rates (r ext = r d ), therefore in the dense distribution case STP nonlinearities will be minimal. In other words, N ext = N is the point where dynamic synapses will operate as close to static as possible.
We then quantify the gain in Q p ext for a given N ext always relative to Q p d caused by an input of the same intensity but with dense distribution, as We calculate the curves of G as a function of N ext for different sets of STP parameters and basal rates and search for points where it is maximized (N ext = N opt ), which we call the optimal distribution (see example in Fig. 2D).

Optimal distribution
The optimal distribution of the activity (OD) can be framed as the fraction of the optimal number of encoding units N opt in a given population of size N, that is, OD ¼ N opt =N. Because the optimal code (N opt ¼ R ext =r opt ) is the distribution that maximizes the gain over the dense distribution with the same input magnitude (N ¼ R ext =r d ), OD can be written as We define R ext as a fraction of R bas to keep the same signal-to-noise ratio (R ext =R bas ) for populations of different sizes N. We find that r opt is fixed given the STP parameters and r bas (see Results), therefore by defining r d (R ext =N) as a fraction of r bas (R bas =N) we reach the interesting consequence of OD being independent of any particular choices of population size (Eq. 7). That is, given the same STP parameters and value of r bas , populations of different sizes will optimally encode the same stimulus intensity (relative to their basal activity) with the same OD. Because the optimal encoding rate is constrained by r d , r opt , 1, the optimal distribution will also be constrained to 0,OD,1 (see Fig. 3D), with values close to zero or one characterizing sparse or dense distributions, respectively.
Optimal rate (r opt ) and maximum gain (G max ) estimation Equation 6 describes the gain G obtained by encoding a stimulus R ext into N ext units (with rates increased by r ext ¼ R ext =N ext ) as opposed to N units (with rates increased by r d ¼ R ext =N). The peak of this function (G max ) is achieved by an optimal number of encoding units N ext = N opt with their rate increased by r opt ¼ R ext =N opt . This maximum point can be found by taking the derivative of the gain function with respect to r ext and setting it equal to zero, given that dQd dr ext ¼ 0, this can be further simplified into Q s ext À Q s bas Q s9 ext À r ext ¼ 0; where Q s9 ext ¼ dQ s ext =dr ext and the value of r opt is the solution of the equation 9. This solution is independent of the stimulus intensity R ext and population size N (see results in Fig. 2F).
For the optimal rate r opt , the gain (Eq. 6) can be written as Assuming that Q s d is linear with slope S s for small r d , that is, Q s d ¼ Q s bas 1S s Á r d (see below, Linear approximation of Q s d ), then G max can be further simplified into which makes G max independent of the stimulus intensity R ext and population size N.
Combined optimal rate (r com opt ) and maximum gain (G com max ) estimation When an axon branches to connect to different targets, STP properties might be target dependent. In the case of excitatory fibers driving feedforward excitation-inhibition (FF-EI) motifs, with synapses type 1 (s1) directly exciting a readout neuron and synapses type 2 (s2) driving the local inhibitory circuit (Fig. 1C), the combined gain is given by To find the activity distribution that maximizes the combined gain, we take the derivative of G com with respect to r ext , set it equal to zero and, assuming again that Q s d is linear with slope S s for both synapses, find the equivalence for which the solution, r ext ¼ r com opt , is independent of the stimulus intensity R ext and population size N. The optimal combined gain is then which is also independent of the stimulus intensity R ext and population size N.

Numerical simulations
As a proof of concept of the potential relevance that the estimated presynaptic gains could have on postsynaptic targets, we performed numerical simulations of a conductance-based integrate-and-fire (I&F) neuron model acting as the readout device for a FF-EI circuit (See section Sparse code identification by a postsynaptic neuron mode). The I&F model's membrane voltage V m is described by where C m = 250 pF is the membrane capacitance, g e and g i are, respectively, the excitatory and inhibitory input conductances and V e = 0 mV and V i = -75 mV are the excitatory and inhibitory synaptic reverse potentials. When a spike occurs, the membrane voltage is reset to V reset = -60 mV and held at this value for a refractory period of 2 ms. The synapses were modeled by a-functions (Kuhn et al., 2004) with time constants t e = 0.5 ms for excitatory and t i = 2 ms for inhibitory synapses. The presynaptic population consisted of N = 160,000 units that connected to the I&F neuron in a FF-EI arrangement. The population stationary basal rate was R bas = 80 kHz, with the individual basal rate r bas = 0.5 Hz. At the stationary basal rate, the synaptic states are described by u bas ¼ U 11t f r bas 11Ut f r bas x bas ¼ 1 11u bas t rec r bas PRR s bas ¼ u bas x bas r bas ; where PRR s bas is the expected rate of PRR by each synapse with STP parameters fU; t rec ; t f g.
We simulate a neuron that, during stationary basal activity, is kept in the fluctuation-driven regime through excitation-inhibition input balance (Kuhn et al., 2004). While excitation is provided directly by s1, disynaptic inhibition is modulated by s2 in a linear fashion, The inhibitory firing rate that keeps the target neuron membrane potential fluctuating around the mean value of V m during stationary basal activity can be approximated by a linear function of the excitation (adapted from Kuhn et al., 2004): where B e and B i are the maximum amplitudes for the excitatory and inhibitory synaptic conductances. Equation 18 allows to find the linear scale of Equation 17 that fulfills the condition V m ¼ À53mV. The inhibitory synapses are kept static (no STP). The extra presynaptic activity happens in blocks of T s ¼ 40 ms and is defined as sparse (when N ext = N opt ) or dense (when N ext = N).

Continuous rate distribution
Although some bursting networks [e.g., cerebellar parallel fibers (PFs)] seem to operate in a quasi-binary fashion (burst or no-burst), it is important to extend the analysis to continuous distributions, which most parts of the brain seem to operate under. We do this by assuming that the distribution of event-related neural firing rates follows a g distribution, which allows us parameterized control of the sparseness of the neural code (with the mean of the distribution) and of the distribution shape (with the skewness and kurtosis): where k is the shape parameter and u is the scale parameter. When k = 1, this is equivalent to an exponential distribution and, for increasing values of k, this becomes a right-skewed distribution, with the skewness approaching zero for higher values of k (becoming approximately Gaussian). For each shape parameter, we controlled the mean of the distribution by varying the scale parameter, because for a g distributed r ext the expected value is For the g -specified distribution of extra rates and a given presynaptic set of STP parameters, the expected amount of resources released by a population is  Figure 1. Distribution of the spiking activity over presynaptic neurons and STP. A, top, A neuron receives input from a presynaptic population, with only one of the neurons eliciting seven spikes. Bottom, The postsynaptic conductance (PSC) generated by the consecutive spikes for three different types of synapses (static, black; facilitatory, blue; depressing, red). The PSCs are different for each of these three types of synapses. B, top, Similar to panel A, a neuron receives inputs from a presynaptic population, but in this scenario the spikes were distributed among all presynaptic neurons. Bottom, The PSC generated by a sequence of seven consecutive spikes arriving at the same time as in panel A coming from three different types of synapses (static, black; facilitatory, blue; depressing, red). The PSCs are identical for each of these three types of synapses (lines overlapped). C, Feedforward excitation followed by feedforward inhibition configuration and two distributions of an extra spike rate R ext in addition to the basal firing rate r bas . Top, The extra rate is distributed into a few presynaptic neurons N ext units (gray), with each chosen unit increasing its rate by r ext ¼ R ext =N ext . Bottom, The extra rate is distributed homogeneously throughout the population of N units, with each unit increasing its rate by r d ¼ R ext =N.
which we solved numerically for two synapse types (s1-facilitatory and s2-depressing) and a range of rate distributions. The distribution gain G for E½Q p ext was then calculated in relation to the dense case, where N ext = N and r ext ¼ r d .
A glossary of of key symbols used throughout the work is given in Table 1. All the analyses and simulations were performed in MATLAB and Python. The model simulations were performed using Euler's method with time step of 0.1 ms implemented in the neural simulator Brian2 (Stimberg et al., 2014). The simulation and analysis code is available on GitHub at https://github.com/ luiztauffer/stp-activity-distribution.

Results
Here, we are interested in a mechanism by which a neuronal network or a single postsynaptic neuron receiving multiple inputs may distinguish between different spiking distributions with the same intensity (e.g., the same number of spikes). This problem is schematically illustrated in Figure 1. Consider two scenarios. In the first scenario, seven spikes arrive from a single presynaptic neuron while others six remain silent (Fig. 1A, sparse distribution). In the second scenario, each of the seven presynaptic neurons spikes once. In both trials, the postsynaptic neuron receives seven spikes (Fig. 1B, dense distribution). Here, we test the hypothesis that when synapses exhibit STP (facilitation or depression) the two scenarios can be differentiated without any specific tuning of synaptic weights.
Static synapses evoke exactly the same PSC sequence for both sparse and dense distributions (black lines), making them indistinguishable for a readout neuron. However, when synapses are dynamic, short-term facilitation (blue line) enhances the PSC amplitudes compared with the static synapses (compare Fig. 1A,B, bottom traces). Short-term depression (red line) results in a weaker response as compared with the static synapses (compare Fig. 1A,B, bottom traces). If the incoming spikes are distributed along different synapses, the sequence of PSCs is identical for all types of synaptic dynamics (compare Fig. 1A,B, bottom traces).
In vivo neural coding is certainly more complex than the above example. However, this simple example suggests that in the case of a neuron receiving synaptic inputs via thousands of noisy synapses, STP could be a mechanism to differentiate between an evoked signal from the background activity fluctuations of the same amplitude, provided the former is encoded as a specific pattern that can exploit the STP properties of the synapses. In the following, we describe how well dynamic synapses could endow feedforward circuits with such activity distribution discrimination properties in low signal-to-noise regimes (Fig. 1C).

Optimal activity distribution with dynamic synapses
We implemented dynamic synapses with the ratebased TM model Eq. 2). In this model, the instantaneous PRR depends on the resource release probability (u 1 ) and the proportion of available resources (x -), which have their dynamics guided by the choice of STP model parameters fU; t f ; t rec g. For a transient increase in firing rate, a facilitatory synapse produces an average profile of sustained PRR, while a depressing synapse produces an average profile of rapid decaying PRR ( Fig. 2A). Throughout this work, the two reference sets of values for STP types are: U = 0.1, A opt subscript indicates a quantity for the optimally distributed presynaptic activity (N ext = N opt ) t f ¼ 200 ms and t rec ¼ 50 ms (facilitatory) and U = 0.7, t f ¼ 50 ms and t rec ¼ 200 ms (depressing). To quantify the effects that different profiles will have on the presynaptic output, for varying transient increases in firing rate (r ext ), we calculate the total amount of extra resources (Q s ext ) a synapse releases over a time period of T s (Eq. 3; Fig. 2B). We found that Q s ext varied in a nonlinear fashion as a function of r ext , with depressing dynamics approaching Q s ext saturation much faster than facilitatory dynamics. The slope of Q s ext (Fig. 2B, inset) for depressing synapses is monotonically decreasing, indicating that any increase in the firing rate in those synapses will produce sublinear increase in Q s ext , whereas for facilitatory synapses the slope initially grows, indicating that increases in the firing rate of those synapses, up to some point, will produce supralinear increase in Q s ext . In the brain, neurons typically receive inputs from a large ensemble of presynaptic neurons. In the ongoing activity state, these neurons spike at a low-basal firing rate (r bas ) with the total synaptic output of Q p bas ¼ N Á Q s bas . In the event-related activity state, the firing rate of a subset of presynaptic neurons (N ext ) is transiently increased and the total synaptic output (Eq. 4) changes accordingly. We distribute a fixed event-related population rate increase R ext into varied numbers of chosen synapses N ext , each of these chosen synapses increasing its firing rate 3). Q s ext for depressing synapses saturates at lower firing rates than facilitatory synapses. Inset, The derivative of Q s ext and highlights the nonlinearities in Q s ext , with depressing synapses showing monotonically decreasing slopes (decreasing release rate) and facilitatory synapses showing an initial region of increasing slopes (increasing release rate) with respect to r ext . C, The extra PRR (Q p ext À Q p bas ) as a function of the number of presynaptic neurons (N ext ) whose firing rate increases by two different values of R ext . Dashed lines mark the value achieved when N ext ¼ N, i.e., the dense distribution case (Q p d À Q p bas ). A population of facilitatory synapses (left) maximizes its release with low N ext , while a population of depressing synapses (right) maximizes its release with N ext ¼ N. D, The gain (G; Eq. 5) as a function of N ext for a fixed R ext . The N ext that maximizes G, for this particular extra rate, is N opt = 64 for facilitatory synapses and N opt ¼ N for depressing synapses. Notice that if the extra rate is allocated in even fewer input units, G can be negative. E, G surface for a facilitatory synapse as a function of r d and N ext . The black line marks the maximum values of G, i.e., N opt for each r d . The gain curves at panel D, where r d ¼ 8% of r bas , is marked with a gray line for reference. F, The relationship between N opt and r d is linear. At maximum gain (G max ), the firing rate of the event-related neurons (N ext ) is r opt ð100Hz for this specific example). G, G max for the two STP regimes shown in panel B. For a low signalto-basal ratio r d =r bas ,1, the gain can be considered independent from the stimulus intensity r d . by r ext , that is, R ext ¼ N ext Â r ext , and report the changes in Q p ext . We found that, for a population of facilitatory synapses, Q p ext varied in a non-monotonic fashion as a function of N ext , initially increasing up to a peak point, then decreasing ( Fig. 2C, left). By contrast, for depressing synapses (Fig. 2C, right), Q p ext varied in a monotonically increasing fashion. For both facilitatory and depressing synapses, Q p ext converged to their respective Q p d when the total extra input rate R ext was distributed over all the neurons such that N ext = N and r ext ¼ r These results suggest that, when synapses are facilitatory, the total amount of synaptic resources released during a event-related activity state is maximized when event-related spiking activity is confined to a small number of synapses. Q p ext was smaller than Q p d when R ext was distributed into a small subset of presynaptic neurons, because those chosen neurons spiked at very high rates and the synapses ran out of vesicle resources rapidly. When the event-related input was distributed over all the presynaptic neurons, the Q p ext also decreased because in such a scenario r ext ¼ r d ¼ R ext =N was too small to fully exploit the benefits of synaptic facilitation. In contrast to the facilitatory synapses, for depressing synapses it was more beneficial to distribute the event-related spiking activity over the whole input ensemble to maximize the total amount of synaptic resources released. In this condition, r ext ¼ r d was small enough to avoid any losses in vesicle release caused by depression.

Activity distribution-dependent gain
To further quantify the effect of distribution of eventrelated activity over the input ensemble (that is, how neurons increase their rate in the event-related phase), we defined the distribution gain G as the proportional change in Q p ext in relation to Q p d (Eqs. 5, 6). We found that Q p d is approximately a linear function of r d for a wide range of scenarios (see Materials and Methods) and, because of that, with the dense distribution of the activity (when all the presynaptic neurons change their firing rate by a small amount r d in the event-related activity state), even dynamic synapses behave approximately as static synapses. Therefore, G can be understood either as a gain over a dense distribution or as a gain over static synapses. For facilitatory synapses, just as for Q p ext , G follows a non-monotonic curve as a function of N ext , with a single peak at N opt (Fig. 2D, blue line). By contrast, depressing synapses resulted in negative gains to every distribution, except for N ext = N where G = 0% (Fig. 2D, red line).
Next, we estimate N opt and G for a range of extra activity intensities (Fig. 2E, for facilitatory synapses). For these calculations, we parameterized the extra activity R ext as a fraction of the basal firing rate R bas (correspondingly, r d as % of r bas ; see Materials and Methods). We found that, for facilitatory synapses, N opt increased linearly with the extra activity intensity (Fig. 2G), resulting in an optimal encoding rate r opt which is independent of the input intensity. For depressing synapses, the optimal distribution N opt = N did not change with the extra activity intensity, making the optimal encoding rate always r opt = r d .
Because the presynaptic neurons are assumed to be Poisson processes, an advantage of parametrize R ext in terms of fraction of R bas is that it directly translates to signal-to-noise ratio. For the example shown in Figure 2G, we found that STP could amplify the presynaptic output for weak signals (which were ,10% of the basal activity) by up to 60% if the extra rate was distributed over N opt synapses as opposed to N synapses. For low signal-tonoise ratios (r d , r bas ), the gain at the optimal distribution (G max ) was approximately constant and always positive for facilitatory synapses, while depressing synapses keep G max = 0 at N opt = N (Fig. 2G). Finally, we analytically show that the independence of r opt and G max from the extra activity intensity is a good approximation for a wide range of basal rates and STP types (see Materials and Methods).
These results suggest that when synapses are facilitatory, the input should be distributed sparsely (or sparse code, that is, only a small set of neurons change their firing rate in the event-related state) to maximize the total amount of synaptic resources released at the downstream neuron. By contrast, when synapses are depressing, the input should be distributed densely (or dense code, that is, all the neurons change their firing rate in the event-related state) to maximize the synaptic resources released at the downstream neuron. Thus, for sparse population activity, while facilitatory synapses are optimally used, depressing synapses are subutilized.

Effects of STP parameters on optimal rate and gain
Next, we investigated how N opt , r opt , and G max vary with STP parameters. To this end, we systematically changed synapses from facilitatory to depressing by jointly varying the set of parameters: U ¼ f0:05; :::; 0:9g; t rec ¼ f0:02; 0:5g ms and t f ¼ f0:5; :::; 0:02g ms. We found that r opt decayed exponentially as the synapses became more depressing (Fig. 3A). This follows from the fact that facilitatory synapses profit from high firing rates and depressing synapses avoid negative gains at lower rates.
The maximum gain G max also decreased exponentially as synapses were systematically changed from facilitatory to depressing (Fig. 3B). We found that the relationship between gain and optimal rate was linear from mildly to strongly facilitatory synapses (Fig. 3C), with larger basal rates constraining the optimal conditions to lower rates with lower gains.
Interestingly, increasing the basal firing rate r bas substantially reduced r opt and G max . This is surprising because, at such low values of spiking rates, STP effects are hardly perceivable in traditional paired-pulse ratio analyses. The high value of G max , when the system operates at low r bas , happens because of synapses taking advantage of the nonlinearities in their individual Q s ext (Fig. 2B). Increased basal activity attenuates these nonlinearities, therefore impairing the distribution-dependent gain.

Relationship between facilitatory synapses and sparse coding
We quantified the optimal distribution of an evoked neural signal by OD (see Materials and Methods). High OD (OD ! 1) indicates a dense distribution in which many neurons spike to encode the extra activity, whereas low OD (OD ! 0) indicates a sparse distribution. We found that OD changed abruptly from sparse to dense as synapses were changed from facilitatory to depressing (Fig.  3D). Facilitatory synapses yielded maximum response for sparse while depressing synapses yielded maximum response (avoid negative gains) for dense distributions. The transition point from sparse to dense OD did not depend on the stimulus duration. However, the basal rate strongly modified the transition point, with higher r bas allowing only strongly facilitatory synapses to take advantage of sparse distributions. This configuration remained independent of the stimulus intensity as long as the circuit operates at low signal-to-noise conditions (r d =r bas ,1; Fig. 2G).
In the above, we changed the synapses from facilitatory to depressing by linearly modifying the whole set of parameters together. Next, we systematically varied each of D B A E C Figure 3. Effects of STP attributes on maximum gain of the neural population. Here, the STP parameters U, t rec and t f were varied in the following range, from more facilitatory to more depressing: U : 0:05 ! 0:9; t rec : 20 ! 500 ms and t f : 500 ! 20 ms. A, The optimal frequency r opt as a function of STP properties that gradually and monotonically change the synapse from facilitatory to depressing. r opt is high for facilitatory synapses and low for depressing synapses. r opt monotonically decreases as synapses change from facilitatory to depressing. As the basal rate is increased, r opt decreased for all types of synapses. The circles mark the parameters position for the facilitatory (s1, blue) and a depressing (s2, red) synapses used as reference in this work. B, G max as a function of STP properties that gradually and monotonically change the synapse from facilitatory to depressing. Similar to r opt , G max also decays rapidly for more depressing types and for higher basal rates. C, Relationship between G max and r opt . Notice the approximately linear relationship for facilitatory synapses, with the slope steadily decreasing with increasing r bas . This summarizes our prediction that higher basal firing rates will constrain the amplitudes of activity distribution-dependent gains and optimal encoding rates. D, Optimal distribution of presynaptic activity that maximizes the gain G. The change from sparse (OD;0) to dense (OD;1) optimal distribution is abrupt and occurs approximately at the same STP region for different stimulus durations (T s ). However, the transition point where OD changes from sparse to dense code is strongly modulated by r bas , higher basal rates allow for sparse code only for much more facilitatory synapses. E, Optimal distribution of rate as a function of the three key model parameters (U, t rec , and t f ). The variable U is the most influential in defining the optimal encoding distribution, with U;0:45 defining the OD transition point for T s ¼ 40 ms and r bas ¼ 0:5 Hz. Marker sizes represent OD values, with large ones for OD = 1 and small ones for OD;0. Effects of STP attributes on maximum gain of the neural population can be seen in Extended Data the STP parameters independently and measured the OD for maximum gain. We found that the transition region was primarily governed by the facilitation factor U (U ; 0:45), with a weak dependence on t rec and t f (Fig.  3E). The relative contribution of t rec and t f became more relevant at higher T s (Extended Data Fig. 3-1).
These results clearly highlight the importance of the stationary basal rate in how well the synaptic gain modulation operates, as only low r bas allows for significant gains. Importantly, the switch-like behavior of the optimal distribution indicates that, for a given population code, there is a robust range of STP attributes that could produce positive gains. This transition point seems to be relatively independent of the signal duration but is strongly affected by r bas . Finally, having a low initial release probability (defined in the model by a low U) seems to be the preeminent feature in defining the optimal OD.
The Equation 7 suggests that OD is independent of the population size (N). However, there is a lower limit of N below which the sparsity argument does not hold. We have shown that given the STP parameters, there is an optimum firing rate r opt at which signal carrying neurons should operate to maximize the gain (Fig. 2F). For a given rate it is optimal to distribute spikes over N opt input channels (Fig. 2F). However, when N ¼ N opt , then clearly the optimal distribution will not be sparse. The argument for sparseness arises when N..N opt . When N..N opt , and we increase N while keeping all other parameters constant, OD will decrease. However, if we change N while keeping all other parameters constant, the signal-to-noise ratio will change. The signal-to-noise ratio is defined as R ext =R bas , where R bas ¼ N Â r bas and if we change N, R bas will also change. In order to maintain the signal-to-noise ratios comparable for low and high N scenarios, we need to scale R ext accordingly. Therefore, here, we defined R ext in proportion to R bas so that it can accommodate the changes in N. With this choice, OD is indeed independent of N (see Eq. 7).

Effects of different sources of enhancement on G max
The enhancement of the output at facilitatory synapses could, in principle, have many causes (Valera et al., 2012;Thanawala and Regehr, 2013;Jackman and Regehr, 2017). Using the TM model (Eq. 1), we phenomenologically accounted for two important sources: a low initial C B A D Figure 4. Effects of resources recovery time constant t rec and facilitation factor U on G max and r opt for T s ¼ 40 ms, for facilitatory synapses. A, The r opt surface, as a function of U and t rec , shows that a given optimal encoding rate can be matched by different combinations of synaptic parameters. For example, the iso-frequency curve of 150 Hz (black line) can be achieved with either fU ¼ 0:05; t rec ¼ 90 msg ( ) or fU ¼ 0:1; t rec ¼ 15 msg ((). B, The G max surface, as a function of U and t rec , shows that the same maximum gain can observed for many different combinations of U and t rec . The black line shows the contour for G max ¼ 109%. The two configurations with same r opt marked in panel A have distinct gains ( ¼ 109%; ( ¼ 92%). C, We fix U and vary t rec (circle sizes) to match r opt (x-axis), then observe the gain. Larger values of U systematically produce smaller gains. Recovery time has a lower boundary t rec ¼ 10 ms. D, G max as a function of U for three different values of r opt . Larger values of U require smaller values of t rec (circle sizes) to match the same r opt , but as a consequence the gain decreases as we increase U. Effects of resources recovery time constant t rec and facilitation factor U on G max and r opt for facilitatory synapses can be seen in Extended Data Figure 4-1.
release probability which sequentially increases with each incoming spike (Jackman et al., 2016) and fast replenishment of readily available resources (Crowley et al., 2007). The first characteristic is mimicked by a low facilitation factor U, which determines the initial release probability after a long quiescent period and the proportional increase in it after each spike. The second mechanism is captured by a fast recovery time constant t rec .
We systematically varied U and t rec and measured G max and r opt . We found that several different combinations of U and t rec resulted in the same optimal distribution gain and rate. However, when we changed U and t rec while keeping the r opt fixed, G max could no longer be kept constant and vice versa. For instance, the two parameter sets fU ¼ 0:05; t rec ¼ 90 msg and fU ¼ 0:1; t rec ¼ 15 msg gave r opt ¼ 150Hz (Fig. 4A), but the first parameter set gave G max ¼ 109% and the second parameter set gave  Figure 5. Combined optimal distribution of activity in a FF-EI circuit with target-dependent STP. A, G as a function of r d and N ext for a facilitatory synapse (s1, top) and for a depressing synapse (s2, bottom). This is similar to the Figure 2E. B, The combined gain (G com ¼ G s1 À G s2 ) of the FF-EI circuit as a function of r d and N ext obtained by combining the gains of the FFE and FFI branches. The black line marks the N com opt for every stimulus intensity r d and is represented with dashed black lines in panel A. In-box, Schematic of the FF-EI circuit. C, Gain as a function of N ext for r d ¼ 8% of r bas (gray lines in panels A, B). G com inherits the non-monotonicity from G s1 (blue, compare with Fig. 2D). The gain for a depressing synapse is negative (G s2 , red) for every N ext ,N. D, Nopt as a function of r d produces iso-frequency lines (compare with Fig. 2F). r com opt is markedly larger than r s1 opt (top). G com max is independent of r d (for r d ,r bas , compare with Fig. 2G). Gain for both synapse types at r com opt (dashed lines). The small decrease in synaptic gain for s1 is compensated by putting s2 in a very negative gain region (bottom). E, G com max surface for different combinations of STP characteristics of s1 and s2. Notice that G com max steadily increases for s1 ! Fac or s2 ! Dep, and that G com max ¼ 0 whenever s1 is more depressing than s2. The circle marks the specific fs1; s2g combination used in the other panels and throughout the work. F, Effects of ongoing basal activity r bas on optimal conditions for T s ¼ 40 ms. Increasing basal activity decreases the combined optimal rate (top) and combined maximum gain (center). Results for three different U at the facilitatory synapse. Increasing basal activity consistently decreases G max . G com max decay happens mostly because of decay of the positive gain at the facilitatory synapse s1 (blue), while the negative gain at the depressing synapse s2 is kept negative and change only slightly (red). Dashed vertical line marks the basal activity used for most part of our analysis, r bas ¼ 0:5 Hz, where both branches contribute significantly to increase the combined gain (bottom). G max = 92% (Fig. 4B). Holding U fixed and choosing t rec to match with different r opt showed that G max consistently dropped for higher U (Fig. 4C,D).
These results indicate that, in terms of maximum gain G max , the fine tuning of intracellular mechanisms that work to steadily increase a low initial release probability might be more important than fast vesicle replenishment mechanisms. This remains true for larger T s (Extended Data Fig. 4-1).
In summary, our results show that a set of presynaptic STP parameters generates a gain surface G that, in principle, could be tuned to match presynaptic population activity characteristics. The optimal rate and the maximum gain are independent of the stimulus intensity for a low signal-to-noise ratio, with facilitatory synapses yielding high gains for sparse distributions while depressing synapses avoid negative gains only with dense distributions. For low basal activity (r bas = 0.5 Hz) and short duration integration window (T s = 40 ms) conditions, the parameter U is the principal determinant of the optimal distribution. Furthermore, lower U yields a higher gains than lower t rec when the optimal encoding rate is kept constant.
We extend our previous analysis to a scenario in which the presynaptic population makes synaptic contacts not only with a readout neuron, but also with the local inhibitory population which projects to the readout neuron creating the FF-EI motif. Both, the readout neuron and the inhibitory group receive the same spike trains via two different types of synapses, s1 and s2 (Fig. 1C). Because the presynaptic population activity is the same for both synapses (r bas ¼ 0:5 ms; T s ¼ 40 ms), the differences in gain (G) are governed by the STP properties of the two synapses. Figure 5A shows G for a facilitatory (s1, U = 0.1, t f ¼ 200 ms; t rec ¼ 50 ms) and a depressing (s2, U = 0.7, t f ¼ 50 ms; t rec ¼ 200 ms) synapse.
In the case of a FF-EI network, those two synapse types may be associated with the two branches, for example s1 to the feedforward excitation (FFE) branch (targeting a principal neuron) and s2 to the feedforward inhibition (FFI) branch (targeting local interneurons which eventually project to principal neurons; Fig. 5B, inset). In this arrangement, the combined gain is determined by the two branches G com ¼ G s1 À G s2 . We found that the combined gain of the FF-EI circuit also varied non-monotonically as a function of N ext and peaked at N com opt which corresponded to the combined optimal encoding rate r com opt (Fig. 5B,C). Note that the combined maximum gain of the FF-EI circuit is larger than the gain obtained via the FFE branch with facilitatory synapses alone (Fig. 2C). This substantial increase is a consequence of the strictly negative profile of G s2 . When the extra input is distributed in N com opt units (sparse coding), the depressing branch of the FF-EI drove the local inhibitory group with weaker strength than a scenario in which N ext ¼ N (dense coding). Therefore, with sparse distribution of the input, the readout neuron experienced stronger excitation from the FFE branch and weaker inhibition from FFI branch.
Similar to the behavior of facilitatory synapses, in the FF-EI network N com opt increased linearly as a function of r d , maintaining a constant optimal encoding rate r comb opt (Fig.  2D, top). We also observed that r comb opt was larger than r s1 opt (N com opt , N s1 opt ), making the isolated gain of s1 suboptimal. However, this can be compensated by putting s2 into a very negative gain region (Fig. 5D, bottom, red dashed line), with a sparse distribution of the inputs. We show analytically that r com opt and G com max are independent of the extra rate for a wide range of conditions (see Materials and Methods).
We extended this analysis to a large range of fs1; s2g STP combinations by gradually changing the set of parameters fU; t f ; t rec g (Fig. 5E). We found that G com max increased monotonically when we made the synapse s1 more facilitatory or when we made the synapse s2 more depressing. The anti-diagonal (where s1 ¼ s2) marked the region of zero gain and any point above it (s2 more facilitatory than s1) resulted in G com max ¼ 0, whereas any point below it (s1 more facilitatory than s2) resulted in G com max .0. As expected, if s1 is highly facilitatory and s2 highly depressing the combined effect will be of very high gains, given that the presynaptic activity is optimally distributed.

Effects of basal activity on the FF-EI network
Next, we investigated the effects of the stationary basal activity at the combined optimal conditions of a FF-EI network. We found that the optimal rate and optimal gain both decreased as r bas was increased (Fig. 5F). Separation of the individual contributions of s1 and s2 branches revealed that this decrease was primarily because of a reduction in the gain of facilitatory synapses (s1) whereas the strong negative gain of depressing synapses (s2) remained approximately unaltered. This suggests that a population of facilitatory synapses will lose most of its activity distribution-dependent gain as the basal firing rate is increased, whereas a population of depressing synapses can preserve this capability even at larger basal rates.
Thus, these results show that a FF-EI network with target-dependent STP can make the discrimination of sparse activity more robust than what could be achieved by the FFE alone. This can be achieved when the excitatory branch is facilitatory while the activation of the inhibitory branch is depressing (by placing s1 and s2 at the region below the anti-diagonal in Fig. 5E).

Sparse code identification by a postsynaptic neuron model
The ability of STP to amplify the output of a presynaptic population would be functionally relevant only if this amplification is transferred to the postsynaptic side. We tested the postsynaptic effects of the STP based modulation of the presynaptic activity distribution by simulating an I&F neuron model (Eq. 15) as a readout device for a FF-EI circuit (Fig. 6A). We simulate a presynaptic population with characteristics similar to the cerebellar molecular  Figure 6. Transfer of STP gain from presynaptic population to postsynaptic neuron. A, Schematic of a readout neuron receiving feedfoward excitation and FFI input. The inhibitory group was driven by the PRR of synapses of type s 2 (Eq. 17). Sparse and dense activity patterns are schematically shown. The population temporally (T s ¼ 40 ms marked in gray) increases its firing rate by R ext ¼ 1:::10% of R bas in two different configurations: sparse (top, N ext = N opt ) or dense distribution (bottom, N ext ¼ 16; 000). For panels B-E, R ext ¼ 8% of R bas . B, PSTH of the spiking rate (3000 realizations) of readout neuron receiving sparse (black) or dense (gray) distribution of input activity. C, Mean membrane potential for sparse input (black) and dense input (gray) when the synapse s 1 was facilitatory and synapse s 2 was depressing. Red trace, Membrane potential when the synapse s 1 was facilitatory and synapses s 2 was static. Blue trace, Membrane potential when the synapse s 1 was static and synapses s 2 was depressing. The red and blue traces show the contributions from synaptic facilitation (FFE branch) and depression (FFI branch) to the neuron response. D, Changes in the total excitatory and inhibitory conductances for the four configurations of synapses (as show in the panel C). The dashed line marks the conductance changes for the static synapses condition. E, Effect of varying N ext as a proportion of N com opt on the expected spike count (top, black circles) and the mean membrane potential (bottom, black circles) during the event-related activity period. Both profiles match the combined gain curve (gray line, compare with Fig. 5C), with peak at N ext ¼ N com opt . F, Probability distribution of output spike counts within T s . G, Separation (1-BC) between spike count distributions as a function of r d . The sparse distribution produced increasingly substantial separation when compared with basal (dark gray) and dense distribution (black), whereas the separation was always small when comparing dense distribution with basal activity (light gray). layer, a massively feedforward system with properties much alike the ones we have described so far (Ito, 2006). Specifically, the readout neuron received input from 160,000 presynaptic neurons. The presynaptic background activity was modeled as independent and homogeneous Poisson spike trains with average firing of r bas ¼ 0:5 Hz (R bas ¼ 80 kHz). In addition, the population of presynaptic neurons increased their firing rate (R ext ¼ 1:::10% of R bas ) during a brief time window (T s ¼ 40ms) to mimic an event-related activity. The extra presynaptic activity was either confined to a small set of presynaptic neurons (N ext = N opt , sparse) or distributed over a large number of neurons (N ext ¼ 16; 000, dense). The excitatory synapses onto the readout neuron (s1) were facilitatory and the STP parameters for each synapse were drawn from a Gaussian distribution (s1, U : mean ¼ :1; s:d: ¼ :02; t rec : mean ¼ 50; s:d: ¼ 10 ms and t f : mean ¼ 200; s:d: ¼ 40 ms). The FFI activity was modeled as a Poisson process whose firing rate (l i ; Eq. 9) was linearly dependent on the excitatory input of depressing synapses, whose STP parameters for each synapse were drawn from a Gaussian distribution (s2, U : mean ¼ 0:7; s:d: ¼ :14; t rec : mean ¼ 200; s:d: ¼ 40 ms, and t f : mean ¼ 50; s:d: ¼ 10 ms). Maximum weights of each excitatory and inhibitory were drawn from Gaussian distributions (Be : mean ¼ 25; s:d: ¼ 2:5 nS; Bi : mean ¼ 2:; s:d: ¼ :2 nS).
The distribution of the input had a noticeable effect in the output of the target neuron, as shown by the peristimulus time histogram (Fig. 6B). While the dense distribution elicited transients at the beginning and ending of the stimulus period because of the inhibition slow time constant, the sparse code elicited a sustained elevated firing rate response throughout the stimulus period. The stimulus induced membrane potential responses for the two types of input patterns (dense and sparse) were also similar to the firing rate responses (Fig. 6C). By interchangeably setting s1 and s2 to static, we identified that both branches contributed significantly to keep the mean membrane potential high in the presence of extra sparse input.
The contribution of each branch becomes clear at the average change in the total excitatory and inhibitory conductances of the readout neuron. When both synapses were dynamic and the stimulus was sparse (Fig. 6D, leftmost), the average excitation was larger (because of synaptic facilitation) and the average inhibition was lower (because of synaptic depression) than the average changes caused by a stimulus of the same intensity but with dense distribution (Fig. 6D, rightmost). Note how, with dynamic synapses and dense distribution of the stimulus, the conductance changes matched the expected change for static synapses (dashed line). When we kept the stimulus distribution sparse, but interchangeably set s1 and s2 to static, the conductance trace related to the static branch reached the same value as for the dense distribution and the system was left with the gain produced at the dynamic branch. Dense distributions, therefore, do not exploit the STP nonlinearities and the synapses behave approximately as static, as predicted.
Next, we systematically changed N ext as percentages of N opt (N ext = 1%, 10%, 25%, 50%, 100%, 200%, 400%, 1000% of N opt , black circles in Fig. 6E) and found that both the mean membrane potential and the average spike count during the stimulus period followed profiles that closely matched the predicted G com curve (Fig. 6E). This result confirms that the modulation of the PRR from the presynaptic population is faithfully translated into postsynaptic variables (gain estimated at the presynaptic side and membrane potential and spike rate measured on the postsynaptic neuron side). Furthermore, this result also highlights the robustness of this mechanism, even with considerable deviations from the optimal encoding distribution (N ext = 50% or N ext = 200% of N opt , marked as the first black points at left and right from N ext = N opt ), the evoked responses remained reasonably close to the optimal.
To further assess how individual realizations of the sparse input could be distinguished from a dense input of the same intensity, we sampled the output spike count of the readout neuron for a period of 40 ms during the ongoing basal activity just before the stimulus and during the 40 ms stimulus period for both sparse and dense distributions (Fig. 6F). We used the Bhattacharyya coefficient (BC) as a measure of overlap between these sample distributions and 1-BC as a measure of difference (Fig. 6G). The dense input had almost complete overlap with the basal condition. On the other hand, the sparse input produced increasingly different response distributions from both the dense input and basal condition, with almost complete separation at r d ¼ 10% of r bas .
Taken together, these results illustrate the potential role of dynamic synapses in amplification of sparse signals at the presynaptic side (Q p , G), even when such signal intensity is just a small fraction of the ongoing basal activity and, therefore, likely to be buried in proportionally large noise fluctuations. In addition, for a dense distribution of the input, the system can preserve short periods (;10 ms) of increased (decreased) spike probability right after stimulus onset (offset) because of delayed inhibition, which is a known characteristic of FF-EI motifs and might serve as indication of global background rate changes.

Continuous extra rate distribution
Thus far, we have considered a binary distribution of the extra rate: a fraction of presynaptic cells increased their rate by r ext or not at all. Although some neural networks might roughly operate in this binary fashion, it is important to ask how would such STP-driven gains operate under continuous distributions, a perhaps more comprehensive way of describing the activity distribution of many neural populations. We therefore estimate the optimal conditions for when the extra presynaptic activity follows a g distribution (Eq. 19).
The variation of the shape parameter (1,k,20) changes the distribution from an exponential to a quasi-Gaussian. For each fixed shape, we control the mean (therefore the sparsity; Eq. 20) of the distribution with the scale parameter (10 À2 ,u ,10 3 ). For each set fk; u g we calculate the expected gain (Eq. 21) yielded by a population of facilitatory synapses (s1, 7A left), of depressing synapses (s2; Fig. 7A, center) and the combined gain (Fig. 7A, right). Nine particular parameters choices are demonstrated in Figure 7B, where the central panels follow the choices that maximize the combined gain.
We found that, similar to the binary distribution case, the gain for facilitatory synapses followed a non-monotonic curve as a function of u (for a fixed k), with negative values at high u (overly sparse distribution), a single peak at the optimal u choice and convergence to 0 at low u (dense distribution). By contrast, depressing synapses showed negative gains, monotonically converging to zero at low u . The combined gain reached high values when s1 synapses were in very positive and s2 synapses were in very negative operating regions ( Fig. 7C; see Fig. 7A, gray line).
Interestingly, not only the gain magnitudes were very similar to the ones obtained with binary distributions (compare colorbars of Figs. 5A,B and 7A), but also with continuously distributed rates the points of maximum gain were obtained at high mean rates (in relation to r d ) and, therefore, representative of sparse distributions of the population activity. For  Figure 7. Synaptic gain when firing rates of individual neurons (r ext ) were draw from continuous distributions (g distribution). A, Gain surfaces for a facilitatory synapse (left), for a depressing synapse (middle) and for the combined effect in a FF-EI (right). Increasing the shape parameter (k) moved the distribution from exponential to approximately Gaussian. Decreasing the scale parameter (u ) moved the distribution from a high mean and high variance (sparse) to a low mean and low variance one (dense). Black lines mark the u that resulted in maximum combined gain for each value of k. Similar to the binary distribution, the G com max is obtained by putting s1 in positive and s2 in negative gain regions. Note that the gain values are in the same range as in Figure 5A Changing k affects the shape of the distribution: exponential (top row), right-skewed (middle row), and approximately Gaussian (bottom row) shapes. Changing u affects the scale of the distribution (sparsity of the population code): high mean and variance (left column), optimal mean and variance (middle column), and low mean and variance (right column). C, Gain curves for a facilitatory synapse (blue), for a depressing synapse (red), and for the combined effect in a FF-EI (black). These curves were obtained for a fixed k = 10 (gray lines on panel A) and gradually changing u . The gains as a function of the g activity distribution follow a profile similar to the binary distribution (compare with Fig. 5C). D, g -Distributed r ext (color plot) as a function of shape parameter. Mean r ext from the g distributions obtained with the optimal u for each value of k (black lines on panel A). As the g shape moves from an exponential to a Gaussian one (increasing k), the mean of the optimal distribution approaches the r opt for the binary distribution.
increasing values of k, the skewness of these distributions approached zero (i.e., became closer to a Gaussian) and the mean r ext of the optimal u approaches the r opt obtained by binary distributions. These results further corroborate the effects of the activity distribution-dependent gain modulation in presynaptic populations with STP.

Continuous basal rate distribution
In the preceding analysis we assumption that r bas is fixed and the same for all presynaptic units. A more natural scenario, however, would be to consider a continuous distribution of basal firing rates. We extend our analysis to account for this continuous r bas scenario in a similar way to what we did for r ext : we modelled the distribution of basal rates with a g distribution, with varying shape and scale parameters. The variation of the shape parameter (1,k,20) changed the distribution from an exponential to a quasi-Gaussian (Fig. 8A). For each fixed shape, we controlled the mean of the distribution (Eq. 20) with the scale parameter.
We calculated the expected values for Q s bas ; Q s d , and Q s opt for each distribution shape (Fig. 8B) and found that these values converged to the values estimated with a fixed r bas for higher k (quasi-Gaussian) and diverged for lower k (exponential). Using these traces, we calculated the gains (Fig. 8C) and again found that the results converged to the estimated values for fixed r bas . For higher r bas , however, the differences between fixed and continuously distributed r bas were more pronounced.
The divergence we observed can be explained by the probabilities that any chosen unit will have a rate below or above the mean r bas value. As discussed above, higher r bas will hinder the exploitation of STP nonlinearities and, therefore, reduce the possibility of higher gains. For exponential-like distributions, a higher proportion of the population has r bas , r bas , which reduces this hindering effect, even if a smaller part of the population (for which r bas . r bas ) gets more impaired. As the distribution gets closer to a Gaussian (increase in shape parameter), the proportions of the population with r bas below or above the r bas become almost equal. In the limit of k ! 1, the variance of the g distribution will approximate zero (for our fixed mean) and the gains will converge to the values estimated with fixed r bas .
It is worth noting that, for higher r bas , the spike rate variances are also higher and the estimates of gains with fixed r bas become less accurate. This means that our simplified predictions of the hindering impact that higher r bas have on the distribution-dependent gains will likely be an overestimate of the actual effects in real neuron populations. In other words, the distribution-dependent gains in facilitatory populations can be more resilient to higher r bas than what is predicted by a fixed r bas models.

From presynaptic gains to postsynaptic rate changes
The readout neuron in our simulations operates in a regime where the presynaptic gains are reliably translated into readout firing rate gains, which is equivalent to saying that the postsynaptic transfer function is independent of the input distribution. However, both synapses and readout neuron dendrites/soma can operate in a nonlinear regime B A C Figure 8. Continuous distribution of r bas . A, Two configurations of a g distribution with the same mean value r bas = 0.5 Hz. When k = 1, the distribution is exponential and there is a higher probability that any chosen neuron will have r bas , 0.5 Hz as opposed to r bas . 0.5 Hz. A higher shape value k = 20 brings the distribution closer to a Gaussian and decreases variance. B, Expected values of Q s bas ; Q s d and Q s opt for varied g shape values. The higher k is, the smaller is the variance and the results converge to the estimated values with fixed r bas . C, Similar to the other variables, the optimal gains start with higher values (when k is low) and converge to the estimated values with fixed r bas for higher k. For lower rates the estimates do not diverge much (see overlapping solid and dashed lines when r bas = 0.5 Hz), but for higher r bas the increased variance of the continuous distribution has non trivial effects. This indicates that distribution-dependent gains from facilitatory populations might be more resilient to higher basal levels than initially predicted. and further transform the presynaptic gain described above. These non-linearities reflect in the transfer function of the neuron, i.e. the probability of an output spike given a certain input.
To identify in which circumstances changes in postsynaptic transfer function may affect the transfer of presynaptic gains into output firing rate, let us consider a neuron with two possible transfer functions (Fig. 9, blue and red curves). The transfer function TF-1 is similar to the one we have considered previously in Figure 6. The TF-2 shows a sharp change. Such a sharp change in the transfer function may arise, for example, because of NMDA receptors. When input is strong, postsynaptic depolarization can remove the Mg 21 block and creates a larger EPSP and increase the spike probability (Du et al., 2017). Similarly, nonlinear local dendritic integration (Polsky et al., 2004), input correlations (de la Rocha and Parga, 2005), and voltage dependent ion channels may also create input-dependent changes in the neuron transfer function. When the neuron transfer function can change between TF-1 and TF-2, the output firing rate is not only determined by the effective input (sparse . dense, for s1-facilitatory) but also by the qualitative differences in the two transfer functions.
Sparse input distributions will allocate extra incoming spikes as bursts, which could potentially cause extra accumulation of neurotransmitters (for s1-facilitatory) in specific dendritic sites, triggering supralinear integration (TF-2, red curve). If a dense input distribution does not attain the triggering of TF-2 and instead keeps operating under TF-1, the difference between presynaptic gains of sparse and dense distributions will be further increased (see the difference between points 1 and 4; Fig. 9).
In cases where both input distributions operate under the same TF the presynaptic gains will be reliably transferred into output rates (compare points 1 and 2 for TF-1 and points 3 and 4 for TF-2 in Fig. 9). Finally, when a dense distribution of inputs makes the output neuron operate under TF-2 and a sparse distribution brings the neuron to operate under TF-1, the presynaptic gains could potentially be overcome (Fig. 9, compare points 2 and 3).
Linear approximation of Q s d We solve Q s numerically (Eq. 3) and show that it behaves linearly for a moderate range of rates in different STP regimes (Fig. 10). The approximation by a linear function, Q s d ¼ Q s bas 1S s Á r d , allows G max and r opt to be independent of the stimulus intensity and population size (Eqs. 13,14).
To which extent is the linear approximation valid? To investigate this, we solve Q s d for gradually increasing r d (Fig.  10A) departing from a range of different basal levels r bas ¼ 0:5:::10 Hz. We then compare the slopes for each r d to the slope for r d ¼ 0:1 Á r bas and see how much they deviate from it (Fig. 10B,D). If, for a given r bas , increasing r d would result in significant change in the regressed S s , then G max would be dependent on the stimulus intensity r d . We also show the R 2 statistics to confirm the accuracy of the linear approximation (Fig. 10C,E).
As we observe, for low signal-to-basal ratios (r d =r bas ,1), there is a wide range of rates for which the approximation is good enough, with jS s dev j,1% and R 2 . 99.9%. Specially for low r bas , the approximation is valid for the whole range of r d .

Discussion
Our results suggest how the activity distribution of a presynaptic population can exploit the nonlinearities of short-term synaptic plasticity and, with that, the theoretical potential of synaptic dynamics to endow a postsynaptic target with the ability to discriminate between weak signals and background activity fluctuations of the same amplitude. Such mechanisms have the advantage of being in-built in synapses, not requiring further recurrent computation or any sort of supervised learning to take place. This feature is likely to be present in different brain regions, e.g. the cerebellum and the hippocampus, and might have critical implications for general information processing in the brain.

Relevance to specific brain circuits
We have shown that STP can enhance the effective input when (1) stimulus is sparse, temporally bursty and (2) FFE synapses on the principal cells are facilitatory and FFE synapses on local fast-spiking, inhibitory interneurons are depressing. These two conditions are fulfilled in several brain regions. Sparse coding provides many advantages for neural representations (Babadi and Sompolinsky, 2014) and associative learning (Litwin- Kumar et al., 2017). As discussed in the following, a number of experimental studies provide support for sparse coding in several brain regions such as the neocortex, cerebellum and hippocampus.
In the cerebellum, glomeruli in the granular layer actively sparsify the multimodal input from mossy fibers into relatively few simultaneously bursting PFs (Billings et al., 2014) projecting to Purkinje cells (PuC). A single PuC might sample from hundreds of thousands of PFs (Tyrrell and Willshaw, 1992;Ito, 2006). In behaving animals, PF present two stereotypical activity patterns, a noisy basal state with rates lower than 1 Hz during long periods interleaved by short-duration (;40 ms), high-frequency (usually .100Hz) bursts carrying sensory-motor information (Chadderton et al., 2004;Jörntell and Ekerot, 2006;van Beugen et al., 2013). Given the large number of PFs impinging on to a PuC, the fluctuations in basal rate are as big as the event-related high-frequency bursts. As our analysis shows, if PF synapses were static, the PuC would not be able to discriminate between high-frequency bursts and background fluctuations. However, PF synapses show short-term facilitation when targeting PuC and short-term depression when targeting B A C D E Figure 10. Linear approximation of Q s d . A, For basal rate r bas ¼ 0:5Hz and T s ¼ 40ms; Q s d is approximately linear for all STP regimes, facilitatory (U = 0.1,t rec ¼ 50ms; t f ¼ 2000ms), facilitatory/depressing (U = 0.4,t rec ¼ 100ms; t f ¼ 1000ms), or depressing (U = 0.7,t rec ¼ 2000ms; t f ¼ 500ms). B, Slope deviation for increasing r d in comparison to the slope for r d ¼ 0:1 Á r bas is always smaller than 0.6% for the three synapse types. C, R 2 is always close to 100% for the three synapse types. D, Absolute value of slope deviation, similar to panel B, but for r d departing from several different values of r bas . The gray line marks jS s dev j ¼ 1%. We observe that the linear approximation will work well throughout a large space (left from the gray line) for facilitation (left) and facilitation/depression (middle) regimes and gets a bit more constrained for depressing (right) synapses. E, Similar to panel C, but for r d departing from several different values of r bas .
Basket cells (Atluri and Regehr, 1996;Bao et al., 2010;Blackman et al., 2013;Grangeray-Vilmint et al., 2018). Basket cells perform strong, phasic somatic inhibition to PuCs (Jörntell et al., 2010). This circuit motif closely matches the FF-EI circuit investigated in this work (Fig.  6). Based on these similarities, we argue that one of the functional implications of the specific properties of STP is to enable the PuC to discriminate between information encoded in high-frequency bursts and background activity fluctuations.
In the neocortex, the population code in the layer 2/3 of the somatosensory (De Kock and Sakmann, 2008) and visual cortex of rats (Greenberg et al., 2008) and mice (Rochefort et al., 2009) is believed to be sparse (Petersen and Crochet, 2013), with short-lived bursts (usually ,20ms) of high firing rates occurring over low rate spontaneous activity (,0:5Hz). Additionally, it has been recently found that pyramidal cells at layer 2/3 of the mouse somatosensory cortex show short-term facilitation when targeting cells at layers 2/3 and 5 (Lefort and Petersen, 2017). The receptive field properties in the visual cortex are also consistent with the sparse code (Olshausen and Field, 1996). These characteristics suggest that the mechanism to discriminate between weak signals and background fluctuations may also be present in the neocortex. It is believed that such sparse representation at superficial cortical layers indicates strong stimulus selectivity (Petersen and Crochet, 2013), in which case the transient gain, provided by the target-dependent STP configuration of local pyramidal neurons, would be a suitable property for interlayer communication.
In the hippocampus, the Schaffer collaterals bringing signals from CA3 to CA1 operate under low basal firing rates with evoked bursts of high-frequency activity during short periods of time (Schultz and Rolls, 1999). The synapses from pyramidal cells in CA3 to pyramidal cells in CA1 are facilitatory and provide this pathway with extra gain control (Klyachko and Stevens, 2006). Simultaneously, Schaffer collaterals synapses to CA1 stratum radiatum interneurons show larger release probability than to pyramidal neurons (Sun et al., 2005). Therefore, it is likely that this STP-based stimulus/noise discrimination mechanism is also used to improve the transmission of sequential activity from CA3 to CA1.
As we have pointed above, STP configuration in the neocortex, hippocampus and cerebellum are consistent with the configuration that enables the neural networks to take advantage of sparse coding. However, it is important to notice that facilitatory excitatory inputs to other inhibitory cells also exist in the aforementioned circuits. These facilitatory inputs mostly target interneurons that form synapses on distal dendrites. The presence of facilitatory excitatory drive to these classes of inhibitory neurons is, however, unlikely to counteract the distribution-dependent transient gains, because they produce weaker, slower and persistent dendritic inhibition. Consistent with this idea, only parvalbumin-expressing neurons (that synapse on the soma), but not somatostatin-expressing neurons (that synapse on distal dendrites), modulate stimulus response gain (Wilson et al., 2012).
The initial release probability is the most distinguishable STP parameter between Schaffer collaterals synapses onto CA1 pyramidal cells versus CA1 interneurons (Sun et al., 2005). In line with that, our approach predicts that facilitatory mechanisms that steadily increase a low initial release probability during a fast sequence of spikes (low U) will have a greater impact on the optimal OD and gain amplitude than mechanisms for fast replenishment of resources (low t rec ). However, the speed of recovery has been shown to be itself an activity-dependent feature (Fuhrmann et al., 2004;Crowley et al., 2007;Valera et al., 2012;Doussau et al., 2017) and this could in principle increase the relevance of t rec .
The facilitatory or depressing nature of STP depends on the postsynaptic neuron type Reyes et al., 1998;Rozov et al., 2001;Sun et al., 2005;Pelkey and McBain, 2007;Bao et al., 2010;Blackman et al., 2013;Larsen and Sjöström, 2015;Éltes et al., 2017). Target-dependent STP is a strong indication that such short living dynamics are relevant for specific types of information processing in the brain (Middleton et al., 2011;Naud and Sprekeler, 2018). Here, we predict that, when accompanied by specific arrangements of target-dependent STP found experimentally in different brain regions, disynaptic inhibition could further increase the gain of sparse over dense distributions and make it robust even at higher basal activity, when the gain at facilitatory excitation decreases substantially.
Disynaptic inhibition following excitation is a common motif throughout the brain, and different classes of inhibitory neurons are believed to serve distinct computations within their local circuits (Wilson et al., 2012;Jiang et al., 2015). Despite a wide diversity of inhibitory cell types, a classification of FF-I into two main types, perisomatic and dendritic targeting, seems to be coherent with findings throughout the central nervous system. A remarkable attribute of this configuration is the consistency of the short-term dynamics of excitatory synapses across local circuits: depressing to perisomatic and facilitating to dendritic interneurons (Sun et al., 2005;Bao et al., 2010;Blackman et al., 2013;Éltes et al., 2017).
Disynaptic inhibition has been implicated in controlling the precision of a postsynaptic neuron's response to brief stimulation in the cerebellum (Mittmann et al., 2005;Ito, 2014) and hippocampus (Pouille and Scanziani, 2001). Additionally, the combination of disynaptic inhibition with target-dependent STP has been recently associated with the ability of networks to decode multiplexed neural signals in the cortex (Naud and Sprekeler, 2018). In line with these, our results show a bimodal profile of the readout neuron response to sparse or dense input code. We also demonstrate that, coexisting with the sustained gain during sparse code transmission, in a dense coding scenario, the system produces shorter periods (;10 ms) of increased (decreased) spike probability right after stimulus onset (offset; Fig. 6B, gray line). This results from inhibitory conductances (GABA) which are slower than the excitatory conductances (AMPA). This very short period of firing rate modulation might work as an indication of a widespread basal rate change in the presynaptic population.

Relationship with previous work
Historically, STP has been prominently explored as a frequency filter which renders an individual neuron as a low-pass filter (when synapses are depressing) or high pass filter (when synapses are facilitatory; Markram et al., 1998;Dittman et al., 2000;Abbott and Regehr, 2004). It has been suggested that under some conditions STD can also interact with subthreshold oscillation to modulate the gain of the neurons (Latorre et al., 2016). With STP the synaptic strength depends on recent history of the incoming spikes in a particular synapse. This automatically makes the downstream neurons more sensitive to transient fluctuations in input spike trains. In most of the previous work this specific property has been exploited for neural coding.
For instance, history dependence of STP means that the effect of serial correlations (that can be seen in the autocorrelogram of spike trains) and spike bursts in the presynaptic activity depends on whether the synapses express STF or STD. Synapses with STD reduce redundancy in the input spike train by reducing the PSPs of spikes that appear with a certain serial correlation or periodicity (Goldman et al., 2002). By contrast, when synapses express STF, they enhance the effect of serial correlations or spike bursts and the readout neuron can function as a burst detector (Lisman, 1997). In fact, both STF and STD can be combined to de-multiplex spike bursts from single spikes (Izhikevich et al., 2003;Middleton et al., 2011;Naud and Sprekeler, 2018). Thus, much emphasis has been put on understanding how STP can be used to extract information encoded in the pattern of spikes of a single input neuron.
Here, we extend this line of work and show how STP may affect the impact of a neuron ensemble on downstream neurons. Previous work has suggested that STP makes a neuron sensitive to transient rate changes. Given this property, when synapses show STD, input correlations can still modulate the neuron output for a wide range of firing rates (de la Rocha and Parga, 2005). Our work reveals a new consequence of the same effect as we show that STP renders the neurons in the brain with input distribution dependent gain, through which sparsebursty codes could have stronger downstream impact than dense codes with same intensity. Furthermore, we investigate the relative importance of different STP parameters and baseline firing rates for these gains. This novel feature could be a highly valuable asset in low signal-to-noise ratio conditions. Moreover, our results also show how synapses can impose further constrains on the neural code.

Experimental verification of model predictions
Experimentally, these results can be tested by measuring the distribution of evoked firing rates of the neurons and STP properties of the synapses in the same brain area. Recent technological advances in stimulation systems, allowing for submillisecond manipulation of single and multiple cells spike activity, might soon provide means for fine control of population spike codes in intact tissues (Shemesh et al., 2017). These, together with refined methods for single cell resolution imaging of entire populations (Xu et al., 2017;Weisenburger and Vaziri, 2018), may also allow for scrutinizing the extent of which the proposed synaptic mechanisms for distribution-dependent gain are present in neural networks. Our prediction about the role of background activity in determining the gain of the sparse or dense codes can be tested by changing the overall background activity using chemogenetic techniques.

Limitations and possible extensions
Here, we made several simplifications and assumptions to reveal that STP of synapses has important consequences for neural coding. Relaxing each of these simplifications and assumptions may affect our conclusions in certain conditions and should be investigated in further studies. In the following we briefly discuss a few crucial simplifications and how they might affect our results.
Our analyses considered the presynaptic activity to be comprised of independent Poisson processes, that is, whenever we choose a set of N ext presynaptic units to increase their firing rates, we choose them randomly. Because STP is a synapse specific property, input crosscorrelation will not affect PRR and Q and therefore, the presynaptic gain. However, it is well known that input correlation can change the gain of a neuron (Kuhn et al., 2003;de la Rocha and Parga, 2005).
It is conceivable that in some conditions, input correlations can potentially neutralize the advantage of sparse codes over dense codes. The readout neuron fluctuations (and therefore, their output firing rate) are dependent on the input correlation. For the same amount of pairwise correlation, the size of fluctuations in the readout neurons is directly proportional to the number of signal-carrying units (N ext ). A larger N ext (dense distribution) will elicit larger fluctuations than a smaller N ext (sparse distribution). This is because for a larger N ext more input spikes can occur together in the same time bin. Thus, input correlations may amplify the downstream impact of dense input distributions more than the sparse input distributions. The size of this effect of correlations depends on the number of inputs (N ext ) and the amount of correlations (both pairwise and higher-order). However, because cortical activity is weakly correlated (Ecker et al., 2010) such an effect of correlation may not be enough to completely neutralize the advantage of sparse distribution over dense distribution.
We also did not study the effect of spatial location of synapses in transferring the effect of sparse codes over dense codes to the readout neurons. There are at least two possibilities in which dendritic locations of the synapses may weaken the advantage of a sparse input distributions over a dense input distributions. First, when synaptic strength decreases as a function of distance from the soma: it is possible that in the sparse input distribution case, the synapses bringing the information are located far away from the soma while for dense input distribution at least some inputs will be closer to the soma. Therefore, even if on the presynaptic side, a sparse input distribution generates stronger outputs than a dense input distribution, its effect on the postsynaptic neurons may be weakened because of weaker synapses. However, even in this case, because of dendritic nonlinearities and Na 1 /Ca21 spikes (Larkum et al., 2009), distally located sparse distribution may still have a stronger response than proximally located dense input distributions. Second, the effect of synapses on certain dendrites is cancelled by strategically placed inhibitory synapses (Gidon and Segev, 2012). It is possible that a sparse distribution (because of fewer synapses) may get cancelled or weakened by strategically placed inhibition. The effect of such inhibition will indeed be weaker on dense input distributions as many more synapses will carry the input information. Thus, for sparse input distributions, their location on the neuron may be an important factor. A proper treatment of this question requires the knowledge of, e.g., neuron morphology, distribution of inhibition, and dendritic non-linearities, and should addressed in a separate study.
We also assumed that all the synaptic weights are sampled from the same Gaussian distribution as our goal was to consider a naive situation in which weights have not been "trained" for any specific task. Having different synaptic weight distributions may affect the value of the gains, especially when synaptic weights and input are associated (stimulus-specific tuning). Such different distributions may arise because of supervise/unsupervised learning. A systematic study of a network with stimulusspecific tuning of synaptic weights raises several pertinent questions and should be investigated in a separate study.
The transient enhancement or depression of synaptic efficacy by presynaptic mechanisms consists of many independent processes (Zucker and Regehr, 2002). The TM model is a tractable and intuitive way to account for these two phenomena of interest, but this parsimony comes at the cost of biophysical simplifications. For example, it assumes the space of available resources is a continuum (0 , x , 1) as opposed to the known discrete nature of transmitter-carrying vesicles. However, we argue that when modeling a large number of simultaneously active synapses, the variable of interest (population PRR) can be approximated by a continuous variable. The nonuniform amount of transmitters per vesicle might further justify this assumption.
Detailed STP models that try to account for specific intracellular mechanisms (Dittman et al., 2000), and stochasticity of the release process (Sun et al., 2005;Kandaswamy et al., 2010) have been proposed in the literature. We argue that, with more complex models of STP, our results might change quantitatively but the qualitative outcome of our analysis would remain: that presynaptic short-term facilitation (depression) yields a substantial positive (negative) gain to sparse over dense population codes. Nevertheless, it would be interesting to see how the gain and optimal rate predictions may be shaped by more detailed models.
Our analyses do not account for use-dependent recovery time, changes in the readily releasable pool size (Kaeser and Regehr, 2017) or vesicles properties heterogeneity. The effects of postsynaptic receptor desensitization and neurotransmitter release inhibition by retrograde messengers (Brown et al., 2003) are likely to decrease the estimated gain by counteracting facilitation. Another interesting extension could be used to further investigate the effects of input STP heterogeneity at compartment-dependent input using multi-compartment neuron models (Vetter et al., 2001;Grillo et al., 2018).
If the same patterns of bursts tend to happen repeatedly (e.g., PFs in cerebellum during continuously repetitive movement), there might be an optimal interburst interval (IBI opt ) for which, if bursts happen faster than IBI opt , the signal would be compromised (because of slow vesicles recovery time) and if bursts happen separated by intervals longer then IBI opt no extra gain will happen. Experimental evidence points to the importance of resonance in the band oscillations (5:0;10:0 Hz; interburst interval : 100;200 ms IBI) for cortical-cerebellar drive (Gandolfi et al., 2013;Chen et al., 2016) and for hippocampus (Buzsáki, 2002). In these cases, the slower interaction between different pools of vesicles (Rizzoli and Betz, 2005) are likely to play a role in information transfer. Augmentation, a form of transient synaptic enhancement that can last for seconds, is also likely to play a role in these cases (Kandaswamy et al., 2010;Deng and Klyachko, 2011).