## Abstract

The ability to discriminate spikes that encode a particular stimulus from spikes produced by background activity is essential for reliable information processing in the brain. We describe how synaptic short-term plasticity (STP) modulates the output of presynaptic populations as a function of the distribution of the spiking activity and find a strong relationship between STP features and sparseness of the population code, which could solve this problem. Furthermore, we show that feedforward excitation followed by inhibition (FF-EI), combined with target-dependent STP, promote substantial increase in the signal gain even for considerable deviations from the optimal conditions, granting robustness to this mechanism. A simulated neuron driven by a spiking FF-EI network is reliably modulated as predicted by a rate analysis and inherits the ability to differentiate sparse signals from dense background activity changes of the same magnitude, even at very low signal-to-noise conditions. We propose that the STP-based distribution discrimination is likely a latent function in several regions such as the cerebellum and the hippocampus.

- excitation/inhibition balance
- neural code
- short-term plasticity
- sparse code
- synaptic depression
- synaptic facilitation

## Significance Statement

What is the optimal way to distribute a fixed number of spikes over a set of neurons so the we get a maximal response in the downstream neuron? This question is at the core of neural coding. Here, we show that when synapses show short-term facilitation, sparse code (when a few neurons increase their firing rate in a task-dependent manner) is more effective than dense code (when many neurons increase their firing rate in a task-dependent manner). By contrast, when synapses show short-term depression a dense code is more effective than a sparse code. Thus, for the first time, we show that the dynamics of synapses itself has an effect in deciding the most effective neural code

## Introduction

The brain is a highly noisy system. At the cellular level, the neurons are unreliable in eliciting spikes and synapses are unreliable in transmitting the spikes to the postsynaptic neurons. At the network level, the connectivity and balance of excitation and inhibition gives rise to fluctuations in the background activity (Brunel, 2000; Kumar et al., 2008), which can be as high as the mean stimulus response (Arieli et al., 1996; Kenet et al., 2003). In such a noisy environment, a neuron is faced with a crucial task: how to discriminate stimulus-induced firing rate changes from fluctuations in the firing rate of the background activity of the same magnitude?

If synapses were static, that is, when the postsynaptic conductances (PSCs) do not depend on the immediate spike history, this task could not be accomplished, unless synapses are specifically tuned to do so. For instance, the identification of specific spiking patterns, filtering out presumed noise sequences, can be accomplished by precise tuning of synaptic weights (Gütig and Sompolinsky, 2006). This solution, however, relies on training synaptic weights using a certain supervised learning rule, and even then, it could only work for a specific set of spike timing sequences. Active dendrites (with voltage dependent ionic conductance) can also work as pattern detectors (Hawkins and Ahmad, 2016), but this mechanism would only work for signals constrained to locally clustered synapses. Therefore, despite being relevant for the understanding of signal processing in the brain, the mechanisms by which neural ensembles solve the activity discrimination problem have remained elusive.

Here, we show that short-term plasticity (STP) of synapses provides an effective and general mechanism to solve the aforementioned task. STP refers to the observation that synaptic strength changes on spike-by-spike basis, depending on the timing of previous spikes (Stevens and Wang, 1995; Zucker and Regehr, 2002), that is, STP arises because neurotransmitter release dynamics is history dependent and can be manifest as either short-term facilitation (STF) or short-term depression (STD). Thus, STP becomes a crucial part of neural hardware when information is encoded as firing rate. Indeed, STP has been suggested to play several important roles in neural information processing (Buonomano, 2000; Fuhrmann et al., 2002; Izhikevich et al., 2003; Abbott and Regehr, 2004; Middleton et al., 2011; Rotman et al., 2011; Scott et al., 2012; Rotman and Klyachko, 2013; Jackman and Regehr, 2017; Grangeray-Vilmint et al., 2018; Naud and Sprekeler, 2018).

An immediate consequence of STP is that the effective PSCs depend on the firing rates of individual presynaptic neurons (Fig. 1). This suggests that postsynaptic targets of populations with dynamic synapses could distinguish among different input firing rate distributions even without supervised learning. To demonstrate this feature of STP, we measured the response of postsynaptic neurons for a weak stimulus with amplitude one order of magnitude smaller than the background activity. By systematically changing the distribution of firing rates over the presynaptic neuron ensemble, we found that weak signals can be differentiated from the noisy fluctuations if the signal is appropriately distributed over the input ensemble. The optimal distribution that maximizes the discriminability depends on the nature of STP. We found that, for facilitatory synapses, sparse codes give better discrimination between a weak signal and dense background changes of the same intensity. By contrast, for depressing synapses, sparse codes result in highly negative gains in relation to dense background changes of the same magnitude. We also investigated feedforward networks with excitation and disynaptic inhibition, with target-dependent STP, and found that this arrangement allows for extra robustness for the output gain.

Finally, we demonstrate how STP can endow a postsynaptic neuron with the ability to differentiate sparsely encoded activity from dense activity of the same magnitude, a function that would be especially important at very low signal-to-noise regimes. Thus, our results reveal that the nature of STP may also constrain the nature of firing rate-based population code.

## Materials and Methods

### Model of STP

One parsimonious and yet powerful mathematical description of short-term synaptic dynamics was proposed already 20 years ago (Tsodyks and Markram, 1997). The Tsodyks–Markram (TM) model could first account for activity-dependent synaptic depression observed in pairs of neocortical pyramidal neurons and was soon extended to cover for facilitation (increase in probability) of vesicle release (Tsodyks et al., 1998). With a small set of parameters, the TM model is able to explain the opposed effects of depletion of available synaptic vesicles and of the increase in release probability caused by accumulation of residual calcium in the presynaptic terminal, making it suitable as a framework to conjecture general impact of STP in neural information processing.

Here, we use the TM model (Eq. 1) to describe the short-term synaptic dynamics. The effect of depression is modeled by depletion of the proportion of available resources, represented by the variable *x* (0 ≤ *x *≤* *1), which instantaneously decreases after each spike and returns to 1 with recovery time *τ _{rec}*. The gain effect of short-term facilitation is modeled by the facilitation factor

*U*(0 ≤

*U*≤

*1), which accounts for the accumulation of calcium at the presynaptic terminal after the arrival of an action potential.*

*U*transiently increases the release probability

*u*(0 ≤

*u*≤

*1), which returns to 0 with time constant*

*τ*: (1)where

_{f}*t*is the last spike time.

_{sp}### Proportion of released resources (*PRR*)

The change in the PSC *g ^{s}* after a presynaptic spike is proportional to the instantaneous

*PRR*( ) and to the absolute synaptic strength

*B*. The average instantaneous

^{s}*PRR*of a presynaptic unit can also be described as a function of a time-dependent Poissonian firing rate

*r*(

*t*) (Tsodyks et al., 1998) as: (2)where the brackets denote the average over many realizations. The total

*PRR*contribution of a single synapse, for a time window of duration

*T*, can then be obtained by integrating Equation 2 over this period: (3)

_{s}### Total effective input to a postsynaptic neuron

For a homogeneous presynaptic population with same STP parameters and individual basal firing rate *r _{bas}*, the population basal activity is
, where

*N*is the population size. We quantify

*R*as a multiple of

_{ext}*R*. Our analysis is restricted to the case of low signal-to-noise ratio, i.e., . We consider a simplified scenario where

_{bas}*R*is distributed homogeneously through a number

_{ext}*N*of selected presynaptic units, which will increase their firing rate by , while the remaining presynaptic units will keep their activity unchanged.

_{ext}The total *PRR* released to a target neuron by the entire population, during *T _{s}*, will then be
(4)where
and
are the total

*PRR*(Eq. 3) delivered by a stationary unit (firing at

*r*) and a stimulus encoding unit (firing at

_{bas}*r*+

_{bas}*r*), respectively.

_{ext}### Gain in the effective input

We are interested in the effects of varying the presynaptic distribution (over *N _{ext}* inputs) of this total extra rate (

*R*) on the effective input to postsynaptic targets. To estimate the change in the gain because of STP we used the maximally dense distribution, when

_{ext}*N*=

_{ext}*N*as the reference point: (5)where the

*δ*subscript denotes the smallest possible increase in individual firing rates,

*r*(maximally distributed

_{δ}*R*). We refer to this as the dense distribution case and it ideally represents a situation of homogeneous increase in the basal activity of the system, against which a stimulus would need to be distinguished from.

_{ext}*N*=

_{ext}*N*also implies smallest increase in individual operating rates (

*r*=

_{ext}*r*), therefore in the dense distribution case STP nonlinearities will be minimal. In other words,

_{δ}*N*=

_{ext}*N*is the point where dynamic synapses will operate as close to static as possible.

We then quantify the gain in
for a given *N _{ext}* always relative to
caused by an input of the same intensity but with dense distribution, as
(6)

We calculate the curves of *G* as a function of *N _{ext}* for different sets of STP parameters and basal rates and search for points where it is maximized (

*N*=

_{ext}*N*), which we call the optimal distribution (see example in Fig. 2

_{opt}*D*).

### Optimal distribution

The optimal distribution of the activity (*OD*) can be framed as the fraction of the optimal number of encoding units *N _{opt}* in a given population of size

*N*, that is, . Because the optimal code ( ) is the distribution that maximizes the gain over the dense distribution with the same input magnitude ( ),

*OD*can be written as (7)

We define *R _{ext}* as a fraction of

*R*to keep the same signal-to-noise ratio ( ) for populations of different sizes

_{bas}*N*. We find that

*r*is fixed given the STP parameters and

_{opt}*r*(see Results), therefore by defining

_{bas}*r*( ) as a fraction of

_{δ}*r*( ) we reach the interesting consequence of

_{bas}*OD*being independent of any particular choices of population size (Eq. 7). That is, given the same STP parameters and value of

*r*, populations of different sizes will optimally encode the same stimulus intensity (relative to their basal activity) with the same

_{bas}*OD*. Because the optimal encoding rate is constrained by , the optimal distribution will also be constrained to (see Fig. 3

*D*), with values close to zero or one characterizing sparse or dense distributions, respectively.

### Extended Data Figure 3-1

Effects of STP attributes on maximum gain of the neural population. ** A**, Similar to Figure 3 but with
.

**, Similar to Figure 3 but with . Both panels reproduce the main findings of Figure 3, showing that the distribution-dependent gain is high for facilitatory synapses and it is strongly affected by the basal rate even if we consider longer integration time windows. The relative importance of the recovery time constant and facilitation time constant in defining the optimal distribution**

*B**OD*grow with larger

*T*(large circles for

_{s}*OD*=

*1 and small circles for ), but the facilitation factor*

*U*keeps being the most relevant attribute in defining the

*OD*. Download Figure 3-1, EPS file.

### Optimal rate ( ) and maximum gain ( ) estimation

Equation 6 describes the gain *G* obtained by encoding a stimulus *R _{ext}* into

*N*units (with rates increased by ) as opposed to

_{ext}*N*units (with rates increased by ). The peak of this function (

*G*) is achieved by an optimal number of encoding units

_{max}*N*=

_{ext}*N*with their rate increased by . This maximum point can be found by taking the derivative of the gain function with respect to

_{opt}*r*and setting it equal to zero, (8)given that , this can be further simplified into (9)where and the value of

_{ext}*r*is the solution of the equation 9. This solution is independent of the stimulus intensity

_{opt}*R*and population size

_{ext}*N*(see results in Fig. 2

*F*).

For the optimal rate *r _{opt}*, the gain (Eq. 6) can be written as
(10)

Assuming that
is linear with slope *S ^{s}* for small

*r*, that is, (see below, Linear approximation of ), then

_{δ}*G*can be further simplified into (11)which makes

_{max}*G*independent of the stimulus intensity

_{max}*R*and population size

_{ext}*N*.

### Combined optimal rate ( ) and maximum gain ( ) estimation

When an axon branches to connect to different targets, STP properties might be target dependent. In the case of excitatory fibers driving feedforward excitation-inhibition (FF-EI) motifs, with synapses type 1 (*s*1) directly exciting a readout neuron and synapses type 2 (*s*2) driving the local inhibitory circuit (Fig. 1*C*), the combined gain is given by
(12)

To find the activity distribution that maximizes the combined gain, we take the derivative of *G ^{com}* with respect to

*r*, set it equal to zero and, assuming again that is linear with slope

_{ext}*S*for both synapses, find the equivalence (13)for which the solution, , is independent of the stimulus intensity

^{s}*R*and population size

_{ext}*N*. The optimal combined gain is then (14)which is also independent of the stimulus intensity

*R*and population size

_{ext}*N*.

### Numerical simulations

As a proof of concept of the potential relevance that the estimated presynaptic gains could have on postsynaptic targets, we performed numerical simulations of a conductance-based integrate-and-fire (I&F) neuron model acting as the readout device for a FF-EI circuit (See section Sparse code identification by a postsynaptic neuron mode). The I&F model’s membrane voltage *V _{m}* is described by
(15)where

*C*= 250 pF is the membrane capacitance,

_{m}*g*and

_{e}*g*are, respectively, the excitatory and inhibitory input conductances and

_{i}*V*= 0 mV and

_{e}*V*= –75 mV are the excitatory and inhibitory synaptic reverse potentials. When a spike occurs, the membrane voltage is reset to

_{i}*V*= –60 mV and held at this value for a refractory period of 2 ms. The synapses were modeled by

_{reset}*α*-functions (Kuhn et al., 2004) with time constants

*τ*= 0.5 ms for excitatory and

_{e}*τ*= 2 ms for inhibitory synapses.

_{i}The presynaptic population consisted of *N *=* *160,000 units that connected to the I&F neuron in a FF-EI arrangement. The population stationary basal rate was *R _{bas}* = 80 kHz, with the individual basal rate

*r*= 0.5 Hz. At the stationary basal rate, the synaptic states are described by (16)where is the expected rate of

_{bas}*PRR*by each synapse with STP parameters .

We simulate a neuron that, during stationary basal activity, is kept in the fluctuation-driven regime through excitation-inhibition input balance (Kuhn et al., 2004). While excitation is provided directly by *s*1, disynaptic inhibition is modulated by *s*2 in a linear fashion,
(17)

The inhibitory firing rate that keeps the target neuron membrane potential fluctuating around the mean value of
during stationary basal activity can be approximated by a linear function of the excitation (adapted from Kuhn et al., 2004):
(18)where *B _{e}* and

*B*are the maximum amplitudes for the excitatory and inhibitory synaptic conductances. Equation 18 allows to find the linear scale of Equation 17 that fulfills the condition . The inhibitory synapses are kept static (no STP). The extra presynaptic activity happens in blocks of and is defined as sparse (when

_{i}*N*=

_{ext}*N*) or dense (when

_{opt}*N*=

_{ext}*N*).

### Continuous rate distribution

Although some bursting networks [e.g., cerebellar parallel fibers (PFs)] seem to operate in a quasi-binary fashion (burst or no-burst), it is important to extend the analysis to continuous distributions, which most parts of the brain seem to operate under. We do this by assuming that the distribution of event-related neural firing rates follows a γ distribution, which allows us parameterized control of the sparseness of the neural code (with the mean of the distribution) and of the distribution shape (with the skewness and kurtosis):
(19)where *k* is the shape parameter and *θ* is the scale parameter. When *k *=* *1, this is equivalent to an exponential distribution and, for increasing values of *k*, this becomes a right-skewed distribution, with the skewness approaching zero for higher values of *k* (becoming approximately Gaussian). For each shape parameter, we controlled the mean of the distribution by varying the scale parameter, because for a γ distributed *r _{ext}* the expected value is
(20)

For the γ-specified distribution of extra rates and a given presynaptic set of STP parameters, the expected amount of resources released by a population is
(21)which we solved numerically for two synapse types (*s*1-facilitatory and *s*2-depressing) and a range of rate distributions. The distribution gain *G* for
was then calculated in relation to the dense case, where *N _{ext}* =

*N*and .

A glossary of of key symbols used throughout the work is given in Table 1. All the analyses and simulations were performed in MATLAB and Python. The model simulations were performed using Euler’s method with time step of 0.1 ms implemented in the neural simulator Brian2 (Stimberg et al., 2014). The simulation and analysis code is available on GitHub at https://github.com/luiztauffer/stp-activity-distribution.

## Results

Here, we are interested in a mechanism by which a neuronal network or a single postsynaptic neuron receiving multiple inputs may distinguish between different spiking distributions with the same intensity (e.g., the same number of spikes). This problem is schematically illustrated in Figure 1. Consider two scenarios. In the first scenario, seven spikes arrive from a single presynaptic neuron while others six remain silent (Fig. 1*A*, sparse distribution). In the second scenario, each of the seven presynaptic neurons spikes once. In both trials, the postsynaptic neuron receives seven spikes (Fig. 1*B*, dense distribution). Here, we test the hypothesis that when synapses exhibit STP (facilitation or depression) the two scenarios can be differentiated without any specific tuning of synaptic weights.

Static synapses evoke exactly the same PSC sequence for both sparse and dense distributions (black lines), making them indistinguishable for a readout neuron. However, when synapses are dynamic, short-term facilitation (blue line) enhances the PSC amplitudes compared with the static synapses (compare Fig. 1*A*,*B*, bottom traces). Short-term depression (red line) results in a weaker response as compared with the static synapses (compare Fig. 1*A*,*B*, bottom traces). If the incoming spikes are distributed along different synapses, the sequence of PSCs is identical for all types of synaptic dynamics (compare Fig. 1*A*,*B*, bottom traces).

*In vivo* neural coding is certainly more complex than the above example. However, this simple example suggests that in the case of a neuron receiving synaptic inputs via thousands of noisy synapses, STP could be a mechanism to differentiate between an evoked signal from the background activity fluctuations of the same amplitude, provided the former is encoded as a specific pattern that can exploit the STP properties of the synapses. In the following, we describe how well dynamic synapses could endow feedforward circuits with such activity distribution discrimination properties in low signal-to-noise regimes (Fig. 1*C*).

### Optimal activity distribution with dynamic synapses

We implemented dynamic synapses with the rate-based TM model (Tsodyks et al., 1998; Eq. 2). In this model, the instantaneous *PRR* depends on the resource release probability (*u*^{+}) and the proportion of available resources (*x*^{–}), which have their dynamics guided by the choice of STP model parameters
. For a transient increase in firing rate, a facilitatory synapse produces an average profile of sustained *PRR*, while a depressing synapse produces an average profile of rapid decaying *PRR* (Fig. 2*A*). Throughout this work, the two reference sets of values for STP types are: *U *=* *0.1,
and
(facilitatory) and *U *=* *0.7,
and
(depressing).

To quantify the effects that different profiles will have on the presynaptic output, for varying transient increases in firing rate (*r _{ext}*), we calculate the total amount of extra resources (
) a synapse releases over a time period of

*T*(Eq. 3; Fig. 2

_{s}*B*). We found that varied in a nonlinear fashion as a function of

*r*, with depressing dynamics approaching saturation much faster than facilitatory dynamics. The slope of (Fig. 2

_{ext}*B*, inset) for depressing synapses is monotonically decreasing, indicating that any increase in the firing rate in those synapses will produce sublinear increase in , whereas for facilitatory synapses the slope initially grows, indicating that increases in the firing rate of those synapses, up to some point, will produce supralinear increase in .

In the brain, neurons typically receive inputs from a large ensemble of presynaptic neurons. In the ongoing activity state, these neurons spike at a low-basal firing rate (*r _{bas}*) with the total synaptic output of
. In the event-related activity state, the firing rate of a subset of presynaptic neurons (

*N*) is transiently increased and the total synaptic output (Eq. 4) changes accordingly. We distribute a fixed event-related population rate increase

_{ext}*R*into varied numbers of chosen synapses

_{ext}*N*, each of these chosen synapses increasing its firing rate by

_{ext}*r*, that is, , and report the changes in .

_{ext}We found that, for a population of facilitatory synapses,
varied in a non-monotonic fashion as a function of *N _{ext}*, initially increasing up to a peak point, then decreasing (Fig. 2

*C*, left). By contrast, for depressing synapses (Fig. 2

*C*, right), varied in a monotonically increasing fashion. For both facilitatory and depressing synapses, converged to their respective when the total extra input rate

*R*was distributed over all the neurons such that

_{ext}*N*=

_{ext}*N*and .

These results suggest that, when synapses are facilitatory, the total amount of synaptic resources released during a event-related activity state is maximized when event-related spiking activity is confined to a small number of synapses.
was smaller than
when *R _{ext}* was distributed into a small subset of presynaptic neurons, because those chosen neurons spiked at very high rates and the synapses ran out of vesicle resources rapidly. When the event-related input was distributed over all the presynaptic neurons, the
also decreased because in such a scenario
was too small to fully exploit the benefits of synaptic facilitation. In contrast to the facilitatory synapses, for depressing synapses it was more beneficial to distribute the event-related spiking activity over the whole input ensemble to maximize the total amount of synaptic resources released. In this condition,
was small enough to avoid any losses in vesicle release caused by depression.

### Activity distribution-dependent gain

To further quantify the effect of distribution of event-related activity over the input ensemble (that is, how neurons increase their rate in the event-related phase), we defined the distribution gain *G* as the proportional change in
in relation to
(Eqs. 5, 6). We found that
is approximately a linear function of *r _{δ}* for a wide range of scenarios (see Materials and Methods) and, because of that, with the dense distribution of the activity (when all the presynaptic neurons change their firing rate by a small amount

*r*in the event-related activity state), even dynamic synapses behave approximately as static synapses. Therefore,

_{δ}*G*can be understood either as a gain over a dense distribution or as a gain over static synapses. For facilitatory synapses, just as for ,

*G*follows a non-monotonic curve as a function of

*N*, with a single peak at

_{ext}*N*(Fig. 2

_{opt}*D*, blue line). By contrast, depressing synapses resulted in negative gains to every distribution, except for

*N*=

_{ext}*N*where

*G*=

*0% (Fig. 2*

*D*, red line).

Next, we estimate *N _{opt}* and

*G*for a range of extra activity intensities (Fig. 2

*E*, for facilitatory synapses). For these calculations, we parameterized the extra activity

*R*as a fraction of the basal firing rate

_{ext}*R*(correspondingly,

_{bas}*r*as % of

_{δ}*r*; see Materials and Methods). We found that, for facilitatory synapses,

_{bas}*N*increased linearly with the extra activity intensity (Fig. 2

_{opt}*G*), resulting in an optimal encoding rate

*r*which is independent of the input intensity. For depressing synapses, the optimal distribution

_{opt}*N*=

_{opt}*N*did not change with the extra activity intensity, making the optimal encoding rate always

*r*=

_{opt}*r*.

_{δ}Because the presynaptic neurons are assumed to be Poisson processes, an advantage of parametrize *R _{ext}* in terms of fraction of

*R*is that it directly translates to signal-to-noise ratio. For the example shown in Figure 2

_{bas}*G*, we found that STP could amplify the presynaptic output for weak signals (which were <10% of the basal activity) by up to 60% if the extra rate was distributed over

*N*synapses as opposed to

_{opt}*N*synapses. For low signal-to-noise ratios ( ), the gain at the optimal distribution (

*G*) was approximately constant and always positive for facilitatory synapses, while depressing synapses keep

_{max}*G*= 0 at

_{max}*N*=

_{opt}*N*(Fig. 2

*G*). Finally, we analytically show that the independence of

*r*and

_{opt}*G*from the extra activity intensity is a good approximation for a wide range of basal rates and STP types (see Materials and Methods).

_{max}These results suggest that when synapses are facilitatory, the input should be distributed sparsely (or sparse code, that is, only a small set of neurons change their firing rate in the event-related state) to maximize the total amount of synaptic resources released at the downstream neuron. By contrast, when synapses are depressing, the input should be distributed densely (or dense code, that is, all the neurons change their firing rate in the event-related state) to maximize the synaptic resources released at the downstream neuron. Thus, for sparse population activity, while facilitatory synapses are optimally used, depressing synapses are subutilized.

### Effects of STP parameters on optimal rate and gain

Next, we investigated how *N _{opt}*,

*r*, and

_{opt}*G*vary with STP parameters. To this end, we systematically changed synapses from facilitatory to depressing by jointly varying the set of parameters: ms and ms. We found that

_{max}*r*decayed exponentially as the synapses became more depressing (Fig. 3

_{opt}*A*). This follows from the fact that facilitatory synapses profit from high firing rates and depressing synapses avoid negative gains at lower rates.

The maximum gain *G _{max}* also decreased exponentially as synapses were systematically changed from facilitatory to depressing (Fig. 3

*B*). We found that the relationship between gain and optimal rate was linear from mildly to strongly facilitatory synapses (Fig. 3

*C*), with larger basal rates constraining the optimal conditions to lower rates with lower gains.

Interestingly, increasing the basal firing rate *r _{bas}* substantially reduced

*r*and

_{opt}*G*. This is surprising because, at such low values of spiking rates, STP effects are hardly perceivable in traditional paired-pulse ratio analyses. The high value of

_{max}*G*, when the system operates at low

_{max}*r*, happens because of synapses taking advantage of the nonlinearities in their individual (Fig. 2

_{bas}*B*). Increased basal activity attenuates these nonlinearities, therefore impairing the distribution-dependent gain.

### Relationship between facilitatory synapses and sparse coding

We quantified the optimal distribution of an evoked neural signal by *OD* (see Materials and Methods). High *OD* (
) indicates a dense distribution in which many neurons spike to encode the extra activity, whereas low *OD* (
) indicates a sparse distribution. We found that *OD* changed abruptly from sparse to dense as synapses were changed from facilitatory to depressing (Fig. 3*D*). Facilitatory synapses yielded maximum response for sparse while depressing synapses yielded maximum response (avoid negative gains) for dense distributions. The transition point from sparse to dense *OD* did not depend on the stimulus duration. However, the basal rate strongly modified the transition point, with higher *r _{bas}* allowing only strongly facilitatory synapses to take advantage of sparse distributions. This configuration remained independent of the stimulus intensity as long as the circuit operates at low signal-to-noise conditions (
; Fig. 2

*G*).

In the above, we changed the synapses from facilitatory to depressing by linearly modifying the whole set of parameters together. Next, we systematically varied each of the STP parameters independently and measured the *OD* for maximum gain. We found that the transition region was primarily governed by the facilitation factor *U* (
), with a weak dependence on *τ _{rec}* and

*τ*(Fig. 3

_{f}*E*). The relative contribution of

*τ*and

_{rec}*τ*became more relevant at higher

_{f}*T*(Extended Data Fig. 3-1).

_{s}These results clearly highlight the importance of the stationary basal rate in how well the synaptic gain modulation operates, as only low *r _{bas}* allows for significant gains. Importantly, the switch-like behavior of the optimal distribution indicates that, for a given population code, there is a robust range of STP attributes that could produce positive gains. This transition point seems to be relatively independent of the signal duration but is strongly affected by

*r*. Finally, having a low initial release probability (defined in the model by a low

_{bas}*U*) seems to be the preeminent feature in defining the optimal

*OD*.

The Equation 7 suggests that OD is independent of the population size (*N*). However, there is a lower limit of *N* below which the sparsity argument does not hold. We have shown that given the STP parameters, there is an optimum firing rate *r _{opt}* at which signal carrying neurons should operate to maximize the gain (Fig. 2

*F*). For a given rate it is optimal to distribute spikes over

*N*input channels (Fig. 2

_{opt}*F*). However, when , then clearly the optimal distribution will not be sparse. The argument for sparseness arises when . When , and we increase

*N*while keeping all other parameters constant,

*OD*will decrease. However, if we change

*N*while keeping all other parameters constant, the signal-to-noise ratio will change. The signal-to-noise ratio is defined as , where and if we change

*N*,

*R*will also change. In order to maintain the signal-to-noise ratios comparable for low and high

_{bas}*N*scenarios, we need to scale

*R*accordingly. Therefore, here, we defined

_{ext}*R*in proportion to

_{ext}*R*so that it can accommodate the changes in

_{bas}*N*. With this choice,

*OD*is indeed independent of

*N*(see Eq. 7).

### Effects of different sources of enhancement on *G*_{max}

_{max}

The enhancement of the output at facilitatory synapses could, in principle, have many causes (Valera et al., 2012; Thanawala and Regehr, 2013; Jackman and Regehr, 2017). Using the TM model (Eq. 1), we phenomenologically accounted for two important sources: a low initial release probability which sequentially increases with each incoming spike (Jackman et al., 2016) and fast replenishment of readily available resources (Crowley et al., 2007). The first characteristic is mimicked by a low facilitation factor *U*, which determines the initial release probability after a long quiescent period and the proportional increase in it after each spike. The second mechanism is captured by a fast recovery time constant *τ _{rec}*.

We systematically varied *U* and *τ _{rec}* and measured

*G*and

_{max}*r*. We found that several different combinations of

_{opt}*U*and

*τ*resulted in the same optimal distribution gain and rate. However, when we changed

_{rec}*U*and

*τ*while keeping the

_{rec}*r*fixed,

_{opt}*G*could no longer be kept constant and vice versa. For instance, the two parameter sets and gave (Fig. 4

_{max}*A*), but the first parameter set gave and the second parameter set gave

*G*= 92% (Fig. 4

_{max}*B*). Holding

*U*fixed and choosing

*τ*to match with different

_{rec}*r*showed that

_{opt}*G*consistently dropped for higher

_{max}*U*(Fig. 4

*C*,

*D*).

### Extended Data Figure 4-1

Effects of resources recovery time constant *τ _{rec}* and facilitation factor

*U*on

*G*and

_{max}*r*for facilitatory synapses.

_{opt}**, Similar to Figure 4 but with**

*A**T*= 100 ms.

_{s}**, Similar to Figure 4 but with**

*B**T*= 300 ms. Both panels reproduce the main findings of Figure 4, showing that a given optimal encoding rate can be matched by different combinations of synaptic parameters, but resulting in different gains. Download Figure 4-1, EPS file.

_{s}These results indicate that, in terms of maximum gain *G _{max}*, the fine tuning of intracellular mechanisms that work to steadily increase a low initial release probability might be more important than fast vesicle replenishment mechanisms. This remains true for larger

*T*(Extended Data Fig. 4-1).

_{s}In summary, our results show that a set of presynaptic STP parameters generates a gain surface *G* that, in principle, could be tuned to match presynaptic population activity characteristics. The optimal rate and the maximum gain are independent of the stimulus intensity for a low signal-to-noise ratio, with facilitatory synapses yielding high gains for sparse distributions while depressing synapses avoid negative gains only with dense distributions. For low basal activity (*r _{bas}* = 0.5 Hz) and short duration integration window (

*T*= 40 ms) conditions, the parameter

_{s}*U*is the principal determinant of the optimal distribution. Furthermore, lower

*U*yields a higher gains than lower

*τ*when the optimal encoding rate is kept constant.

_{rec}### Feedforward inhibition (FFI) and heterogeneous STP

In the above we ignored the fact that presynaptic STP can be target dependent (Markram et al., 1998; Reyes et al., 1998; Rozov et al., 2001; Sun et al., 2005; Pelkey and McBain, 2007; Bao et al., 2010; Blackman et al., 2013; Larsen and Sjöström, 2015; Éltes et al., 2017), and the spike trains coming from the same axon can be modulated by different short-term dynamics at different synapses. In the following, we describe the effects of such heterogeneity in a FF-EI motif (Fig. 1*C*), an ubiquitous circuit motif across the brain (Klyachko and Stevens, 2006; Dean et al., 2009; Isaacson and Scanziani, 2011; Wilson et al., 2012; Jiang et al., 2015; Grangeray-Vilmint et al., 2018).

We extend our previous analysis to a scenario in which the presynaptic population makes synaptic contacts not only with a readout neuron, but also with the local inhibitory population which projects to the readout neuron creating the FF-EI motif. Both, the readout neuron and the inhibitory group receive the same spike trains via two different types of synapses, *s*1 and *s*2 (Fig. 1*C*). Because the presynaptic population activity is the same for both synapses (
), the differences in gain (*G*) are governed by the STP properties of the two synapses. Figure 5*A* shows *G* for a facilitatory (*s*1, *U *=* *0.1,
) and a depressing (*s*2, *U *=* *0.7,
) synapse.

In the case of a FF-EI network, those two synapse types may be associated with the two branches, for example *s*1 to the feedforward excitation (FFE) branch (targeting a principal neuron) and *s*2 to the feedforward inhibition (FFI) branch (targeting local interneurons which eventually project to principal neurons; Fig. 5*B*, inset). In this arrangement, the combined gain is determined by the two branches
. We found that the combined gain of the FF-EI circuit also varied non-monotonically as a function of *N _{ext}* and peaked at
which corresponded to the combined optimal encoding rate
(Fig. 5

*B*,

*C*). Note that the combined maximum gain of the FF-EI circuit is larger than the gain obtained via the FFE branch with facilitatory synapses alone (Fig. 2

*C*). This substantial increase is a consequence of the strictly negative profile of . When the extra input is distributed in units (sparse coding), the depressing branch of the FF-EI drove the local inhibitory group with weaker strength than a scenario in which (dense coding). Therefore, with sparse distribution of the input, the readout neuron experienced stronger excitation from the FFE branch and weaker inhibition from FFI branch.

Similar to the behavior of facilitatory synapses, in the FF-EI network
increased linearly as a function of
, maintaining a constant optimal encoding rate
(Fig. 2*D*, top). We also observed that
was larger than
(
), making the isolated gain of *s*1 suboptimal. However, this can be compensated by putting *s*2 into a very negative gain region (Fig. 5*D*, bottom, red dashed line), with a sparse distribution of the inputs. We show analytically that
and
are independent of the extra rate for a wide range of conditions (see Materials and Methods).

We extended this analysis to a large range of
STP combinations by gradually changing the set of parameters
(Fig. 5*E*). We found that
increased monotonically when we made the synapse *s*1 more facilitatory or when we made the synapse *s*2 more depressing. The anti-diagonal (where
) marked the region of zero gain and any point above it (*s*2 more facilitatory than *s*1) resulted in
, whereas any point below it (*s*1 more facilitatory than *s*2) resulted in
. As expected, if *s*1 is highly facilitatory and *s*2 highly depressing the combined effect will be of very high gains, given that the presynaptic activity is optimally distributed.

### Effects of basal activity on the FF-EI network

Next, we investigated the effects of the stationary basal activity at the combined optimal conditions of a FF-EI network. We found that the optimal rate and optimal gain both decreased as *r _{bas}* was increased (Fig. 5

*F*). Separation of the individual contributions of

*s*1 and

*s*2 branches revealed that this decrease was primarily because of a reduction in the gain of facilitatory synapses (

*s*1) whereas the strong negative gain of depressing synapses (

*s*2) remained approximately unaltered. This suggests that a population of facilitatory synapses will lose most of its activity distribution-dependent gain as the basal firing rate is increased, whereas a population of depressing synapses can preserve this capability even at larger basal rates.

Thus, these results show that a FF-EI network with target-dependent STP can make the discrimination of sparse activity more robust than what could be achieved by the FFE alone. This can be achieved when the excitatory branch is facilitatory while the activation of the inhibitory branch is depressing (by placing *s*1 and *s*2 at the region below the anti-diagonal in Fig. 5*E*).

### Sparse code identification by a postsynaptic neuron model

The ability of STP to amplify the output of a presynaptic population would be functionally relevant only if this amplification is transferred to the postsynaptic side. We tested the postsynaptic effects of the STP based modulation of the presynaptic activity distribution by simulating an I&F neuron model (Eq. 15) as a readout device for a FF-EI circuit (Fig. 6*A*). We simulate a presynaptic population with characteristics similar to the cerebellar molecular layer, a massively feedforward system with properties much alike the ones we have described so far (Ito, 2006).

Specifically, the readout neuron received input from 160,000 presynaptic neurons. The presynaptic background activity was modeled as independent and homogeneous Poisson spike trains with average firing of
(
). In addition, the population of presynaptic neurons increased their firing rate (
of *R _{bas}*) during a brief time window (
) to mimic an event-related activity. The extra presynaptic activity was either confined to a small set of presynaptic neurons (

*N*=

_{ext}*N*, sparse) or distributed over a large number of neurons ( ,

_{opt}*dense*). The excitatory synapses onto the readout neuron (

*s*1) were facilitatory and the STP parameters for each synapse were drawn from a Gaussian distribution (

*s*1, and ). The FFI activity was modeled as a Poisson process whose firing rate (

*λ*; Eq. 9) was linearly dependent on the excitatory input of depressing synapses, whose STP parameters for each synapse were drawn from a Gaussian distribution (

_{i}*s*2, , and ). Maximum weights of each excitatory and inhibitory were drawn from Gaussian distributions ( ).

The distribution of the input had a noticeable effect in the output of the target neuron, as shown by the peristimulus time histogram (Fig. 6*B*). While the dense distribution elicited transients at the beginning and ending of the stimulus period because of the inhibition slow time constant, the sparse code elicited a sustained elevated firing rate response throughout the stimulus period. The stimulus induced membrane potential responses for the two types of input patterns (dense and sparse) were also similar to the firing rate responses (Fig. 6*C*). By interchangeably setting *s*1 and *s*2 to static, we identified that both branches contributed significantly to keep the mean membrane potential high in the presence of extra sparse input.

The contribution of each branch becomes clear at the average change in the total excitatory and inhibitory conductances of the readout neuron. When both synapses were dynamic and the stimulus was sparse (Fig. 6*D*, leftmost), the average excitation was larger (because of synaptic facilitation) and the average inhibition was lower (because of synaptic depression) than the average changes caused by a stimulus of the same intensity but with dense distribution (Fig. 6*D*, rightmost). Note how, with dynamic synapses and dense distribution of the stimulus, the conductance changes matched the expected change for static synapses (dashed line). When we kept the stimulus distribution sparse, but interchangeably set *s*1 and *s*2 to static, the conductance trace related to the static branch reached the same value as for the dense distribution and the system was left with the gain produced at the dynamic branch. Dense distributions, therefore, do not exploit the STP nonlinearities and the synapses behave approximately as static, as predicted.

Next, we systematically changed *N _{ext}* as percentages of

*N*(

_{opt}*N*= 1%, 10%, 25%, 50%, 100%, 200%, 400%, 1000% of

_{ext}*N*, black circles in Fig. 6

_{opt}*E*) and found that both the mean membrane potential and the average spike count during the stimulus period followed profiles that closely matched the predicted

*G*curve (Fig. 6

^{com}*E*). This result confirms that the modulation of the

*PRR*from the presynaptic population is faithfully translated into postsynaptic variables (gain estimated at the presynaptic side and membrane potential and spike rate measured on the postsynaptic neuron side). Furthermore, this result also highlights the robustness of this mechanism, even with considerable deviations from the optimal encoding distribution (

*N*= 50% or

_{ext}*N*= 200% of

_{ext}*N*, marked as the first black points at left and right from

_{opt}*N*=

_{ext}*N*), the evoked responses remained reasonably close to the optimal.

_{opt}To further assess how individual realizations of the sparse input could be distinguished from a dense input of the same intensity, we sampled the output spike count of the readout neuron for a period of 40 ms during the ongoing basal activity just before the stimulus and during the 40 ms stimulus period for both sparse and dense distributions (Fig. 6*F*). We used the Bhattacharyya coefficient (BC) as a measure of overlap between these sample distributions and 1–BC as a measure of difference (Fig. 6*G*). The dense input had almost complete overlap with the basal condition. On the other hand, the sparse input produced increasingly different response distributions from both the dense input and basal condition, with almost complete separation at
of *r _{bas}*.

Taken together, these results illustrate the potential role of dynamic synapses in amplification of sparse signals at the presynaptic side (*Q ^{p}*,

*G*), even when such signal intensity is just a small fraction of the ongoing basal activity and, therefore, likely to be buried in proportionally large noise fluctuations. In addition, for a dense distribution of the input, the system can preserve short periods ( ) of increased (decreased) spike probability right after stimulus onset (offset) because of delayed inhibition, which is a known characteristic of FF-EI motifs and might serve as indication of global background rate changes.

### Continuous extra rate distribution

Thus far, we have considered a binary distribution of the extra rate: a fraction of presynaptic cells increased their rate by *r _{ext}* or not at all. Although some neural networks might roughly operate in this binary fashion, it is important to ask how would such STP-driven gains operate under continuous distributions, a perhaps more comprehensive way of describing the activity distribution of many neural populations. We therefore estimate the optimal conditions for when the extra presynaptic activity follows a γ distribution (Eq. 19).

The variation of the shape parameter (
) changes the distribution from an exponential to a quasi-Gaussian. For each fixed shape, we control the mean (therefore the sparsity; Eq. 20) of the distribution with the scale parameter (
). For each set
we calculate the expected gain (Eq. 21) yielded by a population of facilitatory synapses (*s*1, 7A left), of depressing synapses (*s*2; Fig. 7*A*, center) and the combined gain (Fig. 7*A*, right). Nine particular parameters choices are demonstrated in Figure 7*B*, where the central panels follow the choices that maximize the combined gain.

We found that, similar to the binary distribution case, the gain for facilitatory synapses followed a non-monotonic curve as a function of *θ* (for a fixed *k*), with negative values at high *θ* (overly sparse distribution), a single peak at the optimal *θ* choice and convergence to 0 at low *θ* (dense distribution). By contrast, depressing synapses showed negative gains, monotonically converging to zero at low *θ*. The combined gain reached high values when *s*1 synapses were in very positive and *s*2 synapses were in very negative operating regions (Fig. 7*C*; see Fig. 7*A*, gray line).

Interestingly, not only the gain magnitudes were very similar to the ones obtained with binary distributions (compare colorbars of Figs. 5*A*,*B* and 7*A*), but also with continuously distributed rates the points of maximum gain were obtained at high mean rates (in relation to *r _{δ}*) and, therefore, representative of sparse distributions of the population activity. For increasing values of

*k*, the skewness of these distributions approached zero (i.e., became closer to a Gaussian) and the mean

*r*of the optimal

_{ext}*θ*approaches the

*r*obtained by binary distributions. These results further corroborate the effects of the activity distribution-dependent gain modulation in presynaptic populations with STP.

_{opt}### Continuous basal rate distribution

In the preceding analysis we assumption that *r _{bas}* is fixed and the same for all presynaptic units. A more natural scenario, however, would be to consider a continuous distribution of basal firing rates. We extend our analysis to account for this continuous

*r*scenario in a similar way to what we did for

_{bas}*r*: we modelled the distribution of basal rates with a γ distribution, with varying shape and scale parameters. The variation of the shape parameter ( ) changed the distribution from an exponential to a quasi-Gaussian (Fig. 8

_{ext}*A*). For each fixed shape, we controlled the mean of the distribution (Eq. 20) with the scale parameter.

We calculated the expected values for
, and
for each distribution shape (Fig. 8*B*) and found that these values converged to the values estimated with a fixed *r _{bas}* for higher

*k*(quasi-Gaussian) and diverged for lower

*k*(exponential). Using these traces, we calculated the gains (Fig. 8

*C*) and again found that the results converged to the estimated values for fixed

*r*. For higher

_{bas}*r*, however, the differences between fixed and continuously distributed

_{bas}*r*were more pronounced.

_{bas}The divergence we observed can be explained by the probabilities that any chosen unit will have a rate below or above the mean *r _{bas}* value. As discussed above, higher

*r*will hinder the exploitation of STP nonlinearities and, therefore, reduce the possibility of higher gains. For exponential-like distributions, a higher proportion of the population has , which reduces this hindering effect, even if a smaller part of the population (for which ) gets more impaired. As the distribution gets closer to a Gaussian (increase in shape parameter), the proportions of the population with

_{bas}*r*below or above the become almost equal. In the limit of , the variance of the γ distribution will approximate zero (for our fixed mean) and the gains will converge to the values estimated with fixed

_{bas}*r*.

_{bas}It is worth noting that, for higher
, the spike rate variances are also higher and the estimates of gains with fixed *r _{bas}* become less accurate. This means that our simplified predictions of the hindering impact that higher

*r*have on the distribution-dependent gains will likely be an overestimate of the actual effects in real neuron populations. In other words, the distribution-dependent gains in facilitatory populations can be more resilient to higher

_{bas}*r*than what is predicted by a fixed

_{bas}*r*models.

_{bas}### From presynaptic gains to postsynaptic rate changes

The readout neuron in our simulations operates in a regime where the presynaptic gains are reliably translated into readout firing rate gains, which is equivalent to saying that the postsynaptic transfer function is independent of the input distribution. However, both synapses and readout neuron dendrites/soma can operate in a nonlinear regime and further transform the presynaptic gain described above. These non-linearities reflect in the transfer function of the neuron, i.e. the probability of an output spike given a certain input.

To identify in which circumstances changes in postsynaptic transfer function may affect the transfer of presynaptic gains into output firing rate, let us consider a neuron with two possible transfer functions (Fig. 9, blue and red curves). The transfer function TF-1 is similar to the one we have considered previously in Figure 6. The TF-2 shows a sharp change. Such a sharp change in the transfer function may arise, for example, because of NMDA receptors. When input is strong, postsynaptic depolarization can remove the Mg^{2+} block and creates a larger EPSP and increase the spike probability (Du et al., 2017). Similarly, nonlinear local dendritic integration (Polsky et al., 2004), input correlations (de la Rocha and Parga, 2005), and voltage dependent ion channels may also create input-dependent changes in the neuron transfer function. When the neuron transfer function can change between TF-1 and TF-2, the output firing rate is not only determined by the effective input (sparse > dense, for *s*1-facilitatory) but also by the qualitative differences in the two transfer functions.

Sparse input distributions will allocate extra incoming spikes as bursts, which could potentially cause extra accumulation of neurotransmitters (for *s*1-facilitatory) in specific dendritic sites, triggering supralinear integration (TF-2, red curve). If a dense input distribution does not attain the triggering of TF-2 and instead keeps operating under TF-1, the difference between presynaptic gains of sparse and dense distributions will be further increased (see the difference between points 1 and 4; Fig. 9).

In cases where both input distributions operate under the same TF the presynaptic gains will be reliably transferred into output rates (compare points 1 and 2 for TF-1 and points 3 and 4 for TF-2 in Fig. 9). Finally, when a dense distribution of inputs makes the output neuron operate under TF-2 and a sparse distribution brings the neuron to operate under TF-1, the presynaptic gains could potentially be overcome (Fig. 9, compare points 2 and 3).

### Linear approximation of

We solve *Q ^{s}* numerically (Eq. 3) and show that it behaves linearly for a moderate range of rates in different STP regimes (Fig. 10). The approximation by a linear function,
, allows

*G*and

_{max}*r*to be independent of the stimulus intensity and population size (Eqs. 13, 14).

_{opt}To which extent is the linear approximation valid? To investigate this, we solve
for gradually increasing
(Fig. 10*A*) departing from a range of different basal levels
Hz. We then compare the slopes for each
to the slope for
and see how much they deviate from it (Fig. 10*B*,*D*). If, for a given *r _{bas}*, increasing

*r*would result in significant change in the regressed

_{δ}*S*, then

^{s}*G*would be dependent on the stimulus intensity

_{max}*r*. We also show the

_{δ}*R*

^{2}statistics to confirm the accuracy of the linear approximation (Fig. 10

*C*,

*E*).

As we observe, for low signal-to-basal ratios (
), there is a wide range of rates for which the approximation is good enough, with
dev
and *R*^{2} > 99.9%. Specially for low *r _{bas}*, the approximation is valid for the whole range of

*r*.

_{δ}## Discussion

Our results suggest how the activity distribution of a presynaptic population can exploit the nonlinearities of short-term synaptic plasticity and, with that, the theoretical potential of synaptic dynamics to endow a postsynaptic target with the ability to discriminate between weak signals and background activity fluctuations of the same amplitude. Such mechanisms have the advantage of being in-built in synapses, not requiring further recurrent computation or any sort of supervised learning to take place. This feature is likely to be present in different brain regions, e.g. the cerebellum and the hippocampus, and might have critical implications for general information processing in the brain.

### Relevance to specific brain circuits

We have shown that STP can enhance the effective input when (1) stimulus is sparse, temporally bursty and (2) FFE synapses on the principal cells are facilitatory and FFE synapses on local fast-spiking, inhibitory interneurons are depressing. These two conditions are fulfilled in several brain regions. Sparse coding provides many advantages for neural representations (Babadi and Sompolinsky, 2014) and associative learning (Litwin-Kumar et al., 2017). As discussed in the following, a number of experimental studies provide support for sparse coding in several brain regions such as the neocortex, cerebellum and hippocampus.

In the cerebellum, glomeruli in the granular layer actively sparsify the multimodal input from mossy fibers into relatively few simultaneously bursting PFs (Billings et al., 2014) projecting to Purkinje cells (PuC). A single PuC might sample from hundreds of thousands of PFs (Tyrrell and Willshaw, 1992; Ito, 2006). In behaving animals, PF present two stereotypical activity patterns, a noisy basal state with rates lower than 1 Hz during long periods interleaved by short-duration ( ), high-frequency (usually ) bursts carrying sensory-motor information (Chadderton et al., 2004; Jörntell and Ekerot, 2006; van Beugen et al., 2013). Given the large number of PFs impinging on to a PuC, the fluctuations in basal rate are as big as the event-related high-frequency bursts. As our analysis shows, if PF synapses were static, the PuC would not be able to discriminate between high-frequency bursts and background fluctuations. However, PF synapses show short-term facilitation when targeting PuC and short-term depression when targeting Basket cells (Atluri and Regehr, 1996; Bao et al., 2010; Blackman et al., 2013; Grangeray-Vilmint et al., 2018). Basket cells perform strong, phasic somatic inhibition to PuCs (Jörntell et al., 2010). This circuit motif closely matches the FF-EI circuit investigated in this work (Fig. 6). Based on these similarities, we argue that one of the functional implications of the specific properties of STP is to enable the PuC to discriminate between information encoded in high-frequency bursts and background activity fluctuations.

In the neocortex, the population code in the layer 2/3 of the somatosensory (De Kock and Sakmann, 2008) and visual cortex of rats (Greenberg et al., 2008) and mice (Rochefort et al., 2009) is believed to be sparse (Petersen and Crochet, 2013), with short-lived bursts (usually ) of high firing rates occurring over low rate spontaneous activity ( ). Additionally, it has been recently found that pyramidal cells at layer 2/3 of the mouse somatosensory cortex show short-term facilitation when targeting cells at layers 2/3 and 5 (Lefort and Petersen, 2017). The receptive field properties in the visual cortex are also consistent with the sparse code (Olshausen and Field, 1996). These characteristics suggest that the mechanism to discriminate between weak signals and background fluctuations may also be present in the neocortex. It is believed that such sparse representation at superficial cortical layers indicates strong stimulus selectivity (Petersen and Crochet, 2013), in which case the transient gain, provided by the target-dependent STP configuration of local pyramidal neurons, would be a suitable property for interlayer communication.

In the hippocampus, the Schaffer collaterals bringing signals from CA3 to CA1 operate under low basal firing rates with evoked bursts of high-frequency activity during short periods of time (Schultz and Rolls, 1999). The synapses from pyramidal cells in CA3 to pyramidal cells in CA1 are facilitatory and provide this pathway with extra gain control (Klyachko and Stevens, 2006). Simultaneously, Schaffer collaterals synapses to CA1 stratum radiatum interneurons show larger release probability than to pyramidal neurons (Sun et al., 2005). Therefore, it is likely that this STP-based stimulus/noise discrimination mechanism is also used to improve the transmission of sequential activity from CA3 to CA1.

As we have pointed above, STP configuration in the neocortex, hippocampus and cerebellum are consistent with the configuration that enables the neural networks to take advantage of sparse coding. However, it is important to notice that facilitatory excitatory inputs to other inhibitory cells also exist in the aforementioned circuits. These facilitatory inputs mostly target interneurons that form synapses on distal dendrites. The presence of facilitatory excitatory drive to these classes of inhibitory neurons is, however, unlikely to counteract the distribution-dependent transient gains, because they produce weaker, slower and persistent dendritic inhibition. Consistent with this idea, only parvalbumin-expressing neurons (that synapse on the soma), but not somatostatin-expressing neurons (that synapse on distal dendrites), modulate stimulus response gain (Wilson et al., 2012).

The initial release probability is the most distinguishable STP parameter between Schaffer collaterals synapses onto CA1 pyramidal cells versus CA1 interneurons (Sun et al., 2005). In line with that, our approach predicts that facilitatory mechanisms that steadily increase a low initial release probability during a fast sequence of spikes (low *U*) will have a greater impact on the optimal *OD* and gain amplitude than mechanisms for fast replenishment of resources (low *τ _{rec}*). However, the speed of recovery has been shown to be itself an activity-dependent feature (Fuhrmann et al., 2004; Crowley et al., 2007; Valera et al., 2012; Doussau et al., 2017) and this could in principle increase the relevance of

*τ*.

_{rec}The facilitatory or depressing nature of STP depends on the postsynaptic neuron type (Markram et al., 1998; Reyes et al., 1998; Rozov et al., 2001; Sun et al., 2005; Pelkey and McBain, 2007; Bao et al., 2010; Blackman et al., 2013; Larsen and Sjöström, 2015; Éltes et al., 2017). Target-dependent STP is a strong indication that such short living dynamics are relevant for specific types of information processing in the brain (Middleton et al., 2011; Naud and Sprekeler, 2018). Here, we predict that, when accompanied by specific arrangements of target-dependent STP found experimentally in different brain regions, disynaptic inhibition could further increase the gain of sparse over dense distributions and make it robust even at higher basal activity, when the gain at facilitatory excitation decreases substantially.

Disynaptic inhibition following excitation is a common motif throughout the brain, and different classes of inhibitory neurons are believed to serve distinct computations within their local circuits (Wilson et al., 2012; Jiang et al., 2015). Despite a wide diversity of inhibitory cell types, a classification of FF-I into two main types, perisomatic and dendritic targeting, seems to be coherent with findings throughout the central nervous system. A remarkable attribute of this configuration is the consistency of the short-term dynamics of excitatory synapses across local circuits: depressing to perisomatic and facilitating to dendritic interneurons (Sun et al., 2005; Bao et al., 2010; Blackman et al., 2013; Éltes et al., 2017).

Disynaptic inhibition has been implicated in controlling the precision of a postsynaptic neuron’s response to brief stimulation in the cerebellum (Mittmann et al., 2005; Ito, 2014) and hippocampus (Pouille and Scanziani, 2001). Additionally, the combination of disynaptic inhibition with target-dependent STP has been recently associated with the ability of networks to decode multiplexed neural signals in the cortex (Naud and Sprekeler, 2018). In line with these, our results show a bimodal profile of the readout neuron response to sparse or dense input code. We also demonstrate that, coexisting with the sustained gain during sparse code transmission, in a dense coding scenario, the system produces shorter periods (∼10 ms) of increased (decreased) spike probability right after stimulus onset (offset; Fig. 6*B*, gray line). This results from inhibitory conductances (GABA) which are slower than the excitatory conductances (AMPA). This very short period of firing rate modulation might work as an indication of a widespread basal rate change in the presynaptic population.

### Relationship with previous work

Historically, STP has been prominently explored as a frequency filter which renders an individual neuron as a low-pass filter (when synapses are depressing) or high pass filter (when synapses are facilitatory; Markram et al., 1998; Dittman et al., 2000; Abbott and Regehr, 2004). It has been suggested that under some conditions STD can also interact with subthreshold oscillation to modulate the gain of the neurons (Latorre et al., 2016). With STP the synaptic strength depends on recent history of the incoming spikes in a particular synapse. This automatically makes the downstream neurons more sensitive to transient fluctuations in input spike trains. In most of the previous work this specific property has been exploited for neural coding.

For instance, history dependence of STP means that the effect of serial correlations (that can be seen in the autocorrelogram of spike trains) and spike bursts in the presynaptic activity depends on whether the synapses express STF or STD. Synapses with STD reduce redundancy in the input spike train by reducing the PSPs of spikes that appear with a certain serial correlation or periodicity (Goldman et al., 2002). By contrast, when synapses express STF, they enhance the effect of serial correlations or spike bursts and the readout neuron can function as a burst detector (Lisman, 1997). In fact, both STF and STD can be combined to de-multiplex spike bursts from single spikes (Izhikevich et al., 2003; Middleton et al., 2011; Naud and Sprekeler, 2018). Thus, much emphasis has been put on understanding how STP can be used to extract information encoded in the pattern of spikes of a single input neuron.

Here, we extend this line of work and show how STP may affect the impact of a neuron ensemble on downstream neurons. Previous work has suggested that STP makes a neuron sensitive to transient rate changes. Given this property, when synapses show STD, input correlations can still modulate the neuron output for a wide range of firing rates (de la Rocha and Parga, 2005). Our work reveals a new consequence of the same effect as we show that STP renders the neurons in the brain with input distribution dependent gain, through which sparse-bursty codes could have stronger downstream impact than dense codes with same intensity. Furthermore, we investigate the relative importance of different STP parameters and baseline firing rates for these gains. This novel feature could be a highly valuable asset in low signal-to-noise ratio conditions. Moreover, our results also show how synapses can impose further constrains on the neural code.

### Experimental verification of model predictions

Experimentally, these results can be tested by measuring the distribution of evoked firing rates of the neurons and STP properties of the synapses in the same brain area. Recent technological advances in stimulation systems, allowing for submillisecond manipulation of single and multiple cells spike activity, might soon provide means for fine control of population spike codes in intact tissues (Shemesh et al., 2017). These, together with refined methods for single cell resolution imaging of entire populations (Xu et al., 2017; Weisenburger and Vaziri, 2018), may also allow for scrutinizing the extent of which the proposed synaptic mechanisms for distribution-dependent gain are present in neural networks. Our prediction about the role of background activity in determining the gain of the sparse or dense codes can be tested by changing the overall background activity using chemogenetic techniques.

### Limitations and possible extensions

Here, we made several simplifications and assumptions to reveal that STP of synapses has important consequences for neural coding. Relaxing each of these simplifications and assumptions may affect our conclusions in certain conditions and should be investigated in further studies. In the following we briefly discuss a few crucial simplifications and how they might affect our results.

Our analyses considered the presynaptic activity to be comprised of independent Poisson processes, that is, whenever we choose a set of *N _{ext}* presynaptic units to increase their firing rates, we choose them randomly. Because STP is a synapse specific property, input cross-correlation will not affect

*PRR*and

*Q*and therefore, the presynaptic gain. However, it is well known that input correlation can change the gain of a neuron (Kuhn et al., 2003; de la Rocha and Parga, 2005).

It is conceivable that in some conditions, input correlations can potentially neutralize the advantage of sparse codes over dense codes. The readout neuron fluctuations (and therefore, their output firing rate) are dependent on the input correlation. For the same amount of pairwise correlation, the size of fluctuations in the readout neurons is directly proportional to the number of signal-carrying units (*N _{ext}*). A larger

*N*(dense distribution) will elicit larger fluctuations than a smaller

_{ext}*N*(sparse distribution). This is because for a larger

_{ext}*N*more input spikes can occur together in the same time bin. Thus, input correlations may amplify the downstream impact of dense input distributions more than the sparse input distributions. The size of this effect of correlations depends on the number of inputs (

_{ext}*N*) and the amount of correlations (both pairwise and higher-order). However, because cortical activity is weakly correlated (Ecker et al., 2010) such an effect of correlation may not be enough to completely neutralize the advantage of sparse distribution over dense distribution.

_{ext}We also did not study the effect of spatial location of synapses in transferring the effect of sparse codes over dense codes to the readout neurons. There are at least two possibilities in which dendritic locations of the synapses may weaken the advantage of a sparse input distributions over a dense input distributions. First, when synaptic strength decreases as a function of distance from the soma: it is possible that in the sparse input distribution case, the synapses bringing the information are located far away from the soma while for dense input distribution at least some inputs will be closer to the soma. Therefore, even if on the presynaptic side, a sparse input distribution generates stronger outputs than a dense input distribution, its effect on the postsynaptic neurons may be weakened because of weaker synapses. However, even in this case, because of dendritic nonlinearities and Na^{+}/Ca2+ spikes (Larkum et al., 2009), distally located sparse distribution may still have a stronger response than proximally located dense input distributions. Second, the effect of synapses on certain dendrites is cancelled by strategically placed inhibitory synapses (Gidon and Segev, 2012). It is possible that a sparse distribution (because of fewer synapses) may get cancelled or weakened by strategically placed inhibition. The effect of such inhibition will indeed be weaker on dense input distributions as many more synapses will carry the input information. Thus, for sparse input distributions, their location on the neuron may be an important factor. A proper treatment of this question requires the knowledge of, e.g., neuron morphology, distribution of inhibition, and dendritic non-linearities, and should addressed in a separate study.

We also assumed that all the synaptic weights are sampled from the same Gaussian distribution as our goal was to consider a naive situation in which weights have not been “trained” for any specific task. Having different synaptic weight distributions may affect the value of the gains, especially when synaptic weights and input are associated (stimulus-specific tuning). Such different distributions may arise because of supervise/unsupervised learning. A systematic study of a network with stimulus-specific tuning of synaptic weights raises several pertinent questions and should be investigated in a separate study.

The transient enhancement or depression of synaptic efficacy by presynaptic mechanisms consists of many independent processes (Zucker and Regehr, 2002). The TM model is a tractable and intuitive way to account for these two phenomena of interest, but this parsimony comes at the cost of biophysical simplifications. For example, it assumes the space of available resources is a continuum (0 < *x *<* *1) as opposed to the known discrete nature of transmitter-carrying vesicles. However, we argue that when modeling a large number of simultaneously active synapses, the variable of interest (population *PRR*) can be approximated by a continuous variable. The nonuniform amount of transmitters per vesicle might further justify this assumption.

Detailed STP models that try to account for specific intracellular mechanisms (Dittman et al., 2000), and stochasticity of the release process (Sun et al., 2005; Kandaswamy et al., 2010) have been proposed in the literature. We argue that, with more complex models of STP, our results might change quantitatively but the qualitative outcome of our analysis would remain: that presynaptic short-term facilitation (depression) yields a substantial positive (negative) gain to sparse over dense population codes. Nevertheless, it would be interesting to see how the gain and optimal rate predictions may be shaped by more detailed models.

Our analyses do not account for use-dependent recovery time, changes in the readily releasable pool size (Kaeser and Regehr, 2017) or vesicles properties heterogeneity. The effects of postsynaptic receptor desensitization and neurotransmitter release inhibition by retrograde messengers (Brown et al., 2003) are likely to decrease the estimated gain by counteracting facilitation. Another interesting extension could be used to further investigate the effects of input STP heterogeneity at compartment-dependent input using multi-compartment neuron models (Vetter et al., 2001; Grillo et al., 2018).

If the same patterns of bursts tend to happen repeatedly (e.g., PFs in cerebellum during continuously repetitive movement), there might be an optimal interburst interval (*IBI ^{opt}*) for which, if bursts happen faster than

*IBI*, the signal would be compromised (because of slow vesicles recovery time) and if bursts happen separated by intervals longer then

^{opt}*IBI*no extra gain will happen. Experimental evidence points to the importance of resonance in the band oscillations ( IBI) for cortical-cerebellar drive (Gandolfi et al., 2013; Chen et al., 2016) and for hippocampus (Buzsáki, 2002). In these cases, the slower interaction between different pools of vesicles (Rizzoli and Betz, 2005) are likely to play a role in information transfer. Augmentation, a form of transient synaptic enhancement that can last for seconds, is also likely to play a role in these cases (Kandaswamy et al., 2010; Deng and Klyachko, 2011).

^{opt}## Acknowledgments

Acknowledgements: We thank Dr. Gilad Silberberg, Dr. Erik Fransén, Dr. Philippe Isope, and Martino Sindaci for helpful suggestions and feedback.

## Footnotes

The authors declare no competing financial interests.

This work was supported in part by the EU Erasmus Mundus Joint Doctorate Program EUROSPIN, The International Graduate Academy (IGA) of the Freiburg Research Services (L.T.), and the Swedish Research Council (Research Project Grant, StratNeuro, India-Sweden collaboration grants; A.K.).

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.