Skip to main content

Main menu

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT

User menu

Search

  • Advanced search
eNeuro

eNeuro

Advanced Search

 

  • HOME
  • CONTENT
    • Early Release
    • Featured
    • Current Issue
    • Issue Archive
    • Blog
    • Collections
    • Podcast
  • TOPICS
    • Cognition and Behavior
    • Development
    • Disorders of the Nervous System
    • History, Teaching and Public Awareness
    • Integrative Systems
    • Neuronal Excitability
    • Novel Tools and Methods
    • Sensory and Motor Systems
  • ALERTS
  • FOR AUTHORS
  • ABOUT
    • Overview
    • Editorial Board
    • For the Media
    • Privacy Policy
    • Contact Us
    • Feedback
  • SUBMIT
PreviousNext
Research ArticleOpen Source Tools and Methods, Novel Tools and Methods

GhostiPy: An Efficient Signal Processing and Spectral Analysis Toolbox for Large Data

Joshua P. Chu and Caleb T. Kemere
eNeuro 23 September 2021, 8 (6) ENEURO.0202-21.2021; DOI: https://doi.org/10.1523/ENEURO.0202-21.2021
Joshua P. Chu
Department of Electrical and Computer Engineering, Rice University, Houston, Texas 77251-1892
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
Caleb T. Kemere
Department of Electrical and Computer Engineering, Rice University, Houston, Texas 77251-1892
  • Find this author on Google Scholar
  • Find this author on PubMed
  • Search for this author on this site
  • ORCID record for Caleb T. Kemere
  • Article
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF
Loading

Abstract

Recent technological advances have enabled neural recordings consisting of hundreds to thousands of channels. As the pace of these developments continues to grow rapidly, it is imperative to have fast, flexible tools supporting the analysis of neural data gathered by such large-scale modalities. Here we introduce GhostiPy (general hub of spectral techniques in Python), a Python open source software toolbox implementing various signal processing and spectral analyses including optimal digital filters and time–frequency transforms. GhostiPy prioritizes performance and efficiency by using parallelized, blocked algorithms. As a result, it is able to outperform commercial software in both time and space complexity for high-channel count data and can handle out-of-core computation in a user-friendly manner. Overall, our software suite reduces frequently encountered bottlenecks in the experimental pipeline, and we believe this toolset will enhance both the portability and scalability of neural data analysis.

  • local field potential
  • oscillations
  • signal processing
  • spectral analysis

Significance Statement

Because of technological innovation, the size of neural recordings has increased dramatically, but downstream analysis code is often not optimized to handle such large scales of data efficiently. Here we have developed GhostiPy, an open source Python package prioritizing performance and efficiency for large data in the context of typical spectral analysis and signal processing algorithms. Users can control hardware resource consumption (e.g., system memory) by setting the level of parallelization and enabling out-of-core processing. Thus, algorithms can be run on a variety of hardware, from laptops to dedicated computer servers. Overall, GhostiPy improves experimental throughput by increasing the portability of analyses.

Introduction

Advancements in neural recording technologies have enabled the collection of large data in both space (high density/channel count) and time (continuous recordings). During subsequent analysis, the scale of the data induces certain challenges that may manifest as the following scenarios: (1) analysis code takes a long time to complete (high time complexity); and (2) code is unable to complete because of insufficient memory on the hardware (high spatial complexity). Moreover, the scientist may have difficulty finding existing tools that address both 1 and 2 and implement the desired analyses.

Although a potential remedy is to simply upgrade the hardware, it is not an acceptable solution for scientists desiring portability, an important component that improves reproducibility and replicability. In more portable systems, hardware resources may be limited (e.g., using a laptop at the airport). We thus took an alternate approach by efficiently implementing analyses that would trivially scale for different hardware configurations. Our solution is GhostiPy (general hub of spectral techniques in Python), a free and open source Python toolbox that attempts to optimize both time and space complexity in the context of spectral analyses. Methods include linear filtering, signal envelope extraction, and spectrogram estimation, according to best practices. GhostiPy is designed for general purpose usage; while well suited for high-density continuous neural data, it works with any arbitrary array-like data object.

In this article, we first describe the software design principles of GhostiPy to increase efficiency. We then elaborate on featured methods along with code samples illustrating the user friendliness of the software. Finally, we benchmark our software against a comparable implementation, and we discuss strategies for working under an out-of-core (when data cannot fit into system memory) processing context.

Materials and Methods

An overview of implemented methods can be found in Table 1. Excluding out-of-core support, it is possible to use multiple different packages (OverLordGold Dragon, https://github.com/OverLordGoldDragon/ssqueezepy/; Bokil et al., 2010; Oostenveld et al., 2011; Gramfort et al., 2013; Yegenoglu et al., 2015; Lee et al., 2019; Tadel et al., 2019; Virtanen et al., 2020) to achieve the same functionality. However, the mix-and-match approach can reduce user friendliness since application programming interfaces (APIs) differ across packages and dependency management is more difficult. We believe our unified package provides an attractive solution to this challenge. Table 2 documents the methods currently available in GhostiPy.

View this table:
  • View inline
  • View popup
Table 1

Features implemented by GhostiPy compared with existing software

View this table:
  • View inline
  • View popup
Table 2

Available methods in GhostiPy

Software design considerations

As previously noted, successful completion of analyses may be hampered by long computation times or lack of system memory. Specifically, algorithmic time and space complexity is a major determinant for the efficiency and performance of a software method. In general, it is difficult to optimize both simultaneously. For example, time complexity may be reduced by increasing hardware parallelization, at the expense of higher space complexity (memory requirements). While we sought to lower both kinds of complexity compared with existing solutions, we gave space complexity a higher priority. Stated concretely, slow computation time is primarily a nuisance, but failure to complete an analysis because of insufficient memory is catastrophic.

Our design decision to prioritize space complexity was particularly critical because it directly influenced which backend library we chose for the fast Fourier transform (FFT), an operation used in the majority of the GhostiPy methods. While investigating the different options, we saw that numpy currently uses the pocketfft backend (https://gitlab.mpcdf.mpg.de/mtr/pocketfft; Van Der Walt et al., 2011). When accelerated with the Intel MKL library, it can be slightly faster than FFTW (https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library/benchmarks.html). However, we have found FFTW (Frigo and Johnson, 1998, 2005) to be superior for memory management and better suited for FFTs of arbitrary length, including prime and odd numbers. An additional benefit of FFTW was its multithreaded capabilities (Fig. 1). We therefore selected FFTW as our FFT backend.

Figure 1.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 1.

GhostiPy uses fftw rather than numpy for its FFT backend. a, b, Note that when fftw is multithreaded, the computation time can be reduced (a) without an increase in memory use (b).

To lower space complexity, we used blocked algorithms, including overlap save convolution, which is not offered in any of the standard Python numerical computing libraries such as numpy or scipy (Van Der Walt et al., 2011; Virtanen et al., 2020). This approach enabled us to process very large data that could not fit in memory (also known as out-of-core processing). Throughout our code, we also used other strategies such as in-place operations.

To lower the time complexity, we used efficient lengths of FFTs wherever possible, and we leveraged modern computing hardware by parallelizing our algorithms. For example, a wavelet transform can be trivially parallelized since the transform for each scale is not dependent on other scales.

Finite impulse response filter design

GhostiPy provides classical signal processing capabilities such as filtering data, using the efficient overlap save convolution. Filtering data is a ubiquitous operation, but before this stage, the filter must itself be designed. While this step may appear somewhat trivial, it can make a significant difference, including the very existence of theta–gamma phase amplitude coupling (Canolty et al., 2006; Dvorak and Fenton, 2014).

Existing packages such as scipy and MNE offer a variety of finite impulse response (FIR) filter design methods (Gramfort et al., 2013; Virtanen et al., 2020). However, some methods suffer from the following issues. (1) Using the least-squares method, a solution may result in a filter with a magnitude response effectively of zero throughout. This situation is more common when designing filters with passband relatively low compared with the sampling rate. (2) Using the Remez exchange method, the algorithm may simply fail to converge. (3) Using the window method, the transition bands cannot be controlled exactly, and optimality cannot be defined, as is the case for the least-squares (L2 optimal) and Remez exchange (L1 optimal) methods.

Therefore, the GhostiPy filter design uses the method defined in the study by Burrus et al. (1992) for the following reasons: (1) it is simple to design, and the computational complexity is similar to that of a window method and can be implemented on embedded hardware if desired; (2) optimality can be defined, as it is optimal in the L2 sense; (3) transition bands can be defined exactly, and the steepness of the passband rolloff can be controlled by the spline power parameter; and (4) the filter impulse response can be defined analytically. Consequently, its computation does not suffer from the failure modes of the least-squares or Remez exchange methods, as those must solve systems of linear equations. In other words, the design process is reliable and stable.

This method designs a low-pass filter according to the following: h(n)=sin(ω0n)πn[sin(Δn/p)Δn/p]p (1) ω0=ω2 + ω12, Δ=ω2−ω1, (2)where ω1 and ω2 are radian frequencies defining the transition-band boundaries.

GhostiPy uses the low-pass filter defined in Equation 1 as a prototype to design more complicated filters. As a result, users can request filters with arbitrary magnitude response. An example is shown in Figure 2.

Figure 2.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 2.

FIR filter design. a, A theta-band filter designed for full bandwidth data. The specification of the transition bands allows for easy determination of critical frequencies. The −6 dB points are exactly the midpoints of the transition bands. b, Filters with arbitrary pass and stop bands may also be designed.

Multitaper method

Users often wish to perform a spectral decomposition on a signal of interest. This can be accomplished by using the multitaper method (Thomson, 1982; Percival and Walden, 1993). The technique is well suited to reduce the variance of a spectrum estimate, which is particularly useful when working with noisy neural data. The spectrum estimate is obtained as an average of multiple statistically independent spectrum estimators for a discrete signal, x[n], with sampling frequency fs, as follows: ŜWmt(k)=1L∑l=1LŜl,Wmt(k) (3) Ŝl,Wmt(k)=1fs∑n=0N−1vl,W[n]x[n]e−2πjkn/N. (4)

Given the length of data N and a smoothing half-bandwidth W, the tapers vl,W[n] are computed by solving for vectors that satisfy the energy and orthogonality properties, as follows: ∑n=0N−1vl,W[n]vl,W[n]=1 (5) ∑n=0N−1vl,W[n]vm,W[n]=0,l≠m. (6)

For the tapers, GhostiPy uses the discrete prolate spheroidal sequences (DPSSs), which satisfy Equations 5 and 6 and maximize the power in the band [– W,W] (Thomson, 1982). An example for computing the multitapered spectrum is shown in Figure 3.

Figure 3.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 3.

Multitapered spectra. Data are from a sharp wave ripple event, where energy occurs mainly between 100 and 250 Hz. a–d, Bandwidths are 10 Hz (a), 15 Hz (b), 20 Hz (c), and 25 Hz (d). Note in the code that the data-sampling rate is 1250 Hz, the FFT is parallelized across eight threads, and ripple_data are a 1D numpy array.

Continuous wavelet transform

Neuroscientists often use a continuous wavelet transform (CWT) to study transient oscillatory activity. The CWT itself is defined in the time domain by the following: W(a,b)=∫−∞∞1aψ * (t−ba) x (t)dt, (7)where ψ(…) is the mother wavelet function. The transform represents a two-dimensional decomposition in the scale (a) and time (b) planes. In the frequency domain, the CWT is given by the inverse Fourier transform of the following: W(a)=X(ω)Ψ * (aω), (8)for a given scale (a), where X and Ψ are the Fourier transforms of x and ψ, respectively.

Many mother wavelet functions have been investigated in the literature, but we have focused on the analytic wavelets, as they are found to be superior, particularly for estimating the phase (Olhede and Walden, 2002; Lilly and Gascard, 2006; Lilly and Olhede, 2009, 2012. We have implemented the analytic Morse, Morlet, and Bump wavelets, whose respective frequency domain definitions are as follows: Ψ(aω)=2e−(aω−ω0)2/2H(ω), (9) Ψ(aω)=2(eγβ)βγ(aω)βe−(aω)γH(ω) (10) Ψ(aω)=2e1−(1−(μ−σa)2)−11(μ−σ)/a,(μ+σ)/a, (11)where 1(μ−σ)/a,(μ+σ)/a is the indicator function for the interval (μ – σ) /a ≤ ω ≤ (μ + σ) /a and H(ω) is the Heaviside step function. In our implementation, we use Equation 8 to compute the CWT.

Note that in practice the timeseries x(t) is sampled, and the CWT is likewise sampled. Then Equation 8 becomes a pointwise complex multiplication of discrete Fourier transforms, where the discretized angular frequencies ωk are determined by the following: ωk=2πkNΔt, (12)where N is the number of data samples and Δt is the sampling interval.

A naive implementation of the wavelet transform (Eq. 8) calculates untruncated wavelets the same length as the input data. This is often inefficient because it is equivalent to convolving the data with a time-domain wavelet, mainly consisting of leading and trailing zeros. In our approach, we exploit the fact that wavelets are finite in time and frequency, and we use an overlap-save algorithm to compute the CWT purely in the frequency domain. Note that the latter point is particularly critical: because of the Gibbs phenomenon, using any time-domain representation of the wavelet may violate numerical analyticity for wavelet center frequencies near the Nyquist frequency. It is therefore necessary to use only the frequency domain representation of the wavelet. While we offer both traditional/naive and blockwise convolution implementations, the latter will give superior performance for longer-duration data. We believe that this is a valuable option for researchers and that this is the first tool that uses blockwise convolution to implement the CWT.

For electrophysiological data, a typical wavelet analysis will require computing Equation 8 for 50–500 scales. This is an obvious candidate for parallelization since the wavelet transform for each scale can be computed independently of the others. We use a backend powered by Dask to carry out the parallelization (Rocklin, 2015). Users can set the number of parallel computations to execute and thereby leverage the multicore capabilities offered by modern computing hardware.

Synchrosqueezing transform

One disadvantage of the wavelet transform is that its frequency resolution decreases as the temporal resolution increases. Strictly speaking, the CWT results in information contained in the (time, scale) plane, but a single frequency is typically assigned to each scale. Regardless, spectral smearing can be observed at higher frequencies/lower scales. However Daubechies (1996) and Thakur et al. (2013) showed the synchrosqueezing transform (SST) could mitigate this issue by transferring a CWT (time, scale) plane information to the (time, frequency) plane.

The synchrosqueezing transform proceeds as follows. For every scale a: compute the CWT W(a) using Equation 8, compute the following partial derivative: ∂bW(a)=jωX(ω)Ψ(aω), (13)

and compute the following phase transform: ωf(a)=12πℑ(∂bW(a)W(a)). (14)

The phase transform contains the real frequencies each point in the CWT matrix should be assigned to. In practice, the real frequency space is discretized, so the CWT points are assigned to frequency bins. Note that multiple CWT points at a given time coordinate, b, may map to the same frequency bin. In this situation, a given frequency bin is a simple additive accumulation of CWT points.

Note the similarity of the SST to the spectral reassignment algorithms in the studies by Gardner and Magnasco (2006) and Fitz and Fulop (2009). However, an important distinction is that the SST only operates along the scale dimension. In addition to preserving the temporal resolution of the CWT, this makes SST data easy to work with since uniform sampling can be maintained.

Overall, the spectrogram methods implemented by GhostiPy give an experimenter a more complete picture of the time-varying spectral content of neural data. Figure 4 illustrates this using the scipy standard spectrogram method along with the GhostiPy methods.

Figure 4.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 4.

Time–frequency decompositions. a–c, Users can leverage the scipy spectrogram (a) along with the methods of GhostiPy (b–d) for a richer understanding of their data. The synchrosqueezed transform in d gives the overall sharpest time and frequency resolution. Note in the code that data are a 1D numpy array, fs is the sampling rate, nperseg is the spectrogram window size in samples, noverlap is the number of samples overlapping in adjacent windows, and w is the bandwidth for the multitapered spectrogram.

Data availability

The code/software described in the article is freely available online at https://github.com/kemerelab/ghostipy/. Jupyter Notebook, which can be found at https://github.com/kemerelab/ghostipy/tree/master/examples/2021paper, was used to produce the figures. The code and notebooks are also available as Extended Data 1 and Extended Data 2. All results were obtained on an Intel Core i7-4790 desktop computer running the Ubuntu 16.04 operating system.

Extended Data 1

Ghostipy-0.2.0. Download Extended Data 1, ZIP file.

Extended Data 2

Example data analysis notebooks. Download Extended Data 2, ZIP file.

Results

Example analyses

An example spectrogram of local field potentials recorded in area CA1 of the rat hippocampus is depicted in Figure 5. Clearly apparent are the theta oscillation, theta-nested gamma oscillations, and a sharp wave ripple, which occurs after the animal has stopped moving.

Figure 5.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 5.

CWT spectrogram of LFP. Spectrogram of local field potential recordings from area CA1 of the hippocampus of a rat during a 5 min exploration (middle), with movement speed (top) and the raw electrophysiological signal (bottom). A number of features of the hippocampal rhythms can be noted in this example, including the pervasive theta oscillation (∼8 Hz), theta-nested gamma oscillations (∼60 Hz) during movement, and, toward the end, a sharp wave ripple (∼200 Hz). Morse wavelets (γ = 3, β = 10) were used, and frequencies were limited to [1, 250] Hz.

In addition, GhostiPy can be used as an intermediate for a multistep analysis. Figure 6 replicates the speed spectrogram analysis in the study by Kemere et al. (2013) for an animal exploring a novel and a familiar environment (Mattias et al., 2015). Figure 7 implements the clustering of theta cycles (Zhang et al., 2019) with Morse wavelets. We have also included notebooks to replicate these example analyses.

Figure 6.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 6.

Speed spectrogram. The multitaper spectrogram (bandwidth, 5 Hz) was computed with GhostiPy for nonoverlapping 0.5 s time bins and then z scored for each frequency. Each time bin in the spectrogram was assigned to 1 of 21 logarithmically spaced speed bins spanning 0.125–64 cm/s. a, b, The mean PSD for each speed bin is shown for an animal exploring a novel environment (a) and a familiar environment (b).

Figure 7.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 7.

Theta cycle clustering. The Morse wavelet (γ = 3, β = 20) CWT was computed with GhostiPy using 81 frequencies and was subsequently divided into multiple windows, where one window corresponded to one theta cycle. Each CWT sample in a window was assigned to 1 of 20 phase bins according to the instantaneous theta phase at that particular sample. The result was frequency–phase power profiles, which were then clustered into four clusters. Shown is the mean frequency–phase power profile for each cluster. As in Zhang et al., 2019, these use the hc-11 dataset from CRCNS.org (Grossmark and Buzaki, 2016, Grossmark et al., 2016).

Performance and complexity

The calculation of the CWT is computationally intensive and consequently a good method to benchmark performance. Of the software packages listed in Table 1, only MATLAB offered an equivalent solution. It was thus chosen as the reference to compare our implementation against. Figure 8 shows that our implementation results in faster computation times and better memory usage.

Figure 8.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 8.

CWT implementation performance. a, b, Our implementation of the Morse continuous wavelet transform outperforms MATLAB in both time (a) and space complexity (b). Note that MATLAB was unable to complete execution for the full range of the test parameter (data length) because of out-of-memory exceptions. The test machine was an Intel Core i7-4790 (eight hyperthreads) equipped with 32 gigabytes of RAM. In both cases, the CWT was parallelized over available CPU threads.

It is not entirely clear what accounts for the higher jaggedness in the MATLAB curves from Figure 8. A possible explanation is that the FFT computation is less efficient for an odd-length transform, but the magnitude of the spikes in the curve is surprising given that the FFT backend of MATLAB also uses FFTW. Regardless, we have demonstrated that our implementation is able to achieve lower time and space complexity. When using the functionality offered by GhostiPy, the following three primary scenarios arise with regard to the sizes of data involved in the processing: (1) both the input and output data fit into core memory; (2) the input fits into core memory, but the output does not; and (3) neither the input nor the output fit.

In all of the previous examples, we have restricted ourselves to case 1. However, with the ever-increasing sizes of data, the other cases will inevitably be encountered. Case 2 may arise when attempting to generate spectrograms. As the input is a single channel, memory constraints are rarely an issue. For example, even a 10 h local field potential (LFP) recording sampled at 1 kHz and saved as 64 bit floating point values will require <300 mebibytes (MiB) of memory. However, the size of a wavelet spectrogram computed from these data will be directly proportional to the number of scales/frequencies. For a typical range of 1–350 Hz at 10 voices per octave, this amounts to a space requirement of 85 times that of the input data. Given that this can well exceed the core memory size of a machine, the GhostiPy CWT routine can also accept a preallocated output array that is stored on disk (Fig. 9).

Figure 9.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 9.

CWT out-of-core. Example code when the output array is too large for main memory. The CWT method is first executed as a dry run to compute the necessary array sizes. Here data are a 1D numpy array, and cwt_data are an HDF5 array created to store the results to disk.

Case 3 may arise when a user wishes to filter many channels of full bandwidth data. One case used is a 1 h recording for a 256-channel probe sampled at 30 kHz and stored as a 2 byte signed integer type; already this requires 51 gibibytes. Our strategy is similar to case 2, where an output array is allocated and stored on disk. As for the input, it is read in chunks, and the size of these can be chosen to lower memory usage, although potentially at a cost to computation time. The code in Figure 10 illustrates an example.

Figure 10.
  • Download figure
  • Open in new tab
  • Download powerpoint
Figure 10.

Filtering out-of-core. Filtering data from a large array stored on disk and likewise storing the output on disk. Similar to the CWT out-of-core features, the method is called once as a dry run to compute array sizes, which the user can then pass in to store the result. The filtering method also allows to correct for the delay of the filter and to downsample without storing any intermediate results. Although the example uses the h5py library, any object that behaves like an array can be used. Here ds is the downsampling factor, K is the filter group delay, N is the number of samples, infile[’chdata’] is a (n_channels, n_samples) array, and outdata is an HDF5 array.

Several points can be made about the scheme in Figure 10. Our method allows for downsampling during the convolution, which can reduce the number of stages in a computational scheme. Given full bandwidth data, a traditional strategy to filter to the theta band would look like the following: (1) apply an antialiasing filter; (2) downsample to obtain LFP; (3) store the LFP to disk; (4) apply a theta-band filter; (5) downsample this output; and (5) save the result.

Using the GhostiPy method, it is not necessary to generate the intermediate LFP. To our knowledge, we do not know of other software that allows out-of-core filtering and downsampling in a single function call. The result is a simultaneous reduction in time and space complexity, by storing only the downsampled result and by filtering only once. Filtering to the theta band is now simplified to the following steps: (1) apply a theta filter to the full bandwidth data; (2) downsample the result; and (3) save the result to disk.

Discussion

We have described the key features of GhostiPy and given examples of its ease of use to perform computations efficiently. Users can thus conduct exploratory spectral analyses quickly across a range of parameters while reducing their concerns for running out of memory, especially since out-of-core computation is supported for many of the methods. Thus, we believe GhostiPy is well suited to handle the ever-increasing size of experimental data.

In the future, we plan to improve GhostiPy with various enhancements. For example, currently the methods are designed to offer the user a lot of low-level control over areas such as multithreading, and to work with raw array types. However, users may desire a higher-level API. For this reason, we believe it would be a worthwhile endeavor to incorporate our work into frameworks such as NWB (Teeters et al., 2015); this would also facilitate more widespread adoption. There are also other analyses we could implement, including the adaptive multitaper method (Percival and Walden, 1993) and other time–frequency reassignment techniques similar to the synchrosqueezing transform (Daubechies et al., 2016).

Our primary contribution is improving the ease and speed at which data analysis can be conducted by developing user-friendly software implementing efficient algorithms well suited for large data sizes. This point is specifically demonstrated by our ability to outperform existing solutions in space and time complexity, and to run computations even in out-of-core memory conditions, which enables machines with 1–10 s of of GBs of memory to process data on the scale of 10–100 s of GBs and higher. In these ways, we have increased the accessibility of neural data analysis by enabling it to be run on hardware such as a laptop computer, a scenario that often was not previously possible.

Finally, the software we developed has a much larger potential impact than the scope described in this article. Although many of the examples given in this article were specific to extracellular rodent hippocampal data, the functionality we implemented is intentionally generic and applicable to many fields. As an example, our code can easily be adapted for use in real-time processing, whether running on embedded hardware or on a laptop computer in a clinical EEG setting. Given the functionality already developed and the full scope of our work, we are optimistic that GhostiPy can help accelerate modern scientific progress.

Acknowledgments

Acknowledgements: We thank Shayok Dutta (Rice University), Andres Grosmark and Gyorgi Buzsaki (NYU, New York, NY), Loren Frank and Mattias Karlsson (UCSF), and the CRCNS.org data archive for sharing data used in example analyses.

Footnotes

  • The authors declare no competing financial interests.

  • The development of GhostiPy was supported by the National Science Foundation (Grant NSF CBET1351692) and the National Institute of Neurological Diseases and Strokes (Grant R01-NS-115233).

This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license, which permits unrestricted use, distribution and reproduction in any medium provided that the original work is properly attributed.

References

  1. ↵
    Bokil H, Andrews P, Kulkarni JE, Mehta S, Mitra PP (2010) Chronux: a platform for analyzing neural signals. J Neurosci Methods 192:146–151. doi:10.1016/j.jneumeth.2010.06.020 pmid:20637804
    OpenUrlCrossRefPubMed
  2. ↵
    Burrus CS, Soewito AW, Gopinath RA (1992) Least squared error FIR filter design with transition bands. IEEE Trans Signal Process 40:1327–1340. doi:10.1109/78.139239
    OpenUrlCrossRef
  3. ↵
    Canolty RT, Edwards E, Dalal SS, Soltani M, Nagarajan SS, Kirsch HE, Berger MS, Barbaro NM, Knight RT (2006) High gamma power is phase-locked to theta oscillations in human neocortex. Science 313:1626–1628. doi:10.1126/science.1128115 pmid:16973878
    OpenUrlAbstract/FREE Full Text
  4. ↵
    Daubechies I (1996) A nonlinear squeezing of the continuous wavelet transform based on auditory nerve models. In: Wavelets in medicine and biology (Aldroubi A, Unser MA, eds), pp 527–546. Boca Raton, FL: CRC.
  5. ↵
    Daubechies I, Wang Y, Wu H-t (2016) ConceFT: concentration of frequency and time via a multitapered synchrosqueezed transform. Phil Trans R Soc A 374:20150193. doi:10.1098/rsta.2015.0193
    OpenUrlCrossRefPubMed
  6. ↵
    Dvorak D, Fenton AA (2014) Toward a proper estimation of phase–amplitude coupling in neural oscillations. J Neurosci Methods 225:42–56. doi:10.1016/j.jneumeth.2014.01.002 pmid:24447842
    OpenUrlCrossRefPubMed
  7. ↵
    Fitz KR, Fulop SA (2009) A unified theory of time-frequency reassignment. arXiv:0903.3080.
  8. ↵
    Frigo M, Johnson SG (1998) FFTW: an adaptive software architecture for the FFT. In: Proceedings of the 1998 IEEE International conference on acoustics, speech and signal processing: ICASSP'98: May 12-15, 1998, Washington state convention and Trade Center, Seattle, WA (USA), Vol 3, pp 1381–1384. Piscataway, NJ: IEEE.
    OpenUrl
  9. ↵
    Frigo M, Johnson SG (2005) The design and implementation of FFTW3. Proc IEEE 93:216–231.
    OpenUrl
  10. ↵
    Gardner TJ, Magnasco MO (2006) Sparse time-frequency representations. Proc Natl Acad Sci U S A 103:6094–6099. doi:10.1073/pnas.0601707103 pmid:16601097
    OpenUrlAbstract/FREE Full Text
  11. ↵
    Gramfort A, Luessi M, Larson E, Engemann DA, Strohmeier D, Brodbeck C, Goj R, Jas M, Brooks T, Parkkonen L, Hämäläinen M (2013) MEG and EEG data analysis with MNE-Python. Front Neurosci 7:267. pmid:24431986
    OpenUrlCrossRefPubMed
  12. ↵
    Grosmark AD, Buzsáki G (2016) Diversity in neural firing dynamics supports both rigid and learned hippocampal sequences. Science 351:1440–1443.
    OpenUrlAbstract/FREE Full Text
  13. ↵
    Grosmark AD, Long J, Buzsáki G (2016) Recordings from hippocampal area CA1, PRE, during and POST novel spatial learning. CRCNS.org. http://dx.doi.org/10.6080/K0862DC5.
  14. ↵
    Kemere C, Carr MF, Karlsson MP, Frank LM (2013) Rapid and continuous modulation of hippocampal network state during exploration of new places. PLoS One 8:e73114. doi:10.1371/journal.pone.0073114 pmid:24023818
    OpenUrlCrossRefPubMed
  15. ↵
    Lee G, Gommers R, Waselewski F, Wohlfahrt K, O’Leary A (2019) PyWavelets: a Python package for wavelet analysis. J Open Source Softw 4:1237. doi:10.21105/joss.01237
    OpenUrlCrossRef
  16. ↵
    Lilly JM, Gascard J-C (2006) Wavelet ridge diagnosis of time-varying elliptical signals with application to an oceanic eddy. Nonlin Processes Geophys 13:467–483. doi:10.5194/npg-13-467-2006
    OpenUrlCrossRef
  17. ↵
    Lilly JM, Olhede SC (2009) Higher-order properties of analytic wavelets. IEEE Trans Signal Process 57:146–160. doi:10.1109/TSP.2008.2007607
    OpenUrlCrossRef
  18. ↵
    Lilly JM, Olhede SC (2012) Generalized Morse wavelets as a superfamily of analytic wavelets. IEEE Trans Signal Process 60:6036–6041. doi:10.1109/TSP.2012.2210890
    OpenUrlCrossRef
  19. ↵
    Mattias K, Margaret C, Frank LM (2015) Simultaneous extracellular recordings from hippocampal areas CA1 and CA3 (or MEC and CA1) from rats performing an alternation task in two W-shapped tracks that are geometrically identically but visually distinct. CRCNS.org. Available at http://dx.doi.org/10.6080/K0NK3BZJ.
  20. ↵
    Olhede SC, Walden AT (2002) Generalized morse wavelets. IEEE Trans Signal Process 50:2661–2670. doi:10.1109/TSP.2002.804066
    OpenUrlCrossRef
  21. ↵
    Oostenveld R, Fries P, Maris E, Schoffelen J-M (2011) FieldTrip: open source software for advanced analysis of MEG, EEG, and invasive electrophysiological data. Comput Intell Neurosci 2011:156869. pmid:21253357
    OpenUrlCrossRefPubMed
  22. ↵
    Percival DB, Walden AT (1993) Spectral analysis for physical applications. Cambridge, UK: Cambridge UP.
  23. ↵
    Rocklin M (2015) Dask: parallel computation with blocked algorithms and task scheduling. In: Proceedings of the 14th Python in science conference (SCIPY 2015), Vol 126 (Huff K, Bergstra J, eds). Austin, TX: SciPy Developers. [10.25080/Majora-7b98e3ed-013]
  24. ↵
    Tadel F, Bock E, Niso G, Mosher JC, Cousineau M, Pantazis D, Leahy RM, Baillet S (2019) MEG/EEG group analysis with brainstorm. Front Neurosci 13:76. doi:10.3389/fnins.2019.00076 pmid:30804744
    OpenUrlCrossRefPubMed
  25. ↵
    Teeters JL, Godfrey K, Young R, Dang C, Friedsam C, Wark B, Asari H, Peron S, Li N, Peyrache A, Denisov G, Siegle JH, Olsen SR, Martin C, Chun M, Tripathy S, Blanche TJ, Harris K, Buzsáki G, Koch C, et al. (2015) Neurodata without borders: creating a common data format for neurophysiology. Neuron 88:629–634. doi:10.1016/j.neuron.2015.10.025 pmid:26590340
    OpenUrlCrossRefPubMed
  26. ↵
    Thakur G, Brevdo E, Fučkar NS, Wu H-T (2013) The synchrosqueezing algorithm for time-varying spectral analysis: robustness properties and new paleoclimate applications. Signal Processing 93:1079–1094. doi:10.1016/j.sigpro.2012.11.029
    OpenUrlCrossRef
  27. ↵
    Thomson DJ (1982) Spectrum estimation and harmonic analysis. Proc IEEE 70:1055–1096. doi:10.1109/PROC.1982.12433
    OpenUrlCrossRefPubMed
  28. ↵
    Van Der Walt S, Colbert SC, Varoquaux G (2011) The NumPy array: a structure for efficient numerical computation. Comput Sci Eng 13:22–30. doi:10.1109/MCSE.2011.37
    OpenUrlCrossRef
  29. ↵
    Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, Burovski E, Peterson P, Weckesser W, Bright J, van der Walt SJ, Brett M, Wilson J, Millman KJ, Mayorov N, Nelson ARJ, Jones E, Kern R, Larson E, Carey CJ, et al. (2020) SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods 17:261–272. doi:10.1038/s41592-019-0686-2 pmid:32015543
    OpenUrlCrossRefPubMed
  30. ↵
    Yegenoglu A, Holstein D, Phan LD, Denker M, Davison A, Grun S (2015) Elephant–open-source tool for the analysis of electrophysiological data sets. Tech Rep Computational and Systems Neuroscience. Available at https://juser.fz-juelich.de/record/255984.
  31. ↵
    Zhang L, Lee J, Rozell C, Singer AC (2019) Sub-second dynamics of theta-gamma coupling in hippocampal CA1. eLife 8:e44320. doi:10.7554/eLife.44320
    OpenUrlCrossRefPubMed

Synthesis

Reviewing Editor: Michaël Zugaro, CNRS, Collège de France, Inserm

Decisions are customarily a result of the Reviewing Editor and the peer reviewers coming together and discussing their recommendations until a consensus is reached. When revisions are invited, a fact-based synthesis statement explaining their decision and outlining what is needed to prepare a revision will be listed below. The following reviewer(s) agreed to reveal their identity: Adrien Peyrache, Francesco Battaglia.

The manuscript presents GhostiPy, a new free software Python package for the analysis of neural data. This provides a simple, unified interface to easily access numerous algorithms commonly used in spectral analysis. It can run on multiple platforms and hardware, and can compute e.g. spectrograms without prior downsampling. Some algorithms have been optimized compared to standard (numpy) implementations. Overall, this is a welcome addition to the field. Although the Reviewers have not raised any major concerns, they have indicated several ways in which the manuscript could be significantly improved. In particular, the sections should be reordered to reflect the order in which analyses are actually performed (e.g. filtering, then spectograms), and more practical details should be provided (e.g. description of input parameters).

Both Reviewers have pointed out that the mathematical presentation of the algorithms (equations, explanation) was unnecessary given that these are standard methods. More relevant to the specific toolbox presented here would be to offer more details about gains in resource usage (computation time, memory load) and richer examples featuring more use cases. In particular, it would be interesting to know more about scalability: for example, one of the reasons FFTW was favored is that it can be multi-threaded, but no example was provided to illustrate how this library performs on a multi-core machine, or whether it is possible (or there are plans) to scale up use of Ghostipy on distributed (cluster) systems.

Another useful recommendation is to include a list of the API methods/functions of the toolbox, so that it may be seen at a glance what is already possible with the package.

Minor notes follow.

Equations 2 and 3 define an orthonormal basis, however in multi-taper analysis (including in Ghostipy) DPSS (or Slepian) sequences are used, which maximizes power in a restricted frequency band. That condition should be added to the description.

L10: Please clarify that portability is important for reproducibility and replicability.

Table 1: It misleading to list “Morse wavelet” as the default wavelet function. For example, Fieldtrip supports Morlet waveforms.

Eq. 1-4: This study describes a software package, not a mathematical method. There is therefore no need for these equations (at least in the main text). This is true also for the other equations in the main text.

In contrast, the following pseudocode showing how to obtain spectrum contains parameters that are not explained (e.g. “n_fft_threads”), and inputs should be clarified (e.g. “asa”)

Eq. 5-10 are redundant (one can assume readers know the details of CWT) unless their presentation relates to the implementation of CWT in the software. These should be removed, or it should be explained if the implementation of CWT was improved in this toolbox.

The goal of Figure 2 is also unclear. Here, spectrograms from different toolboxes are “compared” (i.e. just displayed) which is not the focus of the paper. The comparison should be between the implementation of SST in this software and others, for example MATLAB. How does this software provide an advantage to users to start using it for computing SST?

Figure 2 gain: x-axis should be time, y-axis Frequency and color bar “normalized PSD", unless the study presents spectrograms in a non-traditional manner. Panels a-d are not described in the legend.

L132-133: Quoting other articles is unusual in a methodological paper, and judgements such as “Theta-gamma coupling was thus a serendipitous discovery.” (L139) are not expected in this kind of study.

Figure 4: The study should report the CWT parameters, the length of the example data, the execution time, comparison to other software packages.

L194: It is unclear why the study focuses only CWT for benchmark. Furthermore, it is not clear if parallelization or the GPU computing capabilities of the CWT in MATLAB was used.

Author Response

Dear Reviewers,

Thank you for your time! We have made numerous changes in response to your suggestions that we think improve the manuscript and will assist it in helping the readers. Please find below, a point-by-point description of the changes that we’ve made.

In particular, the sections should be reordered to reflect the order in which analyses are actually performed (e.g. filtering, then spectograms), and more practical details should be provided (e.g. description of input parameters).

We’ve reordered the sections to be filtering, multitaper spectrum estimation, and finally spectrograms. We have attempted to provide more information about the input parameters in the text as needed.

Both Reviewers have pointed out that the mathematical presentation of the algorithms (equations, explanation) was unnecessary given that these are standard methods. More relevant to the specific toolbox presented here would be to offer more details about gains in resource usage (computation time, memory load) and richer examples featuring more use cases.

We have added a new figure (Figure 1 in the revised manuscript) detailing performance of FFTW vs numpy’s fft. Figures 6 and 7 demonstrate the use of ghostipy in multi-level analyses previously seen in the literature. Note that the code for these examples are given in the repository - we are happy to include this code in the paper, but we would need guidance on the appropriate way to do this.

In particular, it would be interesting to know more about scalability: for example, one of the reasons FFTW was favored is that it can be multi-threaded, but no example was provided to illustrate how this library performs on a multi-core machine, or whether it is possible (or there are plans) to scale up use of Ghostipy on distributed (cluster) systems.

For scalability, the primary focus of ghostipy was to facilitate portability (i.e. so that analyses could still be run on laptops and other smaller scale hardware). However we recognize that researchers may have access to HPC resources, so we have opened up a GitHub issue to implement ghostipy’s algorithms on distributed systems in the future (https://github.com/kemerelab/ghostipy/issues/2). If we find that this is becomes a widely-needed feature, we will work on adding it.

Another useful recommendation is to include a list of the API methods/functions of the toolbox, so that it may be seen at a glance what is already possible with the package.

We have added Table 2 which lists available methods and the corresponding descriptions.

Minor notes:

Equations 2 and 3 define an orthonormal basis, however in multi-taper analysis (including in Ghostipy) DPSS (or Slepian) sequences are used, which maximizes power in a restricted frequency band. That condition should be added to the description.

We have added to the manuscript: “For the tapers, ghostipy uses the discrete prolate spheroidal sequences (DPSS), which satisfy Eqns. 3 and 4 and maximize the power in the band [-W, W].”

L10: Please clarify that portability is important for reproducibility and replicability.

This sentence has been revised to say “Although a potential remedy is to simply upgrade the hardware, it is not an acceptable solution for scientists desiring portability, an important component that improves reproducibility and replicability.”

Table 1: It misleading to list “Morse wavelet” as the default wavelet function. For example, Fieldtrip supports Morlet waveforms.

Table 1 has been revised as “CWT” rather than “Morse CWT.” Entries are likewise updated.

Eq. 1-4: This study describes a software package, not a mathematical method. There is therefore no need for these equations (at least in the main text). This is true also for the other equations in the main text.

We believe the equations provide value to users who want to understand what ghostipy implements (e.g. the kind of filters it designs, how it computes the multitapered spectrum (since different packages do this differently)).

In contrast, the following pseudocode showing how to obtain spectrum contains parameters that are not explained (e.g. “n_fft_threads”), and inputs should be clarified (e.g. “asa”)

An additional sentence to the Fig. 4 caption was added: “Note in the code that the data sampling rate is 1250 Hz, the FFT is parallelized across 8 threads, and ripple_data is a 1d numpy array.”

Eq. 5-10 are redundant (one can assume readers know the details of CWT) unless their presentation relates to the implementation of CWT in the software. These should be removed, or it should be explained if the implementation of CWT was improved in this toolbox.

Text has been added to this section to clarify that we improved the implementation of the CWT by writing a blockwise convolution algorithm in addition to the naive implementation. Using the blockwise algorithm can improve performance and efficiency.

The goal of Figure 2 is also unclear. Here, spectrograms from different toolboxes are “compared” (i.e. just displayed) which is not the focus of the paper. The comparison should be between the implementation of SST in this software and others, for example MATLAB. How does this software provide an advantage to users to start using it for computing SST?

Our objective for Figure 2 (now Figure 4 in the manuscript) was to demonstrate that ghostipy computes various time-frequency decompositions that are not provided by standard Python packages (see Table 1). By combining existing functionality with ghostipy’s, users can have a more complete understanding of their data.

Figure 2 gain: x-axis should be time, y-axis Frequency and color bar “normalized PSD", unless the study presents spectrograms in a non-traditional manner. Panels a-d are not described in the legend.

Figure 2 has been revised accordingly (now Figure 4 in the manuscript)

L132-133: Quoting other articles is unusual in a methodological paper, and judgements such as “Theta-gamma coupling was thus a serendipitous discovery.” (L139) are not expected in this kind of study.

These lines have been removed. The sentence motivating filter design now reads “While this step may appear somewhat trivial, it can make a significant difference, including the very existence of theta-gamma phase amplitude coupling (PAC)"

Figure 4: The study should report the CWT parameters, the length of the example data, the execution time, comparison to other software packages.

The caption has been revised to describe the length of example data, Morse wavelet parameters, and restriction of analyzed frequencies. Since the figure was meant to provide an example of a use case, comparison to other software packages is not described.

L194: It is unclear why the study focuses only CWT for benchmark. Furthermore, it is not clear if parallelization or the GPU computing capabilities of the CWT in MATLAB was used.

We have specifically focused on the CWT to demonstrate the simultaneous reduction in space and time complexity compared to MATLAB. The caption for Figure 5 now includes the statement “In both cases, the CWT was parallelized over available CPU threads.”

We appreciate your reconsideration of the manuscript.

Sincerely,

The authors

Back to top

In this issue

eneuro: 8 (6)
eNeuro
Vol. 8, Issue 6
November/December 2021
  • Table of Contents
  • Index by author
  • Ed Board (PDF)
Email

Thank you for sharing this eNeuro article.

NOTE: We request your email address only to inform the recipient that it was you who recommended this article, and that it is not junk mail. We do not retain these email addresses.

Enter multiple addresses on separate lines or separate them with commas.
GhostiPy: An Efficient Signal Processing and Spectral Analysis Toolbox for Large Data
(Your Name) has forwarded a page to you from eNeuro
(Your Name) thought you would be interested in this article in eNeuro.
CAPTCHA
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.
Print
View Full Page PDF
Citation Tools
GhostiPy: An Efficient Signal Processing and Spectral Analysis Toolbox for Large Data
Joshua P. Chu, Caleb T. Kemere
eNeuro 23 September 2021, 8 (6) ENEURO.0202-21.2021; DOI: 10.1523/ENEURO.0202-21.2021

Citation Manager Formats

  • BibTeX
  • Bookends
  • EasyBib
  • EndNote (tagged)
  • EndNote 8 (xml)
  • Medlars
  • Mendeley
  • Papers
  • RefWorks Tagged
  • Ref Manager
  • RIS
  • Zotero
Respond to this article
Share
GhostiPy: An Efficient Signal Processing and Spectral Analysis Toolbox for Large Data
Joshua P. Chu, Caleb T. Kemere
eNeuro 23 September 2021, 8 (6) ENEURO.0202-21.2021; DOI: 10.1523/ENEURO.0202-21.2021
Reddit logo Twitter logo Facebook logo Mendeley logo
  • Tweet Widget
  • Facebook Like
  • Google Plus One

Jump to section

  • Article
    • Abstract
    • Significance Statement
    • Introduction
    • Materials and Methods
    • Results
    • Discussion
    • Acknowledgments
    • Footnotes
    • References
    • Synthesis
    • Author Response
  • Figures & Data
  • Info & Metrics
  • eLetters
  • PDF

Keywords

  • local field potential
  • oscillations
  • signal processing
  • spectral analysis

Responses to this article

Respond to this article

Jump to comment:

No eLetters have been published for this article.

Related Articles

Cited By...

More in this TOC Section

Open Source Tools and Methods

  • Synthetic Data Resource and Benchmarks for Time Cell Analysis and Detection Algorithms
  • BrainWAVE: A Flexible Method for Noninvasive Stimulation of Brain Rhythms across Species
  • Development of an Open Face Home Cage Running Wheel for Testing Activity-Based Anorexia and Other Applications
Show more Open Source Tools and Methods

Novel Tools and Methods

  • Synthetic Data Resource and Benchmarks for Time Cell Analysis and Detection Algorithms
  • BrainWAVE: A Flexible Method for Noninvasive Stimulation of Brain Rhythms across Species
  • Development of an Open Face Home Cage Running Wheel for Testing Activity-Based Anorexia and Other Applications
Show more Novel Tools and Methods

Subjects

  • Novel Tools and Methods
  • Open Source Tools and Methods

  • Home
  • Alerts
  • Visit Society for Neuroscience on Facebook
  • Follow Society for Neuroscience on Twitter
  • Follow Society for Neuroscience on LinkedIn
  • Visit Society for Neuroscience on Youtube
  • Follow our RSS feeds

Content

  • Early Release
  • Current Issue
  • Latest Articles
  • Issue Archive
  • Blog
  • Browse by Topic

Information

  • For Authors
  • For the Media

About

  • About the Journal
  • Editorial Board
  • Privacy Policy
  • Contact
  • Feedback
(eNeuro logo)
(SfN logo)

Copyright © 2023 by the Society for Neuroscience.
eNeuro eISSN: 2373-2822

The ideas and opinions expressed in eNeuro do not necessarily reflect those of SfN or the eNeuro Editorial Board. Publication of an advertisement or other product mention in eNeuro should not be construed as an endorsement of the manufacturer’s claims. SfN does not assume any responsibility for any injury and/or damage to persons or property arising from or related to any use of any material contained in eNeuro.