A collaborative framework for 3D alignment and classification of heterogeneous subvolumes in cryo-electron tomography

https://doi.org/10.1016/j.jsb.2012.10.010Get rights and content

Abstract

The limitation of using low electron doses in non-destructive cryo-electron tomography of biological specimens can be partially offset via averaging of aligned and structurally homogeneous subsets present in tomograms. This type of sub-volume averaging is especially challenging when multiple species are present. Here, we tackle the problem of conformational separation and alignment with a “collaborative” approach designed to reduce the effect of the “curse of dimensionality” encountered in standard pair-wise comparisons. Our new approach is based on using the nuclear norm as a collaborative similarity measure for alignment of sub-volumes, and by exploiting the presence of symmetry early in the processing. We provide a strict validation of this method by analyzing mixtures of intact simian immunodeficiency viruses SIV mac239 and SIV CP-MAC. Electron microscopic images of these two virus preparations are indistinguishable except for subtle differences in conformation of the envelope glycoproteins displayed on the surface of each virus particle. By using the nuclear norm-based, collaborative alignment method presented here, we demonstrate that the genetic identity of each virus particle present in the mixture can be assigned based solely on the structural information derived from single envelope glycoproteins displayed on the virus surface.

Introduction

Cryo-electron tomography is a widely used 3D imaging technique for the analysis and visualization of macromolecular complexes in their native cellular context (Milne and Subramaniam, 2009). Because cryo-electron tomography is carried out at low electron doses to minimize radiation damage during data collection, the signal-to-noise ratios in the images is typically very low, making it challenging to interpret structural features in tomograms. In order to obtain improved density maps of macromolecular complexes, state-of-the-art averaging methods in the literature rely on a proper alignment and averaging of many subvolumes, each containing the feature of interest (Frank, 2006, Bartesaghi et al., 2008, Wu et al., 2009, Zhu et al., 2006, Bartesaghi and Subramaniam, 2009). Thus, for the analysis of subvolumes that contain a discrete or continuous spread of conformations and variable stoichiometry of the components, it is critical to be able both to align and discriminate among different (heterogeneous) noisy subvolumes prior to averaging. The alignment and classification of cryo-electron tomographic data is further complicated by the presence of the so-called missing-wedge – typically composed of 30% or more of all frequencies in a wedge-shaped region in the Fourier domain that are not sampled due to geometrical limitations of the range of angle that can be sampled by the specimen holder in the microscope (McIntosh et al., 2005). The missing-wedge results in anisotropic distortion and lower spatial resolution in a direction parallel to that of the incident beam (Foerster et al., 2008). Unless properly treated, these missing-wedge effects may arbitrarily bias the alignment and classification leading to erroneous results.

One solution to eliminate the detrimental effect of the missing-wedge in subvolume averaging is to constrain the similarity measure between two tomograms only to those frequencies where the signal is sampled in both tomograms (introduced first in (Frangakis et al., 2002) and adopted by many others (Schmid et al., 2006, Foerster et al., 2008, Liu et al., 2008, Xu et al., 2012). This already provides a significant improvement in the alignment between two tomograms. However, it still lacks consistency for tomogram classification/separation. This follows from the fact that different frequency-overlap regions may carry different information and signal intensity values. In order to construct a similarity measure that is invariant to these signal intensity differences, one needs to properly equalize/normalize them. Unfortunately, such normalization cannot be attained by the constrained-correlation, since it is applied on subtomograms that contain noise with unknown and frequency-dependent signal-to-noise ratios (SNR; such normalization is also challenging due to the presence of the missing wedge itself). This may result in significantly different correlation values even between pairs of equal structures (as a function of their frequency-overlap dependent SNR). As a result, the conformational separation/classification is prone to be biased by the missing wedge locations of tomograms. In this work we circumvent this frequency-overlap inconsistency by imposing symmetry which, effectively, helps to complete the missing data needed for the conformational separation.

A variety of alignment and classification approaches have been recently proposed, based typically on the assumption that all subvolumes are somehow pre-aligned to their respective reference frames (Heumann et al., 2011), or adopting very powerful iterative Joint Alignment-Classification (JAC) techniques widely used also in single particle electron microscopy (Frank, 2006, Wu et al., 2009, Foerster et al., 2008, Winkler et al., 2009; and many others). The main idea of these types of approaches can be briefly formulated as an expectation–maximization type of problem where, given a data distribution, the alignment is refined with respect to cluster centroids, followed by refinement of the clusters using the updated alignment. This is the paradigm in the standard K-means for example.

However, all of these methods are limited by the special challenges of dealing with low SNR images that are modulated by the missing-wedge of data collection. One key limitation is a phenomenon referred to as the “curse of dimensionality” which captures the idea that the concept of distance, needed both for alignment and classification in such methods, becomes less accurate when increasing the number of noisy data dimensions. Methods such as principal component analysis can partially alleviate the effects of the curse of dimensionality, but may also arbitrarily discard information that can be critical for alignment and classification, since there is in general no basis to presume that information critical to describing individual structures lies in the low dimensional subspace that captures most of the energy of the aggregate dataset. This is a significant problem in cryo-electron tomography, since there are as many low-SNR dimensions as the number of voxels in the subvolumes, which are of the order of thousands. For a single distribution and p norms in high dimensions (p(x)=(m=1dxmp)1/p, where xRd and xm are entries of x) the proportional distance between farthest-points and closest-points vanishes (Beyer et al., 1999). This concentration effect of this distance measure is known to be one aspect of this curse of dimensionality, which occurs in a broad range of data distributions and distance measures (Donoho, 2000, Houle et al., 2010). Therefore, extra care has to be taken when dealing with pair-wise distances that are commonly applied for aligning subvolumes to class centroids, as well as in other blocks of JAC techniques.

Despite the general success of the joint alignment and classification methods currently used for aligning and distinguishing distinct sub-populations in tomographic volumes (Bartesaghi and Subramaniam, 2009, Frank et al., 2012, Liu et al., 2008, White et al., 2010, Foerster et al., 2008, Yu and Frangakis, 2011), the accuracy of alignment remains a major bottleneck, especially as the data gets more complex and heterogeneous, including the presence of moderately populated classes. Averaging in the absence of accurate alignment can, and frequently does lower resolution of the final 3D maps.

In this manuscript we propose a new collaborative alignment method to address the above-mentioned challenges. We propose to replace classical similarity measures based on pair-wise distances between two particles, or a particle and a class average, by a one-to-many collaborative similarity function measured between a particle and a group of particles. This approach was inspired by an observation that a matrix composed of aligned particles has a lower complexity (lower rank) as compared to a matrix composed of unaligned particles. Therefore, one can align particles by minimizing their corresponding matrix complexity. The proposed collaborative scheme allows to harnesses contributions of all particles collaboratively for the alignment of every individual particle, as opposed to current state-of-the art approaches that are based on pair-wise comparisons.

Our new method is validated with the problem of separation and reconstruction of SIV Env complexes in a mixture of viruses expressing either the closed conformation alone (SIV MAC239) or the open conformation alone (SIV CP-MAC). This task is particularly difficult since the mixture includes the presence of a very moderately populated class (with an approximate occupancy ratio of 1:10), a challenge that has not been addressed before. In our proposed pipeline we apply the nuclear norm as a collaborative similarity measure for alignment, which is the convex surrogate of the rank of the matrix composed of all available heterogeneous sub-tomograms, thereby retaining all data details essential for the alignment which may be lost when resorting to the use of dimensionality reduction or class averages. We also properly impose 3-fold symmetry early in the alignment procedure, which allows us to complete missing wedge information and, thus, avoid errors in the classification stage of the algorithm.

Section snippets

Strategy of using nuclear norm as a collaborative reference frame

The concept of alignment, as often considered in the literature, implies the selection of some canonical reference frame and a similarity measure that determines optimal alignment parameters for each sampled sub-tomogram. As discussed before, the concept of pairwise distances as used in other approaches (Bartesaghi et al., 2008, Foerster et al., 2008, Winkler et al., 2009, Stoelken et al., 2011, Yu and Frangakis, 2011, Heumann et al., 2011) is problematic due to the curse of dimensionality. A

Nuclear-norm based alignment

We now introduce an algorithm for nuclear-norm based alignment (NNA), which will be used to estimate all the transformation parameters in the proposed joint alignment and classification scheme. Initially, all spikes have random orientations and translations corresponding to p = 6 rigid body transformation parameters. In the Algorithm 1 below, we present a greedy procedure (efficiently implemented in GPU) that iteratively updates one transformation parameter λ (x-axis translation for example),

Conclusion

We have presented a comprehensive platform for carrying out subvolume alignment and classification in cryo-electron tomography via collaborative alignment of the component subvolumes in a way that maximizes the accuracy of alignment, and performs over strategies that combine the alignment and classification steps, in particular for heterogeneous data. We show that it can be successfully used for separating closely related, yet distinct conformations of viral envelope glycoprotein spikes present

Acknowledgments

This work was jointly supported by funds from the National Library of Medicine and the Center for Cancer Research at the National Cancer Institute, NIH, Bethesda, MD. We thank Steven Fellini, Susan Chacko and colleagues for support with our use of the high-performance computational capabilities of the Biowulf Linux cluster at NIH, Bethesda, MD (http://biowulf.nih.gov).

References (34)

  • M. Xu et al.

    High-throughput subtomogram alignment and classification by Fourier space constrained fast volumetric matching

    J. Struct. Biol.

    (2012)
  • Beyer, K.S., Goldstein, J., Ramakrishnan, R., Shaft, U. 1999. When is ”nearest neighbor” meaningful? In: Int. Conf. on...
  • A.M. Bruckstein et al.

    From sparse solutions of systems of equations to sparse modeling of signals and images

    SIAM Rev.

    (2009)
  • J.R. Bunch et al.

    Rank-one modification of the symmetric eigenproblem

    Numer. Math.

    (1978)
  • E.J. Candes et al.

    Exact matrix completion via convex optimization

    Found. Comput. Math.

    (2009)
  • E.J. Candes et al.

    An introduction to compressive sampling

    IEEE Signal Process. Mag.

    (2008)
  • E.J. Candes et al.

    Robust principal component analysis?

    J. ACM

    (2011)
  • Cited by (46)

    • Double structure scaled simplex representation for multi-view subspace clustering

      2022, Neural Networks
      Citation Excerpt :

      However, most of the existing MVC methods do not consider this problem. The DSSSR algorithm has certain difficulties in theoretically analysing its convergence (Kuybeda et al., 2013), but experimental results show that this algorithm has a very stable convergence performance. For specific details, please see Section 4.4.

    View all citing articles on Scopus
    View full text