A new ranking method for principal components analysis and its application to face image analysis

https://doi.org/10.1016/j.imavis.2009.11.005Get rights and content

Abstract

In this work, we investigate a new ranking method for principal component analysis (PCA). Instead of sorting the principal components in decreasing order of the corresponding eigenvalues, we propose the idea of using the discriminant weights given by separating hyperplanes to select among the principal components the most discriminant ones. The method is not restricted to any particular probability density function of the sample groups because it can be based on either a parametric or non-parametric separating hyperplane approach. In addition, the number of meaningful discriminant directions is not limited to the number of groups, providing additional information to understand group differences extracted from high-dimensional problems. To evaluate the discriminant principal components, separation tasks have been performed using face images and three different databases. Our experimental results have shown that the principal components selected by the separating hyperplanes allow robust reconstruction and interpretation of the data, as well as higher recognition rates using less linear features in situations where the differences between the sample groups are subtle and consequently most difficult for the standard and state-of-the-art PCA selection methods.

Introduction

Principal component analysis, or simply PCA, is one of the most successful approaches to the problem of creating a low dimensional data representation and interpretation of face images. Since the pioneering work of Sirovich and Kirby [12], published more than 20 years ago, several subsequent works have projected face images on a PCA feature space to not only reduce the dimensionality of the original samples for further classification and analysis but also to interpret and reconstruct the principal components described by all the training images. However, since PCA explains the covariance structure of all the data its most expressive components [13], that is, the first principal components with the largest eigenvalues, do not necessarily represent important discriminant directions to separate sample groups.

A common practice to identify the important linear directions for separating sample groups in general is to use Fisher’s linear discriminant analysis (LDA) [4], [5] rather than PCA. However, when the dimension of the feature space is greater than the number of groups, LDA can find number of groups-1 meaningful discriminant directions [2], [18]. Thus, for instance, when there are two sample groups to separate, LDA can identify only one meaningful discriminant direction and consequently additional information important to characterize the sample group differences may be lost [2], [18]. Moreover, LDA identifies the optimum discriminant directions only when the probability density function of each sample group can be fairly approximated by Gaussian distributions with a common covariance matrix, but it fails if the class densities are more general [2], [18], [17].

Recently, Loog et al. [8] introduced a weighted pairwise variant of the well-known multi-class Fisher criterion to improve the classification accuracy of the standard LDA. This approach does not require an iterative optimization method to define the best linear discriminant directions for classification, but since it is based on a eigenvalue spectral solution analogously to the standard LDA it has the same limitation of finding number of groups-1 meaningful discriminant directions. Zhu and Hastie [18], [17] developed a method for finding discriminant directions without assuming any particular parametric model to the probability density function of each sample group. When the sample group covariance matrices of each class do not have the same orientation [17], an iterative method is needed to optimize their criterion leading to discriminant directions that are not necessarily represented by the principal components of all the training samples. In [21], Zhu and Martinez proposed a specific method for selecting principal components to improve LDA classification. Their criterion is based on the spectral decomposition of the LDA characteristic equation, selecting the principal components that most correlate to the between class scatter matrix. Zhu and Martinez’s approach is restricted to the LDA spectral solution but it has been successfully employed to solve the small sample size problems in discriminant analysis, outperforming consistently a number of discriminant analysis methods using several different datasets [10], [19], [20].

In this work, we propose a new ranking method for the principal components given by the group differences extracted by separating hyperplanes. Thus, we do not deal with the problem of finding general discriminant directions that are not principal components. We do not deal either with the problem of selecting principal components to improve a specific classifier accuracy, such as LDA. Rather, we propose the idea of using the discriminant weights given by separating hyperplanes to select among the principal components the most discriminant ones. Simply stated, our proposal is to rank the principal components by how well they align with separating hyperplane directions, determined by the corresponding discriminant weights. Such a set of principal components ranked in decreasing order of the discriminant weights is called here as the discriminant principal components. We have focused here on the LDA and SVM (support vectors machine) [1], [16] methods but any other separating hyperplane could be used. To our knowledge this is the first principal component analysis based on a general separating hyperplane ranking strategy.

In order to evaluate the discriminant principal components, the following two-group separation tasks have been performed using frontal face images and three different datasets: (a) gender experiments (female versus male samples) and (b) facial expression experiments (non-smiling versus smiling, anger versus disgust, happiness versus sadness and fear versus surprise samples). Since the face image analysis problem involves small training sets and a large number of features and does not require a specific knowledge to interpret the differences between groups, it seems an attractive application to investigate the effectiveness of the discriminant principal components and to compare the discriminant direction found by the corresponding LDA and SVM approaches. The experimental results carried out in this work show that the principal components selected by the separating hyperplanes allow robust reconstruction and interpretation of the data, as well as higher recognition rates using less linear features in situations where the differences between the sample groups are subtle and consequently most difficult for the standard PCA and Zhu and Martinez [21] state-of-the-art methods.

The remainder of this work is organized as follows. Next, in Section 2, we review briefly PCA and its limitation to extract discriminant information from sample groups. Then, in Section 3, we describe the LDA and SVM approaches used in this work and their main principles on finding optimum hyperplanes to separate samples. Following, Section 4 presents our idea of selecting the principal components based on the discriminant weights given by separating hyperplanes. Then, Section 5 describes all the experiments carried out in this study, as well as the three face databases used to evaluate the effectiveness of the discriminant principal components. In Section 6, we discuss some important points that have emerged from this work. Finally, in Section 7, we conclude the paper, summarizing its main contributions and describing possible future work.

Section snippets

Principal components analysis (PCA)

PCA is a feature extraction procedure concerned with explaining the covariance structure of a set of variables through a small number of linear combinations of these variables.

Let an N×n training set matrix X be composed of N input samples (or face images) with n variables (or pixels). This means that each column of matrix X represents the values of a particular variable observed all over the N samples. Let this data matrix X have covariance matrix S with respectively P and Λ eigenvector and

Separating hyperplanes

In this section, we describe the main principles of the statistical separating hyperplanes approaches used in this work and their most relevant differences for extracting discriminating information from data. The reader is referred to [4], [5] and [1], [16] for more details on the LDA and SVM methods, respectively.

The primary purpose of LDA is to separate samples of distinct groups by maximizing their between class separability while minimizing their within-class variability. Its main objective

Discriminant principal components

We approach the problem of selecting the discriminant principal components as a problem of estimating a linear classifier, assuming that there are only two-classes to separate. Hence, we have used training examples and their corresponding labels to construct both LDA and SVM separating hyperplanes. This comparison allows us to pursue views of the principal components assuming, respectively, a parametric and non-parametric discriminant approach.

To compose the PCA transformation matrix, that is P

Experimental results

We have divided our experimental results in two parts. Firstly, we have carried out some face and facial expression image analyzes to understand and visualize the main differences between the most expressive (or standard) and discriminant principal components. Then, in the second part, we have investigated the effectiveness of the discriminant principal components on recognizing samples.

The following two-group separation tasks have been performed using frontal face images: (a) Gender

Discussion

In the following paragraphs, we discuss some points that have emerged from our study on face image analysis that deserve further considerations.

Conclusion

This paper proposed a new ranking method for the principal components, here called discriminant principal components, given by the group differences extracted by separating hyperplanes. The facial expressions experiments indicate that the principal components selected by the separating MLDA and SVM hyperplanes allow higher recognition rates using less linear features than traditional (standard PCA) and state-of-the-art (Zhu and Martinez) methods, in situations where the differences between the

Acknowledgements

We thank all the reviewers for the very constructive comments, which helped us a lot to improve this work. Particularly, we are very grateful to the careful and insightful changes suggested by one of the reviewers, who contributed important mathematical equations to clarify this paper. Also, we thank Dr. Paulo Sergio Silva Rodrigues for providing the first version of the SVM code used in this work, which is based on the quadratic programming solution created by Alex J. Smola. In addition, the

References (21)

  • C. Davatzikos

    Why voxel-based morphometric analysis should be used with great caution when characterizing group differences

    NeuroImage

    (2004)
  • Christopher J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining and Knowledge Discovery

    (1998)
  • R.D. Cook et al.

    Dimension reduction and visualization in discriminant analysis (with discussion)

    Australian and New Zealand Journal of Statistics

    (2001)
  • P.A. Devijver et al.

    Pattern Classification: A Statistical Approach

    (1982)
  • K. Fukunaga

    Introduction to Statistical Pattern Recognition

    (1990)
  • T. Hastie et al.

    The Elements of Statistical Learning

    (2001)
  • R.A. Johnson et al.

    Applied Multivariate Statistical Analysis

    (1998)
  • Marco Loog et al.

    Multiclass linear dimension reduction by weighted pairwise fisher criteria

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2001)
  • M.J. Lyons et al.

    Automatic classification of single facial images

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (1999)
  • A.M. Martinez et al.

    Where are linear feature extraction methods applicable?

    IEEE Transactions on Pattern Analysis and Machine Intelligence

    (2005)
There are more references available in the full text version of this article.

Cited by (0)

View full text