Automatic sleep scoring: A search for an optimal combination of measures

https://doi.org/10.1016/j.artmed.2011.06.004Get rights and content

Abstract

Objective

The objective of this study is to find the best set of characteristics of polysomnographic signals for the automatic classification of sleep stages.

Methods

A selection was made from 74 measures, including linear spectral measures, interdependency measures, and nonlinear measures of complexity that were computed for the all-night polysomnographic recordings of 20 healthy subjects. The adopted multidimensional analysis involved quadratic discriminant analysis, forward selection procedure, and selection by the best subset procedure. Two situations were considered: the use of four polysomnographic signals (EEG, EMG, EOG, and ECG) and the use of the EEG alone.

Results

For the given database, the best automatic sleep classifier achieved approximately an 81% agreement with the hypnograms of experts. The classifier was based on the next 14 features of polysomnographic signals: the ratio of powers in the beta and delta frequency range (EEG, channel C3), the fractal exponent (EMG), the variance (EOG), the absolute power in the sigma 1 band (EEG, C3), the relative power in the delta 2 band (EEG, O2), theta/gamma (EEG, C3), theta/alpha (EEG, O1), sigma/gamma (EEG, C4), the coherence in the delta 1 band (EEG, O1–O2), the entropy (EMG), the absolute theta 2 (EEG, Fp1), theta/alpha (EEG, Fp1), the sigma 2 coherence (EEG, O1–C3), and the zero-crossing rate (ECG); however, even with only four features, we could perform sleep scoring with a 74% accuracy, which is comparable to the inter-rater agreement between two independent specialists.

Conclusions

We have shown that 4–14 carefully selected polysomnographic features were sufficient for successful sleep scoring. The efficiency of the corresponding automatic classifiers was verified and conclusively demonstrated on all-night recordings from healthy adults.

Introduction

In our previous paper, we produced a systematic overview of the capacity of individual measures to differentiate among various sleep stages [1]. A huge number of traditional and novel measures were computed for polysomnographic recordings of 20 healthy subjects. Some of the characteristics are largely unknown in the field of sleep analysis (e.g., fractal exponent, dimension, and detrended fluctuation analysis). In summary, all-night evolutions of 825 characteristics (74 measures for various channels and channel combinations) were analyzed and compared with visual scoring by experts (hypnograms). To identify the measures that had the best decision-making ability, discriminant analysis was applied using quadratic classifiers for the one-dimensional case. As a result, several promising candidates for the study of sleep onset and sleep stage alternation during the night were identified.

From a large assembly of measures, we first mention the most successful measure: the ratio of powers in the beta and delta frequency range of the EEG. This ratio, which has a mean classification error of 42.5%, was the best single-performing measure for discrimination among all five sleep stages. The success of the beta/delta ratio reflects the well-known fact that during the deepening of non-REM sleep, the proportion of slower waves (especially delta waves) increases, whereas the powers of faster waves (especially beta and gamma) gradually decrease; however, the aforementioned error level demonstrates that it is impossible to satisfactorily separate the five sleep stages by one single measure. To computerize sleep scoring, the simultaneous use of several carefully selected characteristics is necessary. A search for the optimal set of measures for automatic sleep classification is described in this paper.

For monitoring the development of physiological indicators during sleep, the so-called polysomnography technique is used in clinical and basic sleep research. For sleep staging, the following three reference signals are usually recorded: the electroencephalogram (EEG), electrooculogram (EOG), and electromyogram (EMG). Regarding the electroencephalogram, the amplitudes of the measured waves on the surface of the scalp typically go as high as 100 μV, whereas the frequencies of the waves predominantly range from 0.5 to 100 Hz. With respect to brain analysis, the frequency range of the EEG is divided into five basic types of waves: delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–12 Hz), beta (12–30 Hz), and gamma (>30 Hz). The wave patterns and spectral characteristics of the reference signals markedly change between wakefulness and different levels of sleep. The differences make it possible to build a system for the visual classification of sleep stages, which is known as the rules of Rechtschaffen and Kales (RK) [2]. This system, which was extended in 1968, has been used since then; however, following an increasing knowledge of the sleep process, there is a growing need within the sleep research community to revise the scoring rules. As such, the sleep staging standards were revised following an initiative of the American Academy of Sleep Medicine [3]. Therein, some of the issues that are related to the scoring of sleep itself have been clarified, and, moreover, rules have been recommended for pediatric and geriatric scoring, and scoring issues in relation to respiratory and cardiac events, movements, and arousals during sleep have been addressed. Even though the introduction of a new sleep scoring system is an important step forward, the associated modifications have not yet been standardized as part of routine clinical practice outside of the United States. For that reason, efforts to build an automatic sleep classifier, which include this study, are usually built upon available hypnograms scored according to standard RK rules. However, although not investigated here, a concurrent study of the continual evolution of the most successful individual measures might eventually contribute to a more detailed understanding of the sleep process.

The RK scoring of sleep, which is usually accomplished by well-trained personnel, consists of classifying all 20–30 s pieces of an all-night recording into one of the five stages of vigilance. The main stages are wakefulness (W), REM sleep and non-REM sleep (NREM). NREM sleep is further divided into four stages from the lightest sleep stage, Stage 1 (S1), to the deepest, Stage 4 (S4). Stages 3 and 4 are referred to as slow-wave sleep (SWS).

The RK scoring criteria for these particular stages can be briefly summarized as follows:

  • The wake stage is characterized by a low voltage (10–30 μV), a mixed frequency EEG, a substantial alpha activity in the EEG, and a relatively high tonic EMG.

  • During stage S1, there is a low voltage and mixed frequency EEG, wherein the highest amplitude in the 2–7 Hz range. Alpha activity may be present but does not take more than 50% of an epoch. Vertex sharp waves may occur, and their amplitudes can be as high as 200 μV. In S1 after W, slow eye movements can be present. The EMG level is lower in this stage in comparison to W.

  • S2 is characterized by sleep spindles and K-complexes, in addition to relatively low voltage mixed-frequency background activity and an absence of slow waves. Sleep spindles are bursts of brain waves at frequencies ranging from 12 to 16 Hz. A K-complex is a sharp negative wave followed by a slower positive wave.

  • SWS is detected when more than 20% of the epochs of the EEG record contain delta waves (2 Hz or slower with amplitudes in excess of 75 μV). Sleep spindles and K-complexes may also be present.

  • REM sleep, which is similar to S1, exhibits low-voltage and mixed frequencies in the EEG. A sawtooth wave pattern is often present. In this stage, the EMG reaches the lowest level and episodic rapid eye movements occur.

Visual scoring that is based on the rules of Rechtschaffen and Kales requires a subjective judgment of signals, which may lead to unreliable results. In a study that involved eight European sleep laboratories, the overall level of agreement in the scoring of the five sleep stages was only 76.8% [4]. Other comparative publications have mentioned inter-rater reliabilities of about 70–80% and intra-rater reliabilities of about 90% [5], [6]. These studies have agreed that the lowest consensus is found for S1 sleep, whereas the reliability for the other stages was acceptable.

Following an expansion of computational facilities, numerous attempts have been made to automate sleep classification. For a systematic review of studies that are related to computer-assisted sleep recording and analysis, see [7]. These systems are usually based on extracting certain features from the EEG, EMG, and EOG, followed by ranking the 20–30 s fragments of sleep into the five stages. The intended goal of these systems is to achieve results that are comparable to those of visual scoring experts. For example, [8] describes an automatic classification system that is based on one central EEG channel, two EOG channels, and one chin EMG channel. This classification system is based on the decision rules for visual scoring and includes a structured quality control procedure by a human expert.

The methods of computerized sleep scoring usually imply thresholding the spectral power of the frequency bands and conventional linear classification, such as in linear discriminant analysis. For instance, the automated pattern recognition system presented in [9] used five patterns: slow-delta and theta wave predominance in the background EEG activity, the presence of sleep spindles in the EEG, the presence of rapid eye movements in the EOG, and the muscle tone in the EMG. Results on a test set of healthy infants have shown an overall agreement of 87.7% between the automated system and the human expert.

Besides traditional approaches, some more recent studies utilize methods from nonlinear dynamical systems, artificial neural networks, and waveform detection by various pattern recognition algorithms [10], [11], [12], [13].

In our previous study, we went through a large number of measures and individually considered their potential to detect falling asleep and specific sleep stages [1]. The resultant set of 74 characteristics involved relevant simple measures in the time domain, distribution characteristics, linear spectral measures, measures of complexity, and interdependency measures. These were computed and compared with one another using the same data set that contained the EEG, EMG, EOG, and ECG recordings of 20 healthy subjects. Our tests involved classification of the data into five classes (waking and four sleep stages) and 10 classification tasks to distinguish between two specific sleep stages. Regarding the assortment of the stages into five classes, only nine measures out of 74 achieved an error of less than 50%. Fig. 1 exhibits an example of the overnight hypnogram performance of the visual scorers and the best single-performing discrimination measure among all of the stages, which was the ratio of the EEG spectral powers in the beta and delta ranges. At first sight, the trace of the beta/delta ratio resembles the hypnogram; however, even this measure leads to mistakes in about 42% of the cases, and the bottom graph of Fig. 1 clearly demonstrates the unreliability of the corresponding single-measure-based sleep classifier. Therefore, at the end of our previous paper [1], we claimed that combinations of several sleep stage classification measures with optimal precisions would follow. This goal is the aim of this study.

Section snippets

Data

Data that contained all-night polysomnographic records were provided by Georg Dorffner from The Siesta Group Schlafanalyse GmbH, Vienna, Austria (http://www.thesiestagroup.com/index.php). The recordings were obtained from 20 healthy subjects, which included 10 men and 10 women. The subjects’ ages ranged from 23 to 82 years, with a mean of 50 years and a standard deviation of 21.5 years.

For each subject, 10 physiological signals were analyzed: six EEG channels (Fp1–M2, C3–M2, O1–M2, Fp2–M1,

Results of the forward selection procedure

The outcome of the FSP is summarized in Table 1, wherein the selected measures, p-values of the significance of change between two steps, mean and standard deviations of classification errors, and mean errors of specific sleep stages are listed. The mean error of classification decreased from 42.5%, which was achieved by the best single-performing variable (the power ratio beta/delta in the C3 derivation), to 19.3%, which was achieved by a combination of 14 features. The first three positions

Discussion

In a pilot study that compared the spectral and nonlinear measures of EEG signals during sleep, Fell et al. concluded that combinations of spectral and nonlinear measures yielded better overall discriminations of sleep stages in comparison to spectral measures alone [20]. In their study, stepwise discriminant analysis led to the following order of measures: spectral entropy, largest Lyapunov exponent, entropy, correlation dimension, and spectral edge; however, Fell et al. computed only a few

Conclusions

The primary contribution of this study is that it provides directions for the creation of an automatic sleep classifier. The proposed methodology enables us to exclusively extract knowledge from available sleep data. We have already shown that four carefully selected polysomnographic features are capable of sleep scoring with remarkable accuracy. No prior decision rules are built into the classifier. The few resultant measures derived from 825 traditional and novel characteristics (74 measures,

Acknowledgements

We are grateful to Georg Dorffner for providing the data. This research has been supported by the Slovak Grant Agency for Science VEGA (grant no. 2/0019/10).

References (21)

There are more references available in the full text version of this article.

Cited by (0)

View full text