Elsevier

Methods

Volume 31, Issue 4, December 2003, Pages 265-273
Methods

Normalization of cDNA microarray data

https://doi.org/10.1016/S1046-2023(03)00155-5Get rights and content

Abstract

Normalization means to adjust microarray data for effects which arise from variation in the technology rather than from biological differences between the RNA samples or between the printed probes. This paper describes normalization methods based on the fact that dye balance typically varies with spot intensity and with spatial position on the array. Print-tip loess normalization provides a well-tested general purpose normalization method which has given good results on a wide range of arrays. The method may be refined by using quality weights for individual spots. The method is best combined with diagnostic plots of the data which display the spatial and intensity trends. When diagnostic plots show that biases still remain in the data after normalization, further normalization steps such as plate-order normalization or scale-normalization between the arrays may be undertaken. Composite normalization may be used when control spots are available which are known to be not differentially expressed. Variations on loess normalization include global loess normalization and two-dimensional normalization. Detailed commands are given to implement the normalization techniques using freely available software.

Introduction

In this paper, we suppose that an experiment has been conducted using a series of two-color cDNA microarrays. Each microarray has been hybridized with RNA from two sources labeled with different fluors. The two color channels will be referred to by convention as red and green. We suppose that the arrays have been scanned to produce images and that these images have been processed further by an image analysis program to produce measured red and green foreground and background intensities for each spot on each array. Before the gene expression profiles of the RNA samples can be analyzed and interpreted, the red and green intensities must be normalized relative to one another so that the red/green ratios are as far as possible an unbiased representation of relative expression.

The purpose of normalization is to adjust for effects which arise from variation in the microarray technology rather than from biological differences between the RNA samples or between the printed probes. Imbalances between the red and green dyes may arise from differences between the labeling efficiencies or scanning properties of the two fluors complicated perhaps by the use of different scanner settings. If the imbalance is more complicated than a simple scaling of one channel relative to the other, as it usually will be, then the dye bias is a function of intensity and normalization will need to be intensity dependent. The dye-bias will also generally vary with spatial position on the slide. Positions on a slide may differ because of differences between the print-tips on the array printer, variation over the course of the print-run, non-uniformity in the hybridization, or from artifacts on the surface of the array which affect one color more than the other. Finally, differences between arrays may arise from differences in print quality, from differences in ambient conditions when the plates were processed or simply from changes in the scanner settings. Therefore, normalization between as well as within arrays will need to be considered.

Write R and G for the background-corrected red and green intensities for each spot. Normalization is usually applied to the log-ratios of expression, which will be written M=log2R−log2G. The log-intensity of each spot will be written A=(log2R+log2G)/2, a measure of the overall brightness of the spot. (The letter M is a mnemonic for minus while A is a mnemonic for add.) It is convenient to use base-2 logarithms for M and A so that M is in units of 2-fold change and A is in units of 2-fold increase in brightness. On this scale, M=0 represents equal expression, M=1 represents a 2-fold change between the RNA samples, M=2 represents a 4-fold change, and so on.

Any negative values for R or G will yield missing values for M and A and the corresponding spots will be excluded from subsequent analysis including normalization. The frequency of negative values depends very much on the image analysis program and the background estimation method used. SPOT [1], for example, using a “morph” background gives very few negative intensities while other programs such as GenePix [2] using a “median” background may often result in 30% or more negative values. The loss of information which results from omitting such spots from the analysis is usually not great because spots with negative values for either R or G are usually too faint to show good evidence of differential expression. In any case, the relative merits of the different background correction methods are beyond the scope of this paper.

The plan of this paper is as follows. Section 2 describes diagnostic plots to visualize intensity and spatial trends. Section 3 describes the basic normalization method, print-tip loess normalization, designed to adjust for intensity and spatial trends. Section 4 describes composite loess normalization in which use is made of control spots known to be not differentially expressed. Section 5 considers normalization for other trends, in particular, correcting for print-order effects. Section 6 describes scale normalization between arrays. Section 7 describes the use of spot quality weights and Section 8 gives detailed commands to implement the normalization techniques using freely available software.

Section snippets

Visualization of intensity and spatial trends

The sub-array loess normalization methods described in this paper are based on the fact that dye balance typically varies with spot intensity and with spatial position on the array. It is a useful trouble-shooting step to display these trends visually as part of the normalization process.

The relationship between dye-bias and intensity can be seen best in an MA-plot, which is a scatterplot of the M-values against the A-values for an array [3]. Fig. 1 shows an MA-plot for an array showing three

Print-tip loess normalization

The idea of print-tip loess normalization can be visualized in Fig. 4. Each M-value is normalized by subtracting from it the corresponding value of the tip group loess curve. The normalized log-ratios N are the residuals from the tip group loess regressions, i.e.,N=M−loessi(A),where loessi(A) is the loess curve as a function of A for the ith tip group. Each loess curve is constructed by performing a series of local regressions, one local regression for each point in the scatterplot.

Composite loess normalization

It is usual to use all or most of the genes on the array in the normalization methods described above. It can be useful to modify this strategy if a suitable set of control spots is available which are known not to be differentially expressed. To be of most use in loess normalization, the control spots should span as wide a range of intensities as possible. A satisfactory set of controls for this purpose is a specially designed microarray sample pool (MSP) titration series in which the entire

Correcting for other trends

There are many other trends which could be estimated and adjusted for in the normalization step, although normally these are of less importance than the intensity and spatial trends already considered. For example, there can be differences between the purity of DNA from different amplification batches or from different clone libraries. This can mean that different spots on the microarray contain different effective quantities of DNA. Different amplification batches and different clone libraries

Between array normalization

Sometimes there are substantial scale differences between microarrays, because of changes in the photomultiplier tube settings of the scanner or for other reasons. In these circumstances, it is useful to scale-normalize between arrays. Scale-normalization is a simple scaling of the M-values from a series of arrays so that each array has the same median absolute deviation.

Fig. 6 displays side-by-side boxplots of the normalized M-values for a series of six replicate arrays including slide 0924

Weighting for spot quality

Most image analysis programs routinely record a variety of descriptive information about each spot apart from the foreground and background intensities. If this information is used to construct a numeric quality measure for each spot, then lower quality spots can be down-weighted in the normalization process.

Information which is recorded on each spot typically includes morphological details such as area, perimeter, and location plus heterogeneity measures such as standard deviations or

Software

Software to carry out the normalization methods described in this paper is freely available from the Bioconductor project site http://www.bioconductor.org. The Bioconductor packages use the free statistical programming environment R. For normalization of cDNA arrays, the relevant packages are marrayNorm [8], [9] and limma. Here, we give commands from the limma package.

The first step is data input. If all the SPOT output files are in the working directory of the R session and are named after the

Conclusion

Normalization methods for cDNA microarrays will no doubt see further development in the future, but print-tip loess normalization provides a well-tested general purpose normalization method which gives good results on a wide variety of arrays. The method may be refined by using quality weights for individual spots. The method is best combined with diagnostic plots of the data. When diagnostic plots show that biases still remain in the data after normalization, further normalization steps such

Acknowledgements

The authors are grateful to Dr. Lynn Corcoran for permission to use unpublished data from her laboratory at the WEHI and to Henrik Bengtsson for helpful discussions on plate-order normalization.

References (10)

  • M.J. Buckley, Spot User’s Guide, CSIRO Mathematical and Information Sciences, Sydney, Australia, 2000. Available from:...
  • GenePix Pro Microarray and Array Analysis Software, Axon Instruments Inc. Available from:...
  • S. Dudoit et al.

    Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments

    Statistica Sinica

    (2002)
  • W.S. Cleveland et al.

    Local regression models

  • Y.H. Yang, S. Dudoit, P. Luu, T.P. Speed, Normalization for cDNA microarray data, in: M.L. Bittner, Y. Chen, A.N....
There are more references available in the full text version of this article.

Cited by (1535)

View all citing articles on Scopus
View full text