Coexpression Analysis of Human Genes Across Many Microarray Data Sets

  1. Homin K. Lee1,
  2. Amy K. Hsu1,2,
  3. Jon Sajdak1,
  4. Jie Qin1, and
  5. Paul Pavlidis1,3,4
  1. 1 Columbia Genome Center, Columbia University, New York, New York 10032, USA
  2. 2 College of Physicians and Surgeons, Columbia University, New York, New York 10032, USA
  3. 3 Department of Biomedical Informatics, Columbia University, New York, New York 10032, USA

Abstract

We present a large-scale analysis of mRNA coexpression based on 60 large human data sets containing a total of 3924 microarrays. We sought pairs of genes that were reliably coexpressed (based on the correlation of their expression profiles) in multiple data sets, establishing a high-confidence network of 8805 genes connected by 220,649 “coexpression links” that are observed in at least three data sets. Confirmed positive correlations between genes were much more common than confirmed negative correlations. We show that confirmation of coexpression in multiple data sets is correlated with functional relatedness, and show how cluster analysis of the network can reveal functionally coherent groups of genes. Our findings demonstrate how the large body of accumulated microarray data can be exploited to increase the reliability of inferences about gene function.

Footnotes

  • [Supplemental material is available online at www.genome.org and http://microarray.cpmc.columbia.edu/tmm.]

  • Article and publication are at http://www.genome.org/cgi/doi/10.1101/gr.1910904.

  • 4 Corresponding author. E-MAIL pp175{at}columbia.edu; FAX (212) 851-5149.

    • Accepted February 24, 2004.
    • Received August 26, 2003.
| Table of Contents

Preprint Server