Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

A review of UMAP in population genetics

Abstract

Uniform manifold approximation and projection (UMAP) has been rapidly adopted by the population genetics community to study population structure. It has become common in visualizing the ancestral composition of human genetic datasets, as well as searching for unique clusters of data, and for identifying geographic patterns. Here we give an overview of applications of UMAP in population genetics, provide recommendations for best practices, and offer insights on optimal uses for the technique.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. McVean G. A genealogical interpretation of principal components analysis. PLoS Gen. 2009;5:e1000686.

    Article  Google Scholar 

  2. Patterson N, Price AL, Reich D. Population structure and eigenanalysis. PLoS Gen. 2006;2:e190.

    Article  Google Scholar 

  3. Maaten Lvd, Hinton G. Visualizing data using t-sne. J Mach Learn Res. 2008;9:2579–2605.

    Google Scholar 

  4. McInnes L, Healy J, Melville J. UMAP: uniform manifold approximation and projection for dimension reduction. arXiv 2018. http://arxiv.org/abs/1802.03426.

  5. Becht E, McInnes L, Healy J, Dutertre C, Kwok IWH, Newel EW, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37:38–44.

    Article  CAS  Google Scholar 

  6. Moon KR, Dijk Dv, Wang Z, Gigante S, Burkhardt DB, Coifman RR, et al. Visualizing structure and transitions in high-dimensional biological data. Nat Biotechnol. 2019;37:1482–92.

    Article  CAS  Google Scholar 

  7. Diaz-Papkovich A, Anderson-Trocmé L, Ben-Eghan C, Gravel S. UMAP reveals cryptic population structure and phenotype heterogeneity in large genomic cohorts. PLoS Gen. 2019;15. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6853336/.

  8. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature. 2015;526:68–74.

    Article  Google Scholar 

  9. Cann HM, Toma Cd, Cazes L, Legrand MF, Morel V, Cambon-Thomsen A, et al. A human genome diversity cell line panel. Science. 2002;296:261–2.

    Article  CAS  Google Scholar 

  10. Karczewski KJ, Francioli LC, Tiao G, Cummings BB, Alföldi J, MacArthur DG, et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020;581:434–43.

    Article  CAS  Google Scholar 

  11. Nagai A, Hirata M, Kamatani Y, Muto K, Matsuda K, Mushiroda T, et al. Overview of the BioBank Japan Project: study design and profile. Journal of epidemiology. 2017;27:S2–S8.

    Article  Google Scholar 

  12. Sakaue S, Hirata J, Kanai M, Suzuki K, Akiyama M, Okada Y, et al. Dimensionality reduction reveals fine-scale structure in the Japanese population with consequences for polygenic risk prediction. Nat Commun. 2020;11:1569.

    Article  CAS  Google Scholar 

  13. Belbin GM, Wenric S, Cullina S, Glicksberg BS, Moscati A, Kenny EE, et al. Towards a fine-scale population health monitoring system. bioRxiv780668. 2019. https://www.biorxiv.org/content/10.1101/780668v1.

  14. Hunter-Zinck H, Shi Y, Li M, Gorman BR, Ji SG, Pyarajan S, et al. Genotyping array design and data quality control in the million veteran program. Am J Human Gen. 2020;106:535–48.

    Article  CAS  Google Scholar 

  15. Margaryan A, Lawson D, Sikora M, Racimo F, Rasmussen S, Willerslev E, et al. Population genomics of the Viking world. bioRxiv703405. 2019. https://www.biorxiv.org/content/10.1101/703405v1.

  16. Simon A, Fraïsse C, El Ayari T, Liautard-Haag C, Strelkov P, Bierne N, et al. Local introgression at two spatial scales in mosaic hybrid zones of mussels. bioRxiv818559. 2019. https://www.biorxiv.org/content/10.1101/818559v1.

  17. Sánchez-Barreiro F, Gopalakrishnan S, Ramos-Madrigal J, Westbury MV, Manuel Mde, Gilbert MTP, et al. Historical population declines prompted significant genomic erosion in the northern and southern white rhinoceros (Ceratotherium simum). bioRxiv2020.05.10.086686. 2020. https://www.biorxiv.org/content/10.1101/2020.05.10.086686v1.

  18. The Anopheles Gambiae 1000 Genomes Consortium. Genome variation and population structure among 1142 mosquitoes of the African malaria vector species Anopheles gambiae and Anopheles coluzzii. bioRxiv864314. 2020. https://www.biorxiv.org/content/10.1101/864314v2.

  19. Schmidt TL, Chung J, Honnen A-C, Weeks AR, Hoffmann A A. Population genomics of two invasive mosquitoes (aedes aegypti and aedes albopictus) from the indo-pacific. bioRxiv. 2020.

  20. Dai CL, Vazifeh MM, Yeang CH, Tachet R, Wells RS, Martin AR, et al. Population histories of the United States revealed through fine-scale migration and haplotype analysis. Am J Hum Gen. 2020;106:371–88.

    Article  CAS  Google Scholar 

  21. Spear ML, Diaz-Papkovich A, Ziv E, Gravel S, Torgerson DG, Hernandez R. Recent fluctuations in Mexican American genomes have altered the genetic architecture of biomedical traits. bioRxiv. 2020.

  22. Holmes S, Huber W. Modern statistics for modern biology (Cambridge University Press, 2018).

  23. Tonkin-Hill G, Lees JA, Bentley SD, Frost SDW, Corander J. Fast hierarchical Bayesian analysis of population structure. Nucleic Acids Research. 2019;47:5539–49.

    Article  CAS  Google Scholar 

  24. Almarri MA, Bergström A, Prado-Martinez J, Yang F, Fu B, Xue Y, et al. Population structure, stratification, and introgression of human structural variation. Cell. 2020;182;189–199.e15.

    Article  CAS  Google Scholar 

  25. Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Gen Res. 2009;19:1655–1664.

    Article  CAS  Google Scholar 

  26. Lawson DJ, Hellenthal G, Myers S, Falush D. Inference of population structure using dense haplotype data. PLoS Gen. 2012;8:e1002453.

    Article  CAS  Google Scholar 

  27. Kerminen S, Martin AR, Koskela J, Ruotsalainen SE, Havulinna AS, Daly MJ, et al. Geographic variation and bias in the polygenic scores of complex diseases and traits in Finland. Am J Hum Gen. 2019;104:1169–81.

    Article  CAS  Google Scholar 

  28. Berg JJ, Harpak A, Sinnott-Armstrong N, Joergensen AM, Mostafavi H, Coop G, et al. Reduced signal for polygenic adaptation of height in UK Biobank. eLife. 2019;8:e39725.

    Article  Google Scholar 

  29. Sohail M, Maier RM, Ganna A, Bloemendal A, Martin AR, Sunyaev SR, et al. Polygenic adaptation on height is overestimated due to uncorrected stratification in genome-wide association studies. eLife. 2019;8:e39702.

    Article  Google Scholar 

  30. Yamamoto K, Sakaue S, Matsuda K, Murakami Y, Kamatani Y, Okada Y, et al. Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population. Commun Biol. 2020;3:1–11.

    Article  Google Scholar 

  31. Mathieson I, Scally A. What is ancestry? PLoS Genetics. 2020;16:e1008624.

    Article  CAS  Google Scholar 

  32. McInnes L, Healy J, Saul N, Grossberger L. UMAP: uniform manifold approximation and projection. J Open Source Softw. 2018;3:861.

    Article  Google Scholar 

  33. Hunter JD. Matplotlib: a 2d graphics environment. Comput Sci Eng. 2007;9:90–5.

    Article  Google Scholar 

  34. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Duchesnay E, et al. Scikit-learn: machine learning in Python. J Mach Learn Res. 2011;12:2825–30.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Simon Gravel.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Diaz-Papkovich, A., Anderson-Trocmé, L. & Gravel, S. A review of UMAP in population genetics. J Hum Genet 66, 85–91 (2021). https://doi.org/10.1038/s10038-020-00851-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s10038-020-00851-4

This article is cited by

Search

Quick links