Semi-supervised learning for peptide identification from shotgun proteomics datasets

Nat Methods. 2007 Nov;4(11):923-5. doi: 10.1038/nmeth1113. Epub 2007 Oct 21.

Abstract

Shotgun proteomics uses liquid chromatography-tandem mass spectrometry to identify proteins in complex biological samples. We describe an algorithm, called Percolator, for improving the rate of confident peptide identifications from a collection of tandem mass spectra. Percolator uses semi-supervised machine learning to discriminate between correct and decoy spectrum identifications, correctly assigning peptides to 17% more spectra from a tryptic Saccharomyces cerevisiae dataset, and up to 77% more spectra from non-tryptic digests, relative to a fully supervised approach.

Publication types

  • Research Support, N.I.H., Extramural

MeSH terms

  • Algorithms
  • Artificial Intelligence*
  • Chymotrypsin / analysis
  • Chymotrypsin / chemistry
  • Databases, Protein
  • Pancreatic Elastase / analysis
  • Pancreatic Elastase / chemistry
  • Peptide Fragments / analysis*
  • Proteome / analysis
  • Proteome / chemistry
  • Proteomics / methods*
  • Saccharomyces cerevisiae Proteins / analysis
  • Saccharomyces cerevisiae Proteins / chemistry
  • Software
  • Tandem Mass Spectrometry / methods*
  • Trypsin / analysis
  • Trypsin / chemistry

Substances

  • Peptide Fragments
  • Proteome
  • Saccharomyces cerevisiae Proteins
  • Chymotrypsin
  • Pancreatic Elastase
  • Trypsin