Homology-driven assembly of NOn-redundant protEin sequence sets (NOmESS) for mass spectrometry

Tikira Temu; Matthias Mann; Markus Räschle; Jürgen Cox

doi:10.1093/bioinformatics/btv756

Homology-driven assembly of NOn-redundant protEin sequence sets (NOmESS) for mass spectrometry

Bioinformatics. 2016 May 1;32(9):1417-9. doi: 10.1093/bioinformatics/btv756. Epub 2016 Jan 6.

Authors

Tikira Temu¹, Matthias Mann², Markus Räschle², Jürgen Cox³

Affiliations

¹ Computational Systems Biochemistry and Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried 82152, Germany.
² Proteomics and Signal Transduction, Max Planck Institute of Biochemistry, Martinsried 82152, Germany.
³ Computational Systems Biochemistry and.

Abstract

To enable mass spectrometry (MS)-based proteomic studies with poorly characterized organisms, we developed a computational workflow for the homology-driven assembly of a non-redundant reference sequence dataset. In the automated pipeline, translated DNA sequences (e.g. ESTs, RNA deep-sequencing data) are aligned to those of a closely related and fully sequenced organism. Representative sequences are derived from each cluster and joined, resulting in a non-redundant reference set representing the maximal available amino acid sequence information for each protein. We here applied NOmESS to assemble a reference database for the widely used model organism Xenopus laevis and demonstrate its use in proteomic applications.

Availability and implementation: NOmESS is written in C#. The source code as well as the executables can be downloaded from http://www.biochem.mpg.de/cox Execution of NOmESS requires BLASTp and cd-hit in addition.

Contact: cox@biochem.mpg.de

Supplementary information: Supplementary data are available at Bioinformatics online.

MeSH terms

Amino Acid Sequence
Animals
Base Sequence*
High-Throughput Nucleotide Sequencing
Humans
Mass Spectrometry*
Proteomics