MtPAN(3): site-class specific amino acid replacement matrices for mitochondrial proteins of Pancrustacea and Collembola

Francesco Nardi; Pietro Liò; Antonio Carapelli; Francesco Frati

doi:10.1016/j.ympev.2014.02.001

MtPAN(3): site-class specific amino acid replacement matrices for mitochondrial proteins of Pancrustacea and Collembola

Mol Phylogenet Evol. 2014 Jun:75:239-44. doi: 10.1016/j.ympev.2014.02.001. Epub 2014 Feb 10.

Authors

Francesco Nardi¹, Pietro Liò², Antonio Carapelli³, Francesco Frati⁴

Affiliations

¹ Department of Life Sciences, University of Siena, Via Aldo Moro 2, 53100 Siena, Italy. Electronic address: nardifra@unisi.it.
² Computer Laboratory, University of Cambridge. William Gates Building, 15 JJ Thomson Avenue, Cambridge CB3 0FD, UK. Electronic address: Pietro.Lio@cl.cam.ac.uk.
³ Department of Life Sciences, University of Siena, Via Aldo Moro 2, 53100 Siena, Italy. Electronic address: antonio.carapelli@unisi.it.
⁴ Department of Life Sciences, University of Siena, Via Aldo Moro 2, 53100 Siena, Italy. Electronic address: francesco.frati@unisi.it.

PMID: 24525199
DOI: 10.1016/j.ympev.2014.02.001

Abstract

Phylogenetic analyses of Pancrustacea have generally relied on empirical models of amino acid substitution estimated from large reference datasets and applied to the entire alignment. More recently, following the observation that different sites, or groups of sites, may evolve under different evolutionary constraints, methods have been developed to deal with site or site-class specific models. A set of three matrices has been here developed based on an alignment of complete mitochondrial pancrustacean genomes partitioned using an unsupervised clustering procedure acting over per-site physiochemical properties. The performance of the proposed matrix set - named MtPAN(3) - was compared to relevant single matrix models (MtZOA, MtART, MtPAN) under ML and BI. While the application of the new model does not solve some of the topological problems frequently encountered with pancrustacean mitogenomic phylogenetic analyses, MtPAN(3) largely outperforms its competitors based on AIC and Bayes factors, indicating a significantly improved fit to the empirical data. The applicability of the new model, as well as of multiple matrix models in general, is discussed and an R/BioPerl script that implements the procedure is provided.

Keywords: Amino acid matrices; Collembola; MtPAN(3); Pancrustacea; k-Means clustering.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Amino Acid Substitution
Animals
Arthropods / classification*
Arthropods / genetics
Bayes Theorem
Cluster Analysis
Computational Biology
Genome, Mitochondrial*
Likelihood Functions
Mitochondrial Proteins / genetics*
Models, Genetic*
Phylogeny
Sequence Alignment
Sequence Analysis, DNA

Substances

Mitochondrial Proteins