Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix

Persistent Link:
http://hdl.handle.net/10150/610270
Title:
Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix
Author:
Ndhlovu, Andrew; Hazelhurst, Scott; Durand, Pierre M.
Affiliation:
Evolutionary Medicine Laboratory, Faculty of Health Sciences, University of the Witwatersrand; School of Electrical and Information Engineering, University of the Witwatersrand; Sydney Brenner Institute of Molecular Bioscience, University of Witwatersrand; Department of Ecology and Evolutionary Biology, University of Arizona; Department of Biodiversity and Conservation Biology, Faculty of Natural Sciences, University of the Western Cape
Issue Date:
2015
Publisher:
BioMed Central
Citation:
Ndhlovu et al. BMC Bioinformatics (2015) 16:255 DOI 10.1186/s12859-015-0688-8
Journal:
BMC Bioinformatics
Rights:
© 2015 Ndhlovu et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/)
Collection Information:
This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at repository@u.library.arizona.edu.
Abstract:
BACKGROUND: Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB. RESULTS: The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case. CONCLUSION: This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms.
EISSN:
1471-2105
DOI:
10.1186/s12859-015-0688-8
Version:
Final published version
Additional Links:
http://www.biomedcentral.com/1471-2105/16/255

Full metadata record

DC FieldValue Language
dc.contributor.authorNdhlovu, Andrewen
dc.contributor.authorHazelhurst, Scotten
dc.contributor.authorDurand, Pierre M.en
dc.date.accessioned2016-05-20T09:02:45Z-
dc.date.available2016-05-20T09:02:45Z-
dc.date.issued2015en
dc.identifier.citationNdhlovu et al. BMC Bioinformatics (2015) 16:255 DOI 10.1186/s12859-015-0688-8en
dc.identifier.doi10.1186/s12859-015-0688-8en
dc.identifier.urihttp://hdl.handle.net/10150/610270-
dc.description.abstractBACKGROUND: Selective pressures at the DNA level shape genes into profiles consisting of patterns of rapidly evolving sites and sites withstanding change. These profiles remain detectable even when protein sequences become extensively diverged. A common task in molecular biology is to infer functional, structural or evolutionary relationships by querying a database using an algorithm. However, problems arise when sequence similarity is low. This study presents an algorithm that uses the evolutionary rate at codon sites, the dN/dS (ω) parameter, coupled to a substitution matrix as an alignment metric for detecting distantly related proteins. The algorithm, called BLOSUM-FIRE couples a newer and improved version of the original FIRE (Functional Inference using Rates of Evolution) algorithm with an amino acid substitution matrix in a dynamic scoring function. The enigmatic hepatitis B virus X protein was used as a test case for BLOSUM-FIRE and its associated database EvoDB. RESULTS: The evolutionary rate based approach was coupled with a conventional BLOSUM substitution matrix. The two approaches are combined in a dynamic scoring function, which uses the selective pressure to score aligned residues. The dynamic scoring function is based on a coupled additive approach that scores aligned sites based on the level of conservation inferred from the ω values. Evaluation of the accuracy of this new implementation, BLOSUM-FIRE, using MAFFT alignment as reference alignments has shown that it is more accurate than its predecessor FIRE. Comparison of the alignment quality with widely used algorithms (MUSCLE, T-COFFEE, and CLUSTAL Omega) revealed that the BLOSUM-FIRE algorithm performs as well as conventional algorithms. Its main strength lies in that it provides greater potential for aligning divergent sequences and addresses the problem of low specificity inherent in the original FIRE algorithm. The utility of this algorithm is demonstrated using the Hepatitis B virus X (HBx) protein, a protein of unknown function, as a test case. CONCLUSION: This study describes the utility of an evolutionary rate based approach coupled to the BLOSUM62 amino acid substitution matrix in inferring protein domain function. We demonstrate that such an approach is robust and performs as well as an array of conventional algorithms.en
dc.language.isoenen
dc.publisherBioMed Centralen
dc.relation.urlhttp://www.biomedcentral.com/1471-2105/16/255en
dc.rights© 2015 Ndhlovu et al. Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/)en
dc.titleRobust sequence alignment using evolutionary rates coupled with an amino acid substitution matrixen
dc.typeArticleen
dc.identifier.eissn1471-2105en
dc.contributor.departmentEvolutionary Medicine Laboratory, Faculty of Health Sciences, University of the Witwatersranden
dc.contributor.departmentSchool of Electrical and Information Engineering, University of the Witwatersranden
dc.contributor.departmentSydney Brenner Institute of Molecular Bioscience, University of Witwatersranden
dc.contributor.departmentDepartment of Ecology and Evolutionary Biology, University of Arizonaen
dc.contributor.departmentDepartment of Biodiversity and Conservation Biology, Faculty of Natural Sciences, University of the Western Capeen
dc.identifier.journalBMC Bioinformaticsen
dc.description.collectioninformationThis item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at repository@u.library.arizona.edu.en
dc.eprint.versionFinal published versionen
All Items in UA Campus Repository are protected by copyright, with all rights reserved, unless otherwise indicated.