Mol Biol Evol. 2012 Sep 12. [Epub ahead of print]
Integrating sequence variation and protein structure to identify sites under selection.
Meyer AG, Wilke CO.
Source
Section of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics. The University of Texas at Austin, Austin, TX 78731, USA.
Abstract
We present a novel method to identify sites under selection in protein-coding genes. Our method combines a traditional Goldman-Yang model of coding-sequence evolution with information obtained from the 3D structure of the evolving protein, specifically the relative solvent accessibility (RSA) of individual residues. We develop a random- effects likelihood (REL) sites model in which rate classes are RSA-dependent. The RSA dependence is modeled with linear functions. We demonstrate that our RSA-dependent model provides a significantly better fit to molecular sequence data than a traditional, RSA-independent model. We further show that our model provides a natural, RSA-dependent neutral baseline for the evolutionary rate ratio ω= dN/dS. Sites that deviate from this neutral baseline likely experience selection pressure for function. We apply our method to the influenza proteins hemagglutinin and neuraminidase. For hemagglutinin, our method recovers positively selected sites near the sialic-acid binding site and negatively selected sites that may be important for trimerization. For neuraminidase, our method recovers the oseltamivir resistance site and otherwise suggests that few sites deviate from the neutral baseline. Our method is broadly applicable to any protein sequences for which structural data are available or can be obtained via homology modeling or threading.
PMID:
22977116
[PubMed - as supplied by publisher]
Integrating sequence variation and protein structure to identify sites under selection.
Meyer AG, Wilke CO.
Source
Section of Integrative Biology, Institute for Cellular and Molecular Biology, and Center for Computational Biology and Bioinformatics. The University of Texas at Austin, Austin, TX 78731, USA.
Abstract
We present a novel method to identify sites under selection in protein-coding genes. Our method combines a traditional Goldman-Yang model of coding-sequence evolution with information obtained from the 3D structure of the evolving protein, specifically the relative solvent accessibility (RSA) of individual residues. We develop a random- effects likelihood (REL) sites model in which rate classes are RSA-dependent. The RSA dependence is modeled with linear functions. We demonstrate that our RSA-dependent model provides a significantly better fit to molecular sequence data than a traditional, RSA-independent model. We further show that our model provides a natural, RSA-dependent neutral baseline for the evolutionary rate ratio ω= dN/dS. Sites that deviate from this neutral baseline likely experience selection pressure for function. We apply our method to the influenza proteins hemagglutinin and neuraminidase. For hemagglutinin, our method recovers positively selected sites near the sialic-acid binding site and negatively selected sites that may be important for trimerization. For neuraminidase, our method recovers the oseltamivir resistance site and otherwise suggests that few sites deviate from the neutral baseline. Our method is broadly applicable to any protein sequences for which structural data are available or can be obtained via homology modeling or threading.
PMID:
22977116
[PubMed - as supplied by publisher]