TY - JOUR
T1 - Protein map
T2 - An alignment-free sequence comparison method based on various properties of amino acids
AU - Yu, Chenglong
AU - Cheng, Shiu Yuen
AU - He, Rong L.
AU - Yau, Stephen S.T.
N1 - Funding Information:
We thank the anonymous referees for providing us with constructive comments and suggestions. We also thank Dr. Max Benson for critically reading and editing the manuscript. The first author would like to thank Prof. Raymond Chan and Mr. Yeung Hau-Man at CUHK for sharing the Matlab code of Hao's composition vector method. This research was supported by NSF, Tsinghua University, and HKUST.
PY - 2011/10/15
Y1 - 2011/10/15
N2 - In this paper, we propose a new protein map which incorporates with various properties of amino acids. As a powerful tool for protein classification, this new protein map both considers phylogenetic factors arising from amino acid mutations and provides computational efficiency for the huge amount of data. The ten amino acid physico-chemical properties (the chemical composition of the side chain, two polarity measures, hydropathy, isoelectric point, volume, aromaticity, aliphaticity, hydrogenation, and hydroxythiolation) are utilized according to their relative importance. Moreover, during the course of calculation of genetic distances between pairs of proteins, this approach does not require any alignment of sequences. Therefore, the proposed model is easier and quicker in handling protein sequences than multiple alignment methods, and gives protein classification greater evolutionary significance at the amino acid sequence level.
AB - In this paper, we propose a new protein map which incorporates with various properties of amino acids. As a powerful tool for protein classification, this new protein map both considers phylogenetic factors arising from amino acid mutations and provides computational efficiency for the huge amount of data. The ten amino acid physico-chemical properties (the chemical composition of the side chain, two polarity measures, hydropathy, isoelectric point, volume, aromaticity, aliphaticity, hydrogenation, and hydroxythiolation) are utilized according to their relative importance. Moreover, during the course of calculation of genetic distances between pairs of proteins, this approach does not require any alignment of sequences. Therefore, the proposed model is easier and quicker in handling protein sequences than multiple alignment methods, and gives protein classification greater evolutionary significance at the amino acid sequence level.
KW - Amino acid substitution
KW - Classification
KW - Mitochondrial genes
KW - Multiple alignment
KW - Phylogeny
KW - Protein map
UR - http://www.scopus.com/inward/record.url?scp=84860389050&partnerID=8YFLogxK
U2 - 10.1016/j.gene.2011.07.002
DO - 10.1016/j.gene.2011.07.002
M3 - Article
C2 - 21803133
AN - SCOPUS:84860389050
SN - 0378-1119
VL - 486
SP - 110
EP - 118
JO - Gene
JF - Gene
IS - 1-2
ER -