Protein sequence analysis is an important tool for researchers to study on bio-informatics and molecular biology, such as proteins structure and function prediction, phylogenetic classification and different conservation pattern recognition. It is a significant open issue to quickly efficiently find the similar proteins from a large scale of protein repository. This paper proposes a new method based on Discrete Wavelet Transform (DWT) to measure the similarity of protein sequences, i.e. the ACDWT model, as well as two amino acid encoding methods (HPC and ADCC) according to hydropathy properties and dissociation constants respectively. The model employs only the approximation coefficients of DWT so that the feature vector is short. That brings the proposed model a great running time promotion. According to the phylogenic trees about nine ND5 proteins made from our model and others, the experimental results show that our model is efficient and a little better than the others.
Su, Jie and Bao, Junpeng
"A Wavelet Transform Based Protein Sequence Similarity Model,"
Applied Mathematics & Information Sciences: Vol. 07
, Article 30.
Available at: https://dc.naturalspublishing.com/amis/vol07/iss3/30