Abstract
The primitive data for deducing the Miyazawa-Jernigan contact energy or blocks substitution matrix (BLOSUM) consists of pair frequency counts. Each amino acid corresponds to a conditional probability distribution. Based on the deviation of such a conditional probability from random background, a scheme for the reduction of the amino acid alphabet is proposed. It is observed that an evident discrepancy exists between the reduced alphabets obtained from the raw data of the Miyazawa-Jernigan’s and BLOSUM’s residue pair counts. Taking a homologous sequence database SCOP40 as a test set, we detect homology with the obtained coarse-grained substitution matrices. It is verified that the reduced alphabets obtained well preserve information contained in the original 20-letter alphabet.
- Received 30 December 2001
DOI:https://doi.org/10.1103/PhysRevE.66.021906
©2002 American Physical Society