Abstract
The classification of human gene sequences into exons and introns is a difficult problem in DNA sequence analysis. In this paper, we define a set of features, called the simple Z (SZ) features, which is derived from the Z-curve features for the recognition of human exons and introns. The classification results show that SZ features, while fewer in numbers (three in total), can preserve the high recognition rate of the original nine Z-curve features. Since the size of SZ features is one-third of the Z-curve features, the dimensionality of the feature space is much smaller, and better recognition efficiency is achieved. If the stop codon feature is used together with the three SZ features, a recognition rate of up to 92% for short sequences of length can be obtained.
- Received 7 October 2002
DOI:https://doi.org/10.1103/PhysRevE.67.061916
©2003 American Physical Society