Abstract
Gene duplication followed by adaptive evolution is thought to be a central mechanism for the emergence of novel genes. To illuminate the contribution of duplicated protein-coding sequences to the complexity of the human genome, we study the connectivity of pairwise sequence-related human proteins and construct a network of linked protein sequences with shared similarities. We find that (i) the connectivity distribution for sequence-related proteins decays as a power law with , (ii) the top rank of consists of a single large cluster of proteins , while bottom ranks consist of multiple isolated clusters, and (iii) structural characteristics of show both a high degree of clustering and an intermediate connectivity (“small-world” features). We gain further insight into structural properties of by studying the relationship between the connectivity distribution and the phylogenetic conservation of proteins in bacteria, plants, invertebrates, and vertebrates. We find that (iv) the proportion of sequence-related proteins increases with increasing extent of evolutionary conservation. Our results support that small-world network properties constitute a footprint of an evolutionary mechanism and extend the traditional interpretation of protein families.
- Received 15 June 2004
DOI:https://doi.org/10.1103/PhysRevE.70.051908
©2004 American Physical Society