Aligning biological sequences by exploiting residue conservation and coevolution

Anna Paola Muntoni, Andrea Pagnani, Martin Weigt, and Francesco Zamponi
Phys. Rev. E 102, 062409 – Published 7 December 2020
PDFHTMLExport Citation

Abstract

Sequences of nucleotides (for DNA and RNA) or amino acids (for proteins) are central objects in biology. Among the most important computational problems is that of sequence alignment, i.e., arranging sequences from different organisms in such a way to identify similar regions, to detect evolutionary relationships between sequences, and to predict biomolecular structure and function. This is typically addressed through profile models, which capture position specificities like conservation in sequences but assume an independent evolution of different positions. Over recent years, it has been well established that coevolution of different amino-acid positions is essential for maintaining three-dimensional structure and function. Modeling approaches based on inverse statistical physics can catch the coevolution signal in sequence ensembles, and they are now widely used in predicting protein structure, protein-protein interactions, and mutational landscapes. Here, we present DCAlign, an efficient alignment algorithm based on an approximate message-passing strategy, which is able to overcome the limitations of profile models, to include coevolution among positions in a general way, and to be therefore universally applicable to protein- and RNA-sequence alignment without the need of using complementary structural information. The potential of DCAlign is carefully explored using well-controlled simulated data, as well as real protein and RNA sequences.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
5 More
  • Received 2 June 2020
  • Accepted 12 November 2020

DOI:https://doi.org/10.1103/PhysRevE.102.062409

©2020 American Physical Society

Physics Subject Headings (PhySH)

  1. Research Areas
  1. Physical Systems
Interdisciplinary PhysicsStatistical Physics & ThermodynamicsPhysics of Living Systems

Authors & Affiliations

Anna Paola Muntoni1,2,3,*, Andrea Pagnani1,4,5, Martin Weigt3, and Francesco Zamponi2

  • 1Department of Applied Science and Technology (DISAT), Politecnico di Torino, Corso Duca degli Abruzzi 24, I-10129 Torino, Italy
  • 2Laboratoire de Physique de l'Ecole Normale Supérieure, ENS, Université PSL, CNRS, Sorbonne Université, Université de Paris, F-75005 Paris, France
  • 3Sorbonne Université, CNRS, Institut de Biologie Paris Seine, Biologie Computationnelle et Quantitative LCQB, F-75005 Paris, France
  • 4Italian Institute for Genomic Medicine, IRCCS Candiolo, SP-142, I-10060 Candiolo (TO), Italy
  • 5INFN, Sezione di Torino, Via Giuria 1, I-10125 Torino, Italy

  • *Corresponding author: anna.muntoni@polito.it

Article Text (Subscription Required)

Click to Expand

Supplemental Material (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 102, Iss. 6 — December 2020

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×