• Open Access

Disentanglement of Evolutionary Constraints in Statistical Models of Proteins

Haobo Wang, Shihao Feng, Kotaro Tsuboyama, Sirui Liu, Gabriel J. Rocklin, and Sergey Ovchinnikov
PRX Life 2, 023005 – Published 18 April 2024

Abstract

The exponential growth of protein sequences in the post-genomic era has revolutionized the application of generative sequence models for pivotal tasks such as contact prediction, protein design, alignment, and homology search. Despite remarkable progress in these areas, the interpretability of the modeled pairwise parameters remains limited due to complexities arising from coevolution, phylogeny, and entropy. While post-correction methods for contact prediction have been developed to eliminate entropy-related contributions from predicted contact maps, there is currently no direct approach to correct entropy in other applications reliant on raw parameters. In this paper, we investigate the sources of entropy signal and propose a novel spectral regularizer, LH (an abbreviation of Henri Lebesgue), to mitigate its impact during model fitting. By incorporating this regularizer into the GREMLIN framework (utilizing a Markov random field or Potts model), we enable the accurate inference of sparse contact maps while simultaneously improving interpretability and addressing overfitting concerns critical for sequence evaluation and design. To validate the efficacy of our approach, we design multiple protein sequences based on GREMLIN with both L2 and LH regularizers, and subsequently experimentally measure their using cDNA display proteolysis. Our findings demonstrate that proteins designed using the LH regularizer exhibit increased diversity and enhanced folding stability.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
3 More
  • Received 21 November 2023
  • Accepted 11 March 2024

DOI:https://doi.org/10.1103/PRXLife.2.023005

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Statistical Physics & ThermodynamicsNetworks

Authors & Affiliations

Haobo Wang

  • FAS, Division of Science, Harvard University, Cambridge, Massachusetts 02138, USA

Shihao Feng

  • Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China

Kotaro Tsuboyama*

  • Department of Pharmacology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois 60611, USA

Sirui Liu

  • FAS, Division of Science, Harvard University, Cambridge, Massachusetts 02138, USA

Gabriel J. Rocklin

  • Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA

Sergey Ovchinnikov§

  • JHDSF Program, Harvard University, Cambridge, Massachusetts 02138, USA

  • *Present address: Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan; also at Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA; and PRESTO, Japan Science and Technology Agency, Chiyoda-ku, Tokyo 102-0076, Japan.
  • Present address: Changping Laboratory, Beijing 102200, China.
  • Also at Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA; Chemistry for Life Processes Institute, Northwestern University, Evanston, IL 60208, USA; and Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, IL 60611, USA.
  • §Present address: Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; corresponding author: so3@mit.edu

Article Text

Click to Expand

Supplemental Material

Click to Expand

References

Click to Expand
Issue

Vol. 2, Iss. 2 — April - June 2024

Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from PRX Life

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 4.0 International license. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×