Disentanglement of Evolutionary Constraints in Statistical Models of Proteins

Open Access

Disentanglement of Evolutionary Constraints in Statistical Models of Proteins

Haobo Wang, Shihao Feng, Kotaro Tsuboyama, Sirui Liu, Gabriel J. Rocklin, and Sergey Ovchinnikov

PRX Life 2, 023005 – Published 18 April 2024

Abstract

The exponential growth of protein sequences in the post-genomic era has revolutionized the application of generative sequence models for pivotal tasks such as contact prediction, protein design, alignment, and homology search. Despite remarkable progress in these areas, the interpretability of the modeled pairwise parameters remains limited due to complexities arising from coevolution, phylogeny, and entropy. While post-correction methods for contact prediction have been developed to eliminate entropy-related contributions from predicted contact maps, there is currently no direct approach to correct entropy in other applications reliant on raw parameters. In this paper, we investigate the sources of entropy signal and propose a novel spectral regularizer, LH (an abbreviation of Henri Lebesgue), to mitigate its impact during model fitting. By incorporating this regularizer into the GREMLIN framework (utilizing a Markov random field or Potts model), we enable the accurate inference of sparse contact maps while simultaneously improving interpretability and addressing overfitting concerns critical for sequence evaluation and design. To validate the efficacy of our approach, we design multiple protein sequences based on GREMLIN with both L2 and LH regularizers, and subsequently experimentally measure their using cDNA display proteolysis. Our findings demonstrate that proteins designed using the LH regularizer exhibit increased diversity and enhanced folding stability.

3 More

Received 21 November 2023
Accepted 11 March 2024

DOI:https://doi.org/10.1103/PRXLife.2.023005

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article's title, journal citation, and DOI.

Published by the American Physical Society

Physics Subject Headings (PhySH)

Entropy Molecular evolution Protein structure prediction

Proteins

Ising model Potts model Random field Ising model

Statistical Physics & ThermodynamicsNetworks

Authors & Affiliations

Haobo Wang

FAS, Division of Science, Harvard University, Cambridge, Massachusetts 02138, USA

Shihao Feng

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240, China

Kotaro Tsuboyama^*

Department of Pharmacology, Northwestern University, Feinberg School of Medicine, Chicago, Illinois 60611, USA

Sirui Liu^†

FAS, Division of Science, Harvard University, Cambridge, Massachusetts 02138, USA

Gabriel J. Rocklin^‡

Department of Pharmacology, Northwestern University Feinberg School of Medicine, Chicago, Illinois 60611, USA

Sergey Ovchinnikov ^§

JHDSF Program, Harvard University, Cambridge, Massachusetts 02138, USA

^*Present address: Institute of Industrial Science, The University of Tokyo, Tokyo 153-8505, Japan; also at Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA; and PRESTO, Japan Science and Technology Agency, Chiyoda-ku, Tokyo 102-0076, Japan.
^†Present address: Changping Laboratory, Beijing 102200, China.
^‡Also at Center for Synthetic Biology, Northwestern University, Evanston, IL 60208, USA; Chemistry for Life Processes Institute, Northwestern University, Evanston, IL 60208, USA; and Robert H. Lurie Comprehensive Cancer Center, Northwestern University, Chicago, IL 60611, USA.
^§Present address: Department of Biology, Massachusetts Institute of Technology, Cambridge, MA, USA; corresponding author: so3@mit.edu

Article Text

Click to Expand

Supplemental Material

Click to Expand

References

Click to Expand

Issue

Vol. 2, Iss. 2 — April - June 2024

Reuse & Permissions

Author publication services for translation and copyediting assistance advertisement

PRX Life