Correlation-compressed direct-coupling analysis

Chen-Yi Gao, Hai-Jun Zhou, and Erik Aurell
Phys. Rev. E 98, 032407 – Published 11 September 2018
PDFHTMLExport Citation

Abstract

Learning Ising or Potts models from data has become an important topic in statistical physics and computational biology, with applications to predictions of structural contacts in proteins and other areas of biological data analysis. The corresponding inference problems are challenging since the normalization constant (partition function) of the Ising or Potts distribution cannot be computed efficiently on large instances. Different ways to address this issue have resulted in a substantial amount of methodological literature. In this paper we investigate how these methods could be used on much larger data sets than studied previously. We focus on a central aspect, that in practice these inference problems are almost always severely undersampled, and the operational result is almost always a small set of leading predictions. We therefore explore an approach where the data are prefiltered based on empirical correlations, which can be computed directly even for very large problems. Inference is only used on the much smaller instance in a subsequent step of the analysis. We show that in several relevant model classes such a combined approach gives results of almost the same quality as inference on the whole data set. It can therefore provide a potentially very large computational speedup at the price of only marginal decrease in prediction quality. We also show that the results on whole-genome epistatic couplings that were obtained in a recent computation-intensive study can be retrieved by our approach. The method of this paper hence opens up the possibility to learn parameters describing pairwise dependences among whole genomes in a computationally feasible and expedient manner.

  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
  • Figure
7 More
  • Received 4 November 2017
  • Revised 8 June 2018

DOI:https://doi.org/10.1103/PhysRevE.98.032407

©2018 American Physical Society

Physics Subject Headings (PhySH)

  1. Research Areas
Physics of Living Systems

Authors & Affiliations

Chen-Yi Gao1,2, Hai-Jun Zhou1,2,3,*, and Erik Aurell4,5,†

  • 1Key Laboratory of Theoretical Physics, Institute of Theoretical Physics, Chinese Academy of Sciences, Beijing 100190, China
  • 2School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China
  • 3Synergetic Innovation Center for Quantum Effects and Applications, Hunan Normal University, Changsha, Hunan 410081, China
  • 4Department of Computational Biology, KTH Royal Institute of Technology, 10044 Stockholm, Sweden
  • 5Department of Applied Physics and Department of Computer Science, Aalto University, 00076 Aalto, Finland

  • *zhouhj@itp.ac.cn
  • eaurell@kth.se

Article Text (Subscription Required)

Click to Expand

Supplemental Material (Subscription Required)

Click to Expand

References (Subscription Required)

Click to Expand
Issue

Vol. 98, Iss. 3 — September 2018

Reuse & Permissions
Access Options
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review E

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×