• Rapid Communication
  • Open Access

Learning to classify from impure samples with high-dimensional data

Patrick T. Komiske, Eric M. Metodiev, Benjamin Nachman, and Matthew D. Schwartz
Phys. Rev. D 98, 011502(R) – Published 16 July 2018

Abstract

A persistent challenge in practical classification tasks is that labeled training sets are not always available. In particle physics, this challenge is surmounted by the use of simulations. These simulations accurately reproduce most features of data, but cannot be trusted to capture all of the complex correlations exploitable by modern machine learning methods. Recent work in weakly supervised learning has shown that simple, low-dimensional classifiers can be trained using only the impure mixtures present in data. Here, we demonstrate that complex, high-dimensional classifiers can also be trained on impure mixtures using weak supervision techniques, with performance comparable to what could be achieved with pure samples. Using weak supervision will therefore allow us to avoid relying exclusively on simulations for high-dimensional classification. This work opens the door to a new regime whereby complex models are trained directly on data, providing direct access to probe the underlying physics.

  • Figure
  • Figure
  • Received 7 February 2018
  • Revised 14 April 2018

DOI:https://doi.org/10.1103/PhysRevD.98.011502

Published by the American Physical Society under the terms of the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to the author(s) and the published article’s title, journal citation, and DOI. Funded by SCOAP3.

Published by the American Physical Society

Physics Subject Headings (PhySH)

  1. Research Areas
  1. Physical Systems
Particles & Fields

Authors & Affiliations

Patrick T. Komiske1,*, Eric M. Metodiev1,†, Benjamin Nachman2,‡, and Matthew D. Schwartz3,§

  • 1Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA
  • 2Physics Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720, USA
  • 3Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA

  • *pkomiske@mit.edu
  • metodiev@mit.edu
  • bpnachman@lbl.gov
  • §schwartz@physics.harvard.edu

Article Text

Click to Expand

References

Click to Expand
Issue

Vol. 98, Iss. 1 — 1 July 2018

Reuse & Permissions
Author publication services for translation and copyediting assistance advertisement

Authorization Required


×
×

Images

×

Sign up to receive regular email alerts from Physical Review D

Reuse & Permissions

It is not necessary to obtain permission to reuse this article or its components as it is available under the terms of the Creative Commons Attribution 4.0 International license. This license permits unrestricted use, distribution, and reproduction in any medium, provided attribution to the author(s) and the published article's title, journal citation, and DOI are maintained. Please note that some figures may have been included with permission from other third parties. It is your responsibility to obtain the proper permission from the rights holder directly for these figures.

×

Log In

Cancel
×

Search


Article Lookup

Paste a citation or DOI

Enter a citation
×