Cosmic Bell Test: Measurement Settings from Milky Way Stars

Bell's theorem states that some predictions of quantum mechanics cannot be reproduced by a local-realist theory. That conflict is expressed by Bell's inequality, which is usually derived under the assumption that there are no statistical correlations between the choices of measurement settings and anything else that can causally affect the measurement outcomes. In previous experiments, this"freedom of choice"was addressed by ensuring that selection of measurement settings via conventional"quantum random number generators"was space-like separated from the entangled particle creation. This, however, left open the possibility that an unknown cause affected both the setting choices and measurement outcomes as recently as mere microseconds before each experimental trial. Here we report on a new experimental test of Bell's inequality that, for the first time, uses distant astronomical sources as"cosmic setting generators."In our tests with polarization-entangled photons, measurement settings were chosen using real-time observations of Milky Way stars while simultaneously ensuring locality. Assuming fair sampling for all detected photons, and that each stellar photon's color was set at emission, we observe statistically significant $\gtrsim 7.31 \sigma$ and $\gtrsim 11.93 \sigma$ violations of Bell's inequality with estimated $p$-values of $ \lesssim 1.8 \times 10^{-13}$ and $\lesssim 4.0 \times 10^{-33}$, respectively, thereby pushing back by $\sim$600 years the most recent time by which any local-realist influences could have engineered the observed Bell violation.

In this Letter, we report on a new experimental test of Bell's inequality that, for the first time, uses distant astronomical sources to choose measurement settings. This is the first in a series of "cosmic Bell tests" that will use progressively more distant sources, ultimately pushing the measurement settings' origin to greater and greater cosmological distances [1].
Background.-Scientists have struggled with the alleged incompatibility of quantum entanglement and our everyday intuitions about the physical world since the seminal paper by Einstein, Podolsky and Rosen (EPR) in 1935 [2]. EPR concluded that the description of reality given by the quantum-mechanical wave-function is incomplete because it is incompatible with the concepts of locality (no physical influences can travel faster than the speed of light in vacuum) and realism (objects possess complete sets of properties on their own, prior to measurement). A well-known "tool" to experimentally distinguish between the quantum predictions and localrealist alternatives of the sort envisaged by EPR is provided by the famous inequality derived by John Bell in 1964 [3]. Assuming locality and realism, Bell's inequality limits the degree to which measurement outcomes on pairs of distant systems may be correlated, if the mea-surements of one system are carried out with only limited information about the other. By contrast, measurements on entangled particle pairs in the quantum singlet state, for example, are predicted to violate Bell's inequality. Beginning with [4], essentially all significant experimental Bell tests to date have supported the quantummechanical predictions.
However, the conclusions of any experiment are valid only given certain assumptions, the violation of which leaves open "loopholes," whereby a local-realist description of nature could still be compatible with the experimental results. (For extensive reviews of Bell-test loopholes, see [5][6][7].) For example, the locality loophole concerns whether any information about one side's measurement setting or measurement outcome could have been communicated (at or below the speed of light) to the other side prior to its measurement. This loophole has been closed by space-like separating measurementsetting choices on each side from the other side's measurement outcomes [8,9]. The fair-sampling loophole [10] concerns whether the set of entangled particles detected on both sides was representative of all emitted pairs rather than a biased sub-ensemble, and has recently been closed by ensuring a sufficient total fraction arXiv:1611.06985v2 [quant-ph] 26 Jan 2017  Table I for the azimuth (az) and altitude (alt) of each star observed during each experimental run. (3D graphics taken from Google Earth, 2016.) of detected pairs [11][12][13][14][15][16]. Even more recently, several cutting-edge experiments have demonstrated violations of Bell's inequality while closing both the locality and fair-sampling loopholes simultaneously [17][18][19][20][21].
A third major loophole, known variously as the freedom-of-choice, measurement-independence, or setting-independence loophole [22][23][24][25], concerns the choice of measurement settings. In particular, the derivation of Bell's inequality explicitly assumes that there is no statistical correlation between the choices of measurement settings and anything else that causally affects both measurement outcomes.
Bell himself observed forty years ago that, "It has been assumed that the settings of instruments are in some sense free variables-say at the whim of experimenters-or in any case not determined in the overlap of the backward light cones" [22]. Recent theoretical work has demonstrated that models that relax this assumption, allowing for a modest correlation between the joint measurement settings and any causal influence on the measurement outcomes, can reproduce the quantum correlations [26][27][28][29][30][31][32][33][34][35][36]. (See also [37] on some subtleties of addressing the freedom-of-choice loophole.) Even if nature does not exploit this loophole, testing it experimentally (e.g. [38,39]) has significant practical relevance for device-independent quantum key distribution [40][41][42] as well as random-number generation and randomness expansion [32,43,44]. In particular, a sophisticated adversary could undermine a variety of quantum information schemes by utilizing the freedomof-choice loophole [26,30,36].
To the extent that recent experiments have addressed freedom of choice, they have adopted the additional, strong assumption that the relevant causal influences (or "hidden variables") originate together with the entangled particles and hence cannot influence setting choices in space-like separated regions [18,19,38], or assumed that the setting-generation process is completely independent of its past [17,20,21]. Yet nowhere in the derivation of Bell's inequality does the formalism make any stipulation about where or when such hidden variables could be created or become relevant. In fact, as Bell himself emphasized [22,24] (see also [1,32]), they could be associated with any events within the experiment's past light-cone [45]. Thus, in principle, the possibility exists that the purportedly random setting choices in previous experiments could have been influenced by some unknown cause in their past, exploiting the freedom-ofchoice loophole through events as recent as a few microseconds before the measurements [48].
Here we report on an experimental Bell test with polarization entangled photons that, assuming fair sampling, significantly constrains the space-time region from which any such unknown, causal influences could have affected both the measurement settings and outcomes. While simultaneously closing the locality loophole, measurement settings are determined by "cosmic setting generators," using the color of photons detected during realtime astronomical observations of distant stars. We observed Bell violations with high statistical significance and thus conclude that any hidden causal influences that could have exploited the freedom-of-choice loophole would have to have originated from remote spacetime events at least several hundred years ago, at locations seemingly unrelated to the entangled-pair creation. Compared to previous experiments, this pushes back by ∼16 orders of magnitude the most recent time by which any local-realist influences could have engineered the observed correlations.
The idea to address freedom-of-choice by using distant astronomical sources to choose Bell-test measurement settings was already discussed as far back as the 1976 Erice meeting organized by John Bell and Bernard d'Espagnat [49]. While others have also briefly noted this basic premise [38,50,51], this work is the first to implement it experimentally, building on a detailed feasibility study [1].
Experimental Implementation.- Fig. 1 shows our three experimental sites across Vienna. A central entangled photon source S was located in a laboratory on the 4th floor of the Institute for Quantum Optics and Quantum Information (IQOQI) and the two observers, Alice (A) and Bob (B), were situated on the 9th floor of the Austrian National Bank (OENB) and on the 5th floor of the University of Natural Resources and Life Sciences (BOKU), respectively.
The entangled photon source is based on a Sagnac interferometer [52,53] generating polarization-entangled photon pairs in the maximally entangled singlet state . Using single-mode fibers, each entangled photon was guided to an entangled photon transmitting telescope (Tx-EP) located at the rooftop of IQOQI, which sent the photons via free-space quantum channels to Alice and Bob, respectively. Measurement stations for Alice and Bob each featured an entangled photon receiving telescope (Rx-EP), a polariza-tion analyzer (POL), stellar photon receiving telescope (Rx-SP), a setting reader (SR) and a control and data acquisition unit (CaDA). The entangled photons were collected with the Rx-EP and guided to the polarization analyzer where an electro-optical modulator (EOM) allowed for fast switching between complementary measurement bases. This was followed by a polarizing beam splitter with a single-photon avalanche diode (SPAD) detector in each output port.
The Rx-SPs collected stellar photons, which were guided by multimode fibers to setting readers, where dichroic mirrors with ∼700 nm cutoffs split them into "blue" and "red" arms, each fed to a SPAD. An FPGA board processed the SPAD signals to electronically implement the corresponding EOM measurement setting. Every detector click in a red/blue arm induced a measurement in the following linear polarization bases for Alice: 45 • /135 • (blue) and 0 • /90 • (red), and Bob: 22.5 • /112.5 • (blue) and -22.5 • /67.5 • (red), respectively. Finally, using a GPS-disciplined clock, all SPAD detections from the polarization analyzer and the setting reader were timestamped by the FPGA board and recorded by a computer.
We specifically use photon color to implement measurement settings under the assumption that the wavelength of the photon emitted from the star was determined at the time of emission and unaltered since. Astrophysical motivations for this assumption include the absence of any known mechanism that preferentially reradiates photons at a different wavelength along our line of sight; any such process would violate the conservation of energy and momentum. In addition, the effects of wavelength-dependent attenuation by the interstellar medium are negligible for Milky Way stars within a few thousand light years (ly) [54][55][56]. By contrast, significant attenuation from the Earth's atmosphere (38-45% loss) as well as from the experimental setup (59-61% loss) is unavoidable (see the Supplemental Material [57]). Thus, our approach requires the assumption that this represents a fair sample of the celestial emissions.
Two other major effects must be considered to account for SPAD detections which do not represent stellar photons with correctly identified colors. First, local sources of noise, including sky glow, light pollution, and dark counts determine between 1% and 5% of the settings, as measured by looking at a dark patch of sky next to each star. Only a tenth of these are detector dark counts. In addition, due to imperfect dichroic mirrors, a certain fraction of detected stellar photons will be assigned the wrong setting. Hence, to precisely quantify the full optical path of our settings implementation method, we measured transmission and reflection spectra for the dichroic mirrors, and multiplied this by the spectra of the stars, atmospheric transmission [61], reflection and transmission of optical elements, and the FIG. 2. 1+1 D space-time diagram for run 1, with the origin at the entangled pair creation (black dot) and a spatial projection axis chosen to minimize its distance to Alice and Bob. After a fiber delay (thick black line), entangled photons are sent via free-space channels (thin black lines) to be measured by Alice and Bob at events A and B. Blue and red stars indicate example valid settings from measuring stellar photons emitted far away at space-time events A and B (see Fig. 3). Ensuring locality limits valid settings to the shaded regions. Delays to implement each setting and an added safety buffer shorten the validity time windows actually used to the darker shaded regions.
FIG. 3. 2+1 D space-time diagram for run 1 with past light cones for stellar emission events A and B . Two spatial dimensions are shown (x-y plane) with the third suppressed. The stellar pair's angular separation on the sky is the angle between the red and blue vectors. Our data rule out local-realist models with hidden variables in the gray space-time region. We do not rule out models with hidden variables in the past light cones for events A (blue), B (red), or their overlap (purple). SPAD detector response. Conservative estimates of the fraction of incorrectly determined measurement settings range from f w ≈1.4-2.0%, depending on the stellar spectrum and red/blue output port (see the Supplemental Material [57]).
Space-time arrangement.-Ensuring locality requires that any information leaving Alice's star along with her setting-determining stellar photon and traveling at the speed of light could not have reached Bob before his measurement of the entangled photon is completed, and vice versa. This also achieves a necessary condition for freedom of choice.
The projected 1+1 D space-time diagram in Fig. 2 shows run 1 of our experiment. Entangled photons are generated at point S , which coincides with the origin. After local fiber transmission to the transmitting telescopes, which takes a time of τ fiber ≈180 ns, they are sent via free-space channels to Alice at a projected distance of x A = 557 m and to Bob at x B = 1149 m and measured at events A and B, respectively. Settings are determined far away at the stellar emission events A and B , respectively. To close the locality loophole, entangled photon detections cannot be accepted outside a certain (maximal) time interval τ k valid (k = {A, B}) after the detection of a stellar photon, which, in general, must be chosen such that the corresponding setting is still space-like separated from the measurement on the other side.
The time-dependent locations of the stars on the sky relative to our ground-based experimental sites make τ k valid (t) time-dependent parameters. However, since our Rx-SPs pointed out of windows, resulting in highly restrictive azimuth/altitude limits for the star selection, τ k valid (t) did not change significantly during the three minutes of measurement for each experimental run. Consequently, using spatial site coordinates from Fig. 1 and the star's celestial coordinates (see the Supplemental Material [57]),τ k valid = min t {τ k valid (t)} was calculated as 2.55 µs for Alice and 6.93 µs for Bob for run 1 (see Table I).
The final time-window τ k used was chosen to be as large as possible (see Fig. 2), while also subtracting the time it takes to implement a setting (τ set ≈ 170 ns) and to ensure optimal operation of the EOM (e.g. by minimizing piezoelectric ringing). The latter required subtracting safety buffers of τ A buffer = 0.38 µs for Alice and τ B buffer = 1.76 µs for Bob, which easily accounted for the delay of stellar photons due to the index of refraction of the atmosphere (τ atm ≈ 18 ns) [62], and any small inaccuracies in timing or the distances between the experimental stations (see the Supplemental Material [57]).
In Fig. 2, stellar photons arriving parallel to the arrows in the blue and red shaded space-time regions (corresponding to the time intervalsτ k valid ) provide valid basis settings, ensuring space-like separation for all relevant events. The darker shading corresponds to regions actually used in run 1, with τ A used = 2 µs and τ B used = 5 µs. In run 1, the fractions of time with valid settings for Alice and Bob were 24.9% and 40.6%, respectively, while in run 2 they were 22.0% and 44.6%, respectively. Duty cycles for each observer differ primarily due to different values of τ k used and different count rates for each star (see the Supplemental Material [57]).
We pre-selected candidate stars within the highly restrictive azimuth and altitude limits of the stellar photon receiving telescopes from the Hipparcos catalogue [63,64] with parallax distances greater than 500 ly, dis-tance errors less than 50% and Hipparcos H p magnitude between 5 and 9. Combined with the geometric configuration of the sites, selection of these stars ensured sufficient setting validity times on both sides during each experimental run of 179 seconds. To ensure a sufficiently high signal-to-noise ratio, we chose ∼5-6 magnitude stars (see the Supplemental Material [57]). Note that to avoid detector saturation, parts of the entrance aperture of the Rx-SPs had to be covered. Fig. 3 shows a 2+1 D space-time diagram for the stellar emission events from run 1. Events associated with relevant hidden variables could lie within the past light cones of stellar emission events A or B , the most recent of which originated 604 ± 35 years ago, accounting for parallax distance errors arising primarily from the angular resolution limits of the Hipparcos mission [63,64].  Fig. 2). The last 3 columns show the measured CHSH parameter for runs 1 and 2, as well as the p-value and the number of standard deviations ν by which our local-realist model can be rejected (see the Supplemental Material [57]).
Analysis and results.-We performed two cosmic Bell tests, each lasting 179 seconds. In runs 1 and 2, Alice and Bob's settings were chosen with photons from Hipparcos stars in Table I. To analyze the data, we make the assumptions of fair sampling and fair coincidences [65]. Thus, all data can be post-selected to include coincidence events between Alice's and Bob's measurement stations. We correct for GPS clock drift as in [66] and identify coincidences within a 2.5 ns time window.
We then analyze correlations between measurement outcomes A, B ∈ {+1, −1} for particular setting choices (a i , b j ), i, j ∈ {1, 2} using the Clauser-Horne-Shimony-Holt (CHSH) inequality [67]: where E i j = 2 p(A = B|a i b j ) − 1 and p(A = B|a i b j ) is the probability that Alice and Bob measure the same outcome given joint settings (a i , b j ). While the localrealist bound is 2, the quantum bound is 2 √ 2, and the logical (algebraic) bound is 4. Our run 1 data yields S exp = 2.425, while for run 2, we observe S exp = 2.502. Both runs therefore violate the corresponding local-realist bound. See Fig. 4 and Table I. Our analysis must further consider that some experimental trials will have "corrupt" settings triggered not by genuine stellar photons, but by atmospheric airglow, thermal dark counts, errant dichroic mirror reflections, or other noise in our detectors. Since these events originate very recently in the experiments' past light cone, settings chosen with them are no better (and no worse) than settings chosen with conventional random number generators.
To constrain the fraction of experimental runs we can tolerate in which either or both sides were triggered by a corrupt event, we conservatively assume that any such events can produce maximal CHSH correlations (S = 4) [68], whereas settings triggered by correctly identified stellar photons are assumed to obey local realism (S ≤ 2). An optimally efficient local hidden-variable model would only need to use individual corrupt photons on a single side to achieve S = 4, without needing to "waste" simultaneous corruptions or events in which changing the path of a stellar photon through the setting reader is unnecessary. We therefore use this maximally conservative model as our null hypothesis.
To calculate the statistical significance of our results, we account for background events and errant dichroic mirror reflections as well as differences in the measured total and noise rates for the red/blue dichroic ports on each side, which yield unequal (biased) frequencies for various combinations of detector settings a i b j . Moreover, whereas we assume fair sampling (for both entangled and stellar photon detections) and fair coincidences for entangled photons [65], we adopt the conservative assumption that the local hidden-variable model could retain "memory" of settings and outcomes of previous trials [69][70][71]. As detailed in the Supplemental Material [57], we find that the measured fractions of corrupt coincidences are sufficiently low that the probability that a local hidden-variable model could explain the observed violations of Bell's inequality is p ≤ 1.78×10 −13 for experimental run 1, and p ≤ 3.96 × 10 −33 for run 2. These correspond to experimental violations of the CHSH bound by at least 7.31 and 11.93 standard deviations, respectively.
Conclusions.-For both runs, we assume fair sampling, close the locality loophole and, for the first time, explicitly constrain freedom of choice with astronomically chosen settings, relegating any local-realist models to have acted no more recently than 604±35 and 577±40 years ago, for runs 1 and 2, respectively. Therefore, any hidden-variable mechanism exploiting the freedom-ofchoice loophole would need to have been enacted prior to Gutenberg's invention of the printing press, which it- For run 1, bars shows the correlation E i j for each joint setting combination (a i , b j ). CHSH terms are displayed with negative signs (red) and a positive sign (green), showing that our data violate the local-realist bound (dashed line, S = 2). The |E i j | values are unequal due to limited state visibility and imperfect alignment of the polarization measurement bases.
self pre-dates the publication of Newton's Principia by two and a half centuries. While a Bell test like ours only constrains a local-realist mechanism to have acted no later than the most recent of the astronomical emission events, we note that any process that requires both emission events to have been influenced by the same common cause would be relegated to even earlier times, to when the past light cones from each emission event intersected 2409 ± 598 and 4040 ± 1363 years ago, for runs 1 and 2, respectively [72]. This work thus represents the first experiment to dramatically limit the space-time region in which hidden variables could be relevant, paving the way for future ground-and space-based tests with distant galaxies, quasars, patches of the cosmic microwave background (CMB), or other more exotic sources such as neutrinos and gravitational waves. Such tests could progressively push any viable hidden-variable models further back into deep cosmic history [1], billions of years in the case of quasars, back to the early universe in the case of the CMB, or even, in the case of primordial gravitational waves, further back to any period of inflation preceding the conventional big bang model [73][74][75][76][77].

Supplemental Material Cosmic Bell Test: Measurement Settings from Milky Way Stars Causal Alignment
Compared to a standard Bell test, the time-dependent locations of the stars on the sky relative to our groundbased experimental sites complicate the enforcement of the space-like separation conditions needed to address both the locality and freedom-of-choice loopholes. For example, the photon from star A must be received by Alice's stellar photon receiving telescope (Rx-SP) before that photon's causal wave front reaches either the Rx-SP or the entangled photon receiving telescope (Rx-EP) on Bob's side, and vice versa.
To compute the time-dependent durations τ k valid (t) (for k = {A, B}) that settings chosen by astronomical photons remain valid, we adopt a coordinate system with the center of the Earth as the origin. The validity times on each side due to the geometric configuration of the stars and ground-based sites are then given by where r A is the spatial 3-vector for the location of Alice's Rx-SP, m A is the spatial 3-vector for Alice's Rx-EP (and likewise for r B and m B on Bob's side), s is the location of the entangled photon source, and c is the speed of light in vacuum. The time-dependent unit vectorsn S A (t),n S B (t) point toward the relevant stars, and are computed using astronomical ephemeris calculations. Additionally, n is the index of refraction of air and η A , η B parametrize the group velocity delay through fiber optics / electrical cables connecting the telescope and entangled photon detectors. To compute τ k valid (t), we make the reasonable approximation that the Rx-SP and Rx-EP are at the same spatial location on each side, such that r A = m A and r B = m B , and the computations require the GPS coordinates of only 3 input sites (see Table II). This assumes negligible delays from fiber and electrical cables via the η A , η B terms. Negative validity times τ k valid (t) for either side would indicate an instantaneous configuration that was out of "causal alignment," in which at least one setting would be invalid for the purposes of closing the locality loophole. For runs 1 and 2, τ k valid (t) > 0 for the entire duration of 179 s, with minimum times in Table I.
We subtract the time it takes to implement a setting with the electro-optical modulator, τ set ≈ 170 ns, and subtract additional conservative buffer margins τ k buffer (0.38 µs for Alice and 1.76 µs for Bob) to determine the minimum time windows τ k used in Eq. (3) utilized during the experiment (see Table III): where τ set includes the total delays on either side due to reflections inside the telescope optics, the SPAD detector response, and electronic readout on the astronomical receiver telescope side as well as the time to switch the Pockels cell and electronically use the FPGA board to output a random number. The next section conservatively estimates τ atm ≈ 18 ns for the delay due to the index of refraction of the atmosphere for either observer. While τ atm is not explicitly considered in Eq. (3), it is well within the buffer margins, since τ k buffer τ atm , which also encompass any small inaccuracies in the timing or distances between the experimental sites.
Although τ k valid (t) depends on time, motivating our use ofτ k valid ≡ min t {τ k valid (t)} when computing τ k used , the actual values of τ k valid changed very little during our observing windows. For the stars used in experimental run 1, ∆τ A valid = 2.96 ns and ∆τ B valid = 17.26 ns; for experimental run 2, ∆τ A valid = 18.97 ns and ∆τ B valid = 17.27 ns. The largest of these differences represents less than 1% of the relevantτ k valid . By ensuring that the Pockels cell switched if it had not been triggered by a fresh setting within the last τ A used = 2 µs for Alice and τ B used = 5 µs for Bob, we only record and analyze coincidence detections for entangled photons obtained while the settings on both sides remain valid.

Atmospheric Delay
The air in the atmosphere causes a relative delay between the causal light cone, which expands outward at speed c, and the photon, which travels at c/n, where n is the index of refraction of air. We estimate this effect by computing the light's travel time through the atmosphere on the way to the observer. If the atmosphere has index of refraction n and scale height z 0 , the delay time is where X is the airmass. The minimum elevation of each stellar photon receiving telescope is h = 200 m above sea level, and the minimum altitude angle is δ = 24 • above the horizon with airmass X ≈ 2.5 (see Tables II-III). Neglecting the earth's curvature (which is a conservative approximation), we use z 0 = 8.0 km and the index of refraction at sea level of n − 1 ≈ 2.7 × 10 −4 [62]. The  delay between the arrivals of the causal light cone and the photon itself may be conservatively estimated to be ∆t = 17.6 ns due to the atmosphere.

Source Selection
We used custom Python software to select candidate stars from the Hipparcos catalogue [63,64] with parallax distances greater than 500 ly, distance errors less than 50%, and Hipparcos H p magnitude between 5 and 9 to avoid detector saturation and ensure sufficient detection rates. Telescopes pointed out of open windows at both sites (see Table II). A list of ∼100-200 candidate stars were pre-selected per side for each night due to the highly restrictive azimuth/altitude limits. Candidate stars were visible through the open windows for ∼ 20-50 minutes on each side.
Due to weather, seeing conditions, and the uncertain-ties in aligning the transmitting and receiving telescope optics for the entangled photon source, it was not possible to pre-select specific star pairs for each experimental run at a predetermined time. Instead, when conditions were stable, we selected the best star pairs from our pre-computed candidate lists that were currently visible through both open windows, ranking stars based on brightness, distance, the amount of time each would remain visible, the settings validity time, and the airmass at the time of observation. The 4 bright stars we actually observed for runs 1 and 2 were ∼5-6 mag (see Table III). Combined with the geometric configuration of the sites (see Table II), selection of these stars ensured sufficient setting validity times on both sides during each experimental run of 179 seconds.
FIG. 6. 2+1 D space-time diagrams with past light cones for stellar emission events A and B for experimental run 1 (left) and run 2 (right). The time axis begins 5000 years before event O on Earth today, with 2 spatial dimensions in the x-y plane and the third suppressed. The stellar pair's angular separation on the sky is the angle between the red and blue vectors. Our data rule out local-realist models with hidden variables in the gray space-time regions. We do not rule out models with hidden variables in the past light cones for events A (blue), B (red), or their overlap (purple). Given the earlier emission time B for run 2, that run excludes models with hidden variables in a larger space-time region than run 1 (modulo the parallax distance errors in Table III Table II. Azimuthal directions of stars observed during runs 1 and 2 are shown (see Table III). The red line denotes the projected spatial axis for the 1 + 1 D space-time diagrams in Figure 5. (Background graphic taken from Google Maps, 2016.)

Lookback Times
For stars within our own galaxy, the lookback time t k to a stellar emission event from a star d k light years away is t k = d k years. For example, Hipparcos Star HIP 2876 is located d B = 3624 light years (ly) from Earth, and its photons were therefore emitted t B = 3624 years prior to us observing them (see Table III). The lookback time t E k to when the past light cone of a stellar emission event from star k intersects Earth's worldline is t E k = 2d k years.
The lookback time to the past light cone intersection event t AB (in years) for a pair of Hipparcos stars is [72] where d A , d B are the distances to the stars (in ly) and α is the angular separation (in radians) of the stars, as seen from Earth. See the lower left panel of Fig. 5. Ignoring any covariance between d A , d B , and α, and assuming the error on α (σ α ) is negligible compared to the distance errors (σ d i ), the 1σ lookback time error is approximately given by

Experimental Details
The entangled photon source was based on type-II spontaneous parametric down conversion (SPDC) in a periodically poled KTiOPO 4 (ppKTP) crystal with 25 mm length. Using a laser at 405 nm, the ppKTP crystal was bi-directionally pumped inside a polarization Sagnac interferometer generating degenerate polarization entangled photon pairs at 810 nm. We checked the performance of the SPDC source via local measurements at the beginning of each observation night. Singles and coincidence rates of approximately 1.  (2) is the minimum time that detector settings are valid for side k = {A, B} during each experimental run, before subtracting delays and safety margins (see Eqs. (2)-(3) and main text Fig. 2). Star pairs for runs 1 and 2 have angular separations α of 119 • and 112 • on the sky, with past light cone intersection events occurring 2409 ± 598 and 4040 ± 1363 years ago, respectively. The last 4 columns show the number of double coincidence trials, the measured CHSH parameter S exp , as well as the p-value and number of standard deviations ν by which the null hypothesis may be rejected, based on the Method 3 analysis, below. 275 kHz, respectively, correspond to a local coupling efficiency (i.e., coincidence rate divided by singles rate) of roughly 25%. In run 1 (run 2), the duty cycle of Alice's and Bob's measurements -i.e., the temporal sum of used valid setting intervals divided by the total measurement time per run -were 24.9% (22.0%) and 40.6% (44.6%), respectively, resulting in a duty-cycle for valid coincidence detections between Alice and Bob of 10.1% (9.8%). From the measured 136 332 (88 779) total valid coincidence detections per run, we can thus infer the total two-photon attenuation through the quantum channels to Alice and Bob of 15.3 dB (16.8 dB).

Quality of Setting Reader
The value of the observed CHSH violation is highly sensitive to the fraction of generated settings which were in principle "predictable" by a local hidden-variable model. For this reason, it is important to have a highfidelity spectral model of the setting generation process. In our analysis, we conservatively assume that local noise and incorrectly generated settings are completely predictable and exploitable. An incorrectly generated setting is a red photon that generates a blue setting (or vice versa) by ending up at the wrong SPAD.
In this section we compute the fractions of incorrectly generated settings f r→b and f b→r . For example, f r→b is the conditional probability that a red photon goes the wrong way in the dichroic and ends up detected as a blue photon, generating the wrong setting. These fractions are highly sensitive to the transmission and reflection spectra of the two dichroic mirrors in each setting generator. They are somewhat less dependent on the spectral distribution of photons emitted by the astronomical source, on absorption and scattering in the Earth's atmosphere, the anti-reflection coatings on the optics, and the SPAD quantum efficiencies.
A system of dichroic beamsplitters which generates measurement settings from photon wavelengths can be modeled by two functions ρ red (λ) and ρ blue (λ), the probability of transmission to the red and blue arms as a function of photon wavelength λ. Ideally, photons with wavelength λ longer than some cutoff λ would not arrive at the blue arm: ρ blue (λ) = 0 for λ > λ . Similarly, ρ red (λ) = 0 for λ ≤ λ would ensure that blue photons do not arrive at the red arm. Due to imperfect dichroic beamsplitters, however, it is impossible to achieve ρ blue (λ > λ ) = 0 and ρ red (λ ≤ λ ) = 0.
The total number of blue settings generated by errant red photons can be computed as where N in (λ) is the spectral distribution of the stellar photons remaining after losses due to the atmosphere, anti-reflection coatings, and detector quantum efficiency.
Then the fraction f r→b can be computed by normalizing .
We may then compute the ρ's from measured dichroic mirror transmission and reflection curves and model N in (λ). Finally, it is important to note that our redblue color scheme is parametrized by the arbitrary cutoff wavelength λ . We may choose λ to minimize the overall fraction of wrong settings, λ = arg min N r→b + N b→r N r→r + N r→b + N b→r + N b→b .
For the four stars in the two observing runs, and the model of N in (λ) described in the next section, the wrongway fractions are tabulated in Table IV. One typical analysis is illustrated in Fig. 8.

Characterizing Dichroics
Our setting reader uses a system of one shortpass (s) (Thorlabs M 254H45) and one longpass (l) (Thorlabs M 254C45) beamsplitter with transmission (T ) and reflection (R) probabilities plotted in Fig. 8C. We choose to place the longpass beamsplitter in the reflected arm of the shortpass beamsplitter, instead of the other way around, to minimize the overall wrong-way fraction as written in Eq. (9). With this arrangement, ρ blue (λ) = ρ T,s (λ) ∼ 10 −3 for red wavelengths and ρ red (λ) = ρ R,s (λ)ρ T,l (λ) ∼ 10 −3 for blue wavelengths. The transmission/reflection spectra of both dichroic mirrors and of the blue/red arms are plotted in Fig. 8C.

Modeling the number distribution of photons
In this section, we describe our model of N in (λ), which covers the wavelength range 350 nm-1150 nm. We start with the stellar spectra, which can be modeled as blackbodies with characteristic temperatures taken from the Hipparcos catalogue [63,64]. We then apply corrections for the atmospheric transmission ρ atm (λ), two layers of anti-reflection coatings in each arm ρ lens (λ), a silvered mirror ρ mirror , and finally the detector's quantum efficiency ρ det (λ) as the photon makes its way through the setting reader.

Stellar Spectra
As discussed in the main text, the stars were selected on the basis of their brightness, with temperatures ranging from 3150 K-7600 K. To a very good approximation, the photons emitted by the stars follow a blackbody distribution, which we assume is largely unaltered by the interstellar medium as the light travels towards earth: This blackbody distribution is used as a starting point for N in (λ), to which modifications will be made. The blackbody distributions for the Run 1 stars are shown in Fig. 8A.

Atmospheric Absorbance
We generate an atmospheric transmittance spectrum with the MODTRAN model for mid-latitude atmospheres looking towards zenith [61]. To correct for the observation airmass (up to X = 2.5), we use optical densities from [58] to compute the atmospheric transmission efficiency, which is due mostly to broadband Rayleigh scattering. A more sophisticated model could also compute modified absorption lines at higher airmasses, but the effect on the wrong-way fractions f r→b , f b→r is negligible compared to the spectral change resulting from Rayleigh scattering.

Lenses and Detectors
In the experimental setup, one achromatic lens in each arm collimates the incident beam of stellar photons. The collimated beam reflects off a silver mirror and is focused by a second lens onto the active area of the SPADs. These elements are appropriately coated in the range Our maximally conservative model of the anti-reflection coatings in the two lenses, the silver mirror, and detector quantum efficiency curves as a function of photon wavelength λ. (C) Which-way probabilities as a function of λ due to the dichroic beamsplitters. Note that the addition of the longpass beamsplitter makes ρ red (λ) exceptionally flat, i.e. very good at rejecting blue photons. (D) Color distribution of photons seen at each arm are plotted, i.e. N in ρ red and N in ρ blue . The curves are normalized so that the total area under the sum of both curves is 1. The color scheme's cutoff wavelength λ is depicted by the shading color, and for this star is about λ ∼703.2 nm. Note that some of the photons arriving at each arm are classified as the wrong color (overlap of red and blue arm spectra), no matter which λ is chosen.
from 500 nm-1500 nm for minimum losses. However, not all photons are transmitted through the two lenses and the mirror. Each component has a wavelength-dependent probability of transmission that is close to unity for most of the nominal range, as plotted in Fig. 9. Once the focused light is incident on the SPAD, it will actually detect the photon with some wavelength-dependent quantum efficiency. The cumulative effect of these components on the incident spectrum is shown in Fig. 8B.   TABLE IV. For each star, we compute the fraction of wrongway photons f and the atmospheric efficiency: the fraction of stellar photons directed towards our telescopes which end up generating measurement settings (as opposed to those which do not due to telluric absorption or detector inefficiencies). We adopt the notation f 1→2 and f 2→1 to allow easier indexing of the labels for the "red" and "blue" settings ports as applied to each run. Spectral model assumptions for other optical elements shift the f values upwards by no more than 10%, assuming that any uncertainties due to the atmospheric model or real-time atmospheric variability are negligible during each ∼ 3 minute experimental run. We therefore assume conservative values σ f / f = 0.1 in the following sections.

Data Analysis
In this section we analyze the data from the two experimental runs. We make the assumptions of fair sampling and fair coincidences [65]. Thus, for testing local realism, all data can be postselected to coincidence events between Alice's and Bob's measurement stations. These coincidences were identified using a time window of 2.5 ns.
We can define the number of all coincidences for setting combination a i b j , and the total number of all recorded coincidences, A point estimate gives the joint setting choice probabilities We first test whether the probabilities q i j can be factorized, i.e., that they can be (approximately) written as Otherwise, there could be a common cause and the setting choices would not be independent. We define p(a i ) ≡ (N i1 + N i2 )/N and p(b j ) ≡ (N 1 j + N 2 j )/N. This leads to the following values for the individual setting probabilities for experimental run 1: p(a 1 ) = 0.6193, p(a 2 ) = 0.3807, Pearson's χ 2 -test for independence, q i j = p i j , yields 132. This implies that, under the assumption of independent setting choices (17), there is a purely statistical chance of 0.287 that the observed data q i j (or data even more deviating) are obtained. This probability is much larger than any typically used threshold for statistical significance. Hence, this test does not allow a refutation of the assumption of independent setting choices. For run 2, we estimate p(a 1 ) = 0.7333, p(a 2 ) = 0.2667, p(b 1 ) = 0.4854, and p(b 2 ) = 0.5146, with χ 2 = 1.158 and statistical chance 0.282. We next estimate the conditional probabilities for correlated outcomes in which both parties observe the same result: The Clauser-Horne-Shimony-Holt (CHSH) inequality [67] can be written as While the local-realist bound is 0, the quantum bound is √ 2−1 = 0.414, and the logical (algebraic) bound is 1. With our data, the CHSH values are C = 0.2125 for run 1, and C = 0.2509 for run 2, in each case violating the local-realist bound of zero. See Fig. 10. The widely known CHSH expression in terms of correlation functions, S ≡ |E 11 for run 1 and S = 2.502 for run 2, violating the corresponding local-realist bound of 2.

Predictability of Settings
We need to take into account two sources of imperfections in the experiment that can lead to an excess predictability [7] of the setting choices. The excess predictability quantifies the fraction of runs in whichgiven all possible knowledge about the setting generation process that can be available at the emission event of the particle pairs and thus at the distant measurement events -one could predict a specific setting better than what would simply be inferred from the overall bias of the setting choices. Loosely speaking, quantifies the fraction Run Side r a 1 , r b 1 r a 2 , r b 2 n a 1 , n b 1 n a 2 , n b 2 ∆t n k  V. For runs 1 and 2, r k i and n k i are the total and noise rates in Hz for observer k = {a, b} and setting port i = {1, 2}. We use Poisson process standard deviations σ r k i ≈ r k i /∆t r k , and σ n k i ≈ n k i /∆t n k , to estimate total and noise rate uncertainties (rounded up to the nearest integer). ∆t r k = 179 s is the duration of both runs 1 and 2 used to measure the total rate r k i for both observers. ∆t n k are the durations of control measurements to obtain the noise rates n k i for Alice and Bob in each run. Different surface temperatures and apparent magnitudes of the stars result in different emitted spectra and thus in different count rates for run 1 and 2.
of runs in which the locality and freedom-of-choice assumptions fail. The first source of imperfection is that not all of Alice's and Bob's settings were generated by photons from the two distant stars but were due to other, much closer "noise" sources. The total rates of photons in the respective setting generation ports for runs 1 and 2 are listed in Table V. Note that if one calculated p(a i ) as r a i /(r a 1 +r a 2 ) and analogously for b j , the numbers would be slightly different than the numbers in Eq. (17) inferred from the coincidences. The reason is that the average duration for which a setting was valid depended slightly on the setting itself. The overall setting validity times for the whole runtime of the experiment match the numbers in Eq. (17) very well.
A control measurement, pointing the receiving telescopes marginally away from the stars, yielded the noise rates listed in Table V. In the most conservative case, one would assume that all noise photons were under the control of a local hidden-variable model. Thus, their contribution to the predictability of setting a 1 (a 2 ) would be given by the ratio of noise rate to total rate, n a 1 /r a 1 = 0.017 (n a 2 /r a 2 = 0.034) for run 1. Similarly, the noise contribution to the predictability for b 1 (b 2 ) is given by The second source of imperfection is that a certain fraction of stellar photons leaves the dichroic mirror in the wrong output port. We index the wrong-way fractions f i →i as defined in Table IV with i → i denoting either 1 → 2 or 2 → 1.
With (A) and (B) denoting Alice and Bob, we can write Here s (A) i (s (B) j ) is the detected rate of stellar photons at Alice (Bob) which have a color that, when correctly identified, leads to the setting choice a i (b j ). Each rate in Eq. (20) is a sum of three terms: correctly identified stellar photons, incorrectly identified stellar photons that should have led to the other setting, and the noise rate. The four expressions in Eq. (20) allow us to find the four rates s (A) i and s (B) j as functions of the f parameters. We now want to quantify the setting predictability due to the dichroic mirror errors. We imagine a hiddenvariable model with arbitrary local power with the following restrictions: It cannot use non-detections to its advantage, and it can only alter at most certain fractions of the incoming stellar photons, which are quantified by the dichroic mirror error probabilities. We first focus only on Alice's side. We assume that in a certain fraction of runs the local-realist model 'attacks' by enforcing a specific setting value and choosing hidden variables that optimize the measurement results to maximize the Bell violation. This could for instance happen with a hidden (slower than light) signal from the source to Alice's dichroic mirror. Let us assume that q a i is the fraction of runs in which the model decides to generate setting a i . If the incoming stellar photon would, under correct identification, have led to setting a i , this 'overruling' gets reflected in the dichroic mirror error probability f (A) i →i . In fact, we can equate q a i = f (A) i →i , as the commitment to enforce setting a i to occur, independent of knowledge of the incoming photon's wavelength. Thus, the probability to enforce a i , q a i , is identical to the conditional probability f (A) i →i that a i is enforced although a i would have been generated otherwise. The predictability from this 'overruling' is quantified by f (A) i →i s (A) i /r a i , i.e. the fraction of a i settings which stem from stellar photons that should have led to setting a i .
On the other hand, if the incoming stellar photon would have led to setting a i anyway, there is no visible 'overruling' and the attack remains hidden, while the model still produces outcomes that maximize the Bell violation. The predictability from this is quantified by i →i s (A) i /r a i , i.e. the fraction of a i settings for which no attack was actually necessary to maximize Bell violation.
Everything is analogous for Bob. In total, we can add up the different contributions-noise photons and dichroic mirror wrong-way fractions-and obtain the excess predictabilities with the total detected stellar photon rates s (A) ≡ s (A) 1 +s (A)   2 and s (B) ≡ s (B) 1 + s (B) 2 . Note that s (A) = i (r a i − n a i ) and s (B) = j (r b j − n b j ), such that the total star counts from both ports for Alice or Bob are themselves independent of any f parameters, since all detected stellar photons must either go to the correct or incorrect port.   (26). We use values and errors on the total and noise rates from Table V along with 10% fractional uncertainties on the dichroic mirror wrong-way fractions in Table IV. For both runs, Eqs. (23) and (27) yield ± σ = 21 ± σ 21 .
One final source that can contribute to the excess predictability concerns the physical response of the setting readers: after one of the detectors clicks with a certain setting (for example, upon detecting a red stellar photon), that detector becomes "blind" for a dead time of approximately 500 ns, after which its quantum efficiency recovers to the original value. During this dead / recovery time, the 'blue' detector is more likely to click. Such situations would yield an excess predictability, over and above the likelihood that a hidden-variable mechanism might discern from the biased settings frequencies (unequal q i j ) or the other sources of noise and errors in the settings readers (nonzero a i , b j ).
To address this additional predictability from the dead/recovery time of the setting readers, we introduced an additional, artificial "dead time" for the 'blue' detector, after the corresponding 'red' detector had clicked (and vice versa). We optimized the window τ cut for each detector by analyzing data from our calibration measurements with the various astronomical sources, conducted before each experimental run. Then we post-selected (and deleted) all measurement coincidences from our Bell-test data that had a 'blue' click within τ cut of a 'red' click (and vice versa), consistent with the assumption of "fair sampling" and "fair coincidences." By finding optimal values of τ cut for each detector and each experimental run, we may reduce this additional, "dead-time" predictability to an arbitrarily small amount. The effect is to remove any additional correlations between neighboring setting-detector 'clicks,' beyond what would be inferred from the measured bias and predictability.
In the worst case, the predictable setting events do not happen simultaneously on both sides but fully add up. Hence, the fraction of predictable coincidences within the ensemble of setting combination a i b j is at most If this number is larger than 1, i j is set to 1.
We conservatively assume that all predictable events are maximally exploited by a local hidden-variable model. Then, in fact, the largest of the four fractions, i.e., can be reached for the CHSH expression C.
To make this clear, let us consider the simple hiddenvariable model in which the outcome values are always A 1 = −1, A 2 = +1, B 1 = +1, B 2 = +1, with subscripts indicating the respective setting. The first two probabilities in Eq. (19) are each 0 (only anticorrelations), the last two are each 1 (only correlations), and C = 0. Now, if in a fraction 21 of all coincidence events with setting combination a 2 b 1 there is setting information of one party available at the source or the distant measurement event, then that latter measurement outcome can be 'reprogrammed' to produce an anticorrelation. Hence, we have p(A = B|a 2 b 1 ) = 0 in that 21 subensemble, and p(A = B|a 2 b 1 ) = 1 − 21 in total. This leads to C = 21 . Similar examples can be constructed for the other fractions. The predictabilities i j thus require us to adapt the CHSH inequality of Eq. (19) to (see Ref. [7]) The dichroic mirror errors were characterized, taking into account the spectra of the stars and all optical elements. Using the values for f (A,B) i→i in Table IV and the total and noise rates from Table V yields a predictability of = 0.1779 for run 1, such that our observed value C = 0.2125 still represents a violation of the adapted inequality of Eq. (24). Likewise for run 2, we find = 0.1609, again yielding C = 0.2509 > . See Table VI.

Uncertainty on the Settings Predictability
We temporarily drop the labels for Alice and Bob. Assuming that the rates r i , n i , and the f parameters are independent (which follows from our assumption of fair sampling for all detected photons), error propagation of Eq. (21) yields an uncertainty estimate for i given by where we note that s = r 1 + r 2 − n 1 − n 2 . Eq. (25) holds for Alice or Bob by applying appropriate labels. If we assume Alice and Bob's predictability contributions from Eq. (22) are independent, we find with an estimated uncertainty on from Eq. (23) of where σ max i a i is the uncertainty from Eq. (25) on the term which maximizes a i , and likewise for Bob. For both runs 1 and 2, assuming values and errors on the total and noise rates from Table V, wrong-way fractions  f from Table IV with conservative fractional errors of σ f / f = 0.1, Table VI shows values of i j from Eqs. (21)- (22), σ i j from Eqs. (25)- (26), and σ from Eq. (27).

Statistical significance
There exist several different ways to estimate the statistical significance for experimental runs 1 and 2. The result of any such statistical analysis is a p-value, i.e., a bound for the probability that the null hypothesis -local realism with predictability, biased detector-setting frequencies, fair sampling, fair coincidences, and any other additional assumptions -could have produced the experimentally observed data by a random variation.
Until recently, it was typical in the literature on such Bell tests to estimate a p-value under several assumptions (e.g., [38]): that each trial was independent and identically distributed (i.i.d.), and that the hidden-variable mechanism could not make any use of "memory" of the settings and outcomes of previous trials. Under those assumptions, one typically applied Poisson statistics for single coincidence counts, and assumed that the underlying statistical distribution was Gaussian. Moreover, it was typical to neglect the excess predictability, . Applied to our experimental data, such methods yield what we consider to be overly optimistic estimates, suggesting violation of the CHSH inequality by ν ≥ 39.8 and 42.7 standard deviations for runs 1 and 2, respectively.
However, such an approach assumes that the measured coincidence counts N AB i j are equal to their expected values, but then contradicts this assumption by calculating the probability that the N AB i j could have values differing by several standard deviations from their expected values. Plus, as recent work has emphasized (e.g., [7]), excess predictability must be taken into account when estimating statistical significance for any violations of the CHSH inequality.
More recently, several authors have produced improved methods for calculating p-values for Bell tests. These newer approaches do not assume i.i.d. trials, and also, more conservatively, allow the hidden-variable model to exploit "memory" of previous settings and outcomes. Whereas the "memory loophole" cannot achieve Bell violation, incorporating possible memory effects does require modified calculations of statistical significance [7,59,[69][70][71].
Although these new works represent a clear advance in the literature, unfortunately they are not optimized for use with our particular experiment. For example, the unequal settings probabilities (bias) for our experiment limit the utility of the bounds derived in [59,71], as the resulting p-values are close to 1. Likewise, one may follow the approach of [7,69,70] and use the Hoeffding in-equality [60]. However, it is known in general that such bounds routinely overestimate p -and hence underestimate the genuine statistical significance of a given experiment -by a substantial amount (see, e.g., [59]).
Therefore, in this section we present an ab initio calculation of the p-value tailored more specifically to our experiment. This method yields what we consider to be reasonable upper bounds on the p-values, which are still highly significant even with what we regard as a conservative set of assumptions. Our calculation incorporates predictability of settings and allows the local-realist hidden-variable theory to exploit memory of previous detector settings and measurement outcomes. We present essential steps in the calculation here, and defer fuller discussion to future work.
We consider a quantity W, which is a weighted measure of the number of "wins," that is, the number of measurement outcomes that contribute positively to the CHSH quantity C, defined in Eq. (19). A win consists of A = B for settings pair a 2 b 2 , and A B for any other combination of settings. Thus we define N win where q i j ≡ N i j /N is the fraction of trials in which settings combination i j occurs, and i j , defined in Eq. (22), is the probability that a given trial will be "corrupt." A trial is considered "corrupt" if it (1) involved a noise (rather than stellar) photon, or (2) involved a dichroic mirror error, or (3) was previewed by the hidden-variable theory for the purpose of considering a dichroic mirror error, but was passed over because the stellar photon already had the desired color. The occurrence of a corruption in any trial is taken to be an independent random event, which has probability i j that depends on the settings pair a i b j . We assume that for "uncorrupt" trials, the hidden-variable theory has no information about what the settings pair will be beyond the probabilities q i j . We assume that the hidden-variable theory can exploit each corrupted trial and turn it into a win. We further assume that the occurrence of these corrupt events cannot be influenced by either the experimenter or the hiddenvariable theory; they occur with uniform probability i j in each trial. We consider the probabilities i j to be known (to within some uncertainty σ i j ), but the actual number of corrupt trials to be subject to statistical fluctuations.
The p-value is the probability that a local-realist hidden-variable theory, using its best possible strategy, could obtain a value of W as large as the observed value. To define this precisely, we must be clear about the ensemble that we are using to define probabilities. It is common to attempt to describe the ensemble of all experiments with the same physical setup and the same number of trials. Yet it is difficult to do this in a precise way, because one has to use the statistics of settings choices observed in the experiment to determine the probabilities for the various settings. From a Bayesian point of view, this requires the assumption of a prior probability distribution on settings probabilities, and the answers one finds for p would depend on what priors one assumes.
We avoid such issues by considering the actual number N i j of the occurrences of each settings choice a i b j as given. The relevant ensemble is then the ensemble of all possible orders in which the settings choices could have occurred. The p-value will then be the fraction of orderings for which the hidden-variable theory, using its best strategy, could obtain a value of W greater than or equal to the value obtained in the experiment.
We may motivate the form of W in Eq. (28) as follows. In the absence of noise or errors, the hidden-variable model could specify which outcomes (A, B) will arise for each of the possible settings (i, j). The best plans will win for three of the four possible settings pairs, but will lose for one of the possible settings pairs. Hence a plan may be fully specified by identifying which settings pair will be the loser. (There will actually be two detailed plans for such a specification, related by a reversal of all outcomes, but we may treat such plans as equivalent.) In the presence of noise and errors, for each time the settings pair is a i b j , there is a probability i j that the trial is corrupt. If the trial is corrupt, it automatically registers as a win. If it is not corrupt, then it has a probability P win i j of registering as a win, where we take P win i j to be p(A = B|a i b j ) for (i j) = (22), and p(A B|a i b j ) for the other three cases. Then we may write which may be solved for P win i j : The CHSH inequality may be written i j P win where we have defined The lefthand side of Eq. (31) motivates our ansatz for W in Eq. (28).
The function W, which is a random variable, may be expressed in terms of a set of more elementary random variables. We label the trials by α, so for each trial α there will be a set of random variables: with G α + U α = 1. We also define the functions 1 if the settings pair a i b j in trial α is a win 0 otherwisē 1 if the settings pair a i b j in trial α is a loss 0 otherwise (34) with ω α i j +ω α i j = 1. Unlike the variables in Eq. (33), ω α i j andω α i j are not random; they are under the control of the hidden-variable mechanism. The square of each of the quantities in Eqs. (33) and (34) is equal to itself, since their only possible values are 0 and 1.
Our goal is to evaluate σ 2 W = W 2 − W 2 . We begin by calculating W = α W α . In terms of the quantities in Eqs. (33) and (34), we may write Since the settings are chosen randomly on each trial, we assume that all orderings of the setting choices are equally likely, and are independent of the occurrence of corruptions. This implies that F α i j U α = q i j (1 − i j ) and F α i j G α = q i j i j , independent of α. Then we find and hence To evaluate W 2 we write For the first term, we have where the second line follows upon noting that F α i j F α k = 0 if (i j) (k ), and using (F α i j ) 2 = F α i j , (ω α i j ) 2 = ω α i j . We therefore find where we have defined f i j as the fraction of trials for which the hidden-variable theory chooses (i j) to be the losing settings pair.
For the second term on the righthand side of Eq. (38), we have where δ i j,k = 1 if (i j) = (k ) and 0 otherwise. (We have used the fact that for each of the N i j values of α for which F α i j = 1, there are N i j −1 values of β α for which F β i j = 1.) To further simplify Eq. (41), we first assume that the hidden-variable theory cannot exploit memory of previous settings or outcomes. In that case, we may neglect correlations between F α i j and ω β k , and perform a full ensemble average. (We will relax this assumption below.) Proceeding as above, the term T 1 may then be rewritten where we have made use of the fact that i j 1/(1 − i j ) = i j (1 − i j )/(1 − i j ) + i j i j /(1 − i j ) = 4 +¯ . For the term T 2 , we note that Then T 2 may be rewritten Following some straightforward algebra, Eqs. (40), (42), and (44) yield . (45) The f i j are under the control of the hidden-variable theory, so we make the conservative assumption that the hidden-variable theory may choose the f i j so as to maximize σ W . To impose the constraint that i j f i j = 1, we introduce a Lagrange multiplier λ: Setting ∂L/∂ f i j = 0, we find the optimum values f opt i j (λ). By inserting these into the normalization condition i j f opt i j = 1, we may solve for λ, which in turn yields Inserting f opt i j into Eq. (45) for σ 2 W , we find For run 1, Eq. (47) yields an unphysical f 12 < 0 for our data. Upon employing a second Lagrange multiplier to ensure both that i j f i j = 1 and f 12 ≥ 0, we find Using the values for total and noise rates (r, n) in Table V, dichroic mirror wrong-way fractions ( f ) in Table IV, values of q i j inferred from Eqs. (14) and (15) and the probabilities for corrupt trials i j in Table VI Next we take into account the uncertainty in the predictabilities a i and b j . The quantity of interest is ν = W − W σ and we recall from Eq. (22) that i j = a i + b j . We may now compute how σ a i and σ b j affect the statistical significance of each run. The naive number of standard deviationsν in Eq. (50) implicitly assumed σ a i , σ b j = 0, and therefore ∆ ν = 0. If we allow for an uncertainty in ν equal to n times the 1-σ uncertainty in ν, then we should calculate the p-value using ν n ≡ν − n∆ ν .
If we choose n so that n = ν n , then Assuming a Gaussian distribution for large-sample experiments, we conclude that the conditional probability that the hidden variable mechanism could achieve a value of W as large as the observed value W obs , assuming that the true value of ν ≥ ν n , is given by p cond = 1 2 erfc(ν n / √ 2). Since we chose n = ν n , if we assume Gaussian statistics for the uncertainty in ν, then there is an equal probability that the true value of ν is less than ν n , in which case our analysis does not apply, and we must conservatively assume that W might exceed W obs . Thus, the p-value corresponding to the total probability that W ≥ W obs is bounded by p = 2p cond . Again assuming Gaussian statistics, we can relate p to an equivalent ν, by p = 1 2 erfc(ν/ √ 2). Proceeding in this way, we find the values forν, ∆ ν , ν, and p listed in Table VIII.

Memory of Previous Trials
Next we consider possible memory effects. We define the quantityW ≡ W − (3 +¯ )N. Then W = 0, regardless of what plan the hidden-variable theory uses. On the other hand, the hidden-variable theory can affect the standard deviation ofW. If we denote byW 0 the value of W obtained in the experiment, then the p-value we seek is the probability that the hidden-variable theory could have achievedW ≥W 0 by chance. To discuss an experiment in progress, we definẽ which is the contribution toW after n trials. For sufficiently large N, we may assume that the probabilities are well approximated by a Gaussian probability distribution. Then we expect that as long asW n ≤W 0 , the best strategy for the hidden-variable theory is to maximize σW , so that the number of standard deviations to its goal is as small as possible. When and ifW n passes W 0 , on the other hand, then its best strategy is to minimize σW , so as to minimize the probability thatW might backslide toW ≤W 0 . We define N lose,i j as the number of trials for which the hidden-variable theory selects settings (i j) as the loser. Then we seek to estimate p left (n|N lose,i j ) ≡ p(W n < 0), under the assumption that the hidden-variable loser selection is given by N lose,i j . That is, p left (n|N lose,i j ) is the probability that after n trials, the net change inW has been to the left (i.e., negative). For large n, we expect the probability distribution forW to become a Gaussian with zero mean, so that p left (n|N lose,i j ) should approach 1/2, for any hidden-variable theory loser selection. For smaller n, however, p left (n|N lose,i j ) can reach some maximum value B > 1/2.
Finally, we define p 1 to be the probability thatW n ≥ W 0 for some n in the range 1 ≤ n ≤ N, under the assumption that the hidden-variable theory consistently makes choices that maximize σW . Consider some particular sequence of trials that contributes to p 1 , that is, a sequence for whichW n ≥W 0 for some n. The continuation of this sequence for the rest of the experiment (assuming that the hidden-variable theory continues to make choices that for Alice and R (B) ≡ η (B) − /η (B) + = 0.81 for Bob. The difference in R (A) and R (B) can fully be understood on the basis of the known efficiency differences of the detectors used. One can correct the counts in Eq. (11) for these efficiencies by multiplying all '+' counts of Alice (Bob) by √ R (A) ( √ R (B) ), and dividing all her (his) '−' counts by ). The corrected counts show no sign of a violation of no-signaling. This is also true for the data from run 2. We finally note that, due to the low total detection efficiencies, our experiment had to make the assumptions of fair sampling and fair coincidences. This implies that low or imbalanced detection efficiencies are not exploited by hidden-variable models.