Quantum-classical hypothesis tests in macroscopic matter-wave interferometry

We assess the most macroscopic matter-wave experiments to date as to the extent to which they probe the quantum-classical boundary by demonstrating interference of heavy molecules and cold atomic ensembles. To this end, we consider a rigorous Bayesian test protocol for a parametrized set of hypothetical modifications of quantum theory, including well-studied spontaneous collapse models, that destroy superpositions and reinstate macrorealism. The range of modification parameters ruled out by the measurement events quantifies the macroscopicity of a quantum experiment, while the shape of the posterior distribution resulting from the Bayesian update reveals how conclusive the data are at testing macrorealism. This protocol may serve as a guide for the design of future matter-wave experiments ever closer to truly macroscopic scales.


I. INTRODUCTION
Matter-wave interference is one of the key observations that validate quantum mechanics and challenge macrorealism [1] and our classical perception of everyday life.Attempts to explain the apparent absence of macroscopic superposition states, so-called Schrödinger cats, involve the many-world interpretation [2], decoherence theory [3][4][5], and gravitational [6,7] or spontaneous wavefunction collapse [8,9].At the same time, longstanding experimental efforts are pushing the frontiers of quantum mechanics to ever larger spatial [10,11], temporal [12], and mass [13,14] scales.Systematic methods to assess and compare the prospects of testing macrorealism with different experimental approaches could guide future developments and allocation of resources in the field.
From a theory perspective, macroscopic quantum phenomena are often associated to quantum states whose macroscopic character manifests as a high degree of delocalization and entanglement in abstract many-body Hilbert space [15].They are then gauged with functionals such as the Quantum Fisher Information [16], that return large values for states intuitively deemed macroscopic.This approach may lead to inconclusive or unintuitive results [17].As an alternative, one may adhere to an empirical notion of macroscopicity [18], which is based on how much the observation of quantum behavior constrains the hypothesis that quantum mechanics ceases to be valid on the macroscale.
A common approach to quantum hypothesis testing is to consider the binary propositions that pure quantum or classical mechanics are more likely to have produced the observed measurement outcome [19][20][21].In practice however, there are always unaccounted sources of noise and decoherence in the experiment so that both the quantum and the classical model are incomplete, and the measurement data will likely fit neither.One can alleviate this problem by instead considering a continuous hypothesis test against a set of minimal macrorealist modifications (MMM) of quantum mechanics [22].These models augment the Schrödinger equation by a parametrized stochastic process that destroys superpo-sitions above a certain size, time, and mass threshold, while preserving them on the microscopic scale and fulfilling minimal consistency requirements [23].Macroscopicity then measures to what extent such classicalizing models are ruled out by the experimental demonstration of quantum effects.The assessment of macroscopicity via a Bayesian hypothesis test refines the original formulation in Ref. [23] into an unambiguous definition.Thanks to Bayesian consistency [24,25], this agrees in the asymptotic limit of large amounts of data with consistent frequentist estimators based on expectation values.
The formal definition of MMM and the continuous hypothesis test is summarized in Sec.II.We then demonstrate how to employ this hypothesis test in the most relevant macroscopic matter-wave scenarios: near-field Talbot-Lau interferometry (Sec.III) and atomic Mach-Zehnder interferometry (Sec.IV).After evaluating the macroscopicities of current record holders, we give a natural criterion in Sec.V how to assess the amount of data needed to decisively rule out MMM, before concluding in Sec.VI.

II. EMPIRICAL MACROSCOPICITY
A hypothetical modification of quantum mechanics that restores macrorealism while adhering to the fundamental symmetry principles of the theory should fulfill several consistency requirements [23].On a coarsegrained timescale all its observable consequences are captured by a Markovian extension of the von Neumann equation, Here, the parameter τ e sets the overall strength of the MMM effect, as given by the associated decoherence time for a single electron.The superoperator M σ depends on a set σ of additional parameters describing the details of the MMM.Complete positivity, Galilean covariance, and consistent many-body scaling requirements single out a particular Lindblad form of the MMM generator in sec-ond quantization [23], M σ ρ = d 3 sd 3 q g σ (s, q) L(q, s)ρL † (q, s) with It applies to arbitrary many-body systems containing several particle species α with masses m α .The electron mass m e sets the reference scale.The c α (p) denote (fermionic or bosonic) particle annihilation operators in momentum representation.Applied to a single-particle state the Lindblad operator L(q, s) effects a phase-space translation on a characteristic scale set by the parameters σ = (σ q , σ s ), the momentum and position widths of the distribution g σ (s, q), which we assume to be Gaussian for simplicity.We avoid entering the relativistic regime by enforcing the upper bounds /σ q ≤ 10 fm and σ s ≤ 20 pm.In practice, this renders the σ s -scale irrelevant for all interferometer scenarios since they exhibit momentum superpositions far below (m/m e ) × /20 pm.The most prevalent instance of MMM by far, the continuous spontaneous localization (CSL) model [8,9], is obtained by setting σ s = 0. MMM restore macrorealism by destroying coherences on a length scale determined by /σ q , at an effective rate that amplifies with the mass and scales like 1/τ e .Matterwave interferometers or other mechanical superposition experiments then falsify MMM parameters (τ e , σ) if the predicted coherence loss is incompatible with, say, the observed interference visibility.An experiment can thus be deemed as more macroscopic than another one if its measurement record falsifies a greater set of parameters.This empirical notion of macroscopicity can be cast into a quantitative measure using the concept of Bayesian hypothesis testing [22].To this end, we consider the odds ratio that quantifies wether a hypothesis H τ * e or its rival hypothesis is more plausible, given the data d accrued in an experiment and additional background information I (which includes experimental parameters).Here, H τ * e states that macrorealism holds and a MMM affects (1) with a time parameter τ e ≤ τ * e (at fixed σ); H τ * e assumes weaker modifications, τ e > τ * e , or possibly none at all (τ e → ∞).With help of Bayes' theorem, we can express the odds ratio in terms of the likelihoods P (d|τ e , σ, I) that the MMM model ( 1 We are left with specifying a prior probability p(τ e |σ, I) for the MMM time parameter given the experimental scenario.A natural choice is Jeffreys' prior, defined as the square root of the Fisher information [26] of the likelihood with respect to τ e , p(τ e |σ, I) ∼ F(τ e |σ, I) This is the most objective choice when it comes to comparing different experiments.Jeffreys' prior is invariant under reparametrizations and results in the largest information gain between prior and posterior on average, as measured by the Kullback-Leibler divergence (or relative entropy) [17,27,28].As a conservative criterion for rejecting the MMM hypothesis H τ * e we require the odds ratio (5) to fall below 1:19.This is equivalent to determining the lowest five-percent quantile of the posterior distribution P (d|τ e , σ, I)p(τ e |σ, I)/P (d|σ, I).The respective time parameter τ m marking this quantile then defines the empirical macroscopicity, It characterizes the MMM time parameters most probably ruled out by the data.By maximizing over all σ one obtains a single figure of merit.

III. NEAR-FIELD INTERFEROMETRY
Molecule interferometers are prone to achieve high macroscopicities due to the large masses involved.Recently, the long-baseline universal matter-wave interferometer (LUMI) [14], an extended version of the Kapitza-Dirac-Talbot-Lau interferometer (KDTLI) [13], demonstrated interference of individual molecules of more than 2.5 × 10 4 atomic mass units.It makes use of the Talbot-Lau near-field interference effect based on diffraction off a standing light wave.
The experimental apparatus consists of three equidistant gratings of equal grating period d g .The distance L between them is of the order of the Talbot length, which marks the near-field regime [29,30].An initially incoherent molecule beam is collimated by the first grating and then diffracted by the second one, which results in a periodic interference fringe pattern at the position of the third grating.In the KDTLI and the LUMI setups, the second grating consists of a standing laser wave that modulates the phase of the molecular wave, while the first and third grating are material masks.The interference pattern is scanned by varying the lateral position x S of the third grating mask and counting the number of transmitted molecules as a function of x S .One obtains an approximately sinusoidal detection signal [18,29], Here, f 1 , f 3 denote the opening fractions of the first and third grating, and δx is the grating position offset, which is not measured but can be extracted from a fit to the data.Quantum mechanics predicts an ideal interference contrast (or visibility) V 0 that depends on various parameters such as the molecular mass m, the time of flight T , and the grating laser power P L [31] which are all part of the background information I = {x S , P L , . . .}. MMMinduced decoherence will effect a reduction of the contrast according to [32 with T T = md 2 g /h the Talbot time.In practice, the velocity at which individual molecules are ejected from the source and traverse the setup is not known, so that the signal (8) must be averaged over a measured time-offlight distribution.
Previous estimates of the achieved macroscopicity [14,23] compared the measured contrast with (9), attributing a certain confidence to the latter, and deduced the greatest excluded τ e -value from there.Here we carry out a proper statistical analysis based on the raw molecule count data, which fully accounts for all measurement uncertainties and allows dealing with small noisy data sets.We distinguish two modes of measurement: stationary operation with a constant molecule flux (KDTLI), which requires an additional uniformity assumption for the count statistics, and pulsed operation (LUMI).
The probability of particles to end up at the detector is directly proportional to the intensity S in (8).Given the experimental setup, we may assume a constant particle flux that illuminates homogeneously many grating slits and an efficient detector that covers all these slits, i.e. every molecule that passes the first grating and is not blocked by the third one will be counted.The probability for a molecule that has entered the interferometer to pass through all of its openings is then given a fixed lateral position x S of the third grating.
In the KDTLI experiment [13], the measurement record consists of the count numbers N + xS of detected molecules at each x S .The complementary events with probability P (−|τ e , σ, I) = 1 − P (+|τ e , σ, I) are missing as blocked molecules cannot be detected.However,the numbers N − xS of blocked molecules can be deduced from the sum of all counts, N tot = xS N + xS .To this end, we use that the third grating uniformly scans the lateral dimension in M equidistant steps extending over a multiple of the period d g .At constant flux, this implies that about the same number of molecules, x S ,P L , (11) omitting normalization.This expression also accounts for data taken at different values P L of the grating laser power, as was done in [13].Further data could be simply appended to the product if other parameters are varied in consecutive grating scans, e.g.grating separations, time-of-flight distributions, or the molecular species.
Figure 1 shows the posterior after plugging in the data from Ref. [13], using two runs at grating laser power P L = 0.84 W and P L = 1 W.For this we take the characteristic MMM length scale /σ q to exceed by far the molecule size d m and to stay below the typical interference path separation, d m /σ q d g T /T T .This admits the point-particle description above while maximizing the decoherence effect to V V 0 exp[−2T m 2 /τ e m 2 e ].One observes that Jeffreys' prior (dotted line) is updated to larger MMM time scales (i.e.weaker modifications) and to a much narrower distribution.As further discussed in Sec.V, this already indicates a conclusive measurement and a good posterior convergence for these runs.The plot also illustrates the double role of Jeffreys' prior in the assessment of macroscopicity: On the one hand, the prior gives a rough forecast of what MMM time scales one can access and what macroscopicity one can expect from a certain experimental setup with a given mass, time, and length scale.On the other hand, the overall performance of the experiment in terms of data quantity and quality decides whether the Bayesian update will ultimately converge to a sharp posterior distribution that no longer resembles the prior.We can then speak of a conclusive observation of genuinely macroscopic quantum behavior corresponding to the macroscopicity value µ m .
The LUMI experiment [14] is an extended version of the KDTLI setup that operates in a pulsed regime and is designed for more massive particles.During each shot, the laser grating is switched on and off; the molecules are detected in a time-resolved manner, resulting in two count numbers per shot, N + xS and N 0 xS with the laser on and off, respectively.The second value allows us to infer the number of blocked molecules required for the Bayesian update, xS , since the probability for a molecule to pass the third grating is simply f 3 in the absence of a second grating [33].The lateral Jeffreys prior is shown as a dotted line for two combined experimental runs at 0.84 W and 1 W, the posterior after updating with the data is depicted by a solid line.All distributions are normalized to their maximum value pmax and the lowest five percent quantile of the posterior is marked by a shaded area.position x S is again varied from pulse to pulse, and the procedure is repeated for varying grating laser powers.
The measurement data from 23 LUMI runs at laser powers between 0.2 W and 1.8 W, taken from Ref. [14], result in the posterior shown in Fig. 2 (red solid line).Similar to the KDTLI case, we obtain a sharply peaked, approximately Gaussian distribution around greater classicalization time scales, i.e. weaker MMM, which corresponds to the macroscopicity µ m = 14.0.
The individual LUMI runs at different laser powers comprise roughly an order of magnitude fewer molecule counts than in the KDTLI experiment.Thus, more of these runs have to be combined to reach the same level of posterior convergence.Nevertheless, one might be tempted to postselect among the 23 runs, discarding those with a poor interference visibility in order to achieve maximum macroscopicity.Indeed, by taking only the best eight runs into account, we can boost the macroscopicity to µ m = 14.8.But we are left with a broad posterior (blue dashed line in Fig. 2) that has not yet converged towards a Gaussian shape and still resembles the prior (dotted line).Such an outcome suggests that the hypothesis test is based on too little data to be fully conclusive.In Section V, we will introduce a quantitative criterion for how conclusive a set of data is in terms of empirical macroscopicity.

IV. MACH-ZEHNDER INTERFEROMETRY
We now turn to interferometers of the Mach-Zehnder type, see Fig. 3.A first beam splitter creates a superposition of two wave packets, occupied by a single atom or an entire BEC, which propagate on distinct paths, rejoin on a second beam splitter, and are then detected in the two associated output ports.Experimental demonstrations  [14]: Shown are Jeffreys prior (dotted line) together with the posterior distributions achieved by updating the prior with results of the LUMI experiment [14]: Using the eight best data sets leads to the blue posterior (dashed line) while using all data (grating laser power ranging from PL = 0.2 to 1.8 W) leads to the red posterior (solid line).Jeffreys' prior coincides for both scenarios.All distributions are normalized to their maximum value pmax and the lowest five percent quantiles of the posteriors are marked by the shaded areas.
of such phase-stable superposition states may be deemed macroscopic due to the large arm separations and long coherence times that can be achieved [10][11][12]34].
One may distinguish two basic types of operation: Either individual, i.e. distinguishable, atoms are sent through the interferometer, or a BEC of many identical particles passes the setup, whose macroscopic wave function then interferes with itself.In the ideal case, where no particles are lost, coherence and a stable phase ϕ are maintained, interactions can be neglected, and the particles are detected with unit efficiency, both scenarios lead to the same binomial distribution for the number of atoms n a and n b = N − n a recorded in output ports a and b, This indicates that, in terms of empirical macroscopicity, BEC interference with thousands of atoms in a product state is equivalent to single-atom interference with the same number of repetitions.It turns out that even partial entanglement through squeezing [34][35][36] has no noticeable advantage for BECs regarding macroscopicity, as long as only collective observables are measured [22].However, in the presence of decoherence effects and experimental disturbances the equivalence of both scenarios no longer holds.

A. Two-mode interference of BECs
Suppose a BEC is split coherently at the first beam splitter so that all atoms share the same collective phase  [10,12].A momentum inversion, or the implementation of an atom guide, makes them impinge on a second beam splitter.Interference is demonstrated by counting the atom number in the two output ports in dependence on the relative phase ϕ between the modes, which may be tuned e.g. by varying the times of flight or a potential gradient.
at the beginning.Even in this case, the interference pattern may fluctuate randomly from shot to shot due to technical noise in the beam splitters, timing uncertainties, or fluctuating background fields.In case of full dephasing one expects the uniform phase average of the binomial count distribution (12) as given by It is smeared out over the whole range of 0 ≤ n a ≤ N , with higher probabilities at the margins; the general expression for all stages of dephasing is reported in [37].
Notice the striking difference between ( 13) and the binomial distribution peaked at n a = N/2, which is predicted by a classical coin-flip model of the interferometer where the beam splitters send individual atoms into one or the other arm at random.Recording atom count numbers n a far from N/2 in individual runs of the BEC interferometer therefore demonstrates a non-classical effect; however, since practically the same count statistics (13) are expected if two separately prepared BECs are sent simultaneously onto a beam splitter [37,38] it is doubtful whether a genuine quantum superposition can be confirmed at all without a stable phase [39].
The measure of macroscopicity introduced in this article is particularly suited to clarify this issue, as it based on a Bayesian hypothesis test of MMM that prevent superposition states.Given that typically hundreds or thousands of atoms arrive at the detectors, one can perform a continuum approximation to obtain a closed expression for the partially dephased atom count distribution [22], with δ a ≡ arcsin(2n a /N − 1).Dephasing with rate Γ P is here captured by the function g N (t) = exp [−1/2N − Γ P t/2] and ϑ 3 is the Jacobi-theta function of the third kind, The distribution ( 14) is an oscillatory function of the phase difference ϕ between the two Mach-Zehnder arms.
Interference visibility is reduced, in parts, by the initial phase uncertainty of the N -atom product state and by gradual dephasing over the interference time t.MMM predict the dephasing rate with ∆ x = ∆ p T /2 √ 3m the effective average arm separation, given the momentum splitting ∆ p and the atom mass m.We assume free evolution of the BEC in the zdirection, a Gaussian transverse mode profile with waists w x , w y , and negligible mode dispersion; see [37] for a detailed derivation.Given a highly diluted BEC, one may also neglect phase dispersion caused by atom-atom interactions [40].Its presence would further reduce the interference visibility, and omitting it thus underestimates macroscopicity.
Atoms lost from the condensate can simply be traced out [41], given that we have a product state of singleatom superpositions.Our reasoning thus applies to the remaining particle number N registered in the detectors.In fact, MMM also lead to atom loss, and for σ q > /w x,y this depletion dominates, while the dephasing rate ( 16) drops.The reason is simple: undetected atoms have no effect on the interference signal [42].In principle, one could then rule out macrorealistic modifications such as the CSL model by testing their predicted depletion rates against actually measured atom loss over time [43].But since no quantum signatures would be verified in such a scheme, no macroscopicity should be assigned to such an observation either.As a genuine quantum experiment, the BEC interferometer is most sensitive to MMM dephasing in the parameter range w x,y /σ q ∆ x , which is where the dephasing rate reaches its maximum, e , while MMM-induced depletion can still be neglected.The greatest excluded classicalization time τ m yielding the macroscopicity value is thus obtained in this regime, and we will restrict our subsequent evaluation to this case.
We now perform the Bayesian hypothesis test with the data taken from the Stanford atom fountain experiment [10], which claims to test the superposition principle on the half-metre scale.The original claim was debated, because it hinged on two crucial assertions [39,44]: (i) the Rubidium condensate splits coherently and accumulates a stable relative phase ϕ between the two arms in each shot of the experiment, and (ii) uncontrolled vibrations of the recombining beam splitter cause the phase (and the resulting atom counts) to fluctuate randomly from shot to shot.The authors estimated from the data of several dozen shots values of an average interference contrast and phase that parametrize the expected atom count distribution, but these estimates alone are not a sufficient criterion for a coherently split condensate state [37].Recent comparative studies have largely ignored this issue [14,45].
To illustrate our formalism and assess macroscopicity on the half-metre scale, we shall assume (i), but not (ii).The reason is that, if we knew (ii) were true, we would be prompted to describe the measurement statistics by the phase-averaged distribution (13), which is insensitive to any further MMM-induced dephasing and thus unamenable to our macroscopicity analysis.Instead, we take the position of an observer that is uninformed about the phase noise and specify that the data be described by the count distribution (14) subject to MMM dephasing, with a fixed unknown phase ϕ that we determine a posteriori as the one that maximizes the resulting macroscopicity.
Figure 4 shows the relevant posterior probability for τ e (solid line) resulting from 20 data points at ∆ p = 90 k momentum splitting and t = 2.08 s, i.e. at an effective arm separation of ∆ x 29 cm.The lowest 5% quantile yields µ m = 10.9.Compared to the solid lines in Figs. 1 and 2, the Bayesian update neither shifts nor narrows the posterior here significantly with respect to Jeffreys' prior (dotted line).This lack of convergence stems from the lack of reproducible data: the posterior is based on a mere 20 data points, whose phase information is scrambled by the uncontrolled noise we had to neglect.A conclusive test of macrorealistic dephasing on the half-metre scale would require additional measurements.They could either reveal the remaining phase information, in which case the posterior would approach a narrow Gaussian distribution around a τ e -value that matches the observed phase losses.Or, if the observed phase is indeed random, the posterior would be pushed towards smaller τ e than expected a priori.14) is shown as the dotted curve and updated with the data at 90 k transferred photon recoils.Optimizing the macroscopicity over all possible relative phases ϕ results in the solid curve at ϕmax ≈ 6, which updates the prior to slightly larger τe.The shaded region marks the lowest five percent quantile that sets the macroscopicity to µm = 10.9.Both distributions are normalized to peak value pmax.

B. Nested Mach-Zehnder BEC interferometry
Such additional measurements of the phase are not required in the subsequent nested Mach-Zehnder experiment reported by the Kasevich group in Ref. [11].They measured phase shifts induced by tidal forces in a nested dual Mach-Zehnder setup, in which they apply a sequence of laser-driven Bragg transitions to split a BEC of N 10 6 Rb atoms at 50 nK into two arms separated by ∆ p = 102 k.They then form identical Mach-Zehnder interferometers as in Fig. 3 in each arm by additional splitting and recombination stages at ∆ p = 20 k.Finally, instead of recombining the output ports of the two outer arms, the authors image the two atom clouds in each single shot and read out their relative phases using a phase-shear technique [46]: a sinusoidal density modulation is imprinted onto the two split components, which shows up as spatial fringe patterns in their fluorescence images, with an offset determining the phase difference.This measurement configuration suppresses the effect of vibration-induced fluctuations of the beam splitter phase.
Ideally, all atoms occupy the same nested two-mode superposition state, just before recombination of the inner Mach-Zehnder interferometers.The two arms, spatially separated over 20 cm at a relative phase ϕ 0 , are superpositions of two Mach-Zehnder modes with relative phases ϕ 1,2 , The creation operators a 1,2 and b 1,2 respectively create those inner two modes separated by up to 7 cm.Their relative phases ϕ 1,2 fluctuate randomly from shot to shot due to uncontrolled vibrations, but their difference ∆ϕ remains stable.It is extracted in each shot by comparing the phase-sheared images of the two atom clouds.The fluctuating outer phase ϕ 0 has no relevance in the following, since the two outer arms are never recombined in the experiment.
The probability to extract a phase value of φ 1 from the image in one arm and of φ 2 from the other is then simply given by the probability that any n out of N atoms occupy the mode c 1 (φ 1 ), whereas the other n−N occupy c 2 (φ 2 ), with In fact, we can view the n and N − n atoms in the two outer arms as independent condensates.In the absence of decoherence, we have ρ = |ψ ψ| and a binomial distribution of the atom portion n that is sharply peaked around N/2.We incorporate MMM-induced dephasing by integrating the master equation ( 1) over the interference time t of the two simultaneous Mach-Zehnder stages.Neglecting atom losses and phase dispersion due to atom-atom interactions, we can treat the two branches separately and in the same manner as before.To simplify further, we make use of N 1 by setting n ≈ N/2 in ( 19) and performing the continuum approximation.This leaves us with p(φ 1 , φ 2 |τ, σ, I) ≈ p(φ 1 |τ, σ, I)p(φ 2 |τ, σ, I) and an approximately Gaussian phase distribution in each branch that is smeared out by the MMM-induced dephasing around the actual phase value, It is the conjugate of the dephased number distribution ( 14) obtained by re-substitution of sin φ = 2n a /N −1 and a subsequent −π/2 pulse (For more details see Ref. [22]).
Here, N (x|µ, ∆) stands for a normalized Gaussian distribution with mean µ and variance ∆ and the sum accounts for the fact that the MMM-induced decoherence may smear the initially narrow distribution beyond the periodic interval (−π, π).Now we can calculate the probability distribution of ∆φ = φ 1 − φ 2 , the difference of two random variables, via convolution, which returns once again a Gaussian with double the width and samples the actual interferometer phase difference ∆ϕ unaffected by beam splitter Analysis of the Stanford nested BEC interferometer [11]: Jeffreys' prior based on the likelihood ( 22) is shown as the dotted curve and updated with the data at 20 k transferred photon recoils in both Mach-Zehnder interferometers.The posterior is updated to much greater classicalization times due to the superb localization of the relative phase measured in the experiment.The shaded region marks the lowest five percent quantile that sets the macroscopicity to µm = 12.4.Both distributions are normalized to peak value pmax. vibrations, In the experiment [11], the interferometer phase ∆ϕ was varied by inserting a large test mass in half of the 138 shots recorded at t = 1.2 s interference time.However, the actual value of ∆ϕ is irrelevant, and only the spread of the data points ∆φ around the theoretically expected mean matters for our hypothesis test.This spread turns out to be much greater than 1/N , which implies N Γ P t/2 1 in (22) and justifies a posteriori that we could assume a fixed N and neglect atom number fluctuations in the condensate.We arrive at the posterior shown in Fig. 5, a well converged peak far to the right of the broad prior.Notice the striking improvement over the previous result in Fig. 4, which can be attributed to the fact that the data sample localizes at a stable phase difference.

C. Interferometry with individual atoms
In the case of single-atom interference, quantum statistics of identical particles plays no role, which simplifies greatly the analysis.We shall demonstrate this by means of a recent experiment with Cs atoms realizing a Mach-Zehnder scheme with fixed arm separation ∆ x and t = 20 s of coherence time [12].Each of the four million recorded atoms is brought into a two-arm superposition that accumulates a variable controlled phase difference ϕ k .At recombination, the state is split into four branches, out of which only two interfere, whereas the other two contribute to the detection signal incoherently and thus halve the interference visibility.MMM dephasing at the rate ( 16) would reduce it further predicting the probabilities and p(b|ϕ k , τ e σ, I) = 1 − p k , to detect the atom in the output ports a and b, respectively.The phase difference is varied in equidistant steps by adding small sub-ms increments δt t to the interference time, ϕ k = ω(t + kδt) with ω = 2π ×12.7 kHz.As N k 10 4 atoms are recorded in each step, we can approximate the resulting binomial distribution for the number n a of particle counts in a by a Gaussian distribution and take n a as a continuous variable running from zero to N k , This approximation for the likelihoods of detecting n a atoms in a would be inaccurate close to the boundaries n a = 0, N k , but these extreme values never occur in practice in a realistic low-contrast scenario.From Eq. ( 23) we infer that, whenever ϕ k is an odd multiple of π/2, the likelihoods p(n a |ϕ k , N k , τ e , σ, I) do not depend on the MMM parameters τ e , σ, but instead match the classical coin-flip model of the Mach-Zehnder setup (i.e. a binomial count distribution centered at n a = N/2).Data points recorded at such ϕ k are thus useless in terms of macroscopicity, as they do not update the posterior of the MMM time parameter τ e .Notice that the count distribution matches the classical model also in the limit of complete dephasing (unlike the BEC interferometer case, see Eq. ( 13)).
Contrary to some of the previous examples, there is no shortage of data here to do Bayesian inference.Taking all 4 million recorded data points for t = 20 s into account, we obtain a sharply peaked posterior for τ e , with a corresponding macroscopicity value µ m = 11.7 and a tiny FWHM of 3.5 × 10 9 s (i.e. less than one percent of the corresponding τ m ).However, such a small error in the hypothetical τ e -estimate seems "too good to be true", as it implies perfect single-atom detection for all four million registered counts, which is unrealistic.
For this specific experiment, the most prevalent noise source are dark counts in the CCD cameras; these instruments are supposed to detect atoms via fluorescence imaging, but occasionally they miss an atom or click when there is none.In conventional data analysis, one would subtract an appropriate background level from the average detection signal and top up its error bar.In Bayesian inference, we must introduce a random variable and convolute our model likelihoods with the respective noise distribution.The most obvious choice, a Gaussian random variable, is easily incorporated in our  [12]: Jeffreys' prior based on the likelihood (24) (dotted line) is updated with the atom count data at t = 20 s interference time, resulting in a quasi-δ posterior (blue solid line).A more realistic assessment including fluctuating dark counts with σ dark = 10 3 standard deviation does not change the prior appreciably, but yields a broader posterior (red solid line) with a higher macroscopicity µm = 11.8, as marked by the shaded region.Once again, all distributions are normalized to their respective maxima pmax.
Gaussian approximation (24) of the likelihoods by adding its contribution σ 2 dark to the overall variance, i.e. writing ).The corresponding Jeffreys' prior is given in App. A. For the present experiment, a realistic lower bound for the dark count fluctuations would be σ dark 10 3 .The resulting posterior yields the macroscopicity µ m = 11.8 at a reasonable FWHM uncertainty of 7.85 • 10 10 s.
Figure 6 compares the posterior (red solid line) to the one omitting dark counts (blue line, almost δ-peaked) and to Jeffreys' prior (dotted line).Both posteriors have not moved far from the prior, which is consistent with the low interference contrast reported in [12].We observe that the inclusion of dark counts not only leads to a more realistic spread (i.e.uncertainty) of the distribution, but also to a systematic shift towards greater τ e and µ m .This exemplifies how measurement errors and their proper statistical assessment affect the empirical macroscopicity of a quantum experiment and its capability to rule out macrorealistic modifications such as CSL.The decohering effect of any technical or environmental noise source that one knowingly omits would be attributed to MMM, which overestimates their hypothetical strength, i.e. underestimates τ e and µ m .

V. CONVERGENCE OF THE POSTERIOR DISTRIBUTION
The Bayesian approach to hypothesis testing utilized in this article not only permits comparing matter-wave experiments in terms of their macroscopicity; it also reveals, through the observed degree of posterior convergence, how conclusive the measurement records are at Experiment FWHM Gaus.FWHM min.HD µm KDTLI [13] 1.80 testing macrorealism.This can be turned into a quantitative statement by employing Bayesian consistency and tools from probability theory.Suppose we interpret the Bayesian updating as a parameter estimation method for a specific macrorealistic model such as CSL, with the aim of finding the "true" underlying time parameter τ e = τ 0 at fixed σ.For this case of a one-dimensional parameter space and a strictly positive prior it is known to which distribution the posterior will converge for an asymptotically large data set [24,25,47].It is the Gaussian N (τ e |τ 0 , 1/F(τ 0 |σ, I)), centered around the true parameter value τ 0 , whose variance is given by the inverse of the Fisher information appearing in Jeffreys' prior (6).This suggests to assess the degree to which a quantum experiment is a conclusive test of MMM by quantifying how well the respective posterior has converged to that asymptotic distribution.It is then natural to measure the residual deviation by means of the (bounded and symmetric) Hellinger distance [48], Of course, a true value of τ e is not available in practice because the existence of MMM cannot be verified experimentally.The posterior can only serve to falsify collapse models with values of τ e up to a threshold (e.g. based on the lowest 5% quantile, as in the empirical definition of macroscopicity).After all, an observed reduction of interference visibility might always be caused by uncontrolled technical disturbances, or due to an unidentified environmental source of decoherence.Given that the posterior expectation value of τ e may not exist, we take the minimum of ( 25) over all possible τ 0 as a pragmatic measure for posterior convergence.Small values thus indicate more conclusive experimental tests of MMM at the given macroscopicity level.Table I lists the minimal Hellinger distances for the discussed experiments, together with the FWHM widths of the posteriors and of the corresponding asymptotic Gaussians.The KDTLI data, the LUMI data including  [22].This includes entangled nanobeams (ENB [49]), quantum random walks (QRW [50]), number-squeezed BEC interferometry (NSB [34]), molecule interferometry (KDTLI [13] and LUMI [14]), Mach-Zehnder interferometry (MZI) with single atoms [12], with BECs [10], and nested BECs [11].The values are decadic logarithms (7) of the greatest falsified time parameters of minimal macrorealistic models, plotted against the greatest length scale at which the experiments probe macrorealism effectively [51].
all 23 interferograms, the Stanford nested interferometer, and the Berkeley Mach-Zehnder interferometer result in Hellinger distances much less than unity, indicating a conclusive test of macrorealism.On the other hand, the best eight interferograms of LUMI and the 20 data points from the half-meter BEC interferometer at Stanford yield values not far from unity, which indicates poor posterior convergence and suggests that more data be taken for a conclusive result.This is further indicated by the Gaussian FWHM being of the same order as the mean in both cases, i.e. the asymptotic distribution would extend appreciably to unphysical values τ e < 0.

VI. CONCLUSION
In this article, we analyzed the most recent matterwave interference experiments with atoms, molecules, and BECs regarding their capability to probe the quantum-classical transition by ruling out minimal macrorealistic modifications (MMM) of quantum theory.Building on a parametrization of this class of models, Bayesian parameter inference with Jeffreys' prior allows one to assess and objectively compare the different experiments.The achieved macroscopicity µ m is then given by the maximal classicalization time parameter ruled out by the measurement data at a significance level of 5%.Based on this yardstick, the matter wave interference reported in Refs.[11][12][13][14] yield the highest macroscopicities demonstrated in any test of the quantum superposition principle so far.
A second quantity characterizing the pertinence of in-dividual interferometers is the greatest critical length /σ q for which the falsified classicalization time is maximized.At this scale, often on the order of the interference path separation, the instrument is most sensitive to collapse effects on delocalized quantum states.Figure 7 shows the macroscopicity against this critical length, comparing the discussed interference experiments with previous case studies.One observes that the scales probed by different superposition tests vary by ten orders of magnitude.We remark that the BEC interference experiment by the Kasevich group reaches /σ q = 8 cm, about an order of magnitude less than the geometric path separation of up to half a meter; this is because the effective splitting distance ∆ x = 29 cm has to be sufficiently undershot to maximize the classicalization effect, as is the case for all interference experiments shown here.
We note that the Bayesian method demonstrated in this article can be readily applied to constrain all the parameters of a specific collapse model such as continuous spontaneous localization (CSL).By combining data from distinct experiments one would obtain a probability distribution on the set of all model parameters that can be successively sharpened by Bayesian updating.However, an uninformative prior that is invariant under reparametrization is not available for a multi-parameter space [27] so that in this case one must rely on the posterior turning independent of the prior once sufficient data has been processed.A great advantage of this Bayesian parameter estimation, which builds on the raw data of each experiment, is that the measurement statistics is correctly accounted for.This is in contrast to naive exclusion plots based on average values for each experiment in a frequentist fashion, for which it is very hard to carry out a consistent error analysis.
Apart from demonstrating the significance of statistical errors in diverse types of measurement data, our analysis also highlighted the importance of a realistic modelling of the experiment including technical noise and environmental decoherence, and the pitfalls of postselecting a subset of "successful runs".As the size and complexity of experiments exploring the quantumclassical boundary will grow, so will the difficulty to keep errors under control and collect a sufficient amount of reliable data.The presented Bayesian protocol provides a rigorous method to assess the quality of the data in regard to probing macrorealism and to objectively quantify the degree of macroscopicity.It could be readily applied to space-borne precision interferometry [52][53][54] that is projected to outperform the current record experiments in the future.
have traversed the interferometer in each step, out of which N − xS = N xS − N + xS were blocked.The posterior for the MMM time parameter then reads as p(τ e |d, σ, I) ∝ p(τ e |σ, I) × xS,PL [S(τ e , σ, I)]

Figure 1 .
Figure1.Analysis of the KDTLI experiment from Ref.[13]: Jeffreys prior is shown as a dotted line for two combined experimental runs at 0.84 W and 1 W, the posterior after updating with the data is depicted by a solid line.All distributions are normalized to their maximum value pmax and the lowest five percent quantile of the posterior is marked by a shaded area.

Figure 2 .
Figure2.Analysis of the LUMI experiment from Ref.[14]: Shown are Jeffreys prior (dotted line) together with the posterior distributions achieved by updating the prior with results of the LUMI experiment[14]: Using the eight best data sets leads to the blue posterior (dashed line) while using all data (grating laser power ranging from PL = 0.2 to 1.8 W) leads to the red posterior (solid line).Jeffreys' prior coincides for both scenarios.All distributions are normalized to their maximum value pmax and the lowest five percent quantiles of the posteriors are marked by the shaded areas.

Figure 3 .
Figure 3.Typical setup of a Mach-Zehnder matter-wave interferometer: A first beam splitter prepares individual or Bose-Einstein condensed atoms in a momentum superposition, usually by means of an internal π/2-transition[10,12].A momentum inversion, or the implementation of an atom guide, makes them impinge on a second beam splitter.Interference is demonstrated by counting the atom number in the two output ports in dependence on the relative phase ϕ between the modes, which may be tuned e.g. by varying the times of flight or a potential gradient.

Figure 4 .
Figure 4. Analysis of the Stanford BEC interferometer[44]: Jeffreys' prior based on the likelihood (14) is shown as the dotted curve and updated with the data at 90 k transferred photon recoils.Optimizing the macroscopicity over all possible relative phases ϕ results in the solid curve at ϕmax ≈ 6, which updates the prior to slightly larger τe.The shaded region marks the lowest five percent quantile that sets the macroscopicity to µm = 10.9.Both distributions are normalized to peak value pmax.
Figure 5.Analysis of the Stanford nested BEC interferometer[11]: Jeffreys' prior based on the likelihood (22) is shown as the dotted curve and updated with the data at 20 k transferred photon recoils in both Mach-Zehnder interferometers.The posterior is updated to much greater classicalization times due to the superb localization of the relative phase measured in the experiment.The shaded region marks the lowest five percent quantile that sets the macroscopicity to µm = 12.4.Both distributions are normalized to peak value pmax.

Figure 6 .
Figure 6.Analysis of the Berkeley atom interferometer[12]: Jeffreys' prior based on the likelihood (24) (dotted line) is updated with the atom count data at t = 20 s interference time, resulting in a quasi-δ posterior (blue solid line).A more realistic assessment including fluctuating dark counts with σ dark = 10 3 standard deviation does not change the prior appreciably, but yields a broader posterior (red solid line) with a higher macroscopicity µm = 11.8, as marked by the shaded region.Once again, all distributions are normalized to their respective maxima pmax.