Single-photon nonlocality in quantum networks

The state obtained when a single photon impinges on a balanced beamsplitter is often known as single-photon entangled and its nonlocal properties have been the subject of intense debates in the quantum optics and foundations communities. It is however clear that a standard Bell test made only of passive optical elements cannot reveal the nonlocality of this state. We show that the nonlocality of single-photon entangled states can nevertheless be revealed in a quantum network made only of beamsplitters and photodetectors. In our protocol, three single-photon entangled states are distributed in a triangle network, introducing indeterminacy in the photons' paths and creating nonlocal correlations without the need for measurements choices. We discuss a concrete experimental realisation and provide numerical evidence of the tolerance of our protocol to standard noise sources. Our results show that single-photon entanglement may constitute a promising solution to generate genuine network-nonlocal correlations useful for Bell-based quantum information protocols.

A single-photon maximally entangled state is obtained when a photon impinges on a balanced beamsplitter. Its nonlocal properties have been intensively debated in the quantum optics and foundations communities. It is however clear that a standard Bell test made only of passive optical elements cannot reveal the nonlocality of this state. We show that the nonlocality of single-photon entangled states can nevertheless be revealed in a quantum network made only of beamsplitters and photodetectors. In our protocol, three single-photon entangled states are distributed in a triangle network, introducing indeterminacy in the photons' paths and creating nonlocal correlations without the need for measurements choices. We discuss a concrete experimental realisation and provide numerical evidence of the tolerance of our protocol to standard noise sources. Our results show that single-photon entanglement may constitute a promising solution to generate genuine networknonlocal correlations useful for Bell-based quantum information protocols.

I. BACKGROUND
Local hidden variables models cannot account for all the predictions of quantum theory. This was formalized in 1964 by J. S. Bell [1], and is now commonly termed nonlocality [2]. Nonlocality is a quantum property with no classical analogue displayed in the so-called Bell tests, defined by the statistics obtained when performing appropriate local measurements on a well-chosen entangled state. Bell tests have been performed in many different systems, from massive particles [3] to photons [4,5], and using many different degrees of freedom, such as electronic levels, polarization, orbital angular momentum or time bins. In most of these realizations the relevant degrees of freedom used to encode the entanglement are transmitted to each distant observer by a physical carrier, such as, for instance, a photon.
In this work we are interested in the question of whether single-particle quantum states can display nonlocal correlations with no classical analogue. In particular, we consider the question in the context of single-photon entanglement, that is, the state obtained when sending a single photon into a balanced beamsplitter. Here |01 AB (resp. |10 AB ) represents the situation in which the photon is sent to the right party * These authors contributed equally to this work.
B (resp. the left party A). The resulting state therefore consists of only one photon and entanglement is encoded in the two optical spatial modes.
Is the state (1) nonlocal? This question has been intensively debated in the quantum foundations and quantum optics community, e.g. [6][7][8][9][10][11][12][13][14][15][16][17][18][19]. In principle, a positive answer is provided by the following simple argument [8][9][10]: the two optical modes can be transferred to the population of two energy levels of two distant massive particles. Single-photon entanglement is therefore mapped into two-particle entanglement and a Bell test can now be implemented. The question is much subtler when considering only optical means. To obtain a nonlocal behavior, the two observers need to use local active measurements involving local oscillators creating extra local photons [6,7,13,16]: without these active measurements, measuring the information content of the state (1) would allow the observers to deduce if they received the photon sent by the source, destroying the indeterminacy in the photon path, i.e. the coherences in (1). Then, the statistics become classically simulable. One is therefore tempted to conclude that the observation of nonlocal effects in the single-photon entangled state by passive optical means, that is, phase shifters, beamsplitters and photodetectors, is impossible.
The main result of this work is to show that this is not the case and one can indeed reveal the nonlocality of state (1) with only passive measurements. To do so, we go beyond standard Bell tests and consider setups defined by causal networks. These are causal structures involving several independent sources, each being distributed to a subset of the parties involved in the scenario, according arXiv:2108.01726v3 [quant-ph] 8 Apr 2022 to a structure defined by a network [20]. It is well understood that these networks offer new possibilities to design quantum experiments with no classical analogue [21][22][23][24][25][26].
Here, we show that three copies of single-photon entangled states placed in a triangle causal network (cf. Fig. 1) can exhibit non-classical correlations. Our main idea is to exploit the topology of the network to reintroduce indeterminacy in the photon path, necessary to exploit the coherences of these states. Remarkably, the obtained setup is not only passive in terms of the implemented measurements, but also because it does not require any active choice of measurements. That is, in our setup, there are no classical inputs and observers perform a single measurement on their received shares. These characteristics make the proposal, arguably, the simplest experimental demonstration of the nonlocality of the single-photon entangled state, as well as the first experimental proposal for genuine network nonlocality [26].
Beyond the fundamental motivation, our results are also relevant from an applied point of view. Correlations with no classical analogue are the main resource for device-independent applications. For instance, the security of device-independent protocols for quantum random number generation [27,28] and quantum key distribution [29] is based on the observation of Bell inequality violations. For that, the simplest way of producing entangled states is through Spontaneous Parametric Down Conversion (SPDC). Entanglement can be encoded on different degrees of freedom of the resulting two photons. However, the state produced by SPDC is a mixture of the desired entangled state and vacuum [30]. In fact, a heralded preparation of a two-photon maximally entangled state is quite challenging [31]. In turn, single-photon entanglement can be easily prepared in a heralded way: an arbitrarily good approximation to it can be obtained when detecting photons in one of the two modes resulting from the SPDC process and sending the non-measured mode into a balanced beamsplitter (cf. [32]). Moreover, this form of entanglement does not require the control of any other light degrees of freedom, such as, e.g., polarization or orbital angular momentum. Therefore, the design of simple setups to generate correlations with no classical analogue from this state opens new avenues for the implementation of device-independent protocols.

II. THE TRIANGLE NETWORK
The considered Bell-type experiment consists of a triangle causal network where three observers, A, B and C, receive states prepared by three sources, see Fig. 1. These states are measured producing outcomes a, b and c with probability p(abc).
A classical description of the experiment compatible with the causal constraints defined by the network has (Left) Causal model for the Triangle Network: three independent sources {α, β, γ} prepare correlated states that are distributed among the three parties. Each of them produces an output through a local process acting on the received parts of the states. The form of the states and local processes depend on the theory, say classical or quantum, used to reproduce the correlations in the network. (Right) Schematics of the proposed quantum optical experiment. A, B and C share single-photon entangled states |ψ + = (|01 + |10 )/ √ 2 prepared by the sources. Each party receives two optical modes that are mixed on a beamsplitter, the resulting output modes being measured by photodetectors. In the specific experimental instance depicted here, A does not detect any photon, B has one detector firing, and C has both detectors firing.
the form (here dα, dβ and dγ are normalized measures) The causal model therefore consists of classical variables α, β and γ distributed by the sources and local response functions p X , with X = A, B, C, producing the measurement outcomes. In analogy with standard Bell tests, we define probability distribution p(abc) that can be written as Eq. (2) as causally classical or, simpler, local.
A quantum description of the experiment compatible with the causal network replaces the random variables by quantum states ρ α , ρ β and ρ γ and the local response functions by quantum measurements. Therefore, quantum probabilities compatible with the triangle network have the form where M = 1 A , and similarly for B and C. We slightly abuse the notation in Eq. (3) by not specifying the tensor products and different Hilbert spaces in which the different operators act, but this is clear from Fig. 1. We say that a quantum experiment, defined by states and measurements producing the outcome distribution p(abc) according to Eq. (3), is nonlocal whenever this distribution cannot be described by a classical model (2). Our goal in what follows is to provide a nonlocal quantum experiment in the triangle network using only single-photon entangled states, beamsplitters and photodetectors.
The basic idea of the experimental proposal is depicted in Fig. 1: three parties A, B, C share, for each pair AB, BC, CA, the single photon entangled state |ψ + , see Eq. (1). The initial state is thus (4) Each party then receives its two optical inputs on modes X 1 X 2 (X = A, B, C) and mixes them with a beamsplitter, which induces a unitary transformation B X1X2 (t, φ) parametrized by its transmissivity t and phase φ. All parties use the same value for t, and the phases are all null for simplicity in the following (cf. [32]).
After passing through the beamsplitters, the photons end up in photodetectors. For each mode X i , the operators describing a perfectly efficient photodetection correspond to the projectors onto the vacuum state D Xi = |0 0| Xi (detector off) and the projector on its orthogonal complement D Xi = 1 Xi − |0 0| Xi (detector firing). Indeed, we assume that the detectors do not resolve the number of photons but only their presence. The measurement obtained by mixing two modes with the beamsplitter and the ideal photodetectors can be accordingly expressed as a POVM for each party (here , where the measurement labels stand respectively for no photon counts (0), a count in the left detector (L), a count in the right detector (R), or counts in both detectors (2). The crucial point is that when t = 0, the L and R measurements actually detect superpositions of photons in the incoming modes (see details in [32]).
The quantum experiment described here results in the output distribution which depends on the transmissivity t of the beamsplitters used by the parties and whose exact expression can be found in the Supplementary Material [32].

III. WITNESSING SINGLE-PHOTON NONLOCALITY
The first main result of this work is that The distribution p t obtained from the experiment described in Fig. 1 (cf. previous section), is nonlocal (at least) for values of the beamsplitter transmissivity in the intervals t ∈ (0, 0.215) and t ∈ (0.785, 1).
We give in the following a sketch of the proof, which is analytical and detailed in [32].
First, we simplified the structure that classical strategies must follow in the triangle network (2). Specifically, all the local response functions p A , p B , p C in (2) can be assumed to be deterministic, and all the indeterminacy is therefore delegated to the classical sources {α, β, γ}, which can all be assumed to be, w.l.o.g, real numbers uniformly distributed in the interval [0, 1]. Therefore, any local model is specified by deterministic triangle-local response functions p A p B p C that map all the points of the cube [0, 1] 3 to the observed outputs Secondly, we were able to identify strict constraints that need to be satisfied by all possible classical causal models simulating the considered experimental output p t (abc) in the triangle network. In particular, we exploited the cyclic symmetry and null components of the distribution. For example, all outputs of the form (here χ represents any of L or R) {(000), (00χ), (2χχ), (22χ)}, or any of their permutations, have zero probability, due to the fact that there are initially 3 photons in the network, of which at most 2 can end up in the same photodetector. That is, in each run of the experiment the total number of clicks in the detectors must be 2 or 3. By taking all the relevant properties of p t into account, one can identify constraints that need to be satisfied by any classical strategy, specified by the response functions (7), aiming at reproducing p t . In fact, while the exact form of the response functions remains in general unknown, some of its marginals can be expressed in terms of the output p t . These relevant marginals are nothing other than linear constraints on the response functions, parametrized by t. Together with standard normalization and positivity constraints, these define a Linear Program. The feasibility of such Linear Program is, by definition, necessary for the existence of such local response functions. Therefore, when infeasible, no local model exists to simulate our experiment proposal. Results show that the Linear Program is infeasible for t ∈ (0.785, 1) and t ∈ (0, 0.215), proving the claims of this section. We refer to the Supplementary Material for the technical details and the complete proof.
The techniques we used are similar to those introduced in [26] and generalized in [33]. However, their findings cannot be applied directly to our scenario. The reason behind this is that the works [26,33] are based on a token-counting approach to some physical "tokens" that are: i) generated from the sources, ii) distributed to the parties in a coherent superposition of different ways, and iii) counted at the output. In our experiment the physical tokens are the photons, which however can be miscounted at the output, as more than one could enter in the same photodetector. For these reasons, in the proof [32] we had to extend these techniques so that they could be applied to our setup. As part of the proof, we showed that our distribution is nonlocal if and only if the distribution proposed in [26], which we dub p t , is nonlocal as well. While finishing this manuscript, we became aware of preliminary unpublished results [34], which prove nonlocality of p t for discrete points in the range t ∈ (0.5, 0.785) as well. Nonlocality of p t in such interval has been conjectured already [35]. Given the above mentioned equivalence between the nonlocality of p t and p t proven in this work, this would imply that the proposed ideal experiment is nonlocal for all transmissivities except t ∈ {0.0, 0.215, 0.5, 0.785, 1.0}, which are known to have local models (cf. [26,32]).

IV. NOISE TOLERANCE AND MACHINE LEARNING ANALYSIS
After proving the nonlocality of the outputs of the ideal noiseless experiment, we analyzed the robustness of our results against typical noise errors, by modelling imperfections which occur in experimental realizations of the optical network presented in Fig. 1. Therefore, the resulting output distribution, p Q,T,ν t (abc) depends on additional noise parameters quantifying: the impurity of the generated single-photon entangled state (Q), the transmissivity of the optical channels (T ) of the network, and the efficiency of the final photodetectors (ν). It follows that that is, with no impurity, and perfect transmission and detection, we recover the idealized experiment. The details of the modelling employed are deferred to the Sup.Mat.
[32]. Inevitably, part of the key properties and symmetries of p t (abc) disappear as soon as noise is introduced in the network. This makes the analytic approach unworkable in this case. Consequently, in order to estimate the tolerance to the noises introduced above, we resorted to a technique recently introduced in [35]: there, a feedforward neural network is shaped with the same topology of the causal network under study, and it is then asked to reproduce the target distribution p (Q,T,ν) t . Each output of the neural network is thus literally an instance of a classical model (which can be therefore described by Eq. (2) in our case) trying to reproduce p (Q,T,ν) t . For a fixed target distribution, the neural network is trained by minimizing the Euclidean distance from the neural network's local model to the target. When the target distribution is inside the local set, a sufficiently large neural network should be capable of learning it. Instead, a large distance between the machine's best guess and the target is taken as an indication of nonlocality. What it means to be "large" enough can be somewhat arbitrary, since some nonlocal behaviors are extremely close to the local set (as shows the regime where we have proven nonlocality, while the blue line shows the regime where we conjecture nonlocality, based on these numerics and the relation to the distribution in Ref. [26], which was studied numerically in Ref. [35]. is the case here), and additionally the neural network's model is not guaranteed to converge to the optimal solution as it can get stuck in local minima during training. In order to gain deeper insight into the boundary between locality and nonlocality we examine transitions of the learning algorithm's behavior when adding noise to the target distribution, and retraining the machine independently for each target distribution. The very noisy case is guaranteed to be local and the machine learning results on those give a reference to which we can compare the nonlocal regime. By definition, this technique does not certify nonlocality in an absolute way, but has been shown to be reliable and efficient from the point of view of computational resources [35].
The results of the analysis are summarized in Figs. 2 and 3, where we consider only t ≥ 0.5 because of the symmetry of the experiment when mirroring the beamsplitters t = 1 − t. For the noiseless distribution (perfect visibility r = 1 in Fig. 2), the neural network's best guess is distant from the experimental output, corroborating the analytical proof of nonlocality for t ∈ (0.785, 1). At the same time the neural network hints at the locality of the output distribution for t = 0.5 and t = 1, which clearly have local strategies. A local model exists as well for t ∼ 0.785 (cf. [26,32]) where the neural network struggles to get closer; however, note that the distance of 0.003 achieved there is already very close to the local set. Moreover, the same machine indicates (seemingly even stronger) nonlocality in the range t ∈ (0.5, 0.785), in line with the conjecture of [35] and the results of [34].
The noise robustness is, however, small. In Fig. 2 an artificial noise is considered by adding a Werner state visibility to the source (1) of ideal experiment (Q = 0, T = 1, ν = 1). The neural network seems to indicate that the points that are "most nonlocal" are t ∼ 0.85 in the proven region (purple interval in Fig. 2), and t ∼ 0.65 in the conjectured region (blue interval). For these two points we tested the tolerance to the physical noises introduced above, see for different values of the transmissivity T and detector efficiency ν. Results show that nonlocality is more robust for t = 0.65, where it is lost when T 95% or ν 95%.
All data was obtained by representing each of the three response function (p A (a|γβ), p B (b|γα), p C (c|αβ)) by a multilayer perceptron of depth 4 and width 20 with rectified linear activation functions. For each target distribution we retrained the neural network independently 30 times and kept the smallest distance among those.

V. DISCUSSION
We have proven how single-photon entangled states can be used to generate an outcome distribution with no classical analogue in a triangle network. The considered setup only requires passive optical elements, namely beamsplitters, phase shifters and photodetectors, and involves a single measurement per observer. Our results not only challenge the current understanding of the nonlocal properties of single-photon entanglement, but also open new perspective for the use of this form of entan-glement for quantum information applications, as they provide the first proposal of an experimental demonstration of genuine network nonlocality.
We have shown that the nonlocality of such proposal has (small) noise-tolerance to natural noises that can arise in its implementation, through a machine learning analysis. Such approach is however not exact, and it remains an open question to prove nonlocality in the noisy regime by other means, e.g. certifying it by inflation techniques [36], which would be crucial for an experimental implementation.
Finally, in the Supplementary Material [32] we show that our main result on the nonlocality of the ideal experimental proposal in the triangle network can be extended to any ring network with N ≥ 3 parties, although increasing the number of parties does not improve the detectability of nonlocality in the proposed experiment with our current techniques. Supplemental Material for "Quantum networks reveal single-photon nonlocality"

I. NOISELESS OUTPUT DISTRIBUTION
Here we derive the form of the noiseless output distribution p Q=0,T =1,ν=1 t (abc) ≡ p t (abc) produced when all the elements of the optical scheme described in Sec. II are perfect.
The initial state shared among the parties is (S1) The action of a beamsplitter with transmissivity t and phase φ is described in terms of the input and output optical modes with creation operators a † i as Consequently, the corresponding unitary induced by the transformation can be derived in the Fock basis by expressing Accordingly, the POVM (6) can be written as where |χ r = √ t|01 − e iφ √ 1 − t|10 , |χ l = √ t|10 + e −iφ √ 1 − t|01 , and where we truncated the Hilbert space considering that the input state consists only of combinations of vacuum and a single-photon excitation. Therefore each party has four possible outputs a, b, c ∈ {0, L, R, 2}, standing for no detector counts , a count in the left detector , a count in the right detector , or counts in both detectors , respectively, described by the POVM above.
The resulting network output has multiple constraints due to the cyclic symmetry of the experiment, due to all the parties using the same value for the beamsplitter transmissivity t (S2), as well as photon number conservation. For example, all outputs of the form (here χ represents any of L or R) p t (000) = 0 , p t (00χ) = 0 , (too few photons would be detected) (S12) p t (2χχ) = 0 , p t (22χ) = 0 , (too many photons would be detected) (S13) are null, due to the fact that there are initially 3 photons in the network, of which at most 2 can end up in the same photodetector.
The non-zero probabilities are, modulo the cyclic symmetry, in the form p(0χχ), p(02χ), p(0χ2), p(χχχ), and are summarised, in order, in the following.
In what follows, we take Φ = 0, as the range of values of t for which the distribution is proven to be nonlocal decreases when Φ = 0 (that is, the following analysis can be performed for an arbitrary value of Φ, and the interval of values of t for which p t is nonlocal is maximised when Φ = 0). Also, note that Φ = φ A + φ B + φ C can be tuned locally by any of the parties.

II. NONLOCALITY OF THE NOISELESS DISTRIBUTION
To prove the nonlocality of the ideal noiseless distribution p t presented above, we take an approach inspired by the one presented in [26]. There, a quantum distribution is proposed, which is based on the same input state in the triangle network (we report it in our notation) , and the following POVM on the two modes X 2 X 1 of each party X = A, B, C (again, we use a notation that makes the comparison easier with the experiment proposed in the present manuscript) where |χ r = √ t|01 − √ 1 − t|10 and |χ l = √ t|10 + √ 1 − t|01 (here we put all the phases φ x to zero, as mentioned above). The output distribution of our experiment is not equivalent to that of [26], as our POVM consists, as described in Sec. I, of Notice that both POVMs Π and Π are a coarse graining of the measurement The POVM Π is the one that would be obtained from the scheme described in the main text if the photodetectors were able to resolve photon numbers, and has thus six possible outputs (cf.(S7)-(S10)). Accordingly, it is possible to define distributions p t , p t , p t , obtained from the state (S18) and applying (respectively) Π t , Π t , Π t at each party modes X 2 X 1 , i.e.
Surprisingly, we prove that p t , p t , and p t , have the same range of nonlocality for the parameter t. That is, for a fixed t, if one among p t , p t , p t , is classically reproducible in the triangle network, then all of them are. At the same time, the infeasibility of one among p t , p t , p t , implies the infeasibility of all of them. From the physical point of view, this means that the possibility of performing perfect number-resolving photodetection does not enhance the "nonlocality" of the output distribution of our ideal experiment, although it may improve its resistance to noise.
To prove the nonlocal equivalence (in the triangle network) of the three distributions p t , p t , p t we proceed as follows: where by "feasibility" we mean the feasibility of classically simulating the distribution with a local model, as from Eq. (2). The first two implications follow immediately, without assumptions on the input state |Ψ , from simple properties of the POVMs involved. Indeed: • The POVM Π can be obtained as a fine-graining of Π via a probabilistic splitting of Π (2) in three outcomes Π (L2) , Π (R2) , Π (2) , which is just a classical local post-processing of the original projector |11 11|.
• The POVM Π is a local coarse-graining of Π and thus p t is classically simulatable whenever p t is.
The last implication requires more effort and we prove it in the following subsections. To do so, we identify constraints on local strategies simulating p t and show that these are the same as those needed to simulate p t , as from [26] (cf. following derivations and Paragraph II C 0 e).

A. Constraints on local models simulating pt
We start by assuming that there exists a classical model that simulates the output distribution p t of the experiment proposed in the main text, and we find the constraints that it has to respect. That is, we assume that indeed p t (which is summarised in Sec. I), can be written as Notice that the classical shared variables {α, β, γ} can be assumed to be real numbers in the [0, 1] interval, and all the randomness of the local statistical responses p X can be absorbed in the distribution of {α, β, γ}, meaning that without loss of generality the local response functions can be taken deterministic, i.e., where A(β, γ) is some deterministic response function. Let X, Y and Z denote the set of possible α, β and γ respectively. Let us define In short, set X P i is the set of α's for which party P can potentially obtain output i, and similarly for the Y P i and Z P i sets for β's and γ's, respectively. We coarse-grain the possible outcomes by grouping outcomes L and R as χ, which means that the possible outcomes are now a, b, c ∈ {0, χ, 2}. Then, according to Sec. I, the set of outcomes with nonzero probability in our setup are (up to permutations) abc ∈ {χχχ, 0χχ, 0χ2}. (S29) Observe that • two 0's never appear at the same time, nor two 2's, • 2 only appears together with exactly one χ and one 0.
These properties are simply due to the fact that the number of photons is conserved, and that at most two photons can end up in the same photodetector. Already from these observations we obtain some structure on the previously defined sets in three steps. We demonstrate the steps for the {X P i } i,P sets, but they can be done with the {Y P i } i,P and {Z P i } i,P similarly.
FIG. S1. The relation of the sets {X P i }i,P to each other.
This is a direct consequence of the previous observation (Eq. S29). There cannot be 4 photons among 4 parties or 0 photons in total for two parties.
Then by definition ∀β, γ : Observe that when α = α * , Alice must not answer a = 2, due to (S29). However, to do this, since she does not know the value of α, Alice must always not answer a = 2. A similar conclusion can be drawn for the other parties, due to the cyclic symmetry. This, however, leads to a contradiction since parties can in general output 2, e.g. p t (a = 2) = 0.
Charlie does not know γ, so if α = α * , he knows he must answer χ for any β, since that is the only symbol consistent with both 0 and 2. Thus we have that ∀β : C(α * , β) = χ.
Say Alice receives γ = γ 2 . Alice does not know whether α = α * or not. Thus, her response must be one that is consistent with the scenario that α = α * . Because of Charlie's response being χ, this implies that for any β she must answer a = 0, i.e.
This means, by definition, that Y A 0 = Y . This implies, after doing steps 1 and 2 for the sets {Y P i } i,P , that Y C 0 = ∅. However, since p t (c = 0) = 0, we arrive at a contradiction. 4. All sets X P 0 , Y P 0 , Z P 0 have probability 1/2 The previous constraints II A 1-II A 3 on the sets X P i are can be summarized as in Fig. S1. We now give a partial quantitative assessment on the size of these sets. Note that by the definition of the sets Y A 0 and Z A 0 we have where in the last step we used the statistical independence of the hidden variables. At the same time, by using the inequality ab ≤ ((a + b)/2) 2 we have Combining the two we see that p(Y C 0 ) ≤ p(Z A 0 ). Repeating the same argument (cyclically) for the other parties we get which implies that they are all equal. Using also (S30) it is clear that all sets with i = 0 are equally probable with probability 1 2 , i.e., Equation (S31) combined with (S30) tells us that Alice, when receiving from Y A 0 on one side and from Z A 0 on the other, will deterministically output 0. The same holds for the other parties (Bob when receiving from X B 0 and Z B 0 , and Charlie when receiving from X B 0 and Y B 0 ). This consideration combined with the previous ones and the definition of the sets (S28), yields a constrained picture of all possible classical models that simulate the coarse graining of p t in the triangle network. This is illustrated in Fig. S2: • Alice outputs 0 when receiving from Y A 0 and Z A 0 . • Alice outputs χ when receiving from Y A 0 and Z B 0 , or when receiving from Y C 0 and Z A 0 (in both cases Alice cannot output a = 2 because of property II A 3).
• Alice outputs either χ or 2 when receiving from Y C 0 and Z B 0 , (further structure can be given using the sets Z B 2 and X B 2 ). Bob and Charlie follow similar strategies when cycling the indices.

B. Breaking up the coarse-graining
Now, the main idea is the following: If there exists a local model for p t (a, b, c) as from Fig. S2, then there should exist a distribution q t (i, j, k, s) representing the parties collective response function (i, j, k = L, R) when the hidden variables α, β, γ come from We cannot directly derive q t (i, j, k, s) from p t (a, b, c), however, we can derive its marginals (see below). These marginals will be incompatible for some values of transmissivity t. For these situations, thus, we can deduce that there does not exist a local model for p t (a, b, c). Additionally, the marginals constraints on q t (i, j, k, s), are the same as in [26] for the distribution p t , meaning that the classical feasibility of p t implies the classical feasibility of p t , as stated in (S25).
To start, consider the two sets Note that S 0 ∩ S 1 = ∅ and the events χχχ can happen if and only if (α, β, γ) ∈ S 0 ∪ S 1 . We define where the indices i, j, k are each either L or R, and the index s is either 0 or 1. This is a probability distribution, since if (α, β, γ) ∈ S 0 ∪ S 1 , then it must be either in S 0 or S 1 , and all parties must output either L or R (hence normalization and positivity are satisfied). Using the definition of conditional probability and the fact that the sets S 0 and S 1 have probability 1/8 (cf. II A 4 and Fig. S2), we see that Marginalizing over s gives us the value of which is given by the parameters of the model, e.g. the transmissivity. Next we would like to express other marginals, e.g. q t (i, s) ≡ jk q t (i, j, k, s), as a function of the target probability distribution. To do this, first note that if b = 0 then α ∈ Z B 0 , γ ∈ X B 0 and either β ∈ Y A 0 or β ∈ Y C 0 . Note that next to a 0 output we can only have the other two parties answering {χ, 2} or {χ, χ}. For q t (i, s) we are, however, interested in the probabilities of a = i, therefore we break up the χ in Alice's response. In terms of probabilities this means where we used colors to simplify the reading, separating the sets in a local strategy on which Bob bases his choice (in blue), from those to which he has no access (in red). From now on we use a shorthand for expressions like this, indicating e.g.
Next, consider the sum where we force Alice to output i, but Bob and Charlie can either output 0 or χ. In other words we are focusing on the χ0χ, χ02, χχ0, χ20 outputs, breaking coarse-graining χ → L, R only in Alice's case. Define the quantity D i A as A few manipulations show that where we first used (S35) (and a similar expression for c = 0), and then that Alice does not have access to α, so the probabilities stay the same under the swap of X B 0 for X C 0 and X C 0 for X B 0 . Finally we identified S 0 and S 1 in the relevant expressions. Hence, we could express the differences of q(i, s = 0) and q(i, s = 1) as an expression of known terms. We also know that the sum is Combining the two we get that Testing qt(i, j, k, s) using linear programming We sum up here the marginal properties (boxed equations in the previous section II B) of the distribution q t (i, j, k, s) found above. These properties are linear constraints on the vector q t (i, j, k, s) which are parametrized by the transmissivity t. A linear program can be implemented to verify if a distribution a q t (i, j, k, s) compatible with these marginals exists.
a. Constraint 0 (normalization). First of all, q t (i, j, k, s) ≥ 0 ∀i, j, k, s and i,j,k,s q t (i, j, k, s) = 1 (S42) that is, it truly represents a probability vector. b. Constraint 1. Then and cyclic combinations (meaning only the number of Ls and Rs matters). c. Constraint 2. This constraint is actually a consequence of Constraint 1, but we write it for completeness.
and cyclic combinations. This last constraint can be made explicit (cf. Sec. I) e. Relation to Ref. [26] and equivalence between p t and p t The constraints defining the linear program above, can be translated to be the same constraints of a linear program found in Ref. [26], where the distribution p t (S23) is considered (in [26] t is identified as u 2 ). Specifically, both the distributions p t and p t are local if a solutionq t (i, j, k, t) to the same linear program exists and can be generated via a local model (cf. [26]). This proves that the local feasibility of p t is equivalent to that of p t . At the same time, the existence ofq t (i, j, k, t) is a necessary condition for the local feasibility of p t . This means that when the linear program fails to find a solution, the nonlocality of p t is certified, while if a solution is found, this does not directly imply the locality of p t .
The Linear Program resulting from the constraint above is infeasible for t ∈ (0, 0.215) and t ∈ (0.785, 1).

III. NOISY OPTICAL REALISATION
As introduced in the main material, after proving the nonlocality of the idealized experiment, in this section we give the modelling details of the imperfections that can arise in the different elements of the optical network presented in Fig. 1, when realized experimentally. We focused on: a. the impurity of the generated single-photon entangled state (Q), b. the transmissivity of the optical channels (T ) of the network, and c. the efficiency of the final photodetectors (ν).
Our results (see Main Material) indicate that the noise tolerance w.r.t. these parameters is of the order of few percentage points, which makes the proposal very stringent from the experimental point of view, but possible on a table-optical experiment with high-efficient detectors.
a. Source imperfections Firstly, we considered a realistic process of creation for the single photon entangled state |ψ + = (|01 + |10 )/ √ 2. This is generated by a single photon sent onto a 50:50 beamsplitter. Typical sources achieve the heralding of single photons from two-photon states created in a SPDC process, followed by the detection of one of the two photons [38,39].
An externally controlled laser pulses at high frequency on a χ (2) non-linear crystal. For each pulse, the crystal consequently outputs a two-mode squeezed vacuum state |Ψ ∝ n q n |nn . Then, photodetection is performed on one of the two modes. Conditioning on a detection allows to isolate a very good approximation of the one-photon Fock state on the unmeasured mode [40]. The trade-off between probability of heralding and quality (fidelity to target) of the heralded state is strongly conditioned by the photodetector efficiency and ability to resolve photon number, as well as the characteristics of the crystal and the laser power, which tune the value of q [40]. Here we chose typical currently achievable values for the SPDC, which we assume to have q = 0.01 and 10MHz frequency of the pulses [41]. The heralding is simulated by currently available number-resolving photodetectors which we assume to have 8-photon resolution achieved with an array of M = 8 single photon detectors pixels, having each a η = 70% efficiency, well in the range of present technologies [42,43]. Conditioning on the firing of a single pixel in the detector, the resulting state in the unmeasured mode can be approximated by where Q ∝ q is the ratio between the chance of obtaining a single pixel firing due to a double-photon hitting the detector, and the chance of obtaining a single pixel fire due to a single photon, i.e.
Note that the probability of heralding is qη and thus for the three sources (of the experiment proposed in the main text) to be heralded at the same time, the corresponding total experimental repetition rate is of approximately q 3 η 3 10MHz ∼ 1Hz [44]. Considering the imperfect state (S49), propagated through a 50:50 beamsplitter, the resulting true source shared by each couple in the triangle network is With the above-mentioned values of q, η, and M , it results Q = 0.006875. Notice that the same single-photon preparation could be done with simple, non-number-resolving (NNR) photodetection. In such a case the value of Q (which we remind, is the ratio between the chance of the detector clicking due to a double-photon, and the chance of a click due to a single-photon), would be where η is the efficiency of the detectors. We see that in such a case Q is bounded to be larger than q, for example with the same values above (q = 0.01, η = 70%), one obtains Q (NNR) = 0.013, essentially double what can be obtained with number-resolving detectors. This is not a huge limitation per se, as we can rescale q to make Q (NNR) smaller. At the same time, halving q makes the total repetition rate of the experiment (∝ q 3 η 3 ) decrease by one order of magnitude.
Finally, let us notice how basing our proposal on the single-photon state |ψ + ∝ |01 + |10 is crucial in our scenario. A unitarily equivalent state is the two-photon state ∝ |HV +|V H , which encodes the information in the polarization degree of freedom. However the creation of such state from an SPDC source typically needs the heralding of the 6photons term |33 from n q n |nn (and 4 photodetectors per source) [31]. This means that even in an ideal scenario in which all detectors have unit efficiency, the probability of heralding the correct state would be ∼ q 3 , and for the whole experiment with 3 sources, q 9 , compared to q 3 for our single-photon proposal. For a 1% error in the source, we chose q = 0.01, which is translated into 12 orders of magnitude of difference in the heralding rate.
b. Losses in the channels Secondly, loss might happen during the transmission along the channels that form the sides of the triangle network of Fig. 1, before the local POVM performed by the parties. We denote by T the transmissivity of these optical channels. The resulting correction due to photon loss can be computed as where Kraus operators of the form act on each of the six modes X i , and the sum is truncated to n = 1, 2 (given the support of input state (S51)). In fact, as we work in the regime of low losses, we only keep the first-order terms in 1 − T in Eq. (S53). c. Detectors Finally, the photodetectors used at the vertices of the triangle (Fig. 1) do not resolve photon number, and are assumed to have a finite, high efficieny ν, thus modelled, at first order in 1 − ν as Notice that high efficiencies close to 100% have been reached by modern photodetection systems [45-49].

IV. GENERALIZATION TO CHAINS OF N PARTIES
In this section, we sketch a generalization of the experiment presented in the main text (which is proposed in the triangle scenario), to a chain of N parties in a circular network. For such case, we generalise the procedure carried out through Sec. II which proves the existence of a range of transmissivities for which the network output is nonlocal.
The generalized experiment is described as follows: N parties A i share a copy of the single photon state |ψ + = |01 +|10 √ 2 for each couple of neighbouring parties A i A i+1 with i = 1, . . . , N (the total network is circular and thus we identify N + 1 ≡ 1). Each party consequently receives two input modes containing at most 1 photon, and performs the same measurement described in the main text (6), and detailed in Sec. I, consisting in a local mixing of the modes with a beamsplitter of transmissivity t, followed by photodetection on both modes. All the parties choose the same value for t and the photodetectors do not resolve the number of photons, thus being described by projective measurements on vacuum and its orthogonal complement M = |0 0|, M = 1 − |0 0|. Consequently, the resulting output distribution is given by p t (a 1 , . . . , a n ) = Tr where the state ψ + ≡ |ψ + ψ + | is shared between each "right mode" of the ith party (A (R) i ) and the "left mode" (A (L) i+1 ) of the following, and each party performs the POVM operationally described above, corresponding to Π t (6) (detailed in Eq.s (S7)-(S10)) on its two modes.
We now put constraints on any possible local strategy aiming at reproducing the same statistical output of p t in the circular network. That is we assume p t can be written as p t (a 1 . . . a N ) = dα 12 dα 23 . . . dα N 1 p A1 (a 1 |α N 1 α 12 )p A2 (a 2 |α 12 α 23 ) . . . p A N (a N |α (N −1)N α N 1 ) where a i is the output of party A i , which is based on a local response on the hidden variables {α i(i+1) , α (i−1)i } shared with his left and right neighbours. In the coarse grained scenario, parties can output 0, χ, 2 as before (χ is the coarse graining of {L, R}, cf. Sec. II), representing the outcomes with 0, 1, or 2 photodetectors firing respectively at each party station. Following Sec. II we define the equivalent of the sets (S28), accompanying the formal definitions with an intuitive notation and explanation of the underlying local model; the sets are represented by arrows that intuitively suggest the direction of "classical photons" in a corresponding local hidden variable model. The following definitions are pictured in Figure S3. We have formally, for the set of sources α (k−1)k between A k−1 and A k , This is the set allowing A k to output 0 for some of the hidden variables that come from the other side. That is, classical photons are not sent to A k from the left.
(→ k ) :={α (k−1)k | ∃ α k(k+1) : A k (α (k−1)k , α k(k+1) ) = 2} (S60) This is the set allowing A k to output 2 for some of the hidden variables that come from the other side. That is, some classical photons are sent to A k from the left.

A. Constraints on the sets
We here derive in this generalized N -party scenario the constraints on any local model reproducing p t corresponding to those obtained for the triangle network (II A 1 to II A 4).
As depicted in Fig. S3 we have, firstly, because otherwise two neighbouring parties A k−1 , A k , would be allowed to output 0 at the same time, which is in contrast with the output of p t (the photon shared between two parties ends up in one of their detectors). Secondly meaning that, together, the two sets form the total set of sources α (k−1)k between A k−1 and A k . This is proven as a consequence of the fact that at least one between A k−1 and A k must be allowed to output 0 (otherwise there would This satisfies q t (i k , 0) − q t (i k , 1) = p t (a k = χ i , a k+1 = 0) − p t (a k−1 = 0, a k = χ i ) 2 N −3 . (S74) The proof of this equation is formalized as follows q t (i k , 0) − q t (i k , 1) The first equality above simply follows from the definition of q t (i k , s) for s = 0, 1. The second equality is obtained by noticing that all sets ( j ) and ( j ) have probability 1/2, and that the output a k does not depend on the source shared between A k+1 and A k+2 , nor it depends on the source shared between A k−2 and A k−1 . The third equality is obtained by tracing out the probability of N − 3 of the sets which were included in the previous lines. Finally the last inequality is implied by property (S68). The above constraints on q t coincide with the ones derived in the Appendix C of [26]. There, it is proven that it is always possible to choose the value of the transmissivity t such that no solution can be found for q t (i 1 , i 2 , . . . , i N , s) satisfying the linear constraints (S73) and (S74). Therefore for those values t the output p t of the experiment is proven to be nonlocal.