New physics with the lepton flavor violating decay $\tau\to 3\mu$

Lepton flavour violating (LFV) processes are a smoking gun signal of new physics (NP). If the semileptonic $B$ decay anomalies are indeed due to some NP, such operators can potentially lead to LFV decays involving the second and the third generation leptons, like $\tau\to 3\mu$. In this paper, we explore how far the nature of NP can be unraveled at the next generation $B$-factories like Belle-II, provided the decay $\tau\to 3\mu$ has been observed. We use four observables with which the differentiation among NP operators may be achieved to a high confidence level. Possible presence of multiple NP operators are also analysed with the Optimal Observable technique. While the analysis can be improved even further if the final state muon polarisations are measured, we present this work as a motivational tool for the experimentalists, as well as a template for the analysis of similar processes.


Introduction
Lepton flavour is an accidental symmetry of the Standard Model (SM), and there are many extensions of the SM, like the seesaw models, supersymmetric SM, flavour-changing Z or scalars, leptoquarks, or left-right symmetric models, that can naturally break this symmetry. Even within the ambit of SM, neutrino mixing provides a source of leptonic flavour violation (LFV), but the rates are too small to be observed in any near future [1]. Thus, observation of any LFV decay is a smoking gun signal of New Physics (NP). In general four types of LFV processes have been looked for: (i) leptonic decays (τ → 3e, τ → 3µ, µ → 3e, τ → 1e + 2µ, τ → 1µ + 2e), (ii) radiative decays (τ → eγ, τ → µγ, µ → eγ), (iii) semileptonic decays ( 1 → 2 M , where M is some meson), and (iv) conversion (like µ → e). They are not all independent, e.g., a flavour-changing electromagnetic penguin can also give rise to leptonic LFV decays. Most of the decays, however, have very stringent limits [1], the branching ratios (BR) being typically of the order of 10 −8 or even smaller.
The interest in such LFV decays have recently been rekindled from the observation that some of the semileptonic B decay modes show anomalous deviations from the SM expectations, which may possibly be explained by lepton flavour non-universality (LFNU) as well as LFV. As an example, let us refer the reader to a recent attempt in Refs. [2,3], where the authors have shown that both R K , R K * and R(D), R(D * ) anomalies can be explained satisfactorily with only two new operators, if the weak and mass bases of the charged leptons {µ, τ } are related by a field rotation [4]. The apparent excess in the LFV decay channel of the Higgs boson, h → µτ , as was once reported by the CMS Collaboration [5], coupled with such B decay anomalies, could lead to some well-motivated and fairly constrained models of LFV [6]. The LFV operators with only leptonic fields cam also be induced by Renormalisation Group (RG) running of semileptonic operators [7]. Thus, there is enough motivation to seriously look into such LFV channels; they might be observed at the Large Hadron Collider (LHC) [8], or dedicated super-B factories like Belle-II. Implications of such LFV decays may also be found in Ref. [9].
In this paper, we will focus solely on the leptonic channel τ → 3µ. The channel has already been studied in detail in the literature; there are both model-independent [10][11][12] as well as partially model-dependent [13,14] studies. The Belle Collaboration has an upper bound on the BR [15] BR(τ → 3µ) < 2.1 × 10 −8 (1) at 90% confidence level (CL). One reason for looking at this particular channel is the possibility of leptonic rotation in the {µ, τ } sector as mentioned above, which invariably leads to such LFV channels out of lepton flavour conserving operators in the weak basis. Another reason, of course, is the relative ease with which the final state muons can be detected in both hadronic and e + e − colliders.
Here we would like to push the studies on LFV in τ → 3µ further by asking and answering a few questions. Observation of even a single τ → 3µ event is a definite signal for NP. Assuming that one observes, possibly at a super-B factory like Belle-II, a few events for the LFV decay in question, will one be able to unearth the nature of the possible operators that can lead to such a decay? It has been shown [10,13] that there can be six independent LFV operators in the chiral basis that lead to τ → 3µ. If the final state muon polarisations are not measured, all the operators are a priori equally probable, and obviously only the number of events will not tell us anything about the presence or absence of any of these operators; it can only yield some estimate on the respective Wilson coefficients (WC). So, are there observables which will help us to differentiate between these operators? We will show that this is indeed possible, without using higher-order differential cross-sections like Ref. [10] which may have very few number of events in each bin. At this point, we also note that some more operators can be generated through Fierz reordering, but obviously they are not independent of the first six, and therefore we will not consider them any further.
The second question that we would like to ask is whether the existence of more than one such operators can be disentangled from the data. Here the answer will be partially positive, unless, again, the muon polarisations are measured. If one can have a sizeable number of events, and measure the muon polarisations too, one may have in principle further observables, but we would like to be conservative. Anyway, as we will show, one does not expect more than 70 events or so at the most with an integrated luminosity L int = 50 ab −1 at Belle-II.
We will use the method of Optimal Observables (OO), which has already been used in different areas of particle physics [16,17], and in particular, for flavour physics [18,19]. This method displays the amount of significance level ("how many sigma" in standard parlance) by which one point in the allowed parameter space can be separated from another point. This is the only way to approach the question of model differentiation before the arrival of the data. Once one has the data, other methods, like the unbinned multivariate maximum likelihood, may be employed.
A related question is the number of events with which one can have a successful differentiation among models, where a model is specified by its operator structure and WCs. As expected, if the number of events is too small, it will be harder to differentiate among various models, or in other words, the significance level will be lower. We will quantify this statement subsequently.
The paper is arranged as follows. In Section 2, we enlist all the possible NP operators that can give rise to the τ → 3µ decay, and the observables that we deal with are discussed in Section 3. In Section 4, we show the differentiation among models with only one NP operator. Section 5 deals with models with two such NP operators, and we discuss how well the presence of the second operator can be found out from various observables. Section 6 summarizes and concludes the paper.

The New Physics Operators
For this section, we will follow the notation and convention of Ref. [13]. The most general LFV Lagrangian can be written as and we will denote the operator accompanying g X IJ (X = S, V, T , and I, J = L, R) as O X IJ . Λ is the cutoff scale, which we will set at 5 TeV for our analysis. We separate the operators into three major classes: S (operators of the form O S IJ ), V (the O V IJ operators) and T (the tensor operators O T IJ ). Thus, the effective Lagrangian is of the form In the above mentioned basis, not all the ten operators are independent; Fierz transformation relates the two tensor operators with the rest, and the pairs O S LL -O V LL and O S RR -O V RR are also related [10]. Thus, only four scalar and two vector operators are enough to span the operator basis. However, we keep all of them for the time being, as the mediator that has been integrated out may give rise to operators that are linear combinations of the six independent ones, like the tensor operators that can be generated from some hypothetical spin-2 mediators.
Writing the BR in terms of the new WCs [13], one may easily show that the decay τ → 3µ has maximal sensitivity to V or T operators. As an example, the present bound on BR(τ → 3µ) < 2.1×10 −8 translates to |g S RL | ≈ 1, while |g T RL |, |g V RL | ∼ O(0.1). Thus, for a given number of events, the reach for V or T WCs is better than that for the S ones.
The left-chiral fields being SU (2) doublets, one can also get a neutrino-antineutrino pair out of the operators O S RR and O V RR , which technically gives an extra contribution to the SM τ decay channel τ → µν µ ν τ . However, the couplings will turn out to be so constrained as not to affect this channel in any significant amount. Similar LFV operators for µ → 3e may affect the extraction of the Fermi coupling G F from the muon lifetime in a measurable way.
One may try to look for τ → µγ by contracting a pair of muons and taking the photon with momentum q µ out from the loop. For the scalar operators, the contribution vanishes in the q 2 = 0 limit, and for the vector operators, this amounts to charge and not the transition magnetic moment renormalisation.
To begin with, we will consider the presence of one operator at a time. This generates six independent models spanning over the S and V classes. Next, we will consider the presence of two operators at a time, which include the well-motivated combinations like and similarly for the V class. Our goal will be to pinpoint whether or not these two coupling scenarios can be differentiated from those involving only one coupling at a time.

Observables
In this section, we will define the observables which we have used in our analysis to differentiate the effects of different NP operators. As an example, we consider the S class of operators, O S IJ , taken from Eq. (3), and consider the decay τ − → µ − µ + µ − . The double differential cross-section for the antimuon is given, after integrating out the phase space for the two muons, by where we take the muons to be massless, and use the notation Here, T τ = 1/Γ is the lifetime of the τ lepton, x = 2E µ /m τ is the reduced energy of the antimuon, and θ is angle between the polarization of the τ and the momentum of the antimuon, following the convention of Ref. [11]. For further discussion, let us define Thus The number of events gives the information only on the combination g 1 − 1 4 g 2 .
Another observable is the observed integrated forward-backward asymmetry A F B , defined as with where σ Prod is the τ production cross-section, L int is the integrated luminosity, and is the combined detection efficiency in the τ → 3µ channel.
We will also define the x-dependent asymmetry, normalised to the total decay width, as where N = N F + N B gives the total number of signal events.
Instead of the antimuon, one can play an identical game with one of the same-sign muons (i.e., the one with the same sign as the decaying τ ), say the more energetic of the two. Let us define where y = 2E µ /m τ is the reduced energy of the more energetic same-sign muon, and α is angle between its direction and the polarisation of the τ lepton.
In an analogous way to Eq. (11), one can define As will be shown later, the observables A F B and A F B are useful to differentiate the sensitivities of the subtypes of operators within a particular type, say S or V.
Similarly, for the V class of models, one obtains, in an analogous way to Eq. (5), The corresponding BR is where, From Eqs. (5), (8), (15), and (16), one finds that V-type operators generate more events than S-type operators, if the orders of magnitude of their WCs are similar. The number of events as well as the angular distribution depend on the model subtype. We refer the reader to Fig. 1, which shows this explicitly. ). The horizontal line shows the present limit. The results for g I LR are identical to those for g I RL , and the results for g I LL are identical to those for g I RR .
For the same-sign muon, one gets with In an analogous way to Eqs. (11) and (14), one can define A F B (x) and A F B (y).
While we will not discuss the T-type operators separately, the double differential decay distribution is given by, where, Thus

Analysis
In this section, we discuss the current and future sensitivities of the S, V, and T operators on the observables B τ , A F B . and A F B . The next subsection deals with the simplified cases where only one operator is considered to be present at a time. While this is instructive and sheds a lot of light on the differentiating power of the observables, a more realistic scenario might involve more than one operators. Thus, in the next Section, we discuss the cases where two operators are simultaneously present, and try to see whether those two-operator models can be separated from the single-operator ones.

One Operator Models
Let us assume, to start with, that only one out of the ten possible operators shown in Eq. (2) is present, notwithstanding the fact that not all of them are mutually independent, some being related to the others through Fierz rearrangement. In Fig. 1, we show how the branching ratio B τ depends on the WCs g S,V,T RL and g S,V RR . Identical plots are obtained if one replaces LR with RL, and RR with LL. This R ↔ L symmetry is true for all subsequent observables and their distributions, which reduces possible independent cases worth discussing by a factor of 2.
Given the combination IJ, if theory tells us the approximate magnitude of the WC g X IJ , even with the number of events as the sole observable, one can almost immediately differentiate X = S case with X = V or T . With higher statistics even a differentiation between V and T may be possible. The present limit on B τ is translated to |g T RL | ≤ 0.13, |g V RL | ≤ 0.20, |g S RL | ≤ 0.80, |g V RR | ≤ 0.28, and |g S RR | ≤ 0.57. However, as we do not have any a priori knowledge of the magnitudes of the WCs, we have to look for some other observables and use the number of events as a normalisation. In other words, we will assume that the total number of events has been given to the community by the experimentalists and see how much extra information we can extract.
In Fig. 2, we show how the differential rate dB τ /dx varies with the muon energy variable x for a fixed value of the WC, set at 0.1. With the normalisation included, the area under the curve gives the total number of events in different x-bins. Note that due to possible paucity of events, one may have only a few bins, 2 or 3, before the data starts thinning out too much to have any statistical significance. Thus, the continuous distribution showed in Fig. 2 is an idealised scenario. Even then, what we find is that the number of events will be markedly different for different classes of operators if the WCs are of the same order, which is very much along the expected line. On the other hand, the asymmetry variables A F B (x) or A F B (y) must show identical pattern for all operators, S, V, or T, with a fixed chirality structure, because the overall normalisation cancels in the ratio.
The next task would be to differentiate among the various chiral subclasses of a particular class of model. For illustration, we will take the S class of models, and consider the presence of one S-type operators at one time. As the sensitivities are higher for V and T classes, whatever results one has for S will only be more enhanced and pronounced for other classes. At the same time, if the underlying In the single-coupling scheme, we consider four different models, depending upon which operator contributes, and denote them as If only one operator contributes, A F B (x) becomes a function of x only and does not depend on the magnitude of the WCs: The integrated asymmetry A F B (i) for the i-th model can be obtained by integrating x ∈ [0 : 1], and the values are There is a zero crossing only for models 2 and 3 at x = 1 2 .
Similarly, for the forward-backward asymmetry A F B (y), we find While all of them show zero-crossing, for the last two models such crossing occurs almost at the end of the kinematic range at y = 7/8. The integrated asymmetries are (for y ∈ [0 : 1]) The A F B (x)s for different models are shown in Figs. 3a and 3b. Similarly, A F B (y)s are shown in Figs. 3c and 3d. In these figures, every theoretical line has broadened out to a thick band, often overlapping with each other. This happens because the number of the events is limited. For every x (y), the error margin in A F B (x) is approximately given by where are the statistical errors in the number of events in the forward and backward directions respectively, and δN = √ N .
The expression for A F B (y) is analogous. We have not considered the correlation between N F (x) and N B (x); depending on the sign of the correlation, the expression can be an overestimation or underestimation, but as we do not have any a priori knowledge of the distribution, it is better to stick to zero correlation. The bands in Fig. 3 indicate the 1σ error margins. Clearly, the resolving power is much less for 20 events than with 50 events.
Because of the probable paucity of events, the asymmetries may be measured only with a limited number of bins. But even with two bins, low-x (0 < x < 0.5) and high-x (0.5 < x < 1), one should be able to differentiate between competing models.
The existing bound on τ → 3µ comes from the analysis of 782 fb −1 data from the Belle collaboration [15], and 468 fb −1 data from the BaBar collaboration [20]. With a production cross-section of 0.919 nb for the τ + τ − pairs, one gets 720 million such pairs at Belle and 420 million at BaBar. For 50 ab −1 of integrated luminosity at Belle-II, one expects N P = 4.6 × 10 10 τ + τ − pairs. With a detection efficiency of 7.6% [15], and using the present bound given in Eq. (1), the maximum number of such events is about 73. For our discussion, we will use two scenarios: one with N = 20 and the other with N = 50. Note that the errors are only statistical in nature. There may be other uncertainties, like fixing the direction of the τ polarisation, which will widen up the bands, but that effect is expected to be small with the τ detection ability of Belle-II.
If A F B turns out to be positive (negative), the viable models are 1 or 3 (2 or 4). Similarly, the positive (negative)-A F B models are 2 and 4 (1 and 3). Thus, measurement of only the sign of these asymmetries leave us with a twofold ambiguity. However, x(y)-dependent asymmetry measurements have the ability to resolve the same; S and V-type models behave identically. It is enough to measure the asymmetry in two bins: low x(y) bin for 0 ≤ x(y) < 1 2 and high x(y) bin for 1 2 ≤ x(y) ≤ 1. If these measurements were precise enough, it would have been sufficient not only to pinpoint the model but also to explore whether more than one operators are contributing 1 . Unfortunately, with a limited number of events, the measurements cannot be that precise.
Our results dB τ /dx and dB τ /dy in the single-coupling schemes are shown in Fig. 4. The lines broaden out into bands if we take the errors and uncertainties into account. Such broadening, in all probability, will make the lines indistinguishable from one another. However, all these models can be separated from each other from the asymmetry measurements, particularly in the high-x(y) bin.

Two Operator Models
Once we establish that given enough events (∼ 50), it will be straightforward to differentiate between several one-operator models, the next question is: what if the data is not compatible with any of them? Note that for the single-operator scheme to stand good, all the observables, and not only a few of them, have to be in the right ballpark specified by that model. However, even the principle of Occam's Razor may not be enough reason for not invoking the double-operator scheme. We will, as before, be confined within the S class of models, and consider the cases where two WCs are nonzero at a time. As we have shown, this will be the case if the underlying theory forces the muon current to be pure S or P 2 . Thus, the question we ask is: If the new physics is described by two τ → 3µ operators, with what confidence level can we differentiate that from those cases where only one of them is present? Note that the number of events will act as the tightest constraint on the parameter space. We will try to differentiate these models, hereafter called O2 for two effective operators, from that with only one operator, which we call the 'seed' model. For example, we consider the following O2 models: where the first operator is treated as the seed.
Models B and D are different, because of different seeds. The seed models are chosen in such a way as to have positive A F B for the opposite-sign muon for them; the negative A F B models will have a corresponding relationship, which can be obtained by flipping L and R, L ↔ R 3 . Let us mention here that the confidence interval contours will depend on the type of seed operators being considered.
For this part of the analysis, we will use the Optimal Observable (OO) technique. For a detailed discussion on this technique, we refer the reader to Refs. [16,17]. In the context of B decays, this method has been applied in Refs. [18,19]. The essential point of the OO technique is that this gives the optimal set of observables (which are in general functions of experimental observables) with which two points in the parameter space of different models can be differentiated with maximum efficiency. In other words, this gives the maximum possible separation, in terms of confidence level, between two points in the parameter space as a function of experimental observables. In practice, the systematic errors reduce the confidence intervals.
As has been shown in Refs. [18,19], this method is all the more useful when one does not have any experimental data; in the presence of data, one can do a maximum likelihood analysis. This also means that not all the systematic uncertainties are taken into account. Thus, OO acts more as a motivational tool to the experimentalists than as an instrument for detailed quantitative theoretical studies.
Even with only the S class of operators, the parameter space of WCs is four-dimensional. A complete analysis is not only cumbersome but also of very little help in the real-life scenario where the number of events will definitely be below 100 and therefore a fine scan of the parameter space, with a twodimensional binning on x(y) and cos θ(cos α), will have so few events per bin as to make the analysis meaningless. The only constraint on the WCs comes from the non-observability of the decay.
In the OO technique, one writes any observable O, depending on a variable φ, as which can be generalised to a set of variables denoted by φ. Here, all the f i s are independent, and C i s are some constants. The major goal of this technique is to extract C i s. In our case, C i s will be functions of g i s and g i s defined earlier. Our analysis can be done by defining a quantity analogous to χ 2 , such as The C 0 i s are called the seed values, which can be considered as model inputs. The covariance matrix V ij s are defined as In the above expression of V ij , σ T = O(φ) dφ and N = σ Prod L int B τ , as defined earlier. For a specific model, χ 2 gives the confidence level separation between the seed value C 0 i (seed model) and the model under consideration, parametrized by C i .
Looking at Eq. (33), it is clear that the shape of the fixed χ 2 hypersurface depends on V −1 ij , and the centroid of that (where χ 2 = χ 2 min = 0) changes with the seed values. These fixed χ 2 surfaces are what determines the separation between models essentially. Thus separation between any two models 1 and 2, with seed at 1, will in general be not equal to the separation when the seed is at 2. This is the reason for treating Models B and D separately in Eqs.
We show our results for the S class of models; V class of models will show identical results. The observables that we use are A F B (x), A F B (y) (both defined in the previous section), as well as dB τ /dx and dB τ /dy, the expressions for which can be obtained from Eqs. (5) and (13): From Eq. (11), one can write For 50 events, B τ = 1.43 × 10 −8 . Similarly, Before we show our results for all the 6 models, let us mention a few important points here.
1. The determination of χ 2 involves an integration over the variable φ of Eq. (32). If over the region of integration the observable for the seed model becomes zero for any value of φ, the integration diverges. Thus, one has to cut off such badly behaving regions. For example, if the observable for the seed model becomes zero at the end points, say a and b, one has to perform the integral between a + and b − , where is taken to be so small as not to affect the observable (like, say, the number of events). More concrete examples are given below.
2. One may ask why we do not use a two-variable analysis and use the double differential cross-section as the observable. This would have certainly been useful, and more powerful as an analytical tool, if we could manage a large number of events so that even the two-dimensional bins have enough number of events. With a small number of events, such an analysis would not give much useful information.
All our observables depend on only two functions, f 1 and f 2 , with the argument being x for the opposite-sign muon and y for the like-sign more energetic muon. Depending on the observables, the combinations C 1 and C 2 are as follows.

Observable: A F B (x)
Model Seed Second operator We show the coefficients C 1 and C 2 in Table 1, and Fig. 5 displays our results for Models A-F. As the plots are not self-explanatory, let us clearly specify what they mean. The diagonal band with negative slope, in each of the plots, represents the allowed region in the parameter space of the various O2 models. Only the two relevant WCs are taken to be nonzero, keeping the others fixed at zero. Once the experimentalists obtain a certain number of events, this will specify a line in the two-dimensional parameter space over which the allowed models, each of them specified by some WCs, may lie. The exact position of the line will depend on what model one chooses, but the analysis must take into account the constraint imposed by this line 4 . The uncertainties in the data will broaden the line to a band, whose width will ultimately depend on the number of events as well as the detector parameters. As a very rough guess, we take √ N to be the width for the line with N events. The plots are drawn for N = 50; thus, the band includes all the points for which the number of events lie approximately between 43 and 57. The separation contours are drawn on these bands only. We expect the bands to be narrower in actual experiments.
Let us consider Fig. 5a. This takes |g S RL | 2 = 0.435 as the seed value. The plot tells us that this one coupling model can be differentiated from the one with |g S RL | 2 = 0.2 and |g S LL | 2 = 0.1 at more than 3σ if we have approximately 50 events and use A F B (x) as our observable. Similarly, the model with |g S RL | 2 = 0.35 and |g S LL | 2 = 0.05 can be separated from the above mentioned seed model by less than 2σ. The actual numbers should be even worse as the systematic uncertainties will also creep in. Similar conclusions hold for Models B, C, D, E and F, for which the results are shown from Figs. 5b to 5f respectively. As we mentioned before, contours for Models B and D are not the same, although they involve the same set of operators. This is because the seed is different, which ultimately control the correlation matrix.
For models D-F, the seed model has a zero crossing for A F B (x) at x = 1 2 . Unlike in the case of the differential decay distribution and observables proportional to it, the integrated observable in this case may become negative in different parts of the parameter space. This makes the covariance matrix V ij not positive definite. We note here that for our purpose, i.e. to construct χ 2 , the integrated observables serve only as the normalization of V −1 ij (= M ij ). We have taken the modulus of the integrand for each value of x(y) for this reason. This, while keeping the nature of the error ellipsoids intact, will always keep the covariance matrix positive definite. On the flip-side, this makes the integral diverge at the zero crossing point. Thus, to evaluate the correlation matrix V −1 ij and to cancel this divergence, one has to remove the tiny patch 0.495 < x < 0.505 from integration. This has only a negligible effect on the number of events, but keeps the necessary integrations convergent.   Fig. 5, for which Models A-F are separable from the respective seed models at 3σ level. The integrated asymmetries are also shown.
In Fig. 6, we show, as an illustration, the behaviour of A F B (x) for Models A-C vis-a-vis the seed Model 1 and Models D-F with seed Model 2, for which the differentiability is at the 3σ level. The corresponding WCs, extracted from Fig. 5, are displayed in Table 2. We note that Models A-C can be differentiated from the seed Model 1 with only |g S RL | 2 for all values of x. On the other hand, Models D and F can be differentiated from seed Model 2 with only |g S RR | 2 (Model 3) only for medium values of x, and zero-crossing of A F B (x) plays a crucial role.

Observable: A F B (y)
In an analogous way, one can use the more energetic of the like-sign muons, and the corresponding asymmetry A F B (y). The coefficients C 1 and C 2 , from Eqs. (13) and (14), are shown in Table 3.  Table 4: The WCs, as obtained from Fig. 7, for which Models A-F are separable from the respective seed models at 3σ level. The integrated asymmetries are also shown.
In Fig. 8, we show the distribution of A F B (y) for Models A-F, comparing A-C with Model 1 as seed and D-F with Model 3 as seed respectively. The corresponding WCs are given in Table 4. While a 3σ separation between the models is possible, one notes that the differentiation works best in the middle-y region, rather than at the endpoints.

5.3
Observable: dB τ /dx and dB τ /dy Study of the differential BRs is instructive. First, let us refer the reader to Tables 5 and 6 respectively for the coefficients C 1 and C 2 in all the models considered. For both dB τ /dx and dB τ /dy, this shows immediately that Models A and B must yield identical distributions; same is true for the pair D and F. This is because the BR does not depend on the change R ↔ L. Models C and E are very poorly differentiable from their respective seeds (at less than 1σ) and so we do not discuss them any further, neither do we show the corresponding separation plots. Even though the pattern seems similar, there is an important difference. With dB τ /dx as the observable, we can separate Models A(B) or D(F) from the corresponding seed models at 3σ or more, depending on the respective WCs. This can be seen from Figure 9, as well as Table 7. With dB τ /dy as the observable, there is no available parameter space with 50 events where any model can be separated  at more than 2σ from the seed models. This is why we do not show the corresponding plots for dB τ /dy. Thus, as far as the measurement of the number of events in different energy bins goes, it is preferable to detect the unlike-sign muon, than one of the like-sign muons.
Model Seed Second operator  Table 6: C 1 and C 2 for different models. The observable is dB τ /dy.

Conclusion
In this paper, we focus on the LFV decay τ → 3µ. This is of crucial importance in the light of semileptonic B-decay anomalies, which hint at some new physics involving second and third generation leptons, probably a mixing among the charged leptons. The present limit on this mode translates to ∼ 70 events at the most at Belle-II with 50 ab −1 integrated luminosity. While even a single event will unequivocally indicate new physics, we try to answer a more ambitious question: is it possible to say anything about the underlying operators from the observables? Needless to say, the answer will be vital for model builders.  Fig.9, for which Models A and D can be differentiated from the respective seed models at 3σ level. have been measured. As far as the present technologies go, this is not easily attainable. However, as we show, one can form other observables, which are relatively clean and at the same time can yield significant information. One of the observables is the asymmetry of either the unlike-sign muon, or the like-sign more energetic muon, which is to be measured with respect to the initial τ -polarisation direction. If one can measure the asymmetries, even with the associated error margins, in different energy bins, this can differentiate between the different types of operators in a particular class (scalar, vector, or tensor).
Another important observable, as expected, is the number of events in different energy bins, either the unlike-sign or the like-sign muons. Just like the asymmetries, it can potentially differentiate among the different chiral structures of the operators, although to a lesser extent. Given the total number of events, one can also have an idea of the magnitude of the relevant WCs. We expect more events for V or T type operators, so their WCs, g V IJ or g T IJ , can be probed better.
It may so happen that there are more than one NP operators. A typical case is when the muon current is purely vector or axial-vector in nature. If we have enough number of events (∼ 50), we should be able to say whether there is only one underlying operator or two. Asymmetries are the better observables, but the distribution of the number of events can also help and act as complementary ones.
One must, however, remember that such an analysis involves the risk of underestimating the errors by neglecting the systematic uncertainties. Thus, this is to be seen more as a motivation to the experimentalists. Once the data is available, other powerful analysis methods, like the maximum likelihood, can be applied.