Massive black hole binaries in LISA: multimessenger prospects and electromagnetic counterparts

In the next decade, the Laser Interferometer Space Antenna (LISA) will detect the coalescence of massive black hole binaries (MBHBs) in the range $[10^4, 10^8] \, \rm M_{\odot}$, up to $z\sim10$. Their gravitational wave (GW) signal is expected to be accompanied by an electromagnetic counterpart (EMcp), generated by the gas accreting on the binary or on the remnant BH. In this work, we present the number and characteristics (such as redshift and mass distribution, apparent magnitudes or fluxes) of EMcps detectable jointly by LISA and some representative EM telescopes. We combine state-of-the-art astrophysical models for the galaxies formation and evolution to build the MBHBs catalogues, with Bayesian tools to estimate the binary sky position uncertainty from the GW signal. Exploiting additional information from the astrophysical models, such as the amount of accreted gas and the BH spins, we evaluate the expected EM emission in the soft X-ray, optical and radio bands. Overall, we predict between 7 and 21 EMcps in 4 yrs of joint observations by LISA and the considered EM facilities, depending on the astrophysical model. We also explore the impact of the hydrogen and dust obscuration of the optical and X-ray emissions, as well as of the collimation of the radio emission: these effects reduce the number to EMcps to 2 or 3, depending on the astrophysical model, again in 4 yrs of observations. Most of the EMcps are characterised by faint EM emission, challenging the observational capabilities of future telescopes. Finally, we also find that systems with multi-modal sky position posterior distributions represent only a minority of cases and do not affect significantly the number of EMcps.


I. INTRODUCTION
The Laser Interferometer Space Antenna (LISA) [1] is planned for launch in 2034 and will detect gravitational waves (GWs) in between [10 −4 , 10 −1 ] Hz. Among other sources, LISA will detect the coalescence of massive black hole binaries (MBHBs) in the entire Universe up to redshift z ∼ 20 before the epoch of re-ionization [2][3][4][5][6][7][8] and in the range of masses ∼ [10 4 , 10 8 ] M in the nearby Universe. Detecting GWs from these sources will allow to reconstruct the merger history of MBHBs, disentangling the astrophysical processes and mechanisms driving their formation and evolution [9][10][11][12][13][14][15][16][17], perform tests of general relativity [18,19] and constrain cosmological scenarios [20][21][22][23][24][25] (see [26,27] for recent reviews on cosmological and fundamental physics implications of LISA). * mangiagli@apc.in2p3.fr Compact binaries emitting GWs can be considered "standard sirens", because they provide access to the source luminosity distance d L . The latter is indeed encoded in the waveform and can be extracted directly, without resorting to a cosmic distance ladder, as necessary for type Ia supernovae (SNIa) [28]. However, GWs alone do not provide the redshift of the source. In the presence of an electromagnetic (EM) counterpart, the redshift can be obtained identifying and observing the host galaxy with EM facilities; this information can then be used to construct the d L −z diagram and constrain cosmological parameters [20,29] (see also [30][31][32] for groundbased detectors). If no EM counterpart is present, statistical methods can be employed to infer the cosmological parameters, if enough GW sources are available [33][34][35][36][37][38][39].
In the case of MBHBs, the presence of an EM counterpart accompanying the GW signal has long been discussed in the literature and the situation is still unclear, mostly due to the lack of observational evidence. In the presence of a sufficient amount of gas in the close envi-rons of the binary, an EM counterpart can be triggered by the accretion of the gas onto the binary during the inspiral, merger or ringdown [40][41][42][43][44]. The binary motion is expected to excavate a cavity in the circumbinary disk, while gas streams from the inner edge of the disk should form minidisks surrounding each BH, contributing to spectral features and variable EM emission at various wavelengths previous to merger [45][46][47][48][49]. Moreover, the orbital motion of the binary is expected to imprint a modulation on the EM counterpart from minidisks in phase with the GW signal, allowing for the possible identification of the host galaxy in the field of view provided by LISA [50][51][52]. Additional features can appear at or after merger, for instance an increase in jet power [53], high accretion rate episodes similar to Active Galactic Nuclei (AGN) emission [42], spectral or transient features caused by gravitational recoil [54,55].
In this work, we present different scenarios for the EM counterpart of MBHB mergers, exploring the potential of multimessenger observations with LISA and future EM facilities. This is the first of a series of papers, sharing the common objective of upgrading the analyses of Klein et al. [15] and Tamanini et al. [20] (hereafter T16), to provide up to date forecasts on the ability of LISA to constrain MBHBs parameters (especially the sky position and the luminosity distance) with the final aim of probing the expansion of the Universe. For this reason, we present here detection strategies always including the redshift determination. Only in Sec. VII B, for comparison, we provide the predicted numbers of MBHBs mergers with associated EM emissions without imposing the redshift determination. Among the other papers, one will focus on the construction of the MBHBs standard sirens catalogues and on the inference of the cosmological parameters, while in the others we will discuss extensively the parameter estimation of the GW signal for this type of sources.

II. GENERAL STRATEGY
In order to provide updated forecasts for multimessenger detection of MBHBs with LISA, we improve and complement the EM counterpart types proposed in T16, as well as the MBHBs parameter estimation with LISA.
Concerning EM counterparts, as put forward in T16, several options can be envisaged. First of all, if either the AGN or the host galaxy are sufficiently bright, and the LISA sky localization error small enough, the system can be identified and its redshift directly measured. Another possibility is the formation of a radio jet or flare during/after the merger, to be detected in the sky localization area provided by LISA. This would allow us to pinpoint the GW source sky position, with subsequent identification of the host galaxy. The source redshift could then be estimated either spectroscopically or photometrically with an optical telescope. Similarly, the Xray emission associated to the MBHBs could also be used to identify the GW source sky position, and in turn to determine the host galaxy.
In the context of the counterpart types described above, we consider in this work specific EM observatories. For the direct optical identification of an AGN at the time of the MBHB merger, we consider the Vera C. Rubin Observatory [56,57]. We assume that the identification via the radio emission is performed by the future radio telescope Square Kilometre Array (SKA) [58]. In addition, we also explore the possibility to detect the X-ray EM counterpart with the Advanced Telescope for High ENergy Astrophysics (Athena) [59][60][61]. Once the galaxy is identified from the radio or X-ray emission in the sky localization error region provided by LISA, the redshift measurement can be obtained with the Extremely Large Telescope (ELT) [62] as an example of a telescope with a 30 m mirror or, if possible, directly with the Rubin Observatory.
In summary, we will analyse 3 observational scenarios: with several variations, detailed in Section IV and V to bracket the uncertainties. As a starting point, we need a population of merging MBHBs. Following T16, we adopt the result of semianalytical models (SAM) [14,[63][64][65] to track the evolution of BH masses, spins and surrounding gas across the cosmic time. Specifically we consider three models to explore different seed and time-delay prescriptions: 1. Pop3: a light-seed model with delays included where BHs form from very massive metal-poor stars at high redshift [66,67]; 2. Q3d: a heavy-seed model with delays included where MBHs originate from the collapse of protogalactic disks [68,69]; 3. Q3nd: a similar heavy-seed model [68,69] but without delays between the galaxy and the BH merger, leading to more events but skewed toward higher redshift. In this sense, this scenario can be considered as optimistic in the number of predicted MBHB mergers.
These models predict the merger rate, the intrinsic binary properties (masses, spin magnitudes and orientations, luminosity distance) and the properties of the host galaxy (amount of mass in gas and stars, mass in the disk, etc.). For each model, we use a catalogue containing 90 years of data. We assume 4 years of LISA observations, corresponding to an overall mission duration of 5 years with 80% duty cycle of data taking. We further complete the catalogues by assigning randomly to each event the sky position (uniform over the sky sphere), orbit inclination (random in [0, π]), polarization (random in [0, 2π]), coalescence phase (random in [0, 2π]), and merger time (random over [0, 1] years [70]). Given the simulated MBHB population, in order to reproduce the actual observational process, the next step should be to perform parameter estimation of the GW signal for each of the MBHBs in the catalogues, to infer the sky localization error. If the latter is small enough, one would then turn to evaluating the detectability of the EM counterpart.
In this work we use the Bayesian Markov Chain Monte-Carlo (MCMC) approach of [71] for the LISA parameter estimation, improving on the Fisher forecast of T16. However, this implies that the parameter estimation of the GW signal is the most computationally expensive step. Moreover, not every system in the catalogues is expected to produce a detectable EM counterpart, as the emission might be too faint, or the merger might happen in a dry environment. Therefore, we choose to assess the detectability of the EM counterpart in the first place. We select the systems whose fluxes (or magnitudes) are greater (smaller) than the corresponding threshold values for each of the listed EM facilities, and we run the parameter estimation only on this subset of events. Applying an initial cut in the EM detectability allows us to reduce the number of sources for which we have to perform the parameter estimation, limiting the computational effort.
For the subset of systems with detectable EM counterpart, we simulate the full inspiral-merger-ringdown GW signal in LISA using the waveform model PhenomHM for circularized binaries with aligned spins [72], further select those with signal-to-noise ratio (SNR) above 10, and we estimate the binary parameters of these systems with the MCMC.
In order to detect an EM counterpart (especially if they are transients close to merger [41]), telescopes must be pointed to the expected sky position of the GW source inferred by LISA (similarly to the alerts provided by LIGO/Virgo). For this reason, we apply a cut in the sky localization uncertainty. Together with the above mentioned SNR level and the detectability of the EM counterpart, we further require that the binary systems satisfy ∆Ω < 10 deg 2 to guarantee detection with the Rubin Observatory and SKA, or ∆Ω < 0.4 deg 2 to guarantee detection with Athena (see also Section IV C for another possible strategy).
Following this procedure, we define as multimessenger candidate (hereafter MMcand in the figures) any MBHB system within the catalogue that satisfies the following two conditions: Multimessenger candidate: 1. The system has a detectable EM counterpart; 2. The system has GW SNR > 10.
According to this definition, a multimessenger candidate is a system that can be detected by LISA and by any EM facility, but for which we impose no restrictive requirement on the sky localization. In other words, multimessenger candidates are mergers detectable by LISA for which the EM emission would be observable if the sky position were known with a certain accuracy.
We then define as GW event with EM counterpart (hereafter EMcp) any system that satisfies the following two conditions: GW event with EM counterpart: We stress again that, according to the definition of the three observational strategies at the beginning of this section, we require the redshift determination both for multimessenger candidates and EMcps.
An important caveat of our analysis is that the cuts in the number of GW events performed to implement the observability of the EM counterpart concern the magnitude level and the sky localization, but not the event sky position. This is particularly relevant for Earth-based observatories (i.e. the Rubin Observatory, SKA and ELT), which cover only a fraction of the sky. The only way to implement a sky fraction cut would have been to apply an overall reduction factor on the number of GW events with EM counterpart, corresponding to the sky fraction covered by each facility, and elaborate some technique to account for the detection of the same event by multiple telescopes. Instead of adopting this crude method, we have decided to neglect the observable sky fraction as a whole when we implement the detection with Earthbased telescopes. This rather optimistic choice can lead to an overestimation of the number of predicted GW events with EM counterpart, with respect to those that might effectively be observable with actual data. We consider this a minor issue, compared with the uncertainty inherent in astrophysical MBHB evolution scenarios.
The paper is organised as follows. In Section III we describe the MBHBs catalogues and the physics of the SAM that affects the formation and evolution of the MBHBs. In Section IV we present how we model the EM counterpart to MBHB mergers for different wavelengths and telescopes. In Section V we describe a scenario where the EM flux is reduced by the surrounding gas. In Section VI we describe the tools we adopted to simulate the GW signal and to perform the parameter estimation. Our main results are reported in Section VII. In Section VIII we analyse multi-modal systems, i.e. systems whose sky localization inferred by LISA is dislocated in several portion of the sky. In Section IX we conclude with some final remarks and comments. In Appendix A we compare our results with previous works in the literature, identifying the reasons of the discrepancies, when applicable. In Appendix B we discuss briefly the role of the SNR and other binary parameters in determining the number of EMcps.
Finally in Appendix C we present some figures useful for discussion.

III. CATALOGUE OF MBHBS
The MBHB populations adopted in this work are based on a semi-analytical galaxy formation and evolution model [14,[63][64][65] (the same model is employed also in Belgacem et al. [25] and in T16). We refer the interested readers to the original papers and here we summarise only the general features of the model.
The model evolves dark matter merger trees from a Press-Schechter formalism along with the galactic baryonic structures, accounting for the complex interplay between the multiple components (intergalactic medium, interstellar medium, disk and nuclear cluster properties and the massive BH). Among the physical aspects that affect the mass distribution and the merger rate of MB-HBs, we focus on two that have been shown to have a strong effect: the seed prescription, which defines the starting point for the MBH growth, and the time delay between the galaxy merger and the MBHB coalescence.
The light-seed prescription assumes that the first BHs form at high redshift (z > 15 − 20) in the range ∼ [10 2 , 10 3 ] M from the collapse of heavy Pop3 stars in the most metal-poor dark matter halos. We will refer to this seed model as 'Pop3'. In the heavy-seed prescription, most of the mass in a protogalactic disk collapses into a supermassive star or a quasistar leaving behind a MBH in the range ∼ [10 4 , 10 5 ] M at z ∼ 8 − 15. The heavy seeds are considered to be rarer than the light seeds due to their particular birth environmental conditions [73,74].
The time delays represent the time between the merger of the galaxies and the coalescence of the MBHBs. During a galaxy merger, the two MBHs migrate toward the center owing to dynamical friction [75] operating on the individual MBHs. Dynamical friction generated by the interaction between a massive perturber and the stellar and gaseous distribution decelerates the perturber. If dynamical friction is sufficiently effective, the MBH orbits will decay until they find each other at the center of the galaxy merger remnant and form a bound binary. At this point the MBHs have typical separations of sub-parsec to a few parsecs, depending on the binary mass, still far from the orbital scale at which GWs can efficiently subtract energy and orbital angular momentum to the binary leading to the final coalescence ( 10 −3 pc). Additional processes are therefore needed to further shrink the orbit: energy exchanges in three-body interactions between the MBHB and nearby stars (a process referred to as stellar hardening), gas torques in circumbinary discs, and scattering between the MBHB and a third incoming MBH are included in the semi-analytical models of [65], which are used for this paper in order to account for how the MBHB crosses from parsec to milliparsec scales. There are large uncertainties in all these steps and assessing the efficiency of these mechanisms is beyond the scope of this work. Here we summarise briefly how time delays have been derived in [65] and we refer to T16 and [65] for more details. In a gas-rich environment with M res > M tot , where M res is the mass of the reservoir gas available for accretion [76] and M tot = m 1 + m 2 is the binary mass (in our work we set m 1 > m 2 with m 1 the mass of the primary BH), the time delays are set by the viscous time of the nuclear gas (see Eq. 29 in [65]), that is supposed to bring the MBHB to coalescence in 10 7−8 yr. In the case of a gas-poor environment with M res < M tot , stars would be responsible for bringing the two MBHs together to the scale where GW emission is efficient (see Eq. 30-31 in [65]). Furthermore in [65], the authors took also into account the role of three-body interactions that could lead to coalescence binaries that have stalled (see Eq. 32 in [65]).
The SAM tracks also the evolution of BHs spins. The spin magnitudes can increase (decrease on average) under coherent (chaotic) accretion. Moreover, after the merger, the spin magnitude and orientation of the remnant BH is computed according to the formalism described in [77]. Since we model the GW signal with the PhenomHM waveform [72] (see discussion in Sec. VI), we neglect the binary precession in the GW analysis.

IV. EM COUNTERPART
In this section we discuss the models of the EM counterparts. We upgrade and improve T16 in what concerns the optical emission of the AGN, which can be detected with the Rubin Observatory, the radio jets, which can be detected with SKA, and the near-IR galaxy emission, which can be detected with ELT. To these counterpart types, we further add a model of the X-ray emission, to be detected by Athena.
The AGN bolometric luminosity L bol is computed using the conventions in [78]: where rad is the radiative efficiency,Ṁ acc is the mass accretion rate and L Edd = 4πGM tot m p c/σ T is the Eddington luminosity, m p the proton mass, σ T the Thompson cross section. The SAM predicts the amount of gas surrounding the binary at merger. If the binary does not accrete at Eddington,Ṁ acc = M res /t ν , where t ν is the viscous timescale. The latter is computed as where Re ≈ 10 3 is the critical Reynolds number and t dyn is the dynamical timescale with σ the velocity dispersion In the above equation, M b is the total bulge mass from stars and gas, M dyn is the sum of the nuclear star cluster (NSC) mass and of the reservoir of gas feeding the MBH, r b is the scale radius (assumed to be the same for gas and stars) and r h is the NSC half light radii scale. The scale radius r b is related to the half-light radius R eff as r b = R eff /1.8153 and R eff is computed following Eq. 32 in [79]. The NSC half light radii scale r h is computed instead as [80] r h = 3pc max M dyn 10 6 M , 1 .
If the binary accretes at Eddington, on the other hand, in Eq. 2 we limit the mass accretion rate toṀ acc = L Edd / rad c 2 . The radiative efficiency rad describes the fraction of rest frame energy that can be extracted by the matter accreting onto the MBH and depends on the accretion efficiency η (the maximum amount of energy that can be extracted) and on the accretion geometry. In other words, the radiative efficiency takes into account that not all the available energy is radiated but it can be used to produce jets or winds and its value might depend on structural changes in the accretion disc. The accretion efficiency η depends on the spin of the MBH, a bh , and ranges from 0.057 for Schwarzschild MBHs to ∼ 0.4 for maximally rotating MBHs. If the merger is in a gas-rich environment with M res > M tot , we assume where E ISCO is the specific energy around a rotating MBH with spin a ∈ [−1, 1] [81]. If the binary is in a dry environment with M res < M tot , the accretion might happen in prograde and retrograde orbits with the same probability so the accretion efficiency becomes In our approach we choose a = χ 1 where χ 1 is the spin component of the primary MBH along the angular momentum. The radiative efficiency is then computed as beingṁ cr = 0.01 andṁ = ηM res c 2 /(L Edd t ν ) [78].
From the bolometric luminosity we can compute the absolute magnitude through [82] M band = M band, − 2.5 log 10 1 BC where M band, is the absolute magnitude of the Sun in a certain band, L the Sun luminosity and BC the bolometric correction. From the absolute magnitude we can infer the apparent magnitude m band : where the k-correction k band (z) takes into account how the galaxy's radiation is redshifted during the propagation from the source to the observer. In the optical band, we assume that the AGN spectrum is flat in νf ν (see Fig.1 in [83]), i.e f ν ∼ ν −1 . This approximation leads to k band = 0 so we neglect the last term in Eq. 11.

A. The Vera C. Rubin Observatory
The Rubin Observatory is an optical telescope with a 8.4 m mirror for observations in the u, g, r, i, z, y bands and 9.6 deg 2 field of view (FOV).
We envisage two possible counterpart detection strategies with the Rubin Observatory. The Rubin Observatory Legacy Survey of Space and Time (LSST) is expected to reach a final depth of m ∼ 27.5 after 10 years of operations in r band [56]. We therefore assume that we can identify the counterpart in the archival data, searching for a possible modulation due to the proper motion of the binary before merger. Moreover, LSST will finish its current scientific objectives in ∼2032, so we do not exclude the possibility to carry a target of opportunity (ToO) observation for LISA candidates with the Rubin Observatory with an observation time of few hours. For simplicity we assume the same apparent magnitude threshold for the ToO as for the entire survey. We also use the subscript 'Rubin' when the equations refer both to the LSST or ToO case.
For the AGN spectral energy distribution (SED) in optical bands we assume a bolometric correction BC = 10 ( Fig. 2 in [83] shows that the bolometric correction is almost constant around 10 in optical bands) and compute the absolute magnitude following Eq. 10 as where we have inserted M band, = 4.64, the Sun AB magnitude in r band [84]. The apparent magnitude m becomes then Once the source is identified, we can get an accurate redshift determination via photometric measurements, with error on the redshift ∆z = 0.031(1 + z) [85]. Concerning the sky localization threshold, we adopt the value of ∆Ω = 10 deg 2 , close to the Rubin Observatory FOV.

B. The Square Kilometre Array
When LISA will be taking data, the SKA will be the largest radio telescope on Earth, with more than a square kilometer of collecting area. We therefore include the possibility of ToO with SKA in the proposed scenarios for EM counterpart detection.
During the MBHB merger, the interaction between the surrounding plasma and the magnetic fields is expected to produce radio emission [53,[86][87][88]. In particular, the motion of the binary is expected to twist the magnetic field lines, leading to flare emissions, while the Blandford-Znajeck effect [89] could be responsible for strong radio jets, depending on the amount of accreted material.
Following T16, we model the flare emission as where Edd = L bol /L Edd is the Eddington ratio, q = m 1 /m 2 is the binary mass ratio and radio = 0.1 is the portion of bolometric EM radiation emitted in radio band. In our model, Edd is either computed from the reservoir amount of gas surrounding the binary at merger, with a floor value of Edd = 0.02, or it is set equal to 1 if the binary accretes at Eddington. Furthermore, the jet luminosity is modelled as [14,90]  where m 9 = m 1 /(10 9 M ),ṁ jet = M acc /(22 m 9 M yr −1 ) is the accretion rate in Eddington limit unit, a 1 is the spin magnitude of the primary black hole, f = 1 and g = 2.3 are dimensionless parameters regulating the angular velocity and azimuthal magnetic field of the system. The jet luminosity is the only emission that could reach ∼ 10L Edd . L flare , as well as the other emissions considered in this work (in optical and X-ray) are Eddington limited by construction.
Radio jets are expected to be beamed with an opening angle θ = 1/Γ, Γ being the Lorentz factor [91,92]. Two competing effects come into play: • If the line of sight is outside the cone angle of the jet, the radio emission cannot be detected; • Collimated emission increases the flux received from the observer, allowing for the detection of fainter and farther systems.
For typical AGNs, Γ 5 − 15 leading to a corresponding opening angle of θ 6 • . Yuan et al. 92, however, assumed instead a fiducial value of Γ = 2 based on the simulations of [86], corresponding to θ 30 • . If the emission is beamed, the luminosity increases as [93] L beamed = L isotropic δ n , where β is the beam speed in the AGN frame in units of the speed of light, ι is the inclination angle between the binary angular momentum and the line of sight and n = 2 is an index describing the geometry and spectral index of the jet. For simplicity, we assume that the jet is aligned with the binary angular momentum. We consider three scenarios for the radio counterparts: • 'Isotropic flare': the flare emission is isotropic and the jet is collimated with Γ = 2. In this case we compute the total luminosity in the radio band as • 'Γ2' : both the jet and the flare are beamed with Γ = 2. The total luminosity is the sum of the flare and jet collimated emissions, or it is zero if we are outside the cone: • 'Γ10' : the same as 'Γ = 2' but setting Γ = 10. The total luminosity is computed as: For each of the three scenarios above, we use L radio to calculate the flux To detect the multimessenger candidate with SKA, we impose that where F SKA lim = ν SKA F SKA ν, lim is the minimum flux detectable with SKA, ν SKA = 1.7 GHz is the typical frequency at which we expect the bulk emission and F SKA ν, lim = 1µJy is the flux limit in the same band. The final configuration of SKA is expected to reach this sensitivity in a FOV of ∼ 10 deg 2 and ∼ 10 minutes of integration time. We note that this is an optimistic approach that assumes that all the Poynting flux is converted into jet radio emission, which is not guaranteed [94].
Note that the radio jet luminosity can be also computed from the fundamental plane relation [95]. Ongoing work is evaluating how the fundamental plane relation compares with Eq. 16 [96]. Here we have verified that both methods to calculate L jet lead to a similar number of radio EM counterparts, since the flare luminosity on its own is generally detectable with an SKA flux limit of 1 µJy .
For SKA's sky localization threshold we adopt the same limit as in T16, i.e. ∆Ω = 10 deg 2 .

C. Athena
Being the 2nd Large class mission of the European Space Agency, the X-ray telescope Athena will observe the most energetic events from the high-redshift Universe thanks to its Wide Field Imager (WFI) with a FOV of 0.4 deg 2 [97].
Together with the radio and optical/IR emission, MBHs produce copious amount of X-ray radiation. In this work we consider a late-stage X-ray emission produced by the gas accretion into the newly formed remnant MBH, possibly leading to a re-brightening of the source [42]. The increase in X-ray flux would allow the identification of the binary, among the possible X-ray transient candidates in the LISA sky localization area. In addition, internal shocks between the inner disk, closer to the remnant BH, and the outer disk portion are expected to produce X-ray emissions, even if they might be too faint to be detectable [54].
We focus on X-ray emission in the soft band, between 0.5−2 keV, because Athena is most sensitive in this band. The bolometric correction in the X-ray band is given by [83] where L bol is defined in Eq. (1), c 1 = 5.712, k 1 = −0.026, c 2 = 17.67 and k 2 = 0.278. For the MBH accretion, we consider two possibilities: • The MBH accretion remains at the same level as before the merger, and we estimate it with the left term in Eq. 2; • The sudden disappearance of the torques from the binary leads to the infall of the gas in the disc, and accretion reaches the Eddington luminosity.
The X-ray flux is where L X is the X-ray luminosity calculated with Eq. 25. We define two possible detection strategies with Athena. As the exposure time increases, Athena will be able to detect fainter sources. We set a reasonable maximum exposure time of 300 ks [98] and two different flux limits, chosen together with appropriate LISA sky localization cuts: • If Athena observes the same FOV of ∆Ω = 0.4 deg 2 for the entire exposure time, we assume a limiting flux of F X, lim = 4 × 10 −17 erg s −1 cm −2 ; • Athena could also explore a larger sky localization, limited however to brighter flux. For this second strategy, we choose a flux limit of F X, lim = 2 × 10 −16 erg s −1 cm −2 and a sky localization threshold of ∆Ω = 2 deg 2 . This corresponds to ∼ 5 tiles in the sky, each covered twice.
We claim detection of the X-ray counterpart whenever In the second detection strategy, we do not take into account the slew time necessary to repoint Athena, because the tiles are next to each other (see Section VIII for more details on LISA sky localization). Finally, for simplicity, in this work we do not consider the possibility of pre-merger modulated X-ray emission [51].

D. The Extremely Large Telescope
Since one of our aims is to use LISA EMcps to test the expansion of the Universe, we also enforce the redshift measurement of the GW source. Radio and X-ray observations will identify the sky localization of the merger event within the LISA sky area, while the redshift information relies on emission lines at optical/IR wavelengths. The simplest option is to detect the emission of the galaxy hosting the MBHB merger.
By the time LISA will be operative, the Extremely Large Telescope (ELT) will be available and other similarly large telescopes, such as the Giant Magellan Telescope and the Thirty Meter Telescope, are also under discussion. We focus the discussion here on ELT as a case study. Among the ELT instruments, the spectrograph MICADO [99] will allow redshift measurements in the IYJHK band with spectroscopy between the OH lines down to an apparent magnitude of 27.2 and with imaging with advanced filters down to an apparent magnitude of 31.3. These values correspond to the 5σ sensitivity for isolated point sources in 5 hours of observation.
We compute the galaxy luminosity in K band as where M stars is the total mass of the stars in the disk and bulge, and Υ is a fiducial mass-to-light ratio. The latter requires information, such as the star formation history and the metallicity, which is not provided by the MBHB catalogues. We therefore adopt a simple correction. For systems at z ≥ 3 we assume Υ = 0.03, because galaxies are dominated by young stellar populations with low metallicity, so that radiation in the H, J and K bands in the observer's rest frame actually comes from the blue part of the source's restframe spectral energy distribution. For systems with z < 3, we assume Υ = 0.1 because the stars are older and more metallic, and the observer-frame wavelength comes from the optical part of the restframe spectrum. Following Eqs. 10-11 the apparent magnitude is For systems with m gal,ELT 27.2 we expect that a spectroscopic redshift measurement will be doable, with precision ∆z = 10 −3 . This relies on the reasonable assumption that galaxies are star-forming and therefore emission lines are present, since the typical galaxies in the catalogue have masses of 10 7 − 10 11 M at z > 1.
For galaxies with apparent luminosity 27.2 < m gal,ELT < 31.3, the redshift measurement and its uncertainty are less straightforward, and depend on the actual galaxy redshift. For high-redshift sources at z 6.5, MICADO will be able to detect the Lyman-α (1215.67 A) break in the I-band, enabling the redshift measurement with error ∆z = 0.2 [100]. For galaxies between 5 < z 6.5, the redshift measurement is more challenging, also because they might resemble ultra-faint galaxies at z < 0.5. First of all, we assume that this degeneracy can be broken by using the information on the luminosity distance inferred from the GW measurement, in the context of a given cosmology. Even though the aim is to use the redshift information to infer the cosmology, we believe that assuming the cosmology with the only aim of discriminating between sources at z < 0.5 and z > 5 will not substantially bias the cosmological analysis. Furthermore, the Roman Space Telescope [101] will observe also in the R band, corresponding to the Lymanα break for sources at z 5.2. There will be the possibility to perform ToO follow-up within 2 weeks from notification, but with a limiting magnitude of 28.5 in 1hr (but longer observing time might be possible) [102]. Consequently, assuming observations with the Roman Telescope combined with the luminosity distance information from GW sources, we claim that the redshift identification will also be possible for sources with z > 5 with uncertainty ∆z = 0.2, thanks to the detection of the Lyman-α break.
Finally, for systems with 27.2 < m gal,ELT < 31.3 and 0.5 < z < 5, the photometric redshift can be obtained thanks to the Balmer break, with the penalty, however, of observing only in near-IR bands (from I to K with MI-CADO): missing the optical and UV parts of the spectrum will increase the photometric redshift errors. Unfortunately, we did not find reliable estimates of the redshift measurement error in the literature for this kind of sources with only near-IR observations. Therefore, we adopt an agnostic estimate and set ∆z = 0.5 for these systems [103].
Tab. I summarizes the errors adopted for redshifts measured based on the host galaxy spectra using ELT and the Roman Space Telescope. Note that we do not take into account the possibility of observations with the James Webb Telescope [104], because it may not overlap with LISA.

V. AGN OBSCURATION
The gas and dust surrounding the MBHs are expected to absorb a fraction of the EM emission, reducing the observed flux and the number of detected systems [105,106]. Similarly, interstellar dust and intergalactic gas can absorb starlight in the galaxy.
In this section we present how we model obscuration at X-ray and optical wavelengths, affecting the detection of the EM emission with Athena and with the Rubin Observatory respectively. We also model absorption from the galaxy gas, affecting the optical emission detectable with ELT (used to determine the EMcp redshift). Below we detail the two in turns, starting with the AGN obscuration.
Evaluating the fraction of obscured AGNs with respect to the total number of AGNs is generally challenging, and there is no consensus on how it evolves as function of luminosity or redshift [107][108][109]. Nevertheless, we adopt here a recent modelling of obscuration in X-rays, taken from [110]. We further include absorption in the optical, assuming that dust follows gas.
For each event, we start by computing the hydrogen column density around the MBH from equations 3-6 in [110]. For completeness, we detail here the entire procedure. We introduce the ψ(L X,h , z) parameter that corresponds to the fraction of absorbed Compton-thin AGNs with respect to the total Compton-thin AGNs: where ψ max = 0.84, ψ min = 0.2, β = 0.24. The source redshift z and the X-ray luminosity in hard band L X,h are the only input parameters, and we infer them from the information contained in the MBHBs catalogues. To compute L X,h we use Eq. 25, with coefficients c 1 = 4.073, k 1 = −0.026, c 2 = 12.60 and k 2 = 0.278 [83]. The quantity ψ 43.75 (z) represents the fraction of Comptonthin AGNs with log 10 [L X,h /(erg/s)] = 43.75 at z, and it takes the form The distribution of the hydrogen column density N H can be expressed as 3+ , where f AGN = 1 is the fraction of Compton-thick AGNs to the absorbed Compton-thin AGNs, and = 1.7. Given the hard X-ray luminosity and redshift of each binary in the catalogues, we derive, using the equations above, the corresponding hydrogen column density distribution f . We then sample this distribution, in order to extract a value for the hydrogen column density N H surrounding the binary.
We then need to evaluate the absorption. We consider the soft X-ray band and compute the luminosity after absorption as with τ X = σ X N H , where σ X is the X-ray cross section, for which we take, following [111], For the energy, we choose E = 1 keV, in the middle of the soft X-ray band. The X-ray flux after obscuration can be computed from Eq. 26, substituting the original L X with L X,abs . We are applying a statistical correction based on observational samples of AGNs. An alternative approach could be to model obscuration based on the intrinsic properties of the source. For instance Ricci et al. 112 found in a low-redshift AGN sample a relation between the obscuration fraction and the Eddington ratio that can be used to compute the corresponding hydrogen column density, under the assumption that radiation pressure is the dominant factor modulating obscuration, caused by material very close to the black hole. In high redshift sources, however, the interstellar medium can also contribute to obscuration [113,114]. We prefer therefore to base our correction on empirical results based on observational samples covering a broad range of redshifts.
Similarly as for X-ray, the AGN emission in the optical band after absorption is where L bol is the bolometric luminosity from Eq. 1, and τ opt = σ opt N dust is the optical depth, with σ opt the optical cross section and N dust the dust column density. In order to evaluate the optical depth, we start from the galaxy mass-metallicity relation Z gas [115] From the hydrogen column density, calculated as described above, and Z gas , one obtains the dust column density [116]: and the optical cross section, given by where σ 0 = 3 × 10 −22 cm 2 and the fitting function F as with coefficients λ i , a i , b i , p i , q i reported in Tab. II. We choose to compute the cross section at a reference wavelength of λ = 0.62 µm, at the center of r band. The magnitude after absorption can be inferred from Eq. 10, substituting L bol with L bol,abs . At last, we turn to the absorption of the optical galactic emission from interstellar dust and intergalactic gas, to be accounted for in the detection of the host galaxy by ELT. Based on Fig. 1 in [117], we adopt a constant hydrogen column density log 10 (N H /cm 2 ) = 22 and compute the absorbed luminosity in K band following Eqs. 37-39, with λ = 2.2 µm, at the center of K band. The absorption of the host galaxies emission does not affect significantly the EMcps rates, and it is always included in the following results.

VI. GRAVITATIONAL WAVE SIGNAL AND PARAMETER ESTIMATION
The waveform of a MBHB with aligned spins and circular orbit depends on 11 parameters: the primary and secondary source-frame masses, m 1 and m 2 , the two spins magnitudes along the orbital angular momentum, χ 1 and χ 2 , the sky latitude β and the longitude λ, the luminosity distance d L , the inclination ι of the binary angular momentum with respect to the line of sight, the phase at coalescence (or at a reference frequency) φ , the time at coalescence t c , and the polarization angle ψ. We can also define the mass-ratio q = m 1 /m 2 1 and the chirp mass M = (m 1 m 2 ) 3/5 /(m 1 + m 2 ) 1/5 . We model the GW signal with the inspiral-merger-ringdown waveform Phe-nomHM [72] that ignores the binary spins precession but includes higher order harmonics.
If the source is located at cosmological distance, the signal is affected by the expansion of the Universe. As a consequence, in the detector-frame we measure redshifted quantities, i.e. M z = M(1 + z), where z is the source redshift. However, in this work we will refer to rest-frame quantities, unless otherwise stated. We assume a fiducial ΛCDM cosmology with h = 0.6774, Ω m = 0.3075 and Ω Λ = 0.6925.
For each system we compute the signal-to-noise ratio as whereh(f ) corresponds to the Fourier transform of the time-domain signal, and S n (f ) is the noise power spectral density, for which we take the estimate "SciRDv1" described in [118]. We set f min = 10 −5 Hz and f max = 0.5 Hz. We assume 5 years of overall mission duration, with 80% duty cycle. We add to the LISA noise PSD the one of the confusion background from unresolved galactic binaries, according to the fits presented in [119]. The amplitude of the background is taken for three years of mission duration, as a representative average value over the total duration of 5 years. For each event we obtain the posterior distribution p(θ|d) for a set of parameters θ following Bayes theorem, where L(d|θ) is the likelihood of the realisation d with the parameters θ, π(θ) corresponds to our prior on the binary parameters, and p(d) = dθL(d|θ)π(θ) is the evidence. In this work, we assume the so-called zero-noise approximation, i.e. d = h(θ) where h denotes the waveform. We refer the interested reader to Marsat et al. [71] for more detail.
The GW signal parameter estimation is performed using the response and likelihood code of [71], together with the parallel-tempered ensemble sampler ptemcee [120] for the Bayesian parameter estimation. We initialize our chains around the simulated signal with a covariance computed from the Fisher matrix, and use enriched proposals that allow to jump to the known potentially degenerate modes in the sky position, inclination and polarization [71]. For the prior distributions, we assume the sky position angles, inclination and polarization uniformly distributed over the sphere. For all the other parameters we adopt uniform flat priors.
We run the MCMC chain for each system for 2000 iterations with 64 walkers [121] and 10 temperatures [122]. This provides a first set of parameter posteriors. Among them, we are particularly interested in the sky area, to ensure the EM detection. We therefore rerun the chains, for 10 5 iterations, for all the systems whose the first run produced a posterior with an initial sky localization error of 5 < ∆Ω/ deg 2 < 40 at 90% confidence level. This way, we ensure the convergence of the MCMC chains for the interesting systems. Moreover, for all the systems with error in the sky localization ∆Ω < 10 deg 2 at 90% confidence level, i.e. those which we use in the rest of the paper, we further check the convergence of the parameter estimation for the parameters β, λ and d L , studying the evolution of the chains as a function of the iterations. At the end of the process, we obtain three catalogues of EMcps for each of the MBHB astrophysical formation models, for which we are confident that the parameter estimation has converged.

VII. RESULTS
For each of the three MBHB formation astrophysical models described in Sections II and III, we use catalogues simulating 90 years of data. They contain a total number of 15546, 692 and 10700 MBHBs for Pop3, Q3d and Q3nd respectively. The following results are presented for 4 years of LISA observations, i.e. 5 years of mission duration with 80% duty cycle. All catalogues start at z = 20. As described in Section II, we first select, among all the events in the catalogues, those with a detectable EM emission, on which we then run the GW signal parameter estimation, to extract the EMcps. As we shall demonstrate, the number of EMcps strongly depends on how the EM emission is modeled, on the specific instrument adopted to detect it, and on the detection strategy, other than on the astrophysical population model. Several choices for how to combine these variables are possible, leading to different configurations for the EMcps observations. To simplify the presentation of the results, we focus on two specific models, one maximising the number of EMcps, and one minimising it. They are defined as follow: For these two models, we present the rates of both multimessenger candidates and EMcps. We also discuss possible variations to the two selected models separately. Among all the variables, the jet opening angle is the factor that most affects the EMcps rates, while, e.g., taking the accretion from the catalogues or assuming it at Eddington does not change the rates significantly. This is because the SKA+ELT combination dominates the rates in the case of an isotropic flare emission, as we will show below.

A. General distributions
In this section we discuss the distributions in redshift and chirp mass of the average number of MBHBs events, multimessenger candidates, and EMcps, in the two models labelled respectively "maximising" and "minimising" (c.f. the beginning of Section VII). Furthermore, we also report the average number of multimessenger candidates and EMcps for the other observational scenarios, in Tables V and IV. The average numbers are intended in 4 yrs of observations with LISA, and are obtained by multiplying the total numbers provided by the 90 yrs of catalogues by 4/90.
In Fig. 1 we present the average number of merging binaries as a function of redshift. Models Pop3 and Q3nd predict a large fraction of mergers at z 10, while in the Q3d model all the systems merge at z 12. Removing the systems with SNR < 10 in LISA leads to the loss of ∼ 80% of high-redshift sources in the Pop3 catalogue, caused by their low mass (see Fig. 2). The systems of the Q3nd catalogue are on average more massive, therefore, the SNR cut does not alter their number. The systems of the Q3d catalogue are also all detected by LISA with SNR > 10: this is expected, since they have a mass distribution similar to Q3nd, and they merge at smaller redshifts. The average number of intrinsic and GW-detected events for each of the three astrophysical models is reported in Tab. III.
Among the systems with SNR > 10, we further select the multimessenger candidates, i.e. those with a detectable EM counterpart. In Fig. 1, we show their distributions in the maximising and minimising models.
The additional requirement of EM detectability selects systems at even smaller redshift: for all the three astrophysical scenarios, multimessenger candidates have z < 10. Within the maximising model, we predict in total 24.0 (35) {37.6} multimessenger candidates for Q3d (Pop3) {Q3nd} in 4 years. As expected, if we include obscuration and collimated radio emission, the multimessenger candidates number decreases to 3.6 (6.6) {4.2}, and only systems at z 8 can be detected. This reduction of about 81 − 89% in the number of multimessenger candidates with respect to the maximising model, is similar in all astrophysical models.
At last, we impose a cut in the sky localization of the systems, to select only the EMcps. We obtain 14.8 (6.4) {20.7} EMcps for Q3d (Pop3) {Q3nd} in 4 yrs in the maximising model, and nothing but 3.3 (1.6) {3.5} if we include AGN obscuration and collimated flare and jet emission (minimising model). The Pop3 scenario predicts the largest number of multimessenger candidates, however, only ∼ 20% among them are promoted to EMcps: LISA will not localize these sources accurately enough, due to their intrinsic low chirp mass and high redshift. On the other hand, the Q3d and Q3nd models predict fewer multimessenger candidates, but ∼ 60% among them are EMcps. Within the minimising model, even though the total number of both multimessenger candidates and EMcps decreases, the fraction of multimessenger candidates promoted to EMcps is higher for all astrophysical models: 24%, 92%, and 83% for Pop3, Q3d and Q3nd respectively, as opposed to 18%, 62%, and 55% in the maximising model. We interpret this fact as follows: including obscuration and collimated radio emission effectively removes the tails of the distributions, selecting the bulk of the "best" events: those with redshift low  enough to have good LISA parameter estimation, but high enough to be sufficiently numerous. Further imposing the sky localization cut has therefore a minor effect on this subset of events.
In Fig. 2 we report the same quantities as a function of the rest-frame chirp mass M. While the distributions in the massive models (Q3d and Q3nd) peak at M ∼ 10 5 − 10 6 M , for the Pop3 model the peak of the distribution is at M 10 3 M , due to the different BH formation processes. The SNR > 10 cut therefore operates similarly to what already observed for the redshift distribution, i.e. it excludes low-mass events in the Pop3 scenario while leaving the Q3d and Q3nd practically unaffected.
In the maximising model, all the systems with M 10 5 M have detectable EM emission, while at lower masses a significant fraction of Pop3 and Q3nd can be detected by LISA with SNR > 10 but do not have observable EM emission due to the low BH mass or high redshift of the systems. Adding the further requirement on the sky localization results in an overall rescaling of the multimessenger candidate distributions for the massive astrophysical models, while it selects only the heaviest binaries in the Pop3 model. As already observed for the distributions as a function of redshift, the reduction in the number of events when one includes obscuration and the collimated jet is higher than the one obtained when one imposes the cut in the sky localization.
In Fig. 3 we show the EMcps distributions for the Rubin Observatory, SKA+ELT and Athena+ELT strategies separately in the maximising scenario. SKA+ELT is the only combination to provide EMcps at z 4 while the Rubin Observatory and Athena+ELT can observe only SKA+ELT provides more EMcps in the maximising scenario thanks to the isotropic flare emission. We stress that we can not simply sum the y-axis for each instrument combination because the same system might be observed by different telescopes at the same time. the closest events with the latter reaching slightly higher redshifts than the former. Moreover we note that we have no detections with the Rubin Observatory above z > 4 so we can safely use the r band without worrying about absorption. Moving to the chirp mass, the SKA+ELT scenario is able to probe the lightest systems in our catalogues, especially for Pop3, while Athena+ELT and the Rubin Observatory detect the EM emission from systems with 10 4 < M/M < 10 6 . The average number of multimessenger candidates and EMcps for each observational scenario is reported in Tab. IV and Tab. V. Overall, the observational strategy providing the most multimessenger candidates and EMcps in 4 yrs is SKA+ELT, if we assume that the radio flare emission is isotropic. Accounting for a beamed emission with Γ = 2 provides numbers which are closer to those obtained when observing with the Rubin Observatory, or with Athena+ELT. If we further decrease the opening angle (Γ = 10), observations with SKA+ELT become irrelevant. While the beamed emission allows us to detect systems that are farther away from us, imposing that the observer has to be on-axis excludes the vast majority of the systems.
Observing with the Rubin Observatory provides ∼ 3 EMcps in 4 yrs in the Q3d model without accounting for obscuration, while in all the other astrophysical cases the rates are below 1. Concerning the combination Athena+ELT, as expected the Eddington accretion leads to more multimessenger candidates and more EMcps, since the EM emission is brighter. Moreover, in general the observational strategy where one observes a single region of ∆Ω = 0.4 deg 2 allows for the detection of slightly more EMcps than the strategy in which one observes a region of ∆Ω = 2 deg 2 at a higher flux threshold, because there are more systems at fainter fluxes compared to systems with poorer localization.
In general, the two models with massive progenitors predict more EMcps than the Pop3 one, due to the aforementioned difficulties in localising light events with LISA. Indeed, one can appreciate that the Pop3 astrophysical formation model leads to more multimessenger candidates than the Q3nd one, when observing with the Rubin Observatory and SKA+ELT. However, most of them do not satisfy the sky localization requirement and consequently are not accounted for as EMcps.
We highlight that, in order to get the total average number of multimessenger candidates and/or EMcps, one should combine the different EM facilities, while taking care not to count the same event twice (since the same system can be detected with different instruments -see following paragraph). The total average number of EMcps is reported in Tab. VI.
In Tab. VII we report the number of EMcps that can be observed simultaneously by: (i) the Rubin Observatory and SKA ('Rubin+SKA'); (ii) the Rubin Observatory and Athena ('Rubin+Athena'); (iii) SKA and Athena ('SKA+Athena'); (iv) the three instruments ('All'). In the maximising scenario, the Q3d model predicts ∼ 2 − 4 EMcps in 4 yr (depending on the instruments considered), and about ∼ 2 EMcps should be observable by all the instruments simultaneously. As expected, the combination SKA+Athena provides the largest numbers, since both SKA and Athena can observe sources at higher redshift than the Rubin Observatory (c.f. Fig. 3). Moving to the minimising case, we find that < 0.3 EMcps in 4 yrs can be observed by multiple instruments simultaneously, regardless of the astrophysical model.

B. MMcands and EMcps without redshift measurement
In this section we relax the requirement of the redshift determination, i.e. we present the predicted number of multimessenger candidates and EMcps, but without imposing that their redshift should be measured independently. Indeed, interesting information on how the radio or X-ray emissions are produced can also be inferred exclusively by the detection of the EM emission. The redshift can then be determined from the GW-measured luminosity distance by assuming the standard model cosmology (it will not be possible, though, to use these EMcps as standard sirens). Relaxing the redshift determination requirement does not change significantly the number of EMcps; it only affects the number of MMcands, which are, however, less interesting because their sky localization is unknown. In the following, the requirements for the identification of the MMcands and EMcps in terms of SNR, flux and sky localization remain the same as in the rest of the paper.
First, the number of MMcands and EMcps detectable with the Rubin Observatory with and without imposing redshift determination remains the same because the threshold magnitude that we adopted for spectroscopy, m AGN,Rubin, lim = 27.5, is close to the photometric limit of the survey.
Second, let us focus on the MMcands and EMcps detectable with Athena only. If redshift is not needed, this amounts to dropping the requirement of detectability with ELT, which was imposed exclusively for the redshift determination. However, as can be appreciated from Fig. 3, Athena can only detect sources up to z 4 while ELT is sensitive up to z 8 (we justify this value confronting the results for the multimessenger candidates cases in Fig. 1 with the 'SKA alone' configuration in Fig. 19 -see discussion at the end of this subsection ) . Therefore all the sources detectable with Athena can also be observed with ELT, so that the number of MMcands and EMcps with and without redshift determination is identical.
Moving to SKA, removing the requirement on the redshift determination increases the number of MMcands for Pop3 and Q3nd, respectively, by a factor ∼ 2.5 and ∼ 3.6 in the isotropic flare scenario. In the models with beamed emissions, the ratio between the number of MMcands without and with redshift rises to ∼ 4.3 and 4.8 for Pop3 and Q3nd respectively in the case Γ2 and to ∼ 4.5 and ∼ 23 in the Γ10 scenario. For Q3d the increase is less significant in all configurations. The limiting factor is that ELT reaches lower redshifts than SKA, as can be appreciated comparing Fig. 1 and Fig. 19. The number of EMcps does not change significantly instead, because the requirement on the sky localization selects lower redshift systems which can be detected by ELT.
In Tab. VIII we present the numbers of MMcands and EMcps observable with SKA alone, to be compared with the values reported in Tab. IV and Tab. V which include detection with ELT. The distributions of MMcands for SKA in terms of redshift and chirp mass are reported in Fig. 19.

C. Magnitudes and fluxes distributions for EMcps
In this section we present the average number of EMcps as a function of magnitude (relevant for detection with the Rubin Observatory and ELT) and flux (relevant for detection with SKA and Athena).
The magnitude distributions for the EMcps detectable with the Rubin Observatory are reported in Fig. 4. In the absence of AGN obscuration, most of the systems have m AGN,Rubin > 25 and accumulate toward higher magnitudes. From the distribution it is also evident that the Q3d model provides the highest average num-  ber of EMcps compared to Pop3 and Q3nd. If we account for AGN obscuration, the number of EMcps diminishes appreciably and the typical magnitude increases to m AGN,Rubin 26, while Q3d still remains the most promising scenario.
The distributions of the radio fluxes of the EMcps de-tectable with SKA+ELT are reported in Fig. 5 for the 'isotropic flare' and 'Γ2' scenarios. In the isotropic flare case, the distributions are characterised by a peak around 10 2 µJy, for all astrophysical models. In the Γ2 case, the distributions appear instead to be flatter.
By inspecting how the radio luminosities are distributed in the catalogues (c.f. Fig. 17 in Appendix C) we found that the flare emission occurs typically at lower luminosity than the jet one: it is therefore subdominant with respect to the jet emission. However, the jet is pointing in the direction of the observer only in a small fraction of cases. Therefore, the peak in the distribution observed in the 'isotropic flare scenario' corresponds to the flare emission. In the 'Γ2' scenario, on the other hand, the jet luminosity dominates, and the peak characteristic of the isotropic flare scenario is absent.
In Fig. 6 we report the number of EMcps observable with Athena as a function of their X-ray flux. We account both for accretion evaluated from the amount of gas surrounding the MBHB (estimated by the SAM), or at Eddington. Furthermore, we show the two sky localization thresholds, within the corresponding flux limits. If we consider the scenario where the accretion rate is derived from the catalogue, the Q3d model provides the highest EMcps number, while Pop3 and Q3nd give similar results. From Fig. 18 in Appendix C, it can be appreciated that, in the Pop3 scenario, the Eddington ratio is about the same as in the Q3d scenario for systems with M > 10 4 −10 5 M , however, the number of systems with high mass is intrinsically lower with respect to Q3d, and therefore the overall EMcps rate is lower. Furthermore, in Q3nd there are overall more events, but the Eddington ratio is reduced (in other words, without delays there is not time to accumulate enough gas to be accreted on the binary) which leads to fewer EMcps. In the entire catalogue, we also find that the fraction of systems accreting at Eddington is ∼ 80%, ∼ 50% and ∼ 1% for Pop3, Q3d and Q3nd respectively. However, if we consider only the subset of EMcps, these fractions increase to ∼ 100%, ∼ 88% and ∼ 53% because of the requirement on the detectability of the EMcp.
The lower panels of Fig. 6 confirm that the trade-off between sky localization and limiting flux penalises the scenario with the ∆Ω = 2 deg 2 threshold as the AGN are generally faint. Assuming accretion at Eddington, the number of EMcps increase, as expected. In particular, the Q3nd scenario provides slightly more EMcps than Q3d, as the luminosity depends only on the mass of the binary and not on the amount of gas available for the accretion.
The number of EMcps as a function of the magnitude of their host galaxies, observable with ELT, are shown in Fig. 7. In the maximising case, most of the systems have m gal,ELT > 17.5 − 20 but the inclusion of obscuration and jet pushes this value up to m gal,ELT > 22.5. As we move to larger apparent magnitudes, the number of fainter sources increase for all astrophysical models in a similar way. Most of LISA sources are hosted in faint galaxies, pushing the boundaries toward populations that are challenging to observe. Even if there is always a bright fraction of EMcps the bulk of the population is at the limit of the magnitudes currently observed.
Flux-limited samples are always dominated by faint sources, but this statement is specifically motivated by the actual physical properties of the sources: MBHs of mass 10 3 − 10 7 M hosted in galaxies with mass 10 7 −10 10 M at high redshift. For sources with SNR> 10 the median MBH mass is 10 3.57 , 10 5.22 and 10 5.75 solar masses (Pop3, Q3nd, Q3d), the median galaxy masses are 10 8.24 , 10 7.20 and 10 8.63 M (Pop3, Q3nd, Q3d) and the median redshift 5.18, 8.26 and 3.90. In terms of absolute magnitudes the median AGN absolute magnitude is -13.01, -16.67 (Pop3, Q3d) and the median galaxy absolute magnitude is -15.95, -18.57 (Pop3, Q3d), which according to normal definition are faint sources, in line with the galaxies being dwarfs, based on their masses. Since most of the mergers are at high redshift the corresponding apparent magnitudes also are very faint, but we want to stress that the intrinsic faintness is really a property of LISA sources: small MBHs in high-z dwarf galaxies. Concerning Q3nd, since the Eddington ratios are incredibly small ( c.f. Fig. 18) the median absolute magnitudes of the AGN is actually a positive value. The median absolute magnitudes of the galaxies is also very very faint, -12.25, since some of the mergers occur in galaxies with a baryonic mass smaller than 10 6 M .

D. Magnitudes and fluxes distributions for the entire catalogue
In this section we present the magnitude and flux distributions for the entire catalogues, i.e. without the requirement on the SNR or on the sky localization.
In Fig. 8 we show the magnitude distributions of the sources in our catalogues that can be potentially observed with the Rubin Observatory. Similarly to the EMcps case, the Q3d model predicts more events than Pop3 and Q3nd at all magnitudes. Moreover, the Rubin Observatory does not contribute significantly to the number of multimessenger candidates even at m AGN,Rubin > 27.5 due to the intrinsically low fluxes expected from these systems.
Moving to SKA, in Fig. 9 we present the radio fluxes for the isotropic flare case. For all the astrophysical models, the peak at lower fluxes arises from the flare emission. For the Q3d model, most of the sources are above the detection threshold and the distribution from the entire catalogue is similar to the EMcps one (c.f. Fig. 5).
In Fig. 10 we show the flux distributions in soft X-ray assuming the accretion at Eddington or from the values computed in the catalogues and in the case with and without AGN obscuration. Starting from the latter case, the X-ray fluxes for the three astrophysical models are similar above the threshold but they show different behaviour at fluxes < 10 −18 erg s −1 cm −2 due to the differ-  ent Eddington ratio values and BH masses. If we assume Eddington accretion, the flux depends on the binary total mass: since Pop3 has the lightest binaries, it also produces the faintest sources, while Q3nd and Q3d produce brighter emission. The inclusion of AGN obscuration leads to a reduction in the global number of systems with stronger tails extending at fainter fluxes.
In Fig. 11 we report the galaxy magnitude distributions. All three astrophysical models are similar up to m gal,ELT ∼ 30. Above this value, the Q3d model starts decreasing due to the lack of sources while Pop3 and Q3nd proceed with the same trend. Both Pop3 and Q3nd model distributions reach the peak at magnitudes that are too small to be detectable by any planned instruments so we limit the x-axis to m gal,ELT = 35.

VIII. MULTI-MODAL SYSTEMS
LISA ability to accurately localize the gravitational waves source in the sky will strongly depend on the system's parameters, leading to a distribution of sky position uncertainties that spans several orders of magnitude [123]. Moreover, there has been also evidence [71]  of systems whose sky position posterior distributions are multi-modal in the sky, i.e. they peak not only at the true binary position but also in other regions, symmetrically distributed in the sky. The emergence of these 'multimodal events' is due to the intrinsic degeneracy in the LISA pattern functions. The degeneracy can be broken only if enough signal is accumulated at low frequencies, where the orbital motion of the detector provides additional information, or at high frequency, thanks to the frequency dependence of the detector response function.
Multi-modal sky position posteriors pose a serious challenge to the search of EM counterparts, since telescopes have to search in a larger region of the sky. In addition, under some conditions that we will discuss, the probability of the "spurious" modes is similar to the probability of the real mode (the actual binary position), further challenging the detection of a counterpart. Fortunately, as we will see, multi-modal EMcps are relatively rare (c.f. Tab. IX).
In [71,124] it has been shown that, defining (β T , λ T ) the true binary latitude and longitude, the spurious modes appear at β = −β T and λ = λ T + kπ/2 with k = 0, 1, 2, 3, for a maximum of 8 modes in the sky (one true and seven spurious). For events with only one spurious mode, this secondary point is generally the reflected mode with k = 0, i.e. (−β T , λ T ). In a minority of cases (two out of the entire catalogue) it is, instead, the antipodal one, i.e. β = −β T and λ = λ T + π.
Among the degenerate modes, the reflected one deserves a separate discussion [125]. Without going into the details (we refer the interested readers to [71]), the reflected mode is exactly degenerate with respect to highfrequency effects in the response, and only LISA's motion is expected to break the degeneracy. Thus, the reflected mode appears in the sky position posteriors of signals that are short enough for the LISA motion to be unimportant. Furthermore, the other modes also appear in the sky localization posteriors of systems that are massive enough for their waveform not to reach high frequencies. The degeneracy leading to the other modes, in fact, are usually broken during the merger by the frequencydependence of LISA's response function. As a consequence, the other modes also become more common in the parameter estimations performed pre-merger.
To define multi-modal events, we introduce the concept of probability for each mode, defined as the ratio between the number of samples in a mode over the total number of samples in the MCMC analysis. We then define as a 1mode system, a binary whose sky localization posterior has a probability larger than 5% only in a single skyregion. A 2modes system is such that the probability in the reflected mode is at least 5%, and a 8modes system is such that the sum of the probability of the other six modes (the total number of modes minus the true binary position and the reflected spot) is at least 5%.
An example of these three cases is reported in Fig. 12. Unimodal events are typically well localized and the Fisher analysis provides a similar result to the Bayesian inference. For 2modes systems, two spots symmetrical with respect to the equatorial plane of LISA's orbit appear and, for 8modes systems, the sky position posterior distribution presents eight different peaks located symmetrically. By construction, the Fisher approach, which is a local Gaussian approximation to the posterior, is not able to recover posteriors with multiple peaks.
Some multi-modal events are potential EMcps candidates, and we must include them in our analysis. Two key factors need to be taken into account: the sky localization area of each mode, and the corresponding mode probability. First of all, we want to eliminate events with too wide sky localization region, because telescopes can not explore large areas in the sky. This cut can be performed unambiguously for unimodal systems, but for multi-modal events there are different approaches: the cut can be applied only to the sky localization of the primary mode, or one can choose to combine the sky area of all the modes, assuming the telescope is going to re-point to other locations. This choice influences the number of EMcps. For example, if we assume a threshold of ∆Ω = 10 deg 2 and want to cut all events with larger sky localization region, a bimodal system where the primary and secondary modes have ∆Ω = 8 deg 2 each, is an EMcp in the former approach (with a 50% probability of missing it if the telescope does not point to the right location), but not in the latter, because the total sky area -∆Ω = 16 deg 2 -would be above threshold. Second, one can also include a requirement on the probability of the modes: for example, one could consider as viable EMcps only the events for which the probability in the primary mode is higher than 50%; or one could argue that modes which probability is less than a given threshold percentage can be discarded, and the EMcp treated like an unimodal one, as far as EM telescopes are concerned.
In this work, we have decided to focus only on the sky localization of the primary mode (i.e. the sky-region where the binary actually stands) as the criterion to select viable EMcps, and to apply no requirement on the probability of the other modes. In other words, the number of EMcps is given by the systems with detectable EM counterpart, and a sky localization region below threshold in the primary mode. This simplification is possible because, as we will show below, events with multi-modal sky posteriors are a minority of all cases, and furthermore, for most bimodal posteriors there is a clear hierarchy in the probability of the primary and the secondary mode. Therefore, the final number of EMcps does not depend excessively on the selection criterion. Note that, both eliminating from the catalogues all the events without detectable EM counterpart, and considering the sky localization region emerging only from the post-merger parameter estimation analysis, help in reducing the number of multi-modal events: in fact, multi-modal posterior distributions in the sky localization are more frequent at high redshift and for parameter estimation analyses performed pre-merger, when the signals are shorter and have lower SNR [124]. In Tab. IX we report the fraction of 1mode, 2modes and 8modes EMcps in the maximising case (the minimising scenario provides similar results, just rescaled). For all astrophysical models, the largest fraction of EMcps is unimodal, while bimodal EMcps contribute 5%, 26% and 17% of the total rates for Pop3, Q3d and Q3nd respectively. By contrast, 8modes events constitute less than 0.5 EMcps in 4 yrs of LISA observation. Fig. 13 shows the distribution of 1mode, 2modes and 8modes EMcps, in the maximising case, in the z − M plane. It appears clearly that the vast majority of the EMcps population is constituted of unimodal events. Furthermore, the redshifted chirp mass M z , being the quantity that enters the waveform, influences the appearance of multi-modal posteriors. 2modes events tend to have masses such that M z 10 6 M , and relatively high  redshift, z 2. Events with high chirp mass and redshift, in fact, have GW signals long enough in the LISA band for the high frequency instrument response to play a significant role in breaking the pattern function degeneracy. However, most of the signal is accumulated in the last days, so the effect of LISA motion is insufficient to eliminate the reflected sky position, which remains degenerate. At lower chirp mass and redshift, on the other hand, the combination of the high-frequency response with the motion of the detector fully eliminates the degeneracy and the events are unimodal. However, even if it is possible to identify a general trend, the two sub-populations of 1mode and 2modes events do overlap in the z − M plane, because redshift and M z are not the only quantities affecting the parameter estimation, and there is a large dispersion according to the orientation angles. Regardless of the astrophysical model, the 8modes systems are high chirp mass MBHBs, for which LISA will be able to observe only the merger and ringdown, gathering little information from the constellation orbital motion; furthermore, their GW signal will not reach high enough frequencies for the frequency-dependence of the detector response to help. Although 2modes systems seem to constitute a significant portion of the total EMcps, especially for the massive astrophysical models Q3d and Q3nd, this is partly caused by the fact that, to identify an event as bimodal, we impose a relatively low threshold to the probability of the secondary mode, i.e. 5%. In this regard, it is instructive to look at the probability weight of each mode. In Fig. 14 we present the number of bimodal EMcps as a function of the probability in the primary and secondary modes, for all astrophysical models. It is clear that the primary mode is always more probable than the secondary one, which mitigates the risk, for a substantial fraction of the EMcps, of missing the counterpart if telescopes are pointed only at the primary mode.
In Fig. 15 we show the number of 8modes EMcps in each octant of the sky, as a function of the probability of the sky position mode. While the primary mode remains always more probable, the seven spurious modes are rather equiprobable, with probability that can be as large as 10%. This is likely to constitute a serious issue for the search of the EM counterparts of 8modes events. Fortunately, our results show that 8modes events are not going to contribute significantly to the total number of EMcps after merger (c.f. Table IX). Following these results, in this work we have decided to base the selection of multimessenger candidates that can become EMcps only on the requirement that the sky localization region of the primary mode after merger is small enough, as previously described. We decided not to apply, to the so obtained number of EMcps, a correction factor representing the fraction of events that might be missed on average because telescopes are mistakenly pointed at the wrong spot in the sky.

IX. DISCUSSION AND CONCLUSIONS
MBHBs are key sources for multimessenger astrophysics with LISA, as they are expected to merge at the center of galaxies, where the accreting gas might produce an EM counterpart to the GW signal. In this paper we presented the number and characteristics of both multimessenger candidates (i.e. MBHB mergers with a detectable EM emission) and of EMcps (i.e. multimessenger candidates which can be localised well enough via their GW emission). We analysed different astrophysical scenarios for the MBHBs formation, and modelled the EM emission by combining some selected, current and future, EM facilities.
We took as input the results of the SAM code developed in [14,[63][64][65] to infer the mass and redshift distributions of the merging MBHBs, as well as the properties of their host galaxies. For each MBHB event, we first computed the signal-to-noise ratio in LISA, to assess the number and distribution of the events detectable purely from the GW side. We then exploited the MBH and host galaxy properties to compute the expected EM emission at several wavelengths, from radio to soft X-ray. We considered three observational scenarios: EMcp identification and redshift determination both with the Vera Rubin Observatory; EMcp identification with the SKA and redshift determination with the ELT; EMcp identification with Athena and redshift determination again with the ELT. These observational scenarios cover the entire EM spectrum; naturally, the same system can be detectable by one or more observatories at the same time.
For the subset of events classified as multimessenger candidates, we estimated the errors on the binary parameters, especially the sky localization, inferred by the GW signal with LISA, performing Bayesian parameter estimation as in Marsat et al. [71]. We then selected as EMcps only the systems for which the sky localization is smaller than given thresholds, appropriately chosen following the capacities of the EM telescopes. EMcps are therefore those MBHB mergers that can both be exploited for subsequent astrophysical studies, and be used as standard sirens for cosmology, since one can infer their luminosity distance from the GW signal, and their redshift from the EM counterpart.
We focused especially on two scenarios of viable EMcps: one maximising their number, in which we assumed that the radio flare emission is isotropic and that there is no AGN obscuration; and one minimising their number, accounting for beamed radio emission and AGN obscuration. In the maximising scenario we predict 14.9 (6.8) {20.9} EMcps for the Q3d (Pop3) {Q3nd} astrophysical models respectively, in 4 yrs of LISA observations. In the minimising scenario, these rates decrease to 3.4 (1.7) {3.4}, respectively. The collimation of the radio jet and the AGN obscuration by hydrogen and dust are the two features most affecting our results as far as the detection of the EM emission is concerned.
Removing the requirement of the redshift determination does not change significantly the number of MMcands and EMcps observed with the Rubin Observatory and Athena: the cuts in magnitude imposed by the detection with these facilities already select sources at relatively low redshift. Concerning the SKA, on the other hand, we find that the number of MMcands increases by a factor ∼ 2.5 − 5 when removing the redshift identification requirement, because SKA can reach higher redshifts than ELT. Note that the number of EMcps always remains the same: these systems have, in fact, low redshift due to the requirement on the sky localisation.
We find that EMcps can be detected up to z ∼ 6 − 8 and they have typically M ∼ 10 4 − 10 6 M , because these systems have a sufficient amount of gas available for accretion and they are localized with enough accuracy by LISA. Considering each observational scenario separately, we find that the Rubin Observatory is the instrument providing the smallest number of EMcps, with on average less than one event in four years of joint observation with LISA. On the other hand, SKA+ELT provide the largest number of EMcps, if the radio flare emission is isotropic; however, the EMcps rate is drastically reduced if the radio flare and jet emissions are collimated in a jet cone with opening angle θ ∼ 30 • , and goes to zero for θ ∼ 6 • . Therefore, under the plausible assumption of a collimated radio emission, only the presence of Athena in conjunction with LISA would re-enable the possibility of having at least a few EMcps in 4 yrs, assuming, though, that the X-ray emission is not affected by dust obscuration. Introducing AGN obscuration leads in fact to a significant drop in the EMcp rates. Interestingly, restricting the sky area to the size of the Athena FOV, thereby reaching deeper fluxes, increases the number of EMcps, rather than scanning a larger area of the sky with higher flux limit. We therefore identify the first observational strategy as the one capable of maximising the opportunity of joint LISA-Athena observations.
The number of multimessenger candidates changes in the same way as the one of EMcps, when changing the observational strategy. We found that the Pop3 astrophysical model leads to more multimessenger candidates than the massive models Q3d and Q3nd, but most of them do not satisfy the sky localization requirements, because they are too light and at high redshift. Consequently, these events cannot be classified as EMcps, so that the final number of EMcps in the Pop3 astrophysical model is overall smaller than in the massive models. It is also important to remark that the vast majority of the MBHB mergers in the SAM catalogues are characterised by an EM emission much fainter than the threshold magnitudes and fluxes of the EM observatories considered.
Since these facilities are representative of the planned future telescopes, this shows, to the best of our present knowledge, how challenging multimessenger MBHB observations with LISA will be.
We also found that a fraction of multimessenger candidates present multi-modal sky posterior distributions, characterised by two or eight spots in the sky, symmetrically distributed over the sphere. We promoted these events to EMcps when their GW-inferred sky localization is smaller than the selected thresholds in the primary mode only, neglecting the other modes. We found that bimodal events can contribute up to ∼ 25% of all EMcps; however, the posterior probability of the primary mode is always larger than the one of the secondary one. 8modes events, on the other hand, have nearly equiprobable sky-localisation modes, but they contribute less than 0.5 EMcps in 4 yrs of LISA observation, so they play a minor role and do not affect our results.
The present analysis has also some caveats. Concerning the catalogues, our results are based on the same MBH physics described in T16. More recently, new predictions for the MBHB merger rates have been published [126]. In this work, the authors refined the modeling of the time delays, accounting for the baryonic components at galactic merging scale and implementing better prescription for dynamical friction at smaller scales (< 100 pc). They also included the effect of supernova feedback that may reduce the amount of available gas in low-mass galaxies. We compared the two catalogues and found no significant differences in the number of both intrinsic and detected i.e. (SNR > 10) MBHB mergers for the Q3d model (the 'heavy seed' model in the recent work). We note especially that, within the Q3d model, the redshift distributions provided by the new analysis are skewed toward lower redshift, so we expect to recover a similar, or even larger, number of EMcps. On the contrary, the new results predict more (less) MBHB mergers for the Q3nd (Pop3) astrophysical models. Therefore, the results obtained in this work might be overestimated for Q3nd and underestimated for Pop3.
Concerning the modeling of the EM emission and of its detection, several considerations are in order. The effective number of EMcps might be reduced by difficulties linked to the follow-up strategy that we have not addressed, like, for example, how to deal with the situation of two, or more, coincident MBHB mergers in LISA or the presence of other sources with time-varying EM emission in the sky localization region provided by LISA. Concerning the last point, Lops et al. [127] explored the possibility to identify the host galaxy of MBHBs mergers starting from a simulated portion of the Universe. Their work can be considered complementary to ours since they focused more extensively on the exploration of galactic fields around MBHBs mergers. Similarly we did not take into account the possibility of a time delay of weeks to years between the GW signal and the peak of the EM emission. Such long delays would seriously impact the possibility to identify unambiguously the host galaxy since searching for a transient in a large patch of the sky with deep ToOs is challenging. However, a possible solution might be to look for modulated emission in archival data once a transient is detected. Also, we used the post-merger localization of the GW signal by LISA, which is drastically better than the pre-merger one, but at the same time we fully neglected the possibility that some EMcps might be detected already before merger. In order to evaluate the bolometric luminosity in the Eddington emission case, we used the mass of the binary, while for post merger emission it would have been more appropriate to use the mass of the remnant black hole, since the GW emission is expected to subtract energy and angular momentum from the system. However, we found that this leads to no significant difference in our estimates, as the binary and remnant BH masses are very similar [77].
We also chose to be as optimistic as possible and did not apply any correction factor to account for the actual sky coverage of telescopes such as SKA or the Rubin Observatory, which will observe at best half the sky. In the worst case scenario, this would lead to the loss of half of the EMcps, due to Earth orientation.
Regarding Athena, our results suggests that observing a single tile (for example, the one with the highest posterior probability resulting from the LISA parameter estimation), might be a successful strategy to detect the X-ray counterpart. However, this result needs confirmation via a realistic end to end simulation of the observing strategy, which is behind the scope of the current work. For example, each tile and the corresponding observing time might be weighted with the posterior probability from the GW measurement.
Regarding the LISA data analysis, we did not account for data gaps in the detector output. This might be justified, as at least the gaps expected in connection with the standard LISA maintenance can be avoided by triggering a protected period around the predicted MBHB merger time, if the binary is detected sufficiently early before the merger. However, data gaps might still be problematic for massive systems entering in LISA band already in the merger or ringdown phase, since there will be no time to postpone a scheduled maintenance. Moreover in this work, we perform the parameter estimation only for the MBHBs with detectable EM emission. In the future, we plan to perform the parameter estimation of all the binaries present in the SAM catalogues, independently of whether a detectable EM emission is associated to them or not [124], to have a more comprehensive view.
In this work, we restrict ourselves to MBHBs with circularized orbits and spins aligned to the orbital angular momentum, thus neglecting the possibility of misaligned spins inducing precession, as well as the possible presence of eccentricity. We leave the investigation of the localization of sources in the presence of these effects for future work.
Finally, the actual distribution of observed MBHBs might arise from a combination of the three astrophysical models we adopted in this study (see for example [128,129]). For simplicity, here we did not considered mixed astrophysical formation models.  Fig. 1 for the SNR and q. In the lower panels, the black dashed line is at q = 18. The requirements on the sky localization and detectability of the EMcps select the systems with largest SNR and mass ratio close to unity. trapolation of the current results and might not be representative of the estimates that an accurate waveform would produce in this range. For this reason, in Tab. X we report the number of EMcps in the maximising and minimising models if we consider only the events with q < 18. For Q3d the numbers remain unchanged due to the fact that the vast majority of the systems has q < 18, however the approximation in the waveform might affect the parameter estimation of ∼ 20% (∼ 10%) of the cases in Pop3 (Q3nd) astrophysical model. Finally the Phe-nomHM is validated for spin magnitudes up to 0.98 for equal mass binaries. Even if in our simulations we have MBHBs with χ ∼ 0.99, they are a minority and we do not expect significant differences from the parameter estimation.

Appendix C: Useful figures for discussion
In this appendix, we limit to report some figures that are interesting for the discussion in Sec. VII. L radio [erg/s]