Interpolating Detailed Simulations of Kilonovae: Adaptive Learning and Parameter Inference Applications

Detailed radiative transfer simulations of kilonovae are difficult to apply directly to observations; they only sparsely cover simulation parameters, such as the mass, velocity, morphology, and composition of the ejecta. On the other hand, semianalytic models for kilonovae can be evaluated continuously over model parameters, but neglect important physical details which are not incorporated in the simulations, thus introducing systematic bias. Starting with a grid of 2D anisotropic simulations of kilonova light curves covering a wide range of ejecta properties, we apply adaptive-learning techniques to iteratively choose new simulations and produce high-fidelity surrogate models for those simulations. These surrogate models allow for continuous evaluation across model parameters while retaining the microphysical details about the ejecta. Using a new code for multimessenger inference, we demonstrate how to use our interpolated models to infer kilonova parameters. Comparing to inferences using simplified analytic models, we recover different ejecta properties. We discuss the implications of this analysis which is qualitatively consistent with similar previous work using detailed ejecta opacity calculations and which illustrates systematic challenges for kilonova modeling. An associated data and code release provides our interpolated light-curve models, interpolation implementation which can be applied to reproduce our work or extend to new models, and our multimessenger parameter inference engine.

In principle, kilonova observations encode the amount and properties of the ejected material in their complex multi-wavelength light curves (and spectra) [5,23,24]. For example, several studies of GW170817 attempted to infer the amount of material ejected [23][24][25][26][27][28][29][30][31][32]. In practice, these observations have historically been interpreted with semianalytic models, as they can be evaluated quickly and continuously over the parameters which characterize potential merger ejecta. However, it is well known that these semianalytic models contain oversimplified physics of already simplified anisotropic radiative transfer calculations [33][34][35] that neglect detailed anisotropy, radiative transfer, opacity, sophisticated nuclear reaction networks, and composition differences.
To circumvent these biases, some groups have attempted to construct surrogate kilonova light-curve models, calibrated to detailed radiative transfer simulations [23,26,36]. For example, Coughlin et al. [23] used Gaussian process (GP) regression of principal components to construct a multiwavelength surrogate calibrated to a fixed three-dimensional grid of simulations [37], describing flux F k from a single component of ejected material. This study generated a "two-component" ejecta model by adding the fluxes of two independent calculations (F = F 1 + F 2 ), ignoring any photon reprocessing effects. More recently, Heinzel et al [26] applied this method to construct an anisotropic surrogate depending on two components M 1 , M 2 and viewing angle, calibrating to their own anisotropic radiative transfer calculations. They also included reprocessing effects, showing that their previous simplified approach which treats the radiation from each of the two components of the outflow independently introduces biases in inference for the components' parameters. These strong reprocessing or morphology-dependent effects are expected in kilonova light curves [38][39][40][41]. Finally, a recent study by Breschi et al. [25] favored an anisotropic multicomponent model.
In this work, extending [36], we apply an adaptivelearning technique to generate surrogate light curves from simulations of anisotropic kilonovae. Starting with a subset of 36 simulations reported in [35], we use these adaptive learning methods to identify new simulations to perform, refining our model with 448 simulations so far. We apply our surrogate light curves to reassess the parameters of GW170817. We distribute the updated simulation archive, our currentbest surrogate models, and our training algorithms at https://github.com/markoris/surrogate_kne. This paper is organized as follows. In Section II we describe the kilonova simulation family we explore in this study and the active learning methods we employ to target new simulations to perform. We also briefly comment on our model's physical completeness. In Section III we describe the specific procedures we employed to interpolate between our simulations to construct surrogate light curves. In Section IV we describe how we compare observations to our surrogate light curves to deduce the (distribution of) best fitting two-component kilonova model parameters for a given event. We specifically compare our model to GW170817. In Section V we describe how our surrogate models and active learning fit into the broader challenges of interpreting kilonova observations. We conclude in Section VI.

A. Kilonova simulations
The kilonova simulations described in this work adopt a similar setup as and expand on the work of [35]. The simulations discussed throughout were generated using the SuperNu [42] time-dependent radiative transfer code, using tabulated binned opacities generated with the Los Alamos suite of atomic physics codes [43,44]. We use results from the WinNet code [45] to determine radioactive heating and composition effects. We employ the thermalization model of [46], but use a grey Monte Carlo transport scheme for gamma ray energy deposition [33].
The ejecta model is based on a symmetrically-shaped ideal fluid expanding in vacuum described by the Euler equations of ideal hydrodynamics. The assumption of a radiation-dominated polytropic equation of state allows for an analytic representation of the ejected mass M and average velocityv as a function of initial central density ρ 0 , initial time t 0 , and the velocity of the expansion front v max (Equations 11 and 12 in [33]). When combined with Monte Carlo-based radiative transfer and a specified elemental composition for the ejecta, the code produces time-and orientation-dependent spectra. Convolving these spectra with standard observational filters produces light curves such as the ones in Figures 1 and  4.
Real neutron star mergers have (at least) two mechanisms for ejecting material, denoted as dynamical and wind ejecta [47]. Due to the difference in formation mechanisms of dynamical and wind ejecta [5], a multicomponent approach is necessary for accurate model-ing. Each of the two types of ejecta, dynamical and wind, is modeled by a separate component with a specified morphology, elemental composition, ejecta mass and ejecta velocity. The components are modeled together as one radiative outflow [33]. The thermal decay energy is treated by mass-weighting between the components where they overlap. The end product represents a time-dependent spectral energy distribution contained in 54 angular bins, equally spaced in cos θ from 1 to −1.
For the purposes of this study, the spectra are convolved with broadband filters to produce a series of broadband light curves. Specifically, we use the LSST grizy filters for optical and near-infrared bands, 2MASS JHK filters for longer wavelength near-infrared, and the midinfrared S filter for the Spitzer 4.5 µm band. For each band and emission direction, we estimate the AB magnitude for that filter, defined for a source at 10pc in terms of the CGS energy flux F ν per unit frequency via m X,AB = −2.5 log 10 F ν − 48.6. All observations used in this work are provided or are translated into this ABmagnitude system [48][49][50]. Because our simulations tend toward reflection-symmetric behavior across the z = 0 plane, we only consider the independent information contained in the upper half (z > 0) of these angular bins. To reduce the acquisition cost of each simulation, we evolved each kilonova simulation in our initial grid out to 7.66 days. To minimize data-handling and training cost, unless otherwise noted, we manipulate a subset of our simulation output based on a log-uniform grid. For the initial simulations, this log-uniform grid consists of 191 time points ranging from 1/8 to 7.66 days. For the remaining simulations, this grid is extended in log-time to cover their available duration, up to a maximum of 64 days. Because of several systematics associated with modeling emission at early times (e.g., in the ionization states of the medium and in the contribution from and interaction with any strong jet), we do not report on behavior prior to 3 hours post-merger. In this work, we use the orientation-averaged luminosity for simulation placement, but reconstruct the luminosity continuously in angle and time.
The original simulation hypercubes discussed in [33,35] consider multiple wind ejecta morphologies and compositions. To simplify the dimensionality of the problem, this work only considers simulations from the initial grid with a peanut-shaped morphology [38] and lower Y e = 0.27 composition describing the wind ejecta. Table  I summarizes the parameters for the 36 simulations in our four-dimensional hypercube and highlights variation in only ejected mass M and average velocityv for each of the two components: the mass and velocity of the dynamical and wind ejecta, denoted henceforth as M d , v d , M w , v w . Every simulation in our hypercube adopts the same morphologies for the dynamical and wind ejecta, respectively. This initial simulation hypercube thus consists of only 2 of the 3 velocities and 3 of the 5 masses explored in the companion study [35].
As expected and discussed elsewhere [35], these sim-  [35], parameters of the initial kilonova simulations used to initialize our adaptive learning process in this work. All simulations used in this work adopt a two-component model where the morphology and composition of each component is fixed.
ulations exhibit significant viewing-angle dependence on the relative speed of the components. The obscuration of the wind by the dynamical ejecta becomes less significant closer to the symmetry axis and the peanut morphology itself also produces orientation dependence.
The two-component model shows "blanketing" of slow blue components by fast red components [51]. Also expected and observed are qualitative trends versus the component masses and velocities: more wind ejecta mass increases the g-band luminosity along the symmetry axis.

Illustrating systematics of kilonova simulations
Before extensively discussing our ability to reproduce this specific family of simulations, we first comment on their systematic limitations. Our simulation archive explores only a limited range of initial conditions for the ejecta, with specific assumptions about the composition, morphology, and velocity profiles; with specific assumptions about nucleosynthetic heating; and with specific assumptions about (the absence of) additional power and components, such as a jet or a central source to provide additional power or light [52][53][54][55]. Several previous studies have indicated that these and other aspects of kilonova simulations can noticably impact the outcome [8,33,34,39,[56][57][58]. Where possible, we very briefly comment on how current and previous SuperNu simulations' results change when making similar changes in assumptions.
Prior work with SuperNu has explored the impact of composition [34]. However, recently, Kawaguchi et al 2020 [57] (henceforth K20) demonstrated that Zr makes a substantial contribution to the final light curve. Figure 2 shows how our simulations depend on a similar change in composition, noting substantial change in the late-time optical light curves when we remove Zr.
As demonstrated by many previous studies using Su-perNu, the morphology and velocity structure also has a notable impact on the post-day light curve behavior [33,39,58]. Several other groups have demonstrated similar strong morphology and orientation dependence in their work [25,26,57,59]. For example, in their Figure  8, K20 demonstrate how the light curve changes when a  [57]. specific polar component of the ejecta is removed.
Uncertain nuclear physics inputs also propagate into notable uncertainties about the expected light curve; see, e.g., [7,8]. Even for the same morphology and amount of ejecta, nuclear physics uncertainties can modify the effective heating rate, particularly for material with low Y e which has the greatest prospect for producing r-process elements.
Given limited exploration of possible kilonova initial conditions and physics, we can only at present quantify the uncertainties of the type listed above. In future work, we will employ our parameterized models to assess the impact of these uncertainties on inferences about kilonova parameters. Future work could require kilonova models which include EOS parameters to enable joint inference which also simultaneously constrains the equation of state.
As discussed elsewhere [35], at late times some light curves show a modest deficit of blue light (g-band) relative to observations of GW170817 (unless the dynamical ejecta mass is large). Notably, our g-band light curves fall off significantly more rapidly after their peak in all viewing directions and for most parameters considered here. Previous work with other morphologies also recov-ers similar falloff in these bands, see e.g. [29], though additional components could conceivably contribute. Similar g-band behavior has been seen in other detailed kilonova simulations; see, e.g, Figure 12 in [60]. As noted above, this behavior depends on the assumed composition, notably Zr.

B. Interpolation Methodology
In this work, we principally interpolate using Gaussian process (GP) regression. In GP regression, given training data pairs (x a , y a ), the estimated functionŷ(x) and its variance s(x) 2 are approximated bŷ where the matrix K aa = k(x a , x a ) and where the function k(x, x ) is called the kernel of the Gaussian process.
In this work, unless otherwise noted, we used a squaredexponential kernel and a white noise (diagonal) kernel where Q is a diagonal matrix of possible length scales and σ 0 , σ n are hyperparameters that characterize the amount of noise allowed in the problem. The other interpolation method considered in this work was random forest (RF) regression [61]. Unlike the GP, the RF output had no error quantification and was used primarily as a consistency check on the Gaussian process prediction. Unless otherwise noted, we performed all GP and RF regression with scikit-learn [62]. Because of the substantial dynamic range of our many outputs, we interpolate the log 10 luminosity (for placement) or AB magnitudes (for all other results). Unless otherwise noted, we quantify the performance of our interpolation with the RMS difference between our prediction and the true value [This expression overweights the importance of large errors when the source is not detectable at late times; see We employ GP interpolation in two standard use cases. In the first case, used for our exported production results, we interpolate the AB magnitude m α (t * |Λ) at some fixed reference time t * and band α versus our four simulation hyperparameters (and, in the end, also across the extrinsic parameters of angle and wavelength) contained in Λ. In this case, the prediction y(x a ) has a single scalar value at each point; the x a refer to model hyperparameters; and the interpolation provides us with a scalar function of four or more variables. GP regression [Eq. (1)] provides an error estimate for m α at this specific time t * , which in general will depend on time.
In the second case, used for simulation placement, we interpolate the log bolometric luminosity light curve log 10 L bol (t|Λ) versus all time. [In terms of each simulation's spectrum, the bolometric luminosity is L bol = 4πR 2 ∞ 0 F ν dν where R = 10 pc.] In this case, the prediction y(x a ) is vector-valued at each point; the x a refer to model hyperparameters; and the interpolation provides us with a vector-valued function of four or more variables. For simplicity and given our use case, we reduce our error estimate to a single overall value for the entire light curve, reflecting the overall uncertainty in y(x a ).

C. Active Learning Scheme
Gaussian processes have long been used for active learning because they provide an error estimate: followup simulations can be targeted in regions with the largest expected error (and thus improvement) [63]. We follow this approach in our active learning scheme; see [64][65][66][67][68][69][70][71] for a broader discussion of active learning methods and their tradeoffs. To reduce the data volume needed for targeting followup simulations, we used vector-valued interpolation as described above, applied to orientationaveraged outputs of our simulations. This approach has the substantial advantage of providing a single error estimate per light curve (both in training and off-sample), which we can immediately use as an objective function in a minimization algorithm.
We pursued an active learning simulation placement approach in order to maximally explore the parameter space and reduce the amount of redundant information obtained from each new simulation. The subset of 36 light curves discussed in Section II A was used as the initial training set. Thousands of parameter combinations were subsequently drawn from uniform distributions with maxima and minima matching those of the varied parameters in Table I. Each of these parameter combinations was evaluated by an interpolator to produce an initial light-curve prediction as well as an error on the entire light-curve output. The prediction with the largest error across all the tested parameter combinations was selected as the next placed simulation.

D. Prediction Improvement and Interpolation Results
We verified our active learning strategy for simulation placement by randomly sampling combinations of parameters and creating two light curve predictions based on those parameters. The first prediction was trained solely on our initial grid of simulations from Section II A, while the second prediction was trained on the same initial grid, but with an added simulation output characterized by the aforementioned random combination of parameters. Figure 3 shows these before-and after-inclusion predictions which show that, as expected, the GP interpolation capability is improved. This pair of figures anecdotally illustrates the degree to which new training data improves our surrogate light curve models.
With over 400 placed simulations since the start of the active learning process, the training library is built up enough to allow for physically meaningful interpolation of off-sample events. The performance of our adaptive learning is best illustrated with our productionquality interpolation scheme, illustrated in Figure 4 and described in the next section.
Despite producing many follow-up simulations, we achieve success with a very sparse coverage of our parameter space. To illustrate the sparsity of our parameter space coverage, and how slowly our added simulations increase coverage, we evaluated the median "intersimulation" distance, using a simple Euclidean (L 2 ) norm over log 10 L bol (t k ) for several reference times t k . As expected given the high apparent dimension of our output, this median distance changes very slowly with n, owing to the large effective dimension of the output light curves.
The median distance is also larger than the residual error in our fit, as reported below. The success of our interpolation relies not on an overwhelmingly large training sample, but on the smoothness and predictability of our physics-based light curves.

A. Stitched fixed-time interpolation
To efficiently interpolate across the whole model space, we follow a strategy illustrated in Figure 1 of [72]: we pick several fiducial reference times t q (and angles); use GP interpolation to produce an estimate m α (t q |Λ) versus Λ; interpolate in time to construct a continuous light curve at the model hyperparameters Λ at each reference angle; and then interpolate in angle to construct a light curve for an arbitrary orientation. For an error estimate, we stitch together the error estimates in each band to produce a continuous function of time. Figure 4 shows the output of our interpolation (smooth lines), compared to a validation simulation at the same parameters (dashed lines). Our predictions generally agree, though less so for the shortest wavelengths at the latest times. Subsequent figures also illustrate the typical GP error estimate, which is usually O(0.1) in log 10 L for most bands and times considered.

B. Trends identified with interpolated light curves
In Figure 5 we show the results of our fit evaluated at a fixed viewing angle (θ = 0), varying one parameter at a time continuously, relative to a fiducial configuration with M d = M w = 0.01M , v w /c = v d /c = 0.05. The fixed value for the ejected mass of M = 0.01M was chosen as the middle ground of the initial grid's sampled mass space, which does not introduce any biases toward lighter or heavier masses. Since no similar central value was initially available for the velocity parameters, the lower value was selected in the case of both components. The slower velocity resulted in the ejecta not dissipating as quickly and allowed for more variation in the light curves as the non-static parameter was varied. For this viewing angle, changes in the amount and velocity of the dynamical ejecta have relatively modest effect, in large part because that ejecta is concentrated in the equatorial plane. By contrast, changes in the mostly polar wind ejecta has a much more substantial impact on the polar light curve (θ = 0). Specifically, increasing the amount of wind ejecta brightens and broadens the light curve, as expected from classic analytic arguments pertaining to how much material the light must diffuse through [5,[73][74][75]. Similarly, increasing the velocity of wind ejecta causes the peak to occur at earlier times (diffusion is easier) and be brighter.

C. Interpolation in viewing angle
All of the interpolated light curves discussed thus far have been trained at some fixed viewing angle. In Figure 6, we explore the interpolation of several families of models, each of which was trained using simulation data at a different viewing angle. The symmetry of the ejecta across the orbital plane allows for the assumption that any angular variation between 0 and π/2 can simply be mirrored across the symmetry axis. Figure 6 indicates that the first day post-merger does not introduce much angular variation and, as such, is quite well predicted even when interpolating across only For comparison, the heavy dashed lines show the initial training simulation results for the two parameter endpoints. The g-band light curve has the largest dynamic range and is the most sensitive to interpolation errors; notably, the interpolation does not always conform tightly to the underlying simulation data at late times. 11 angles. After 1 day, the luminosity across different angles begins to change considerably as the peanut-shaped wind ejecta becomes more dominant. Particularly at late times, there is a strong angular variability which manifests near the orbital plane, most strongly apparent in the blue (g) and near infrared (K) bands. In the blue bands, the angular variation reflects lanthanide curtaining; in the red bands, the angular variation reflects red emission from the late-peaking red dynamical ejecta. [At the latest times and faintest luminosities along the equatorial plane, numerical uncertainty in our Monte Carlo simulations are apparent in the light-curve results.] In all panels, the solid band denotes an estimated error bar from our GP fit in time, extended in angle.

D. Predictive Accuracy versus time, angle grid sizes
To better understand the systematic limitations and computational inefficiencies introduced by our stitchedtime interpolation grid, we investigated the accuracy of our fits when only using a subset of the time or angular grid.
First, we consider a simple analysis of loss of predictive accuracy as the number of GP interpolators used to make a surrogate light curve is decreased. We denote t ∈ T as the subset of times represented by the GP interpolators used to make a prediction, T as the total available number of time points, and thus interpolators, which can be used to make a light curve, andt as all the other times in T which are not represented by t such that t ∩t = 0 and t ∪t = T .
Thus, when using any number of interpolators at times t ∈ T which is less than the total number of possible time points T , we first generate predictions y(t) with the chosen subset of interpolators. These predictions y(t), along with the times t at which the predictions were made, are then used as inputs for SciPy's UnivariateSpline method from which the remainder of the light curve z(t) = f (t, y(t)) is constructed, where in this last case the function can be evaluated ∀t ∈ T . Figure 8 shows how the average residual between onsample light curve predictions and the respective simulation data changes as a function of the number of time points used as the base for constructing the timeinterpolated light curve. For the current scheme, we can remove up to roughly 75% of the initial set of time points without substantially diminishing our overall accuracy. Future work will explore smarter selection of representative time points in an effort to further reduce the number of interpolators which can be removed without significant loss of accuracy. Average residual as a function of number of considered time points: A plot of the average residuals between on-sample time-interpolated light curves and the respective simulation data as a function of how many time points are used to generate the light curves. In each case, we drew the respective number of samples from a log-uniform distribution between the start and end time of our light curves.

IV. PARAMETER INFERENCE OF RADIOACTIVELY-POWERED KILONOVAE
In this section, we describe and demonstrate the algorithm we use to infer kilonova parameters given observations, using the interpolated light-curve model above. Unless otherwise noted, for simplicity all calculations in this section assume the kilonova event time and distance are known parameters. We likewise assume observational errors are understood and well characterized by independent Gaussian magnitude errors in each observation, and that our model families include the underlying properties of the source (i.e., we neglect systematic modeling errors due to the parameters held constant in our simulation grid: morphology, initial composition, et cetera).

A. Framework and validation
As in many previous applications of Bayesian inference to infer parameters of kilonovae [23,24,[26][27][28], we seek to compare the observed magnitudes x i at evaluation points i (denoting a combination of band and time) to a continuous model that makes predictions m(i|θ) [henceforth denoted by m i (θ) for brevity] which depend on some model parameters θ. Bayes theorem expresses the posterior probability p(θ) in terms of a prior probability p prior (θ) for the model parameters θ and a likelihood L(θ) of all observations, given the model parameters, as Unless otherwise noted, for simplicity we assume the source sky location, distance, and merger time are known. We adopt a uniform prior on the ejecta velocity v/c ∈ [0.05, 0.3] and a log uniform prior on the ejecta masses m/M ∈ [10 −3 , 0.1]. We assume the observations have Gaussian-distributed magnitude errors with presumed known observational (statistical) uncertainties σ i , convolved with some additional unknown systematic uncertainty σ, so that where the sum is taken over every data point in every band used in the analysis. In tests, we treat σ as an uncertain model parameter, de facto allowing for additional systematic observational uncertainty (or for some systematic theoretical uncertainty). For our GP surrogate models, we set σ to the estimated GP model error.
Unlike prior work, we eschew Markov-chain Monte Carlo, instead constructing the posterior distribution by direct Monte Carlo integration as in [76,77]. To efficiently capture correlations, we employ a custom adaptive Monte Carlo integrator; see Champion et al. [78] for implementation details. In Appendix A, we describe several tests we performed to validate this inference technique using synthetic kilonova data drawn from a previously published semianalytic kilonova model. Our tests include recovering the parameters of a hundred synthetic kilonova sources. In future work, we will demonstrate how our parameter inference method can be incorporated efficiently and simultaneously with gravitational wave (GW) parameter inference with the rapid iterative fitting (RIFT) parameter estimation pipeline [77].
B. Inference with surrogate kilonova model Figure 9 demonstrates parameter inference using our surrogate light curves, for a synthetic source generated using our own model. As expected, we can recover a known source, including constraining the viewing angle θ. Figure 10 performs a similar test, but now using a specific simulation, without interpolation. As expected given our adopted systematic error, we recover the simulation parameters. Finally, Figure 11 repeats the test above, using

C. Example: GW170817
SuperNu-based kilonova models have already been successfully used to interpret GW170817, though as noted previously these models have a rapid falloff in the latetime optical magnitudes that is not present in the observations; see [29]. Because of the close proximity of GW170817, only distance modulus (but not redshift) corrections are needed to translate our predictions to apparent magnitudes which can be directly compared to electromagnetic observations. Observational results are taken from [24]'s compilation of photometry reported in [28,29,32,[79][80][81][82][83][84][85][86]. Figure 12 shows the results of directly comparing our extended simulation archive directly to observations of GW170817, selecting for simulations (parameters and angles) with the highest overall likelihood. The solid black curves in these figures show the 50 highest-likelihood configurations, where the likelihood requires simultaneously reproducing all observed bands. Except for reddest three bands (JHK), many simulations compare extremely favorably to the observations. The parameters of these simulations, however, do not represent the optimal parameters of this model family: because our placement algorithm minimizes interpolation error, the selected points preferentially occur at the edges of our domain. Finally, for the reddest band (K), our fits exhibit notable systematic uncertainty relative to the underlying simulation grid.
We have performed parametric inference on GW170817 using our surrogate light curve model to the underlying SuperNu results. Motivated by the direct comparisons above, we perform two analyses. In the first, we use all observing data at all times. In the second, we omit the reddest (K) band. Figure 13 shows the results of these comparisons. Because of the systematic fitting uncertainties at late times, we highlight the analysis omitting K-band observations as our preferred result. Though previously-reported inferences about ejecta masses cover a considerable dynamic range (see, e.g., Fig. 1 in [87]), our inferred masses are qualitatively consistent with selected previous estimates including previous inferences with similar SuperNu models [29] and recent surrogate models adapted to simplified multidimensional radiative transfer [22].
Notably, however, we infer a large amount of "dynamical" (red, lanthanide-rich) ejecta mass (i.e., M ej O(1/30)M ), more dynamical ejecta than wind, and the velocities for the dynamical and wind component are inverted relative to customary expectations (i.e., v d < v w ). Our dynamical and wind component masses differ from typical semi-analytical treatments (see e.g. [88]).
We also weakly constrain the misalignment angle between the outflow and the line of sight to be consistent with independent late-time radio observations, which (weakly supplemented with gravitational wave constraints) constrain the opening angle between the jet and the line of sight to be roughly 20 degrees [21,29,80,[89][90][91]; see also [22] for previous, weaker constraints on binary alignment from kilonova observations. Reanalyzing the light curves using this prior information, imposed as a Gaussian prior on θ with mean 20deg with σ θ = 5deg, we find modestly improved overall constraints on the ejecta; see also also, e.g., Figure 2 of [22] for previous, weaker constraints derived using joint radio and kilonova observations.

V. DISCUSSION
We have demonstrated that our surrogate models can be operationally compared to real kilonova observations, allowing us to deduce what the range of parameters for the original simulation family best fits the observations. In this section, we emphasize several systematic limitations of our approach, to more clearly distinguish the ways in which the answers so obtained could differ from a description of physical reality. When possible, we com-ment on ways in which these systematic limitations could be mitigated with future work.
First and foremost, our surrogate models introduce some modest bias, being an imperfect representation of the simulations they mimic. We have demonstrated that these errors are relatively small (see Figures 4 and 6). In this work, we principally employed two standard interpolation methods (GP and RF interpolation) to construct synthetic light curves. Recent substantial advances in machine learning have led to many new algorithms and architectures for adaptive learning and interpolation. Our prior work suggests that neural networks can also usefully interpolate kilonova light curves [36], which we will describe at greater length in future work. Other groups have also successfully produced surrogate light curves with modest error. Previously, surrogate light curves have been produced by interpolating the coefficients c g (Λ) of a basis-function expansion log L α (t|Λ) = g c g (Λ)φ g (t) [23,26], with the appropriate basis functions identified by principal component analysis of the raw simulation output. Because of the prohibitive cost of GP on large data sets, this analysis had to decimate input data to enable interpolation. Our reference-time method offers several notable advantages. The most important advantage is that our method is embarrassingly parallel-interpolations at every reference time can be performed independently, without need to select suitable basis functions in advance-and completely decoupled between time samples. Our interpolation is also inherently local in time, so artifacts inherited from late-time simulation data of low-photon-count light curves cannot contaminate our estimates of earlytime behavior. Finally, our method can in principle be applied to all available data, without decimation, particularly when we employ other interpolation techniques.
Second, we adopt simulations with an imperfect model of the relevant opacities and nuclear physics. For example, we have adopted a conventional nuclear mass and decay model to predict nuclear heating and element abundances [33,92]. We also find that although our detailed multifrequency opacities yield a more realistic representation of the physics in the system, the assumption of thermalization breaks down much sooner than anticipated. Uncertainties in nuclear physics can play a substantial role in kilonova light curves [7,8]. Given sufficient simulations, surrogate light curves can be constructed for a wide range of nuclear inputs. We defer a systematic treatment of nuclear physics uncertainties and non-LTE opacities to later work.
Third, we employ simulations with phenomenological initial conditions, that are not initialized with the appropriate orientation-dependent distribution of mass, velocity, and composition versus time. More suitable initial conditions could be provided by detailed disk simulations [93,94].
Fourth, we do not (and cannot) initialize our simulations with initial data that is set by physics of the merger. Unlike previous work [23,27], we have not adopted a re- lationship between our two-component ejecta parameters and the progenitor masses m 1 , m 2 , motivated by substantial uncertainty in the nuclear equation of state and remnant lifetime [19,[95][96][97]. Even the best-available fits have considerable systematic uncertainty [98]. Similarly, we have not adopted assumptions about the lifetime of any hypermassive remnant and the duration of neutrino illumination [99], nor have we incorporated radiation from any associated jet [52]. Instead, given substantial systematic uncertainty in merger simulations (relative to the small amount of ejecta), we treat the ejecta purely phenomenologically, implicitly allowing for many potential nuisance parameters to characterize the outflow.
Fourth, our models were trained on a subset of simulations with fixed ejecta morphologies and mass fractions for both components. The predictive capability of our interpolations is restricted to two-component models represented by the parameters in Table I. Our parameter estimation inherits these limitations and should thus also be considered for bias which stems from the selected subset of models in our interpolation training library.
Finally, we note that GW170817 was likely an exceptional case which contributed sufficient quantities of observational data for extremely informative parameter inference. Our methodology is still applicable in cases with sparser light curve data; however, the level of detail in the inference results will vary based on the aforementioned sparsity.

VI. CONCLUSIONS
We have adaptively constructed detailed anisotropic models for kilonovae that cover a four-dimensional space describing two components' masses and velocities. From these models, we have constructed surrogate multiband light curves which can be evaluated continuously over this space. We have demonstrated how our model can be used for kilonova source parameter inference, including the kilonova associated with GW170817. All of our input data products, fitted light curves and the code we used to produce them are available at https://github.com/markoris/surrogate_kne. The underlying full simulations are available at https://ccsweb.lanl.gov/astro/transient/ transients_astro.html.
Though we limited our study to a specific set of assumptions, this analysis is an important stepping stone towards a better understanding of kilonova systematics.
Recently, several studies have demonstrated that several physical assumptions can notably impact the deduced light curve. However, these impacts could have effects that are partially degenerate with modest shifts in ejecta properties. To understand the practical impact of these uncertainties, in future work we will employ our parameterized models with these sources of error.
In this work we emphasized inference on only phenomenological kilonova parameters. Several studies have demonstrated the value in using multimessenger information to more tightly constrain parameters like source inclination (see, e.g., [13,21,22,89]), even without adopting strong assumptions about the relationship between ejecta and progenitor masses. With such assumptions even stronger constraints have been widely explored. In future work we will show how the electromagnetic inference strategy applied here can be tightly and efficiently integrated with the RIFT parameter inference engine, enabling concordance inference about multimessenger sources. To validate our parameter inference codes, we implemented a standard semianalytic kilonova model previously presented in [24]. This model consists of singlecomponent, two-component, and three-component models, combined by flux addition and not allowing for anisotropy. In the following equations, M is the r-process ejecta mass (in M ) and v is the ejecta velocity. Note that for now we assume the ejecta consists entirely of r-process material, so M is the full ejecta mass. The radioactive heating rate at time t is given by [92]: Only a fraction of L in powers the kilonova, given by the thermalization efficiency th . This is approximated analytically in [46]: The parameters a, b, and d are constants that depend on the ejecta mass and velocity; an interpolation of Table 1 in [46] is used in the model. The bolometric luminosity is calculated via [75] L bol (t) where t d is the diffusion timescale, t d = 2κM/βvc, κ is the opacity, and β = 13.7 is a dimensionless constant related to the ejecta's geometry. Light curves are calculated by assuming the kilonova behaves as a blackbody photosphere that expands at a velocity v. The blackbody temperature is generally defined by its bolometric luminosity; however, once it cools to a critical temperature T c , the photosphere recedes into the ejecta and the temperature remains fixed. The photosphere temperature is where σ SB is the Stefan-Boltzmann constant. When T phot > T c , the photosphere radius is simply R phot = vt. When T phot = T c (i.e. the photosphere has receded into the ejecta), the photosphere radius is The flux density at frequency ν is given in [5]: F ν (t) = 2πhν 3 c 2 1 exp (hν/kT photo (t)) − 1 where D is the source distance. We use a fixed fiducial distance of D = 10 pc to calculate F ν (t), then calculate AB magnitude with a distance modulus if necessary.
To compute multi-component light curves we assume each component has a photosphere that evolves independently of the others. The total flux density is the sum of the flux densities of the individual components. The version of the model implemented in the code uses three components with fixed opacities (a blue component with κ = 0.5 cm 2 g −1 , a purple component with κ = 3, cm 2 g −1 and a red component with κ = 10 cm 2 g −1 ).

Validation test: random synthetic kilonova and probability-probability plot
We also demonstrated our inference technique using 100 randomly generated light curves, drawn uniformly from the same priors we use for inference and incorporating noise consistent with our Gaussian noise model. Using these 100 synthetic events and inferences, we can construct a probability-probability (P-P) plot [100], which corroborates that the one-dimensional marginal distributions are consistent. Figure 15 shows the results of our analysis. More extensive tests of the underlying integration algorithm, including PP plots using more complex and higher dimensional models, are reported elsewhere [78].