A linear-scaling algorithm for rapid computation of inelastic transitions in the presence of multiple electron scattering

Strong multiple scattering of the probe in scanning transmission electron microscopy (STEM) means image simulations are usually required for quantitative interpretation and analysis of elemental maps produced by electron energy-loss spectroscopy (EELS). These simulations require a full quantum-mechanical treatment of multiple scattering of the electron beam, both before and after a core-level inelastic transition. Current algorithms scale quadratically and can take up to a week to calculate on desktop machines even for simple crystal unit cells and do not scale well to the nano-scale heterogeneous systems that are often of interest to materials science researchers. We introduce an algorithm with linear scaling that typically results in an order of magnitude reduction in compute time for these calculations without introducing additional error and discuss approximations that further improve computational scaling for larger scale objects with modest penalties in calculation error. We demonstrate these speed-ups by calculating the atomic resolution STEM-EELS map using the L-edge transition of Fe, for of a nanoparticle 80 \AA\ in diameter in 16 hours, a calculation that would have taken at least 80 days using a conventional multislice approach.

(Dated: May 7, 2022) Strong multiple scattering of the probe in scanning transmission electron microscopy (STEM) means image simulations are usually required for quantitative interpretation and analysis of elemental maps produced by electron energy-loss spectroscopy (EELS). These simulations require a full quantum-mechanical treatment of multiple scattering of the electron beam, both before and after a core-level inelastic transition. Current algorithms scale quadratically and can take up to a week to calculate on desktop machines even for simple crystal unit cells and do not scale well to the nano-scale heterogeneous systems that are often of interest to materials science researchers. We introduce an algorithm with linear scaling that typically results in an order of magnitude reduction in compute time for these calculations without introducing additional error and discuss approximations that further improve computational scaling for larger scale objects with modest penalties in calculation error. We demonstrate these speed-ups by calculating the atomic resolution STEM-EELS map using the L-edge transition of Fe, for of a nanoparticle 80Å in diameter in 16 hours, a calculation that would have taken at least 80 days using a conventional multislice approach.

I. INTRODUCTION
The presence of even small numbers of dopant atoms in a material can have a disproportionate effect on that material's properties, and thus characterization techniques that perform nanoscale chemical and elemental mapping are important investigative tools in materials science.
Furthermore, the presence of extended, non-periodic defects, interfaces, and nanoscale precipitate phases either naturally occurring or engineered for functionality requires quantitative atomic resolution analysis of fields of view that are substantially larger than crystal unit cells [1][2][3][4].
In scanning transmission electron microscopy electron energy-loss spectroscopy (STEM-EELS) a focused probe, which can be smaller than the width of an atom in state-of-the-art instruments, is raster scanned across a material, and the energy-loss spectrum is recorded for each scan position of the probe. Integrating regions of the spectra that correspond to energy-losses characteristic of ionization of a particular atomic element allows for mapping the positions and concentrations of elements within the specimen [5]. Additionally, analysis of the fine structure of the energy loss spectrum can reveal modified valence state of the elements due to bonding [2,6,7].
A complicating factor for STEM-EELS interpretation is that strong multiple scattering of the electron probe before and after exciting an ionization event means that for quantitative work, such as measuring chemical concentrations, STEM-EELS results often need to be interpreted by performing forward simulations of an assumed structure [8].
The simulations typically combine either multislice or Bloch wave simulations with inelastic scattering cross-sections for different elements of interest [8][9][10][11]. Currently, STEM-EELS image simulations of simple crystalline structures can require up to a week of calculation time, even using graphical processing unit (GPU) accelerated simulation codes [12,13]. Algorithms with much faster run times that scale better with system size are required to simulate large objects, or those with more heterogenous chemical properties than simple crystals, such as samples containing dopants, extended defects, interfaces, or entire nanoparticles. Recent improvements in the readout speed of electron spectrometers adds additional impetus to efforts to speed up STEM-EELS simulations, as faster spectrometers allow for more routine imaging of much larger systems in STEM-EELS at atomic resolution.
A recent innovation in the field of electron microscope image simulation is the development of the plane wave reciprocal-space interpolated scattering matrix (PRISM) algorithm [14]. Instead of the traditional approach of calculating the propagation of the beam for each scan position in a STEM raster independently, the PRISM method uses the multislice algorithm to calculate the scattering matrix for a particular electron microscope experiment and stores it in memory. The scattering matrix, which propagates components of the electron illumination through the imaging object, can then be rapidly applied to the illumination wave function for each scan position to compute the output wave [15].
In this paper, we extend the PRISM algorithm to the problem of STEM-EELS image simulation, and demonstrate substantial speed up in calculation time. This paper is structured as follows: in Section II the underlying physics of STEM-EELS simulation are reviewed and we discuss both the scattering matrix operator and the multislice algorithm for numerical simulation of STEM images. In Section III we detail algorithmic implementation of both the conventional multislice approach and our novel PRISM approach to STEM-EELS calculation and discuss the predicted scaling of the run-time of both algorithms with different parameters such as simulation grid size and specimen thickness. Section IV shows results simulated using the new method, and compares accuracy and speedup relative to the conventional multislice approach. Two further optimizations are discussed in this section: evaluating inelastic transitions on a cropped grid and using an inverse multislice operation to economize on the total number of multislice operations. These optimizations are shown to typically induce minimal errors while offering substantial reductions in compute time. This section is concluded with a calculation of the STEM-EELS image for an illustrative large heterogeneous object, in this case a FePT nanoparticle roughly 80Å in diameter [16].

II. THEORY
In this section we briefly review the underlying physical theories behind the simulation of a relativistic electron propagating through a specimen of condensed matter. We introduce the transition potential to simulate the inelastic scattering of the probe electron due to the ionization of an electron bound to an atom within the specimen. Existing methods calculate the propagation of the electron probe to the plane of ionization and subsequent propagation of each inelastically scattered wave separately for each scan position and each inelastic transition of interest. We outline how we can calculate and store the scattering matrix operator that performs all of these steps and then rapidly apply it to all probes in the STEM raster.
We begin with the Schrödinger equation in reciprocal space for a fast electron interacting with the electrostatic potential of a specimen of condensed matter [17], Theψ g (z) are Fourier coefficients of the fast electron wave function as a function of depth z in the specimen and Fourier space coordinates g (magnitude g) and h in the plane perpendicular to the direction of propagation. The Fourier coefficients of the electrostatic potential are denoted byV g . The interaction constant σ = 2πm e eλ/h 2 , where m e and λ are the (relativistically corrected) mass and wavelength of the electron, e is the electron charge and h is Planck's constant. Equation (1) is a set of coupled linear equations for which the solution can be written as the matrix-vector product where δ gh is the Kronecker delta and the bold ψ is a vector containing the Fourier coefficients of the illuminationψ h . For STEM the illumination is a coherent focused probe with functional form Here A(g) is the aperture function, a top-hat function with radius equal to the convergence semi-angle of the probe, and χ(g) is the aberration function which takes into account probe aberrations such as defocus, spherical aberration and astigmatism [18].
Efficient numerical calculation of Eq. (2) typically proceeds through diagonalisation of the scattering matrix S (the Bloch wave method [19,20]), or through a split-step evaluation of the action of S on ψ, with the specimen first split into n slices in the beam direction and the operator involving the propagation matrix elements −πλg 2 δ gh z/n z and that involving the specimen interaction matrix elements σV g−h z/n z applied in alternating sequence: This is called the "multislice method" in the electron microscopy literature [17] and has become the most popular method of evaluation of Eq. (2). This is because the e iσV g−h z/nz operator (referred to as the transmission function) is diagonal in real space and a fast Fourier transform (an FFT, which we shall represent with the symbolF r→g and it's inverse operation asF −1 g→r ) [21] can be used to efficiently transform between real and reciprocal space, The two-dimensional FFT has a favourable 2N 2 log N scaling, where N is the number of Fourier coefficients g included in the simulation. For brevity we will define the following operator as short hand for a single multislice iteration, M(∆z) ≡ e −iπλz/nzg 2 δ ghF r→h e iσV (r)z/nzF −1 g→r .
By comparison the Bloch wave approach, which solves Eq. (2) through diagonalisation of a matrix operator containing the coefficients within the exponent of Eq. (2), has the less favourable N 3 scaling [22]. A more in-depth explanation of the multislice algorithm with details on it's implementation may be found in Ref. [18].
A recent innovation in STEM image simulation is the PRISM (plane-wave reciprocal-space interpolated scattering matrix) algorithm, in which only the rows of the matrix operator S(z), from Eq. (2), corresponding to the Fourier coefficients present in the illumination vector ψ g are calculated using the multislice algorithm Eq. (5) and stored in memory. As will be explained in the next section, this approach can be used to accelerate the calculation of STEM-EELS images because the wavefunctions for the inelastically scattered electron can be rapidly propagated to the exit surface with the stored operator. This is instead of having to perform the multislice operation separately for each seperate probe position and inelastic scattering event in the conventional multislice approach.
For an inelastic transition at depth z, the scattered wave function ψ n (r, z), where n is shorthand for an excited quantum state, is given by the product of the elastic wavefunction ψ 0 (r, z) and an inelastic transition potential for ionization H n0 (r, z), The inelastic transition potential is given by Ref. [23], where u 0 (r ) is the wave function of an electron in a bound state of the specimen and u n (κ, r ) is the wavefunction of that electron excited to a continuum (free) state with wave vector κ. The wave function in the continuum state is typically expanded in a spherical harmonic basis, where the Y ,m are spherical harmonic functions, , m , and m are the angular momentum and azimuthal angular momentum quantum numbers of the bound and continuum states and δ is a phase factor determined by fitting to an asymptotic form of the free state [23,24]. To a good approximation, only a small number of the different possible ( , m ) need be included in a converged calculation [13]. The kernel in Eq. (8) is the Coulomb potential, which mediates interactions between the fast electron and the sample, and 0 is the permittivity of free space. Calculations of Eq. (8) used in this paper are based on a solution using an angular momenta basis for u 0 and u n which is derived in Ref. [23], the numerical implementation of which is discussed in Ref. [24].
We may simulate a STEM-EELS image by then propagating the wave function for the inelastically scattered electron, ψ n from Eq. (8), to the exit surface using Eq. (2). We represent this process mathematically Here H n0 is a matrix containing the values of H n0 (r) on its diagonals and the matrix operator S n we have defined in Eq. (11) for imaging of a single inelastic transition. Equation (10) must be solved for each inelastic transition of interest. Figure 1 is a diagramatic representation of Eq. (11), Fig. 1(a) shows how S 1 maps Fourier coefficients of the probe wave function to wave functions at the depth of the inelastic transition (b) where we multiply these wave functions by the ionization transition potentials in Fig.1(c) and then propagate them using S 2 , Fig.1(d), to points within the EELS aperture in Fig.1(e).

III. ALGORITHMS
In this section we outline the details of the implementation of the simulation algorithms and write down estimates for the runtimes of each algorithm. We start with the algorithm for calculating STEM-EELS images using the multislice method: Loop over probe positions for x=1 to n x do for y =1 to n y do Initialise illumination wave function ψ(g, 0) = A(g)e −2πi(x,y)·k Loop over slices of specimen for i z = 1 to n z − 1 do Loop over inelastic transitions within slice i z for n = 1 to n states,i do Calculate inelastically scattered electron Propagate ψ n to exit surface of specimen for i z = i z to n z do Multislice to advance ψ n one slice ψ n (g, iz ) = M(∆z)ψ n (h, iz − 1) end for Integrate wavefunction over detector function D(g) and add contribution to STEM image I(x, y) = I(x, y) + |ψ n (g, n z )| 2 D(g)dg end for Multislice to advance ψ 0 one slice ψ(g, iz + 1) = M(∆z)ψ(h, iz) end for end for end for Assuming a constant number of inelastic transitions n states at each slice, which is not true in general but is a useful assumption for estimating the computational complexity of the above algorithm, this algorithm requires = n states n z (n z − 1)/2 + n z multislice operations for each probe position. The arithmetic sum identity n−1 i=0 i = n(n − 1)/2 has been invoked in the above equation.
In our proposed PRISM STEM-EELS algorithm we calculate two scattering matrices at each slice, the first to propagate the probe wave function to the slice of the inelastic transition and the second to propagate the wave function for the inelastically scattered electron to the exit surface.
Initialize scattering matrices Loop through transitions within slice i z for n = 1 to n states,iz do Calculate Eq. (11) S n,iz = S 2 FH n0 F −1 S 1 Apply S n,iz to each illumination vector in the raster scan and add the resulting amplitude to the STEM image I(x, y) = I(x, y) + |S n,iz ψ(g, x, y)| 2 end for An important point to note is that, as defined here, the rows of S 1 correspond to points in reciprocal space within the probe forming aperture and the columns correspond to real space points in the specimen plane, so S 1 is generally a non-square matrix due to the different sampling of these two planes. Each column can be calculated by using multislice to propagate the plane wave components that fall withing the probe forming aperture through the specimen, as shown in Fig. 1 [14]. Conversely, the rows of S 2 correspond to real space points in the specimen plane and the columns correspond to points within the EELS detector. The most efficient way to calculate S 2 is to propagate plane wave components with transverse momenta within the EELS acceptance aperture back through the specimen. This is formally the transpose of the multislice operation defined in Eq. (6), which is represented by a superscript T in the above algorithm description, and is given by The PRISM STEM-EELS algorithm requires n z * [N 1 + N 2 (n z +1)] multislice operations where N 1 is the number of rows in matrix S 1 , which correspond reciprocal space points in the illumination, and N 2 is the number of columns in matrix S 2 , which correspond to reciprocal space points in the EELS detector. Since the PRISM STEM-EELS simulation algorithm is typically more economical with the required number of multislice iterations, the matrix multiplication step, evaluating Eq. (10), tends to be the rate limiting step. This means that although the calculation time is technically quadratic in n z , in many cases run time scaling is instead predominantly determined by the number of unique transitions. If we further assume that the number of transitions in each slice is roughly constant then the scaling will be roughly linear with the number of slices n z . In the upcoming Sec. IV C, we will introduce an approximation that makes the scaling truly linear with n z .
Comparing the run time of the PRISM STEM-EELS algorithm to the conventional multislice approach requires accounting for the relative speed of FFTs to array multiplication. To explore this question we make the approximation that the computation time of a two-dimensional FFT can be parametrized as T FFT = AN 2 log N and that of an array multiplication (as used in the multislice operation) can be parametrized as T mult = BN 2 and that of an array multiplication and summation step (as would occur in a step of a matrix multiplication where a single row and column are multiplied and added to the final result) can be parametrized T addmult = CN 2 .
Here A, B and C are constants and N is the size of the (square) grid using in computation. Measurements of T FFT , T mult and T addmult for different values of N for Matlab are plotted in Fig. 2 for which values of A = 1.0 × 10 −9 , B = 9.0 × 10 −9 and C = 3.8 × 10 −9 were fitted (fitted functions are plotted with dashed lines). However, for the most accurate estimates of the runtime of calculations at a specific pixel grid size N , in the examples provided in this paper we will simply measure T FFT , T mult and T addmult at that pixel grid size N . We also note that the platform and compiler can impact the relative speeds of different algorithms.
We assume a simulation object measuring L × L in the plane perpendicular to beam propagation and of thickness z, which we divide into n z = z/∆z slice. This object would require a STEM scan with a Nyquist sampling of (4Lα) 2 probe positions, where α is the probe forming aperture in units of inverse length [12]. Assuming a constant number of ionization states at each slice, not true in general as atoms of the element of interest might be not be uniformly dispersed throughout the sample, but a useful approximation for the current timing estimates, the computation time for the conventional multislice approach will be given by The speed of the PRISM STEM-EELS calculation will depend on the size of the matrices used. For matrix S 1 , the number of rows will be depend on the sampling of the probe forming aperture function A(q) in Eq. (3) which covers a reciprocal space area of πα 2Å−2 . A simulation cell size of L implies a natural reciprocal space sampling of L pixels per unit of inverse length for the illumination [18] though this can be further reduced to L/f by using a PRISM interpolation factor of fan optimization described in detail in Ref. [14]. The number of rows in the scattering matrix will then be given by πα 2 L 2 /f 2 . By similar reasoning, the second scattering S 2 matrix, which propagates the inelastically scattered electrons from the plane of ionization to the EELS aperture, will have a number of columns equal to πβ 2 L 2 /f 2 where β is the diffraction space size of the EELS aperture. The time required for multislice iterations necessary for the PRISM algorithm is therefore given by The number of columns in S 1 and the number of rows in S 2 will be given by the square of half total number of pixels in the simulation cell (N/2) 2 . These matrices need only have a side-length of N/2 since the output from a multislice calculation is bandwidth limited either to either 1/2 (as in the implementation used for this investigation) or 2/3 of the total array size and only spatial frequencies within this band-limit need be kept in the scattering matrix. For more detail on the need for this bandwidth limiting approach the reader is referred to Sec. 6.8 of Ref. [18]. With reference to the STEM-EELS simulation using scattering matrices we must sum the computation times of the multislice iterations and the matrix multiplications, Where the first term is the time required to calculate the matrix multiplication of S 1 H n0 in Eq. (10), the second term the time required to do the second matrix multiplication (S 1 H n0 )S 2 in Eq. (10) and the third term the time required to do the matrix multiplication S n Ψ for each probe ψ in the STEM raster in Eq. (11). In the following section we discuss a MATLAB implementation of the STEM-EELS simulation algorithms discussed show that the above expressions, Eqs. (15), (16) and (17), are correct estimators of the runtime of these calculations.
Memory requirements for both algorithms also merit discussion. At a minimum the multislice algorithm requires only arrays containing the Fresnel free-space propagator, transmission function, probe and ionization transition potential, so 4 N × N complex valued arrays. If there is sufficient memory, as is the case for most simulations, the transmission functions for all the slices will also be stored rather than calculated on the flyan additional n z N × N complex arrays. The PRISM STEM-EELS algorithm adds the requirement that two scattering matrices be stored in memory, an additional πα 2 /f 2 + πβ 2 /f 2 arrays of size N/2 × N/2. By way of example we discuss the requirements for simulation of the nanoparticle performed in the upcoming Sec. IV D.
The object was sampled on a 1836×1836 grid and partitioned depth wise into 45 slices. The minimum of 4 single-precision complex floating-point arrays for the conventional multislice approach would take up 107 MB of memory. Storing the transmission functions takes up a further 1.21 GB. The number of columns in the first scattering matrix S 1 and the number of rows in the second scattering matrix S 2 are both 325. Each of these S-matrix rows and columns has size equal to a 918×918 grid so approximately 4.38 GB is required to store both matrices. This is a substantially greater amount of memory than is the case for the equivalent multislice calculation, though we note that with current technology most high-end graphics cards have 8 GB or greater of memory and this example calculation is for a larger simulation cell than has typically been attempted before.

IV. RESULTS AND DISCUSSION
A. Implementation of a scattering matrix based STEM-EELS simulation method In this sectio nwe report results from a Matlab implementation of the conventional multislice algorithm and the new PRISM algorithm. This implementation is included in the supplementary materials of this paper. It is included here to provide an accessible demonstration of the two algorithms introduced in the previous Sec. II; the code does not calculate the inelastic transition potentials H n0 given by Eq. (10) but uses a single calculated H n0 outputted from the µSTEM code [10]. As a test case we calculate STEM-EELS images of a single transition ( = 0, m = 0) → ( = 1, m = 1) for ionization of the O 1s orbital (the K-edge) using both the conventional multislice method (red solid line) and the new PRISM approach (blue solid line) for thicknesses between 10 A and 100Å. A 2x2 tiling of the SrTiO3 unit cell (measuring 7.81Å × 7.81Å) specimen and a 160x160 pixel grid was used, parameters which are likely to result in somewhat unconverged calculations but result in faster runtimes for both algorithms and so allow rapid comparison of results. For the purposes of simplifying comparison of conventional multislice and PRISM-EELS results thermal vibrations of the atoms where turned off for this calculation. The actual timings for each algorithm are compared in Fig. 3 with the relevant estimates from Eq. (15) for the multislice case using measured T FFT , T mult and T addmult for a 160x160 pixel grid (red dashed line) and the sum of Eqs. (16) and (17) for the PRISM case (blue dashed line) showing that these equations give reasonable estimates of computation time for these simulations. Images for the PRISM and conventional multislice calculations are shown for the thicknesses 10, 50 and 80Å. We compare differences in the images using the following normalized root mean sum of squares error percentage error metric, ε = 1 100 These are tabulated in Fig. 4, both for the total image (Total error, ε T ) and in a 6 × 6 pixel window centered on the Ti-O column (site error ε S ). Total error ε T is typically less than 0.01% and site error ε S is between 0.001% and 0.02%. This low a discrepancy confirms that both the PRISM and conventional multislice approaches indeed encapsulate the same scattering physics. The PRISM method is faster for all thicknesses and exhibits the pseudo-linear scaling predicted for this algorithm, whilst the scaling of the conventional multislice method is quadratic with thickness.
B. Algorithm speed ups: calculating inelastic transitions on a cropped grid.
Further speed-ups, with a modest penalty to accuracy, can be achieved by only evaluating the matrix multiplication in Eq. (11) in a fraction of the grid centered about the site location of the transition event H n0 (r) rather than over the whole simulation grid. To demonstrate this optimization we simulate STEM-EELS images of a single Mn dopant occupying an Sr site midway through a 50Å thick SrTiO 3 crystal. An Lshell transition for Mn is shown in Fig. 5 where Eq. (11) is (a) evaluated over the full grid (which has a side length of approximately 8Å), and in cropped regions with side-lengths measuring approximately (b) 4Å and (c) 2 A. The size of these windows relative to the transition potential H n0 (r) is indicated in Fig. 5(d). Only the transitions ( = 1, m = 1) → ( = 2, m = 2) and ( = 1, m = −1) → ( = 2, m = −2) for energy loses 1 eV over the ionization thrheshhold where included for the purpose of this demonstration. The percentage errors, relative to the image in Fig. 5(a), are written on images Fig. 5(b) and (c). Both the total image error ε T and error for the atomic site ε S , which is evaluated only for a 1Å window around the Mn atomic position are indicated. The site error ε S is the most relevant metric for our purposes since elemental concentration mapping would typically proceed by integrating the STEM-EELS signal in a window centered on the atomic site and then relating the result to a pre-computed look-up table. A site error of 0.2%, as in Fig. 5(b) is likely acceptable, though a site error of 11.4% as in Fig. 5(b) is likely too high, suggesting the cropping window chosen was too aggressive. Inspection of a linescan in Fig. 5(e) shows that in the image from Fig. 5(c) the long tails of the Mn transition potential H n0 have been cropped out. Figure 5(f) details the significant speed up benefits of this approach. The speed ups are quadratic which is consistent with the array multiplication scaling quadratically with grid pixel size, as the calculation time of Eq. (11) is observed to roughly quarter with each halving of the window in Fig. 5(a). For this particular transition potential a cropping box with side length of 4 A gives the best balance between calculation time speed up and loss of accuracy. Percentage errors in the PRISM STEM-EELS calculations with the inverse multislice optimization relative to conventional multislice calculations as well as the percentage speed up relative to PRISM STEM-EELS calculations without the inverse multislice optimization.

C. Algorithm speed ups: inverse multislice
In the PRISM STEM-EELS algorithm described in Sec. IV A, the scattering matrix S 2 which propagates the inelastically scattered electron wave ψ n from the depth at which ionization occurred to the exit surface is calculated from scratch for each thickness i z . This is a duplication of work, since S 2 was at some point calculated for all thicknesses in the initialization step S 2 = (M) T ) nz . One approach would be to store S 2 for each slice, which is not practical for most calculations given the size of the scattering matrix (a complex numbered array of size πα 2 L 2 /f 2 × N × N ). A second approach which sacrifices some accuracy at the expense of calculation time would be to perform the inverse of the multislice operation (M −1 ) to retreat the scattering matrix S 2 a single slice. The inverse multislice operation is defined M −1 (∆z) ≡F h→r e −iσV (r)∆zF r→g e iπλ∆zg 2 δ gh . (19) Relative to the forward multislice operation defined in Eq. (6) the order of FFT, multiplication, inverse FFT and propagation steps has been reversed and the complexconjugate of the transmission function and propagation operators is used instead. This can introduce some error since a forward multislice iteration can cause electrons to scatter to high angles outside the bandwidth limit of the calculation and these electrons will not be recovered with an inverse operation. For the simulation of the SrTiO 3 in Sec. IV A we report the percentage errors, both total error ε T and site error ε S , in Fig. 6. For this case, the errors relative to conventional multislice calculations are found to be around 1%, which is small whilst the speed up is around 20% relative to PRISM calculations for some of the thicker cases considered without the inverse multislice optimization.

D. Calculations for heterogeneous nanometer scale objects
In Sec. IV A we showed that using the PRISM approach for STEM-EELS simulations resulted in computation times that scale more favourably with specimen thickness and that calculation time can be further reduced by evaluating the inelastic scattering cross section, Eq. (11), only in a fraction of the calculation grid centered on the transition of interest and using the inverse multislice optimization. In this section we demonstrate how these improvements, taken together, allow calculations of much larger objects than has previously been feasible, in this case a Fe-Pt nanoparticle that is approximately 80Å in diameter. The atomic coordinates for this nanoparticle were reconstructed from a STEM-HAADF tomography tilt series in Ref. [16] and the object contains 6,569 Fe atoms and 16,627 Pt atoms. The projected potential of the nanoparticle is plotted in Fig. 7(a) with Pt potential shown in blue and Fe atoms shown red. The potential resulting from only the Fe atoms is plotted seperately in Fig. 7(b). Both potentials have been convolved with a Gaussian function (σ = 2 pixels) to make viewing easier.
We transitions which between them account for just over 90% of total transitions for energy loses 1 eV over the ionization threshold Fe L edge. For the PRISM-EELS calculation, we use a PRISM interpolation factor of 9, such that each individual probe will be effectively calculated on a 10Å × 10Å grid, a probe step of 0.246Å (nyquist sampling) and evaluate the inelastic scattering cross section [Eq. (11)] on a 4Å × 4Å grid (which was seen to give the best trade-off between calculation speed and accuracy in Fig. 5). Then, Eq. (16) and Eq. (17) estimate the run-time of such a calculation to be 2 days. The results of this PRISM STEM-EELS simulation, which in reality took 16 hours, are shown in Fig. 7(c). We also estimate the computation time of a conventional multislice simulation using Eq. (15) for the full nanoparticle with the same probe step size and 2Å slices along the beam direction. If further approximations are made to use only 1/9 of the grid for probe propagation and only evaluating transitions within 4Å of the probe (i.e. those that we deem to have a reasonable chance of being excited), multislice simulation could be run in about 87 days for a single frozen phonon pass [26].
The simulated image in Fig. 7(c) is qualitatively similar to the Fe potential Fig. 7(b) but artifacts due to strong scattering of the beam are evident. In particular, the center of the nanoparticle has a lower intensity than would be expected from the density of Fe, due to the strong high-angle scattering of the electron beam from the heavier Pt atoms. Close inspection of many of the Fe columns in this region reveals small regions of lower intensity, giving rise to a "donut" or "volcano" structure which is evident in the zoomed region shown in the bottom left hand corner of Fig. 7(c). These features result from inelastic scattering of beam electrons by the heavier Pt atoms, and to a smaller extent by the Fe atoms, to high angles outside the acceptance angle of   [16] with Pt atoms indicated in blue and Fe atoms in red, the Fe projected potential is shown separately in (b). The STEM-EELS simulation of the structure in (c), which is corrected by dividing by an incoherent bright-field image in (d) as suggested in Ref. [25]. The annular dark field and incoherent bright-field images are shown in (e) and (f). Figures (g) and (h) plot the relationship between Fe Projected potential and STEM-EELS intensity, averaged in 5× pixel windows, for (c) and (d) respectively.
the EELS detector when the beam is scanned atop an atomic column. Even if these scattered electrons do cause the ionization of an Fe atom they are unlikely to finally contribute toward the final STEM-EELS image. A detailed explanation of this phenomenon can be found in Ref. [25] along with a strategy to obtain an image more amenable to direct interpretation: dividing by a simultaneously recorded STEM incoherent bright field image with detector of equal angular extent to the EELS aperture. This image is displayed in Fig. 7(d) which is indeed observed to be a more faithful representation of the projected electrostatic potential due to Fe in Fig. 7(b). The STEM annular dark field image, formed with a detector of inner angle 60 mrad and the incoherent bright-field image are shown in Figs. 7(e) and (f) for reference.
STEM-EELS simulations of objects of this size allow statistical analysis of STEM-EELS images as shown in Fig. 7(g), which plots the integrated potential just for the Fe atoms (a proxy for the projected Fe density) against the STEM-EELS intensity in for each 5×5 pixel region in the image. The colour of each point in the scatter plot corresponds to the total projected potential from both Fe and Pt with reference to the colorbar in the figure.
The relationship between STEM-EELS intensity and projected Fe density is approximately linear. However there are noticeable systematic deviations with points falling below the trendline tending to be those with a higher total projected potential -a clear demonstration of how strong elastic and inelastic scattering of the beam complicates direct interpretation of the STEM-EELS maps. These systematic errors are mostly remedied by the division of the incoherent bright-field image, which gives an improvement in the Pearson correlation score from 0.935 to 0.976. This correlation value measures the quality of the fitted trendline, showing that there is less systematic deviation from the linear relationship of Fe density and EELS intensity after applying the BF correction step. This insight demonstrates how our faster STEM-EELS algorithm, by virtue of its better scalability to larger simulation grids, can give a valuable insights into interpretation of heterogenous nanoscale STEM-EELS maps.

V. CONCLUSION
We have developed a new algorithm for simulating STEM-EELS results that economises on the number of multislice iterations required. This algorithm should run faster in general as the calculation time typically scales linearly with specimen thickness, whilst the conventional algorithm scales quadratically with specimen thickness. We have shown that with a some penalty to accuracy, even faster calculation times are possible. Finally, we have also shown that our algorithm can be used to simulate larger nanoscale objects than was previously the case.