Finding unprecedentedly low-thermal-conductivity half-Heusler semiconductors via high-throughput materials modeling

The lattice thermal conductivity ({\kappa}{\omega}) is a key property for many potential applications of compounds. Discovery of materials with very low or high {\kappa}{\omega} remains an experimental challenge due to high costs and time-consuming synthesis procedures. High-throughput computational pre-screening is a valuable approach for significantly reducing the set of candidate compounds. In this article, we introduce efficient methods for reliably estimating the bulk {\kappa}{\omega} for a large number of compounds. The algorithms are based on a combination of machine-learning algorithms, physical insights, and automatic ab-initio calculations. We scanned approximately 79,000 half-Heusler entries in the AFLOWLIB.org database. Among the 450 mechanically stable ordered semiconductors identified, we find that {\kappa}{\omega} spans more than two orders of magnitude- a much larger range than that previously thought. {\kappa}{\omega} is lowest for compounds whose elements in equivalent positions have large atomic radii. We then perform a thorough screening of thermodynamical stability that allows to reduce the list to 77 systems. We can then provide a quantitative estimate of {\kappa}{\omega} for this selected range of systems. Three semiconductors having {\kappa}{\omega}<5 W /(m K) are proposed for further experimental study.


INTRODUCTION
High-throughput (HT) computational materials science is a rapidly expanding area of materials research. It merges a plethora of techniques from a variety of disciplines. These include the kinetics and thermodynamics of materials, solid-state physics, artificial intelligence, computer science, and statistics [1]. The application of HT has recently led to new insights and novel compounds in different fields [2][3][4][5][6][7][8][9]. Despite the importance of thermal transport properties for many crucial technologies, there are to date no high-throughput investigations into lattice thermal conductivity.
Here we seek to address this challenge. We concentrate on the lattice thermal conductivity of half-Heusler compounds, as they have great promise for applications as thermoelectric materials [10][11][12][13]. Half-Heusler compounds are ternary solids. Their crystalline structure consists of two atoms (A and B), located in equivalent positions in a rock-salt structure. A third atom (X) sits in an inequivalent position, filling half of the octahedrally coordinated sites (Fig. 1a).
Experimental studies have reported the thermoelectric figure of merit for a small set of these systems and their alloys [14][15][16][17][18]. Theoretical electronic characterizations have been performed for 36 candidates [19]. It has been speculated that their high thermal conductivity, close to 10 W m −1 K −1 , could limit thermoelectric performance [20,21]. At room temperature, the lattice thermal conductivity κ ω represents the largest contribution to the total conductivity.
Promising thermoelectric figures of merit have been reported both for n-type (1.5 at 700 K [22]) and for p-type (0.8 at 1000 K [17]) half-Heuslers. Such values are comparable to the best thermoelectric materials proposed thus far [23]. Those values, however, were not found in ordered half-Heuslers; but rather in alloyed or nanostructured systems. Furthermore, finding ordered compounds with very low κ ω is advantageous, as their electronic mobilities are expected to be higher than in alloys. In addition, alloying the already low-κ ω ordered compounds would lower κ ω even further.
The pool of candidate compounds analyzed in this article is larger than in previous investigations. All possible half-Heusler compounds from all combinations of nonradioactive elements in the periodic table are considered, as included in the AFLOWLIB.org consortium repository [24] (Fig. 1b). The formation enthalpies of the fully relaxed structures are obtained through density functional theory within the AFLOW high-throughput framework [25].
From a total of 79, 057 entries, those with positive formation enthalpies are removed. When several half-Heuslers are related by permutations of elements, only the lowest-enthalpy configurations are considered. Finally, zero-gap compounds are removed from the list. For the surviving subset of 995 compounds, the second-order force constants are characterized with full phonon dispersion curves. This allows further reduction of the set to a total of 450 mechanically stable semiconductors. Although these requirements do not guarantee global thermodynamical stability, metastable compounds with long lifetimes have been synthesized and used [26]. Hence, their inclusion should not be disregarded a priori.
For the 450 resulting stable half-Heuslers, we compute a large set of structural, electronic and harmonic proper-ties. In principle, one could directly compute κ ω for all the compounds. The computational requirements for this approach would be prohibitive. To solve this issue, our strategy is to obtain κ ω for a smaller subset of systems. We use physical insights and machine learning techniques to predict the remaining values. Cross-validation shows that the approach is reliable for rapidly identifying low-κ ω compounds.
Once the main factors correlated with a low thermal conductivity are identified for the 450-HH library, we use the thermodynamical information in the AFLOWLIB.org database to test the stability of these HHs against more than 110, 000 phases. All competing ternary compounds from the Inorganic Crystal Structure Database (ICSD) [27] and all binaries in that database sharing two elements with each HH are included. The final list of thermodynamically stable compounds contains 77 entries. For these we devise and implement a novel approach to compute the lattice thermal conductivity. Our accuracy is better than 50% of the exact calculation, and has a much lower computational cost. This allows to provide estimates of κ ω that can be compared with experiment for 77 thermodynamically stable compounds.

PREDICTING BULK LATTICE THERMAL CONDUCTIVITIES
The general expression for κ ω at temperature T is [28]: where λ denotes the phonon branch index α and wave vector q, k B is Boltzmann's constant, n 0 the Bose-Einstein distribution, ω λ the frequency of phonons, v (z) λ the phonon group velocity in the transport direction z, and τ λ the relaxation time. The relaxation time is determined by third-order derivatives of the total energy with respect to the atomic displacements of any three atoms i, j, and k in directions a b and c (Φ abc ijk , the anharmonic force constants) in a large supercell [29].
In ordered half-Heuslers the dominant source of scattering is due to three-phonon processes, and we can calculate thermal conductivities with the full ab-initio anharmonic characterization [30,31]. For CoSbZr, one of the most thoroughly studied half-Heuslers, we obtain κ ω = 25.0 W m −1 K −1 . The value is very close to a previous theoretical estimate (∼ 22 W m −1 K −1 ) for monocrystalline CoSbZr [19], and slightly higher than the experimental values [32,33] (between 15 and 20 W m −1 K −1 ). Synthesized samples of Refs. [32,33] may contain microstructures and imperfections not considered in our work. Despite of accuracy, "full ab-initio calculations" of κ ω (Eq. 1) are prohibitive for HT studies, due to the computational requirements of the derivatives giving τ λ .
In this section, we present two different approaches circumventing the limitation. The first method is based on the empirical observation that the force constants show high degree of transferability between compounds sharing crystal structure [34]. This suggests that a single set of anharmonic force constants could be used to get an estimate of the bulk κ ω . We call this thermal conductivity calculated with "transferred" forces κ transf (see Table II).
We desired to preserve the choice between equivalent positions for maximizing transferability. Thus, instead of taking the anharmonic force constants of a particular half-Heusler compound, we choose those of Mg 2 Si. This compound shares the half-Heusler lattice with sites A and B occupied by Mg atoms. For cross-validation, we also fully compute the anharmonic force constants of 32 half-Heusler systems. These are randomly selected with uniform probability inside the convex hull of Fig. 2a, to ensure a wide variety of harmonic/anharmonic features. Comparison between κ ω and κ transf indicates that-although the latter has limited quantitative precision-the qualitative agreement is very good; with a Spearman rank correlation coefficient of 0.93. Hence, the descriptor can be effectively used to separate compounds having high or low κ ω . Note that we chose the Spearman rank correlation [35] instead of the usual Pearson's. The former is invariant under any monotonic transformation of one or both variables, and takes values ±1 for any strict monotonic (not just linear) dependence.
The second proposed approach is based on a completely different direction: we use "random-forest regression" by leveraging the 32 fully calculated κ ω as a training set. We can then employ the fitted model to predict the remaining conductivities. We call these predictions κ forest (see Table  II). Random forests [36] are a general classification and regression algorithms and the are well adapted to dependent input data. They have already been successfully applied to numerous problems [37,38], including compound classification [39]. Here, the 32 compounds represent only around 7% of the mechanically stable half-Heuslers. Our input data comprises a large set of descriptor variables, which is expected to correlate with κ ω (supplementary materials) but is less expensive to obtain. Descriptors include: • A priori chemical information: atomic number and weight, position in the periodic table, atomic radius, Pauling electronegativity [40], and Pettifor's chemical scale χ (Ref. 41).
• General compound information: lattice constant a latt , band gap, formation enthalpy, effective masses of electrons and holes, Born effective charges and dielectric tensor.
• Specific thermal conductivity information: specific heat c v , spherically-averaged speed of sound c s , scaled nanograined-limit thermal conductivitỹ κ grain and phase-space volume available for threephonon scattering processes P 3 .  κ ω  These results are then used as the training set for the random-forest predictions. An estimate of the relative standard deviation of κ forest for each compound in the training set, as obtained using repeated 4-fold cross-validation, is also included. Compounds are always labeled with the element in position X first.

Label Definition κ ω
Lattice contribution to κ from the "full calculation" κ transf Approximated κ ω with anharmonic force constants from Mg 2 Si κ forest κ ω obtained from random-forest regression κ anh κ ω obtained with four exact anharmonic force constants and a linear model for the remaining κ e Electronic contribution to κ κ grain Scaled nanograined-limit κ ω After an exploratory phase, we conclude that a satisfactory fit can be safely achieved using only a priori data. The random-forest method is performed in three steps. First, a large ensemble of decision trees is built by randomly selecting subsets of descriptors and observations. Second, the predictions of all trees are obtained for each data point. Third, the mode (for classification) or the mean (for regression) are taken as the result from the whole ensemble. The algorithm also provides an intrinsic metric to evaluate the importance of each descriptor. This is defined in relation to the effect of randomly permuting the values of that variable on the result [36] (the less resilient upon permutation, the more important).
The prediction of each tree in a random forest can only be a value from the training set, and thus the result of the regression is a weighted average. This average is bounded by the minimum and maximum values within the training data. A small set is unlikely to contain elements having extreme values. Hence, our random-forest regression is expected to have a marked centralizing effect, yielding values tightly grouped around their mean. The frequency densities of both κ transf and this new κ forest are displayed in Fig. 2b. The latter avoids extreme predictions with non-physical magnitudes, a result of the aforementioned centralizing effect.
In this sense, machine-learning algorithms outperform crude extrapolations such as those behind κ transf . Additionally, κ forest has the advantage that its predictions can be refined with controlled accuracy by changing the size of the training set. Even so, the Spearman rank corre- lation coefficient between κ transf and κ forest is still 0.66, corroborating the validity of the analysis based on κ forest . Furthermore, we find that κ forest is strongly correlated with physical descriptors like c v ,κ grain , and P 3 . This confirms our earlier speculation about these methods.
An important concern when training a machinelearning model is whether the training set is diverse or representative enough to justify extrapolating the model to the remaining elements. The values of κ ω needed for direct validation of the predicted κ forest are unavailable. Thus, we resort to a repeated 4-fold cross-validation among the data points in the training set to obtain an estimate of the out-of-sample error. More specifically, we evenly split our training set into 4 subsets. Then, we obtain a random-forest prediction for the HHs in each of the subsets by using only the remaining 75% of compounds as the new training set. We repeat the process 10 times for different divisions of the data, and compute the standard deviation of these predictions. The results are included in Table. I. These estimates support the notion that the model behind κ forest is reasonably insensitive to our choice of training sets. For each cross-validation, we compute the Spearman rank correlation coefficient between the out-of-sample random-forest results for the 32 training compounds and their κ ω . The median value of these Spearman rank correlation coefficients is 0.74, corroborating κ forest as a reliable tool for predicting compounds' ordering.
The ordering predicted by descriptor κ forest is strongly correlated with that of κ ω . This allows to pinpoint the main factors determining high or low thermal conductivities. The bimodal shape of the distribution in Fig. 2b suggests that two groups of half-Heuslers can be identified, with thermal conductivities spread around two different values. A robust version of the "k-means" algorithm [42] is employed to optimally place the medians of the low-and high-thermal-conductivity classes at 4.50 and 23.1 W m −1 K −1 , respectively. By analyzing the importance of variables in the classification, we identify a low Pettifor scale χ X and a large average Pauling electronegativityē AB as the most critical descriptors for low conductivity (supplementary materials).
Given the underlying correlations, many different choices can be used for the classification. A trend can even be suggested on the grounds of atomic radii by following a chain of correlations: if the two elements in equivalent positions are chosen so that their average radius is larger than 150 pm, then the probability of the compound being in the low-κ ω class is 84%. Physically, this follows from the fact that κ ω is highly correlated with the specific heat c v (Fig. 2a). The latter is strongly negatively correlated with the lattice parameter a latt : the larger a latt the lower c v [57].
In addition, a latt correlates well with the sum of the atomic radii of the three elements, quantities known a priori. The atomic radii of the species in positions X concentrate around the average value. This leads to an accurate prediction of a latt by using only the average atomic radius of atoms in positions A and B,r AB . A larger AB causes a large lattice constant, small specific heat, and finally, low thermal conductivity. Alternatively, the lattice parameter can be used as a good discriminant: panel (c) in Fig. 2 is a "violin plot" illustrating the distribution of a latt in the classes of half-Heuslers with low and high thermal conductivities. Also, as it can be seen in Fig. 2a our choice of easily computable descriptors such as c v and P 3 is supported by the result of this classification.
Our calculations are for the true bulk lattice thermal conductivity. They are unrelated to the minimum value proposed by other authors [43,44]. Nevertheless, some of the κ ω obtained directly seem ultra-low. They are even lower than ∼ 0.70 W m −1 K −1 , as reported in literature for AgSbTe 2 and AgBiSe 2 [45], and described as close to the achievable minimum. However, the minimum depends on the compound's structure. Even within the most stringent hypothesis of the shortest possible mean free path equal to interatomic spacing, the lowest found κ ω is much higher than the theoretical minimum. Therefore, none of our predicted values violates the minimum lattice thermal conductivity. Note also that, once the goal of reducing the κ ω under 1 W m −1 K −1 is achieved, its precise value loses relevancy as it is overtaken by the contribution of charge carriers, κ e .

SCREENING FOR THERMODYNAMICAL STABILITY
The ingredients of κ ω for bulk ordered semiconductors depend only on a semilocal characterization of the potential energy surface around the equilibrium configuration. Hence, mechanical stability is sufficient to permit the calculation of the lattice thermal conductivity of a HH. For the analysis performed in the previous section, having the set of 450 mechanically stable HHs reduced and biased by external considerations such as thermodynamical stability would be detrimental to the performance of machinelearning techniques.
On the other hand, in order to propose particular candidates for experimentation we must maximize the probability that they can be obtained in the laboratory. To this end, we obtain the ternary phase diagrams for each of the 450 mechanically stable HHs. This involves taking into account the formation enthalpies of a large number of possible competing phases. These include but are not limited to all relevant binary and ternary compounds in the ICSD [27]. More specifically, all the elemental compounds, 109, 136 binary structures and 4, 363 ternary phases were considered. Many of these phases were already present in AFLOWLIB.org; others were computed specifically for this work. The total number of DFT calculations necessary to obtain the results presented here exceeds 300, 000. Our thermodynamic analysis reveals that 79 of the 450 HHs are thermodynamically stable. Spin-polarized calculations reveal that two of the 79 have semimetallic ground states. Then, only the remaining 77 compounds are further considered. The ternary phase diagrams of the final 77 systems are included in the supplementary materials.
76 of the 77 predicted stable compounds satisfy the octet or expanded octet rules by virtue of having 8 or 18 valence electrons per unit cell, respectively. We compared these numbers with the frequency distribution of valence electron counts in the initial 79, 057-HH library. We conclude that the conditional probabilities of compounds having 8 or 18 valence electrons per unit cell being stable are 1.2% and 3.8%, respectively. While still small, the conditional likelihood of a compound satisfy-ing one of these rules making it through all the filtering steps is much higher than the 0.1% a priori probability. Fig. 3 shows the distribution of the valence during the reduction of prototypes' list. Only the compound LaClSe (valence=16) does not seem to follow the rule. Even among the reduced list, κ forest still spans more than one order of magnitude, its extreme values being 2.33 and 40.3 W m −1 K −1 , reinforcing our previous conclusions.

A DESCRIPTOR WITH QUANTITATIVE POWER
Neither of the two descriptors of κ ω presented so far contains any information about the anharmonic interatomic force constants (IFCs) of each compound. On one hand, the last round of thermodynamical screening puts the number of surviving HHs within the limits of what can be realistically considered for anharmonic calculations. On the other, the qualitative success of κ transf shows that a detailed anharmonic description is not required. To enhance our estimates of the thermal conductivity of stable half-Heuslers, in this section we present a new machine-learning descriptor of κ ω that integrates only the crucial pieces of the anharmonic properties of the solid. This aids in achieving quantitative accuracy with a much lower computational cost than the full calculation.
Crystallographic symmetries and the equality of mixed partials impose linear constraints on the anharmonic IFCs. With the parameters described in the "Methods" section below and those constraints, we are left with 737 independent anharmonic IFCs per compound. However, many elements of this set are correlated among them, and others are too small to have a decisive role in the value of κ ω . To quantify these assertions, we perform principal component analysis [46] on the third-order IFCs for the 32 compounds in Table. I.
We find that the first four components account for ∼ 99% of the variance in the set. From the results we can extract an expression for each of the 737 IFCs as a linear combination of those components. Then we perform a multivariable multiple linear regression of the four components on four large and weakly correlated IFCs. By combining the two results, we arrive at a linear model for the whole set of anharmonic IFCs in terms of four parameters that can be obtained with 16 DFT calculations per compound. We use the term κ ω to describe the third-order IFCs thus reconstructed, and κ anh for the second-order IFCs for each compound. The blue circles in Fig. 4a show a comparison between κ anh and the exact κ ω for the 32 compounds in the training set. With two exceptions (compounds with comparatively very high thermal conductivities) this new descriptor yields excellent quantitative estimates of κ ω . Moreover, 4-fold cross-validation shows that it is insensitive to the particular choice of training set. As a final test, we perform full thermal conductivity calculations for four compounds selected at random from those outside the training set: AgBaSb, AgNaTe, InCdY and TlLaMg. The results are depicted as red crosses in Fig. 4a. This shows that the quality of the prediction is as good as for the 32 training compounds.
The distribution of κ anh over the 77 thermodynamically stable HHs (Fig. 4b) confirms the presence in the sample of compounds with thermal conductivities much lower that the 10 − 20 W m −1 K −1 . This is characteristic of experimentally measured HHs. The values of κ anh for the 77 stable HHs is listed in Table III. Notably, the subset of 10 thermodynamically stable half-Heuslers for which κ ω was directly computed already contains BiBaK, with κ ω = 2.20 W m −1 K −1 . Outside of the training sample, the lowest κ anh values are 1.72, 2.84 and 3.49 for PtLaSb, RhLaTe and SbNaSr, respectively.

CONCLUSIONS
In this article, we have presented three computational methods for estimating the bulk κ ω of a large library of half-Heusler compounds. We surmount the formidable task of full ab-initio characterization. We find that κ ω is spread over more than two orders of magnitude over mechanically stable half-Heuslers. This is a much broader range than that suggested by limited experimental available data. By using a set of descriptors and random-forest regression, we have built and tested an effective classification model. We found that compounds are most likely to have low thermal conductivity if the average atomic radius of the atoms in structural positions A and B is large. This also correlates with large lattice parameters and low specific heats.
Extensive thermodynamical calculations allow to remove from the list compounds with more stable competing phases. We employ our third method, with better quantitative accuracy and higher computational cost, to perform a finer analysis of the distribution of κ ω over the reduced library. We conclude ordered half-Heusler compounds with κ ω 3 W m −1 K −1 value (a factor of three below the best scenarios for ordered compounds, and comparable to alloyed systems) very likely exist. The results corroborate the competitiveness of machine-learning methods in accelerated material design [1].

AFLOWLIB library of half-Heusler systems.
The 79, 057 half-Heusler systems are calculated with the high-throughput framework AFLOW [4,25,47,48] based on ab-initio calculations of the energies by the VASP software [49] with projector augmented waves (PAW) pseudopotentials [50], and Perdew, Burke and Ernzerhof exchange-correlation functionals [51]. The AFLOWLIB energies are calculated at zero temperature and pressure, with spin polarization and without zero-point motion or lattice vibrations. All crystal structures are fully relaxed (cell volume and shape and the basis atom coordinates inside the cell). Numerical convergence to about 1 meV atom −1 is ensured by a high energy cutoff (30% higher than the highest energy cutoff for the pseudopotentials of the components) and by the dense 6,000 kpoints per reciprocal atom Monkhorst-Pack meshes [52].
Interatomic force constants. 3 × 3 × 3 supercells are used in second-order IFC calculations. The Phonopy [53] package is used to generate a minimal set of atomic displacements by harnessing the point and translational  symmetries of the crystal structure, and custom software was developed in order to do the same in anharmonic IFC calculations. For those, 4×4×4 supercells are generated and a cutoff radius of 0.85a latt is imposed on the interactions. 2×2×2 and 3×3×3 Monkhorst-Pack k-point grids are employed, and spin polarization excluded to improve speed. Solution of the Boltzmann transport equation. Our self-consistent iterative approach is described in detail in Ref. 30. Both three-phonon processes and the natural isotopic distribution of each element are taken into account as sources of scattering. A Gaussian smearing scheme with adaptive breadth [31] is chosen for integrations in the Brillouin zone. When using anharmonic IFCs from Mg 2 Si to approximate κ ω for all materials, the solution to the Boltzmann transport equation failed to converge for 5 compounds, which are consequently excluded from the associated analysis.
Regression and classification. The R statistical computing environment [54] is chosen for all statistical analyses. Random-forest models are used as implemented in the "randomForest" package [55]. As a check, all regressions and classifications are repeated using a generalized boosted tree algorithm [56]; in all cases the results are found to be in good agreement with those afforded by random forests.