The landscape of promising non-supersymmetric string models

Leptoquarks extending the Standard Model (SM) are attracting an increasing attention in the recent literature. Hence, the identification of 4D SM-like models and the classification of allowed leptoquarks from strings is an important step in the study of string phenomenology. We perform the most extensive search for SM-like models from the non-supersymmetric heterotic string $\mathrm{SO}(16)\times\mathrm{SO}(16)$, resulting in more than 170,000 inequivalent promising string models from 138 Abelian toroidal orbifolds. We explore the 4D massless particle spectra of these models in order to identify all exotics beside the three generations of quarks and leptons. Hereby, we learn which leptoquark can be realized in this string setup. Moreover, we analyze the number of SM Higgs doublets which is generically larger than one. Then, we identify SM-like models with a minimal particle content. These so-called"almost SM"models appear most frequently in the orbifold geometries $\mathbb Z_2\times\mathbb Z_4$ (2,4) and (1,6). Finally, we apply machine learning to our dataset in order to predict the orbifold geometry where a given particle spectrum can be found most likely.

One can define an orbifold as the quotient of a six-dimensional (6D) torus over a discrete set of its isometries, among which the rotational isometries build the so-called point group. There are 7103 admissible point groups in six dimensions. 52 of them can leave N = 1 supersymmetry unbroken in 4D [30], out of which 17 are Abelian (Z N and Z N ×Z M for various orders M and N ). These 17 point groups give rise to in total 138 inequivalent orbifolds. These orbifolds have been the starting point for many studies, trying to connect string theory to the supersymmetric extension of the SM. In order to contrast SUSY and non-SUSY phenomenology, in this work we focus on orbifold compactifications of the non-SUSY SO(16)×SO (16) heterotic string based on those 138 orbifold geometries.
The quest to connect heterotic string theory with non-SUSY particle phenomenology is not new. Models with promising properties have been constructed using orbifolds in the bosonic [31] and fermionic formulation [32][33][34][35], Calabi-Yau manifolds [36], and so-called coordinate-dependent compactifications [37,38]. Also, string models with spontaneously broken SUSY have been considered [39][40][41]. However, our present study represents the most extensive search up to now for non-SUSY string models that reproduce features of the Standard Model (SM) of particle physics.
By using the orbifolder program [42], modified to the construction of string models without SUSY, we obtain more than 170,000 inequivalent SM-like models from the 138 orbifold geometries of interest, as summarized in table 1. This enormous landscape of SM-like models invites to pose questions that may hint towards fruitful corners in the landscape where the best phenomenology could emerge. Such questions include: • What kind of exotic matter fields can we obtain from SM-like string models?
• Are they useful to tackle some of the open questions of particle physics, such as the question of dark matter or the g µ − 2 puzzle? (See also refs. [47,48] for other string approaches to explain the g µ − 2 discrepancy within D-brane string compactifications.) • Is the origin of the SM encoded in the properties of particular orbifold geometries?
In particular, the questions on leptoquarks are motivated by the recent enhancement of the g µ − 2 anomaly, which has triggered a renewed interest in this area [49][50][51][52][53][54][55][56]. Additionally, leptoquarks have been long regarded as viable candidates for dark matter or solutions to some other flavor issues [57,58], though not completely free of challenges (see e.g. [59]). We do no attempt to address the phenomenology of stringy leptoquarks. Our work establishes the foundation of future endeavors in this direction, which consists in describing what leptoquarks can be realized in string constructions. In this work, we provide some tools to address these questions by either inspecting systematically the properties of the identified models or applying machine learning techniques, as has been done recently in the SUSY case [15,16,[60][61][62][63][64][65][66][67][68][69][70][71][72][73].
The content of this work is structured as follows. In section 2, we discuss the setting of our search for SM-like string models and provide an overview of our results. In section 3, we analyze the massless spectra of our SM-like models in order to i) identify the most promising cases, dubbed here almost SM, and ii) uncover patterns in string theory that may lead to the best phenomenology. General features of the spectra of our models are deferred to the appendices. In section 4, we illustrate the qualities of our models by discussing some properties of a couple of sample models. Finally, in section 5, we give our conclusions and outlook.

The landscape of non-supersymmetric heterotic orbifolds
We consider the heterotic string without SUSY in D = 10 with gauge group SO(16) × SO (16). This theory can be obtained from the supersymmetric heterotic string with gauge group E 8 ×E 8 in the bosonic or fermionic formulation [28,29]. The 10D massless spectrum of this non-SUSY string theory is tachyon and anomaly free, and consists of 240 gauge bosons, 256 spinors and 256 cospinors. The dilaton, graviton and Kalb-Ramond field constitute its gravity sector.
In this work, in order to contrast SUSY and non-SUSY compactifications, we focus on the 138 orbifold geometries classified in ref. [30]. That is, we follow the traditional prescription to arrive at 4D models by orbifold compactifications, see e.g. refs. [31,74,75]. In some detail, we define an orbifold geometry as the quotient where S is a so-called space group, whose elements are specified as g = (ϑ, λ). The so-called twists ϑ generate a (rotational) point group P ⊂ O(6), whereas λ correspond to translations.
Space group elements act hence on the spatial coordinates y ∈ R 6 of the extra dimensions according to y In some cases, λ is an element of the 6D torus lattice Λ = {n α e α | n α ∈ Z, α = 1, . . . , 6}, where {e α } is the basis of Λ. Space group elements with λ ∈ Λ are called roto-translations.
In the absence of roto-translations, the orbifold can be defined also as O = T 6 /P , where T 6 = R 6 /Λ. It is evident that P must then be a symmetry of Λ. Thus, in general, for each point group there are various orbifold geometries, as different Λ can have the same point group symmetry. We are interested in toroidal orbifolds with and without roto-translations, where P is Abelian. The details of the 138 space groups associated with Z N and Z N × Z M point groups that we consider here were systematically obtained in ref. [30] in the context of SUSY compactifications. These are equally useful to arrive at consistent 4D models from the non-SUSY heterotic string SO(16)×SO (16). We shall explore all of them to find phenomenologically promising 4D non-SUSY models, which we call SM-like models.
Each orbifold geometry, characterized by a space group S, leads to a myriad of effective field theories in 4D, with a given gauge group G 4D and massless spectrum of matter fields building representations of G 4D , where we take all fermions to be left-chiral. These result from embedding the chosen orbifold geometry into the gauge degrees of freedom. These embeddings can be defined by a 16D shift vector V i for each rotational generator of the point group, and up to six 16D Wilson lines W α , α = 1, . . . , 6, subject to consistency conditions including especially modular invariance (see refs. [31, sec. 3.2] and [76]). Standard techniques yield then the gauge groups and massless matter spectra on which we focus in this work.
We define a SM-like model by the following properties of the 4D gauge group and massless matter spectrum: group, G hidden is a non-Abelian gauge group, usually built as a product of SU(N ) group factors, and n > 0 is an integer number subject to the condition that rank(G 4D ) = 16. G hidden is considered "hidden" because (almost) none of the SM fields is charged under this group.
• The 4D massless spectrum consists of exactly three generations of chiral fermions for quarks and leptons (including three right-handed neutrinos) and at least one Higgs doublet, a number of exotic fermions that are vector-like with respect to the SM, exotic scalars, and several SM-singlet scalars and fermions. In this way, all exotics can in principle be decoupled without breaking the SM gauge group.
The SM hypercharge is non-anomalous and compatible with SU(5) grand unification. In most cases, one of the additional U(1) s appears anomalous (where the anomaly is canceled by the Green-Schwarz mechanism [77]). Note that an arbitrary number of (vector-like) exotics, Higgs doublets and singlets arising from the compactifications are allowed for a SM-like model.
With the goal of performing an extensive search for SM-like models, we use the orbifolder, which we adapted to perform these non-SUSY compactifications. The orbifolder creates randomly and consistently the essential parameters to construct inequivalent and (perturbatively) tachyon-free SM-like models and computes their massless matter spectra. 1 Using this tool and exploring all 138 orbifold geometries we find SM-like models in orbifold # models orbifold # models orbifold # models orbifold # models orbifold # models  Table 1: Total number of inequivalent SM-like string models obtained in our extensive search using 138 orbifold geometries classified in ref. [30]. We find 170,219 inequivalent SM-like models in 104 orbifold geometries. In the "orbifold" columns we label the considered orbifold geometries by their point group (Z N or Z N ×Z M ) and the pair (i, j). The latter refer to the i-th torus lattice and the j-th rototranslation element, following the notation of ref. [30]. The columns labeled by "# models" display the number of SM-like models for the corresponding orbifold geometry.
104 out of 138 orbifold geometries. Our results are presented in table 1. We find a total of 170,219 inequivalent promising models, where 169,177 (1,042) belong to the Z N ×Z M (Z N ) orbifold geometries. The largest number of SM-like models was found in the Z 2 ×Z 4 (Z 8 -II) orbifold geometries with 147,996 (423) SM-like models, which reveals a common feature between SUSY and non-SUSY promising orbifold compactifications, see e.g. [14][15][16]. Our results represent, as far as we know, the most extensive search for SM-like models from string theory. Yet our search is not exhaustive. In particular, about 1,000 SM-like models with point groups Z 8 -I and Z 2 ×Z 2 were identified before in ref. [31] and do not appear in our current search.
orbifold action on the string's gauge degrees of freedom, and verifies their consistency under the worldsheet modular invariance conditions [31,76].
3 Exploring the SM-like models 3

.1 Vector-like exotics and Higgs doublets
We are now interested in knowing explicitly the types of vector-like exotic (VLE) representations and the number of Higgses that appear in all 170,219 identified SM-like models. The motivation of this study of the particle content is twofold. First, we aim at identifying the most promising SM-like candidates, i.e. those whose features best fit known observations. Secondly, among the VLE matter found in these constructions, inspecting the qualities of the leptoquark sector may be relevant for diverse phenomenological questions (see e.g. [49,52,57,58,78]), including the recent enhancement of the muon g µ − 2 anomaly.
Exotic matter refers to representations of G SM = SU(3) C × SU(2) L × U(1) Y appearing in the 4D massless spectrum of an orbifold compactification, beyond the three generations of SM fermions, including three right-handed neutrinos, and one Higgs doublet. Further, to be characterized as vector-like, i) each exotic fermion must be accompanied by another fermion with the exact opposite charges, or ii) it must be a scalar. Abusing of the term, we shall count as VLE matter also additional fermionic singlets under G SM , and scalar SM-singlets. The former may play the role of sterile right-handed neutrinos (see e.g. ref. [20] for its SUSY equivalent), and the latter can be regarded as scalar dark matter candidates in the framework of Higgs portals [79,80], or also flavon fields in the Froggatt-Nielsen mechanism [81,82]. We shall refer to the latter simply as flavons here.
We report our findings on the different types of exotics in tables 3-6 of appendix A. We list all types of VLE matter representations with respect to the SM gauge group. We find 26 kinds of VLE fermions and other 26 representations for exotic scalars. Tables 3 and 4 show the percentage of models that exhibit any of the different exotic fermion or scalar representations.
We observe that fermion and scalar singlets are always present in all the models from Z N and Z N × Z M orbifolds. Tables 5 and 6 present the average numbers of exotic fermions and scalars, respectively.
As a sample case, consider the 155 SM-like models that arise from orbifolds with Z 3 point group. The second column of table 3 shows that the only exotic fermions with standard SM quantum numbers are down-type quark singlets and lepton doublets, which appear in most of these models. There are on average about three of these states, as we can see in table 5. Further, about half of the models exhibit many kinds of fractionally charged fermions [83].
Concerning the exotic scalars, in table 4 we observe that Z 3 SM-like models exhibit generically various types of (scalar) leptoquarks. In the notation of ref. [44], we identify the leptoquarks R 2 : (3, 2)1 /6 andS 1 : (3, 1)−2 /3 in about half of the Z 3 SM-like models, and S 1 : (3, 1)1 /3 in all models of this orbifold geometry. We see in table 6 that they are not very abundant in these models: there are on average ∼ 1.5 leptoquarksR 2 andS 1 , while the mean value of the multiplicity of S 1 leptoquarks is about 5.6 in these models.
Interestingly, our tables reveal that the leptoquark scalars S 1 ,S 1 andR 2 identified in the Z 3 example are generic in all SM-like orbifold models. No other leptoquarks appear. As we shall shortly see, the existence of these leptoquarks might be related to a string-specific structure of localized strings in extra dimensions related to an SU(5) grand unification, so-called local GUTs.
From tables 4 and 6, we note that there is a large number of scalar fields with SM quantum numbers (1, 2)1 /2 . These fields correspond to Higgs doublets in our SM-like models. Thus, we find different extensions of the SM with various numbers (from 1 to 55) of Higgs doublets, such as those previously studied from a bottom-up perspective [84][85][86]. For the different point groups of our orbifold geometries, we display in table 7 the number of Higgses we find in all 170,219 SM-like models. We see that only the Z 6 -I orbifold geometries yield models with just one Higgs doublet (in 13 out of 64 models). There are 3,192 SM-like models with two Higgs doublets distributed in the Z 8 -II, Z 2 ×Z 4 , Z 2 ×Z 6 -I and Z 3 ×Z 3 orbifold geometries. Higher multiplicities of Higgs doublets seem favored in our constructions, where most models are endowed with 11, 9 or 15 Higgs doublets (20,377, 16,657 and 16,484 SM-like models in each case). Although no extra Higgs fields have been observed, they might have interesting implications especially for dark matter and Higgs phenomenology [87][88][89], and for an explanation of the g µ −2 tension [90][91][92][93][94].
As we show in appendix C, there are high correlations between the numbers of different VLE representations appearing in our matter spectra. Especially, we find almost perfect correlations among the scalar leptoquarksS 1 ,R 2 and the charged scalars (1, 1) 1 . Further, the appearance of leptoquarks S 1 and extra Higgs doublets φ : (1, 2)1 /2 is correlated, too. We note that a scalar 10-plet of SU(5) precisely decomposes as 10 =R 2 ⊕S 1 ⊕ (1, 1) 1 , while a scalar 5-plet is built by 5 = S * 1 ⊕ φ, where S * 1 is the complex conjugated scalar 2 . Thus, the observed correlations suggest a common origin for these fields. It is known that heterotic orbifolds can produce so-called local GUTs [12,[95][96][97]. In these scenarios, the gauge symmetry is enhanced to an SU(5) GUT not in 4D, but locally in extra dimensions at the orbifold singularities, where full GUT multiplets are realized. Hence, the number of leptoquarks seems to be related to these local GUTs.
Finally, we compare our findings on VLE with previous results in the context of MSSMlike models that result from heterotic orbifold compactifications [16]. We see a few differences. The most evident difference is that, in general, SUSY models seem to produce more exotic representations than our SM-like models. However, exotic fermions with charges (3, 2)5 /6 show up in our models (cf. table 3 of appendix A), but do not appear in MSSM-like orbifold models. In contrast, we observe several similarities between the massless spectra of MSSM-like and SM-like string models: there are roughly between 50 and 200 of SM singlets (neutrinos and flavons); the most common exotics with SM quantum numbers are (3, 1)1 /3 and (1, 2)−1 /2 , suggesting a local GUT picture with5-plets of SU(5) localized at some orbifold singularities, as mentioned before; following the classification of ref. [44] and considering SUSY breakdown in MSSM-like models, the only possible leptoquarks in all semi-realistic models are just S 1 ,S 1 andR 2 ; and the most common fractionally charged fields have SM charges (1, 1)1 /2 and (1, 2) 0 .

Almost SM models from heterotic orbifolds
We would like now to identify in the landscape of non-SUSY heterotic orbifolds the models that best reproduce the particle content of the SM. With this purpose, we inspect systematically the spectra of our models to select those that contain three SM generations of fermions and the standard Higgs doublet, along with the least amount of exotic matter. Since SM singlets could play an important phenomenological role either as extra sterile neutrinos if they are fermions, or flavons or dark matter if the are scalars, we shall not count them as exotic states here.
The special SM-like models whose massless matter spectra display the closest resemblance with the SM are called here almost SM. These models can be classified in two categories: • Models with no exotic fermions. We find 45 almost SM of this kind, distributed in the orbifold geometries Z 2 ×Z 4 (1,6), Z 2 ×Z 4 (2,4) and Z 3 ×Z 3 (1,4), as summarized in table 8. The spectra of these models include, besides the three SM generations, only S 1 leptoquark scalars accompanied by various numbers of extra Higgs doublets, righthanded neutrinos and SM-singlet scalars. As we see in table 9, we find that there are only three of these models with the minimal number of Higgs doublets (six) in this category.
• Models with no exotic scalars. There are 502 almost SM models of this category, distributed in ten different Z 2 ×Z 4 orbifold geometries. As displayed in table 8, the scalar sector of their spectra include six Higgs doublets (one of them would be the standard Higgs), and SM-singlet scalars. In the fermionic sector, beyond several right-handed neutrinos, the only exotic fermions have quantum numbers (3, 1)1 /3 and (1, 2)−1 /2 (plus their complex conjugates), mostly originated from full multiplets of SU(5) local GUTs.
Additional details, such as the shift vectors and Wilson lines of these selected 547 almost SM models, can be found in our website [98] in a format compatible with the (non-SUSY)

orbifolder.
Some comments are in order. First, there is no SM-like model without exotics. Second, the existence of a few exotics in the models identified as almost SM might be phenomenologically challenging. For example, the S 1 scalar leptoquarks of models without exotic fermions could lead to rapid proton decay if their couplings with first-generation quarks are unsuppressed and they do not develop very large masses. Fortunately, in principle, all leptoquarks in this case and all exotic fermions in the second category of almost SM models can be decoupled, when some flavons attain vacuum expectation values (VEV). The details of this mechanism are beyond the scope of this paper and shall be discussed elsewhere.
As a last comment, let us mention the possibility of SM-like models with only one Higgs doublet, see ref. [31]. We only find seven (six) models of this type, arising from the Z 6 -I (1,1) (Z 6 -I (2,1)) orbifold geometries. Unfortunately, as shown in table 8, these models include several VLE. In the scalar sector, beside 71 flavon or dark matter candidates on average, they include two S 1 leptoquarks and the exotic representations (3, 1)1 /6 , (1, 2) 0 and (1, 1)1 /2 . In the fermionic sector, in addition to about 200 right-handed neutrinos, we see extra vector-like pairs of down-type quark singlets and lepton doublets, as well as fractionally charged exotics in the representations (3, 1)1 /6 , (1, 2) 0 and (1, 1)1 /2 (paired up with their complex conjugates). Although the exotics could in principle be decoupled from the low-energy effective theory, these models seem to be in worse shape than our almost SM.

Predicting the stringy origin of the SM with machine learning
The previous observations based on a systematic, though limited, search reveal some of the general properties of the matter spectra of a subset of all possible SM-like string models. Fortunately, by using machine learning (ML) techniques, as in ref. [16], we can learn more.
Here, based on the information provided by the identified models, we obtain an ML algorithm that predicts the specific orbifold geometry that most likely hosts a SM-like string model with a given particle content of exotics. We address this task using supervised machine learning and evaluate the quality of our algorithms using the accuracy and the f1-macro. The accuracy of our predictive ML algorithm is given by the number of correct predictions divided by the total number of predictions. On the other hand, the f1-macro is computed as the average of the f1-scores of each of the 104 orbifold geometries. Since our dataset is imbalanced, the f1-macro is more suitable for our task, see for example section 3.1 in ref. [16].
In order to compare to the accuracy and the f1-macro of a "good" ML algorithm, we first compute the so-called null value. The null value is based on the trivial algorithm that always predicts the orbifold geometry that appears most frequently in our dataset of 170,219 inequivalent SM-like models, independently of the given particle spectrum. In our dataset, Z 2 ×Z 4 (1,5) is the orbifold geometry that yields the largest number of SM-like models, with a total number of 9,388 SM-like models. Hence, we can estimate the accuracy of the trivial algorithm based on our dataset: it gives a correct prediction with a probability of 5.5%, i.e. null value accuracy : 9, 388 170, 219 In addition, we compute the f1-scores and the f1-macro for the trivial algorithm. We obtain null value f1-macro : These are our null values against which we will compare our results in the following.
Before we discuss our ML algorithm, we first split our dataset into 80% training data and 20% test data. Since the label of the orbifold geometry is a categorical data (i.e. data without ordering), we use a one-hot encoding for the labels of the 104 orbifold geometries that host SM-like string models. Furthermore, for each SM-like string model, we represent the exotic particle spectrum by a 52-dimensional vector of integers (for the 26 fermionic and 26 bosonic exotics as listed in tables 3 and 4). Hence, our ML algorithm f ML takes a 52-dimensional vector X ∈ N 52 as input (corresponding to the particle spectrum of exotics) and gives a 104dimensional vector as output (corresponding to the one-hot encoded prediction of the orbifold geometry that most likely can produce the given particle spectrum), As ML algorithm, we take a fully connected neural network. The input layer has 52 nodes corresponding to X, and the output layer has 104 nodes (corresponding to the 104 orbifold geometries). We add two hidden layers with s 1 and s 2 nodes, respectively. Then, the number of trainable parameters of the neural network is given by # = 53 s 1 + 105 s 2 + s 1 s 2 + 104 , see figure 1b, and we want to balance between the accuracy of our neural network and the number of trainable parameters. As activation functions we choose "selu" except for the output layer. There, we use the "softmax" activation functions, such that each value in the output layer lies in the range [0, 1] and the sum of output values is normalized to 1. Then, we can interpret the i-th output value as the probability that the i-th orbifold geometry can reproduce the given particle spectrum. In addition, we use a learning rate of 0.001 and the loss is computed using "categorical crossentropy". Using our training set, we scan over network architectures with s 1 ∈ {50, 100, . . . , 400} and s 2 ∈ {50, 100, . . . , 400} , using a 20% validation split and train three times each neural network for 200 training epochs. The averaged maximal accuracies of the validation set are evaluated and plotted in figure 1a.
The best accuracy of the validation set is around 75% for a network architecture with s 1 = 300 and s 2 = 400 (with 178, 004 trainable parameters). However, using s 1 = 100 and s 2 = 350 (with 77, 154 trainable parameters) we already obtain an accuracy of 74% (and the f1-macro of the validation set is 73%). Thus, we choose the smaller but almost equally good network architecture. After ≈ 120 epochs of training the loss of the validation set starts to increase, see figure 2a. Hence, the neural network begins to overfit. Thus, we stop training after 120 training epochs. Then, we construct and train 21 neural networks of this architecture and use a majority vote of the 21 individual predictions to obtain a final prediction. By doing so, the accuracy of the test set (consisting of 20% of all data) increases slightly to 76%. We display the confusion matrix of the test set as a heat map in figure 2b. Now, we can use our trained neural networks to extrapolate to SM-like models that have not been discovered in the string landscape so far. By giving a spectrum of exotics to the trained neural networks, we obtain a prediction for the orbifold geometry that most likely can host this model. For example, we ask the networks what the orbifold geometry is that can most likely reproduce the exact SM spectrum without charged exotics. In detail, we specify a SM spectrum that contains in addition to the Higgs and the three generations of quarks and leptons only SM singlets: a (large) number of right-handed neutrinos (which can be utilized for an extended seesaw mechanism, see ref. [20]) and a (large) number of SM scalar singlets. The results are visualized in figure 3. For certain numbers of singlets the orbifold geometries Z 2 ×Z 2 (12,1), Z 3 ×Z 3 (1,4) or Z 2 ×Z 4 (1,6) are predicted to be able to reproduce these spectra.

Benchmark SM-like models
In this section, we discuss some details of three benchmark SM-like string models. Two of them are characteristic almost SM orbifold models: one without exotic scalars and one without exotic fermions. The third model is a SM-like model including a small number of righthanded neutrinos and SM-singlet scalars. These models arise from different Z N ×Z M orbifold geometries. Hence, they are defined by these geometries and their gauge embedding in terms of the 16D shift vectors V 1 , V 2 and Wilson lines W α , α = 1, . . . , 6.
Our benchmark models are defined as follows: • Model 1. Almost SM model based on the orbifold geometry Z 2 ×Z 4 (2,4) and its gauge embedding given by the shift vectors The resulting 4D gauge group is given by where G hidden = SU(3) × SU(2) 2 is the hidden gauge group and one of the U(1) is anomalous. The SM gauge quantum numbers of the massless spectrum are presented in  The 4D gauge group reads where G hidden = SU(2) 3 is the hidden gauge group and one of the U(1) is anomalous. The SM gauge quantum numbers of the massless spectrum are shown in table 2. As exotics, this model includes nine S 1 leptoquarks and eight additional Higgs doublets. Clearly, including the standard Higgs doublet, these fields build 5-plets of local SU(5) GUTs in higher dimensions. In addition, we observe a large set of right-handed neutrinos and 30 scalar singlets.
The 4D gauge group is given by where G hidden = SU(2) 2 is the hidden gauge group and one of the U(1) is anomalous. The SM quantum numbers of the matter spectrum of this model are displayed in table 2.
We observe that this model yields the smallest number of SM singlets among the spectra of the benchmark models. However, there is a large number of (pairs of) vector-like exotic fermions and S 1 leptoquarks. As in many other SM-like models, there are six Higgs doublets.

Conclusions and outlook
In this work we have performed the most extensive search for SM-like models from orbifold compactification of the non-SUSY heterotic string SO(16) × SO (16). We inspected their massless spectra looking for the SM-like models whose spectra best resemble the one of the SM, and for useful patterns that may guide us to find the SM from string theory.
Using a non-SUSY extension of the orbifolder and considering all 138 orbifolds classified in ref. [30], we find 170,219 SM-like models distributed among 104 orbifold geometries, as presented in table 1. Orbifolds with point groups Z 2 ×Z 4 and Z 8 -II produce the majority of the models with 147,996 out of 169,177 in Z N ×Z M orbifolds and 423 out of 1,042 in Z N orbifolds. These models include the SM gauge group, three generations of SM fermions, including three right-handed neutrinos, at least one Higgs doublet, a number of SM-singlet scalars or fermions, and a few vector-like exotic fermions and exotic scalars. We classify all (52) possible exotic representations (where some of them behave as leptoquarks), and the number  of Higgs doublets that can appear in SM-like string models. Our results, summarized in the tables of appendix A, indicate that only certain types of exotic representations appear in these constructions and they are not arbitrary. In particular, they build generically representations of SU(5) (local) GUTs at the singularities of the orbifold in extra dimensions. Further, most exotic scalars transform either as extra Higgs doublets or as S 1 ,S 1 orR 2 scalar leptoquarks, see ref. [44] for notation and refs. [49,52,57,58,78] for their phenomenology.
We explore the massless spectra of our SM-like models in order to identify special SM-like string models called here almost SM models and exhibiting either i) no exotic fermions or ii) no exotic scalars, except for SM singlets that may play the role of right-handed neutrinos in the fermionic sector and flavons or dark matter candidates in the scalar sector. The details of these models are discussed in section 3.2 and summarized in the tables of our appendix B.
In section 3.3, we apply machine learning techniques to our dataset of 170,219 SM-like string models. Following ref. [16], we train a neural network such that it predicts, based on a requested particle spectrum, the orbifold geometry that most likely can host the corresponding SM-like string model. Our analysis shows that the underlying orbifold geometry leaves a distinct imprint on the matter spectrum of the resulting SM-like string model. We are thus able to predict the phenomenologically most promising orbifold geometries to be Z 2 ×Z 2 (12,1), Z 3 ×Z 3 (1,4), Z 2 ×Z 4 (2,4) and Z 3 ×Z 3 (2,3), see figures 3 and 4. Note that we make the list of all particle spectra available and invite the community to consider them in their studies. This information can be found in our website [98]. Our data includes i) the files that contain shifts and Wilson lines of all 170,219 SM-like string models, ii) files that contain the almost SM models, and iii) the complete list of exotics for all 170,219 SM-like models.
To illustrate the qualities of our models, we present in section 4 three special models, two almost SM models and one SM-like model with a reduced number of SM singlets. They correspond to a sample of the models that arise from the three most promising orbifold geometries identified by using machine learning techniques.
One task beyond this work is the detailed study of the phenomenology of our SM-like string models. With this purpose, one should first construct the interaction terms L int. that give rise to the couplings among the different SM fields and the (scalar and fermionic) exotics. From this, one could obtain constraints on the parameters of the couplings that may lead to rapid proton decay, that could explain the g µ − 2 discrepancy via leptoquarks, or that could provide admissible scenarios of multi-Higgs portals to dark matter, among other scenarios. Some of these phenomenological questions shall be studied elsewhere.
Another interesting endeavor in the non-SUSY heterotic string compactified on orbifolds is the study of the emerging eclectic flavor scheme, which is the natural nontrivial combination of traditional and modular flavor symmetries appearing in string orbifolds [99][100][101]. The eclectic picture has been studied only in the supersymmetric context, i.e. in the case of orbifold compactifications of the E 8 ×E 8 heterotic string. As modular and traditional flavor symmetries originate from the outer automorphisms of the Narain space group, which are independent of the presence of supersymmetry, extending the discussion to the non-SUSY case should be feasible. This would, on the one hand, provide an understanding of modular flavor symmetries without SUSY, and on the other, complete the classification of all possible flavor symmetries emerging from orbifold compactifications [14,102].
The final goal of the construction of non-supersymmetric string constructions is to arrive at a phenomenologically viable model. This requires to address the questions of potential instabilities beyond the perturbatively tachyon-free spectra presented here and the potentially large cosmological constant, as discussed in e.g. refs. [37,38,[103][104][105][106]. We postpone the study of these challenges for future works.  Table 3: Percentages of SM-like models containing the various types of vector-like exotic fermions. We provide in the header the Abelian orbifold point groups where SM-like models were found. The row #SM lists the number of SM-like models arising from all orbifold geometries sharing the same point group. We count here vector-like pairs of exotic left-chiral fermions, such that each row includes a representation and its complex conjugate, e.g. (3, 2)1 /6 stands for (3, 2)1 /6 ⊕ (3, 2)−1 /6 . All fermions beyond the three SM generations are considered as exotics.       Table 5: Average numbers of vector-like exotic fermions for SM-like models. In the first row the orbifold, labeled by its point group, is displayed. The total number of SM-like models including all geometries of a point group is presented in the second row. Hypercharge is normalized such that (3, 2)1 /6 is a left-chiral quark-doublet. We count vector-like pairs, such each row includes a representation and its complex conjugate, e.g. (3, 2)1 /6 stands for (3, 2)1 /6 ⊕ (3, 2)−1 /6 .        Table 7: Percentage of SM-like models with a certain number of Higgs doublets. We label in the first row the orbifold point groups considered, such that the second row provides the total number of SMlike models found with all orbifold geometries associated with the indicated point groups. In the first column we present the possible number of Higgs doublets in our SM-like models. Models with one Higgs doublet were found only in the Z 6 -I orbifold.         [12,[95][96][97], where at certain orbifold singularities complete GUT representations are localized even though the fourdimensional gauge group is just G SM . The correlation between the number of extra Higgs doublets and the number of scalar color triplets S 1 seems to originate from complete scalar 5-plets of local SU(5) GUTs, while the correlations between the number of Higgs doublets and the scalar leptoquarksS 1 and R 2 might result from complete scalar 10-plets of SU (5).