Finding Optimal Pathways in Chemical Reaction Networks Using Ising Machines

Finding optimal pathways in chemical reaction networks is essential for elucidating and designing chemical processes, with significant applications such as synthesis planning and metabolic pathway analysis. Such a chemical pathway-finding problem can be formulated as a constrained combinatorial optimization problem, aiming to find an optimal combination of chemical reactions connecting starting materials to target materials in a given network. Due to combinatorial explosion, the computation time required to find an optimal pathway increases exponentially with the network size. Ising machines, including quantum and simulated annealing devices, are promising novel computers dedicated to such hard combinatorial optimization. However, to the best of our knowledge, there has yet to be an attempt to apply Ising machines to chemical pathway-finding problems. In this article, we present the first Ising/quantum computing application for chemical pathway-finding problems. The Ising model, translated from a chemical pathway-finding problem, involves several types of penalty terms for violating constraints. It is not obvious how to set appropriate penalty strengths of different types. To address this challenge, we employ Bayesian optimization for parameter tuning. Furthermore, we introduce a novel technique that enhances tuning performance by grouping penalty terms according to the underlying problem structure. The performance evaluation and analysis of the proposed algorithm were conducted using a D-Wave Advantage system and simulated annealing. The benchmark results reveal challenges in finding exact optimal pathways. Concurrently, the results indicate the feasibility of finding approximate optimal pathways, provided that a certain degree of relative error in cost value is acceptable.


I. INTRODUCTION
Since the inception of modern chemistry in the 18th century, scientists have designed and discovered numerous chemical reactions.These discoveries are recorded in chemical databases such as Reaxys [1] and SciFinder n [2].The vast collections of chemical reactions stored in these databases are big data in chemistry and have experienced rapid growth.For instance, Reaxys contains more than 60 million chemical reactions as of 2023.The number of chemical reactions added to Reaxys each year is more than hundreds of thousands, which has more than doubled between 2000 and 2015 [3].Additionally, emerging technologies for automatic reaction exploration, such as chemical synthesis robots [4,5], artificial intelligence and machine learning [6,7], and quantum chemical calculation [8], will further accelerate the growth of chemical databases.
A chemical reaction database can be represented by a gigantic chemical reaction network.A chemical reaction network (CRN) is a graph-theoretical representation of a collection of chemical reactions, modeled as a directed bipartite graph with two types of nodes: chemical reactions and species [9,10].This representation is used in chem- * mizuno@es.hokudai.ac.jp istry and systems biology to visualize, elucidate, and design the mechanisms of complex chemical processes.
Finding optimal pathways in CRNs is a static analysis of CRNs, which has significant applications such as synthesis planning [10][11][12][13] and metabolic pathway analysis [14,15].Such a chemical pathway-finding problem can be formulated as a combinatorial optimization problem to find an optimal combination of chemical reactions connecting starting materials to target materials in a given CRN.In chemical synthesis, chemists aim to find synthesis pathways from readily available materials to target compound(s).Moreover, they seek synthesis pathways with better properties, such as low monetary costs, short execution times, low risks, and eco-friendliness.In systems biology, identifying metabolic pathways is fundamental to characterizing the function and mechanism of a living system.Furthermore, metabolic engineers aim to design artificial metabolic pathways to maximize the target metabolite production of cells.There are several algorithms for these pathway-finding problems, including recursive network search algorithms [11,12], optimization/enumeration algorithms using a mixed integer programming solver [13,15], and a simulated annealing algorithm designed for synthesis planning [11].
However, finding optimal pathways in CRNs is NPhard in general [14]; the computation time to find an optimal solution increases exponentially as the network size increases due to combinatorial explosion.Consequently, the continuous improvement of software and hardware for pathway finding in CRNs is crucial to address the rapidly increasing computational demands associated with growing data size in chemistry.
In recent years, Ising machines have gained considerable attention as next-generation hardware devices dedicated to combinatorial optimization [16].Ising machines are specially designed to find the ground state(s) of an Ising model characterized by the Hamiltonian Here, σ i ∈ {−1, +1}, σ, N σ , J ij , and h i denote, respectively, a spin variable, the spin configuration (σ 1 , σ 2 , . . ., σ Nσ ), the number of spins, the interaction coefficient between two spins σ i and σ j , and the local field interacting with σ i .Ising machines include quantum annealing devices developed by D-Wave systems [17,18] and classical simulated annealing devices [19][20][21][22][23].Many combinatorial optimization problems are efficiently reducible to the ground state finding problems of Ising models [24,25].If chemical pathway-finding problems are also reducible to Ising model problems, these physicsinspired, novel computers may potentially offer a solution to the challenges posed by the accelerating growth in chemical data and the concurrent slowdown in performance growth for conventional von-Neumann computers due to the end of Moore's law [26].However, to the best of our knowledge, there has yet to be an attempt to apply Ising machines to chemical pathway-finding problems.
Pathway-finding problems in CRNs, as potential application targets of Ising machines and quantum annealing, exhibit the following notable features: (1) The available data size is enormous and increasing rapidly.(2) Various problem settings exist, including multi-objective and mixed integer programming formulation of synthesis planning [13] and exhaustive enumeration of possible metabolic pathways [15].(3) The bipartite graph representation of a CRN can be viewed as a Petri net [27], a graphical, mathematical modeling tool for discrete event systems.Hence, chemical pathway-finding problems can be model cases for developing various general-purpose algorithms using Ising machines, which are anticipated to be transferable to other systems.
In this article, we present an Ising computing framework for pathway finding in CRNs.In Sec.II, we formulate a synthesis-planning problem as a typical chemical pathway-finding problem for algorithm development.In Sec.III, we detail our proposed algorithm, which includes a translation procedure of a chemical pathway-finding problem into an Ising model problem and postprocessing methods for enhancing the feasibility and optimality of solutions.Additionally, to address the challenge of tuning penalty strengths in the translated Ising model, we introduce Bayesian optimization and a novel technique to enhance tuning performance by reducing the search complexity in Sec.III C. In Sec.IV, we evaluate the per-formance of the proposed algorithm using a D-Wave Advantage quantum annealing machine [18] and simulated annealing on a classical computer.Finally, we summarize the present study and remark on possible directions for the future development of chemical pathway-finding algorithms using Ising machines in Sec.V.

II. PROBLEM FORMULATION
A. Chemical reaction networks and pathways Figure 1 shows an example of chemical reaction networks (CRNs) in the bipartite graph representation [9,10], also known as the Petri net representation [27].The network illustrates the Solvay process, an industrial chemical production process of soda ash (Na 2 CO 3 ).The overall process is represented as 2NaCl + CaCO 3 → Na 2 CO 3 + CaCl 2 , which consists of five chemical reactions: In the graph, these five chemical reactions involved in the Solvay process are depicted as squares, and 11 chemical species are represented as ellipses.Each directed edge (arrow) connects a reaction with one of its reactants or products.The edge direction is either from a reactant species to a reaction or from a reaction to a product species.The number of directed edges between a reaction and a species indicates the stoichiometric coefficient, that is, the coefficient of the species in the chemical equation of the reaction.
A CRN may include inflow and outflow dummy reactions representing mass exchange between a chemical system and its environment.The network depicted in Fig. 1 includes four dummy reactions represented as dashed squares: the inflow reactions of NaCl and CaCO 3 (the purchase of the starting materials); the outflow reaction of Na 2 CO 3 (the shipping of the chemical product); and the outflow reaction of CaCl 2 (either the shipping or disposal of the byproduct).Dummy reactions are useful for modeling pathway-finding problems, as detailed below.Henceforth, we will use the term"reaction" to refer to both chemical and dummy reactions.
We define a pathway in a CRN as a combination of chemical and inflow/outflow reactions, taking into account reaction multiplicity.Reaction multiplicity, which indicates the number of occurrences of a reaction, is essential for a quantitative description of complex reaction processes [13,15,28].For instance, in Fig. 1, the numbers in black circles indicate the multiplicity of each reaction of the Solvay process.During a single cycle of the overall reaction, chemical reaction 1 occurs twice, while the other chemical reactions 2-5 occur once each.This specific combination of the multiplicities of the five chemical reactions enables the reuse of ammonia and carbon dioxide, resulting in an economical chemical transformation.In addition, the multiplicity of each inflow or outflow reaction corresponds to the consumption of the starting material or production of the (by)product during a single overall process, respectively.It is important to mention, however, that the definition of pathways here does not consider the order of occurrence of chemical reactions, as in [13,15].
Physically-feasible pathways must satisfy the mass balance equations for all chemical species in a network.The mass balance equation for species s is given by r∈R ν sr m r = 0, where R, m r , and ν sr are the set of all reactions in the network, the multiplicity of reaction r, and the signed stoichiometric coefficient of species s in reaction r, respectively.The sign of ν sr is defined as positive if r is a chemical reaction producing s or an inflow reaction supplying s into the system; conversely, it is defined as negative if r is a chemical reaction consuming s or an outflow reaction exporting s from the system; otherwise, it is zero.The mass balance equation states that the sum of the total production and input equals the sum of the total consumption and output.The mass balance constraints ensure that no species are created from nothing nor annihilated to nothing during the chemical process, in accordance with the law of conservation of mass.

B. Synthesis-planning problem
Let us formulate a synthesis-planning problem to find the minimum monetary cost synthesis pathway from commercially available substrates to target compound(s).
Let R be a set of chemical and inflow/outflow reactions potentially used for synthesizing the target species.Let S be the set of all chemical species participating in chemical reactions of R. Here, the candidate reactions in R and species in S can be listed in advance while the multiplicities of reactions of the desired synthesis pathway are unknown.For example, the chemical reactions in R can be either extracted from a chemical reaction database (see Appendix A) or generated by computeraided retrosynthesis [13].Only commercially available species have inflow reactions in R. All target species must have outflow reactions, while others may also have outflow reactions representing disposal.
The decision variables in the synthesis-planning problem are the multiplicities of each reaction in R. Let x r denote the variable representing the multiplicity of r ∈ R. In this article, we assume each variable x r is a non-negative integer variable with lower bound l r (≥ 0) and upper bound u r .Since feasible synthesis pathways must ensure the positive outflow of each target species, the lower bound of the outflow reaction multiplicity for each target species must be positive; in this way, positiveoutflow constraints specify which species are targets of the chemical synthesis.
The monetary costs can be categorized into two types: variable costs and fixed costs.Variable costs vary with changes in the multiplicities of reactions.For example, the costs of substrate purchase and byproduct disposal are proportional to the substrate inflows and the byproduct outflows, respectively.The simplest model of such variable costs can be represented as c unit r x r , where c unit r is the unit cost of reaction r.On the other hand, fixed costs depend only on whether reactions are carried out.For example, the costs of equipment preparation for reactions are overhead costs for carrying out the reactions and are independent of the amounts of the reactions.These fixed costs can be modeled as c fixed r χ + (x r ), where c fixed r is the fixed cost of reaction r and χ + denotes the positivity indicator function defined as χ + (x) = 0 for x = 0 and χ + (x) = 1 for x > 0.
In summary, the synthesis-planning problem is formulated as follows: Here, x denotes the vector representation of the reaction multiplicity variables {x r }.
Note that a gap still exists between the Ising model (Eq. 1) and the pathway-finding problem (Eq.2).In the next section, we will present a connection between them.

III. PROPOSED ALGORITHM
This section presents an algorithm for finding pathways in chemical reaction networks using Ising machines.Figure 2 illustrates the overview of the proposed algorithm.This algorithm first translates a chemical pathway-finding problem into a quadratic unconstrained binary optimization (QUBO) problem, mathematically equivalent to an Ising model problem.Then, the translated problem is solved by an Ising machine.Here, the algorithm embeds the logical QUBO problem into a physical Ising model of the device if the device hardware has limited spin connectivity.Finally, the algorithm decodes binary solutions returned from the Ising machine and applies postprocessing methods to enhance the feasibility and optimality of the solutions.
In the following subsections, we will elaborate on (1) the translating procedure, (2) the postprocessing methods, and (3) the automatic tuning of parameters in the QUBO/Ising form of a pathway-finding problem.In addition, we review Ising machines and the embedding process in Appendix B for readers unfamiliar with Ising computing.

A. Translating into QUBO
We translate our pathway-finding problem (Eq.2) into a quadratic unconstrained binary optimization (QUBO) problem.This QUBO problem is mathematically equivalent to the energy minimization problem of an Ising model.QUBO is defined as where q i ∈ {0, 1}, q, N q , and Q ij denote, respectively, a binary variable, the binary variables vector (q 1 , q 2 , . . ., q Nq ), the number of binary variables, and a constant associated with q i and q j .The binary variables are interconvertible with the Ising spin variables as q i = (1 − σ i )/2.Note that the diagonal term Q ii q 2 i is essentially linear due to the idempotent law q 2 i = q i for q i ∈ {0, 1}.
First, we translate the objective function into a quadratic function.To achieve this, we devised the following mapping from the positivity indicator function into a quadratic expression.The positivity indicator function of non-negative integer variable x can be expressed as follows: Therefore, we can replace χ + (x r ) in the original cost function with [y r +(1−y r )x r ], adding an auxiliary binary variable y r ∈ {0, 1}.Next, instead of imposing the mass balance constraints directly, we add the penalty term s∈S M s r∈R ν sr x r 2 (5) to the QUBO objective function.Here, M s is a positive constant.This term equals zero if solution x satisfies all the mass balance constraints and a positive value otherwise.Thus, this term penalizes solutions violating the mass balance constraints.The penalty strength is determined by M s .In theory, large-enough penalty strength ensures that the optimal solution of this unconstrained problem equals that of the original problem.However, the penalty strength requires careful tuning in practice, as extremely large penalty strength can impair the annealing performance.Sec.III C describes the penalty strength parameter tuning in detail.
Finally, we encode integer variables into binary variables.We employ four types of integer-to-binary encoding methods: unary, order, log, and one-hot [29].Table I summarizes these encoding methods.In the table, an integer variable x ∈ [l, u] is expressed by a linear expression x(q) of n binary variables q ∈ {0, 1} n .The number n of required binary variables is d (:= u − l) for the unary and order encoding methods, ⌊log 2 d⌋ + 1 for the log encoding method, and d + 1 for the one-hot encoding method.In addition, the order and one-hot encoding methods require a penalty term in the QUBO objective function.The additional penalty in the order encoding method corresponds to constraints that q k = 0 ⇒ q k+1 = 0 for k = 1, . . ., d − 1.Note that the order constraints are not necessarily hard constraints; even if the order constraints are not satisfied, the integer variable expression x(q) takes an integer value in [l, u].On the other hand, the additional penalty in the one-hot encoding method corresponds to a constraint that only one binary variable takes the value one and all the others zero.In contrast to the order encoding method, if the one-hot constraint is violated, x(q) may take an infeasible value of x; thus, the one-hot constraint is a hard constraint.We will compare the performance of these four encoding methods in Sec.IV B.
In summary, the synthesis-planning problem (Eq.2) can be represented in QUBO form as follows: (6) Here, each x r is expressed by binary variables q r by one of the four encoding methods listed in Table I.P xr (q r ) and L r (> 0) are the additional penalty term and its strength parameter for the encoding of x r , respectively.Each y r (∈ {0, 1}) is an auxiliary binary variable for translating the positivity indicator function χ + (x r ).Finally, the symbol q collectively denotes all binary variables {q r | r ∈ R} ∪ {y r | r ∈ R}, and N q is the total number of the binary variables.

B. Postprocessing
Ising machines often return non-optimal or infeasible solutions.To improve the quality of solutions, we employ two postprocessing methods: steepest descent and inflow/outflow adjustment.
Steepest descent is a standard postprocessing method in Ising computing.This method iteratively descends the energy landscape of an Ising model by single-spin flips.At each step, it performs the spin flip that reduces the energy the most among all possible single-spin flips.This postprocessing method is implemented in the D-Wave Ocean Software [30].We employ it to enhance the optimality and feasibility of Ising/QUBO solutions returned by an Ising machine.
Inflow/outflow adjustment, which we developed, is aimed at adjusting the inflow and outflow of each species so that the mass balance equals zero.Specifically, this method (1) sets the inflow and outflow of each species to their minimum values and (2) increases either the inflow or outflow as much as necessary to satisfy the mass balance constraint.The pseudo-code is given in Fig. 3.We apply this postprocessing method at the end of the workflow to enhance the feasibility of solutions.

C. Penalty strength tuning
The QUBO form of the pathway-finding problem (Eq.6) has penalty strength parameters.Moreover, when the QUBO problem is embedded into a physical Ising machine with limited connectivity of spins, the physical Ising model has chain strength parameters (see Appendix B 3).In the following discussion, we consider parameter tuning to improve algorithm performance.
Parameter tuning aims to find parameter values that maximize the quality of solutions returned by the Ising computing algorithm.The parameter tuning problem is formulated as where λ is the parameter vector to be tuned (in this study, penalty and chain strength parameters), D is the domain of λ, and f : D → R is a function measuring the quality of solutions (smaller is better).Let us define the parameter search space D. In the present case, we can estimate an upper bound of penalty strength parameters, which is large enough to ensure the equivalence between the original and QUBO problems.A simple upper bound is the maximum cost among all feasible and infeasible pathways: If all penalty strength parameters (i.e., {M s } and {L r }) equal C, the QUBO objective function value for any feasible solution is less than C1 ; in contrast, that for any infeasible solution is greater than or equals to C. Therefore, The number n of required binary variables is d (:= u − l) for the unary and order encoding methods, K + 1 (K := ⌊log 2 d⌋) for the log encoding method, and d + 1 for the one-hot encoding method.Type Integer variable expression x(q) Additional penalty Px(q) the penalty strength C ensures the equivalence between the original and QUBO problems.This upper bound estimation is also valid for chain strength parameters; if the chain strength equals C, a spin configuration with chain breaks has an energy greater than any feasible solution without chain breaks.Therefore, we define the search space as where N params is the total number of penalty and chain strength parameters to be tuned.Next, we define the evaluation function f .Multiple runs of the algorithm produce a set of synthesis pathways.This set contains feasible and infeasible pathways with various synthesis costs.Our goal here is (1) to get many feasible solutions and (2) to get low costs solutions.Therefore, we use the following function to evaluate a single solution: Here, the first term evaluates the total costs of synthesis, and the second is the penalty for infeasible solutions.Since the penalty strength equals C in the formula, all infeasible solutions have a higher value of E than any feasible solution.Using this measure of single solution quality, we define f (λ) as the expected value of E(x), where p(x|λ) is the probability that the algorithm returns solution x when using parameters λ.In practice, ⟨E⟩ λ is evaluated by a sample mean of E(x) from a finite set of samples of x.We adopt Bayesian optimization (BO) to minimize ⟨E⟩ λ over parameters λ.Specifically, we employ a BO algorithm implemented in the Optuna library [31,32].BO that case, a value greater than C can be employed as the upper bound of the parameter values.However, in every benchmark problem used in the present article, the solution xr = ur is confirmed to be infeasible.is a sequential black-box optimization technique that selects points to be evaluated in an iteration step based on a surrogate model of f constructed with previous evaluation results.The Optuna's algorithm uses the treestructured Parzen estimator (TPE) [33] as a surrogate model.BO-TPE is known to be effective in a wide range of hyperparameter tuning in machine learning and has less computational time complexity than standard BO algorithms using the Gaussian process [34].
However, the high dimensionality of the search space D may impair the efficiency of BO.Although BO is applicable to parameter tuning in a complex and highdimensional search space, its efficiency decreases rapidly as the dimension of the search space increases [34].The dimension of D is N params , which is on the same order of magnitude as the total numbers of species and reactions.Therefore, reducing the dimensionality of D is crucial for efficient parameter tuning.
To address the issue of high dimensionality in D, we devised parameter grouping methods.In these grouping methods, penalty and chain strength parameters are grouped according to the associated constraint type, and all parameters in each group are constrained to take the same value.We first classify parameters into three major categories: (1) penalty strength parameters associated with mass balance constraints, (2) penalty strength parameters related to integer-to-binary encoding, and (3) chain strength parameters for embedding.In this article, we further consider four types of subdivisions of the mass balance constraints group according to chemical species classifications listed in Table II and detailed below.For simplicity, we leave the other groups related to encoding and embedding intact, though these groups can be further subdivided according to reaction classifications.
The details of the four grouping methods for mass balance constraints are as follows.The first method, 'unified,' consolidates all mass balance constraints into one group without any further subdivision.The second method, 'degree,' classifies mass balance constraints by the associated species' node degree.Here, the node degree is defined as the number of reactions involving the species.The third and fourth methods have been designed based on the specific structure of synthesisplanning problems.The third method, named 'depth,' organizes mass balance constraints by the associated species' depth.The depth of a precursor species is defined as the minimum number of synthesis steps from the TABLE II.Grouping methods for penalty strength parameters associated with mass balance constraints.The right column lists species groups corresponding to mass balance constraint groups.See the text for details on the chemical species classifications.
Type Groups Unified all species Degree degree-1 species, degree-2 species, ... Depth targets, depth-1 precursors, depth-2 precursors, ..., byproducts Category targets, substrates, intermediates, substrates/intermediates, byproducts FIG. 3. Inflow/outflow adjustment algorithm.The input arguments are a set of chemical species S, a set of chemical and inflow/outflow reactions R, a signed stoichiometric coefficient function ν : S × R → Z; (s, r) → νsr, a lower bound function l : R → N0; r → lr, an upper bound function u : R → N0; r → ur, and the multiplicity function representing a pathway m : R → N0; r → mr.This algorithm, first, resets the multiplicity of each inflow/outflow reaction to its lower bound.Then, it increases the outflow of overproduced species (∆ > 0) and the inflow of overconsumed species (∆ < 0).Finally, it returns the multiplicity function of the improved pathway m : R → N0; r → mr.Note that this postprocessing does not ensure that a post-processed pathway satisfies all mass balance constraints.For example, if a species is overconsumed, but the associated inflow is unavailable, this method cannot adjust the inflow/outflow to make the mass balance zero.
species to one of the target species, as detailed in Appendix A. The fourth method, termed 'category,' categorizes mass balance constraints by the associated species' category: 'targets,' 'substrates' (species that are exclusively starting materials and not synthesized from oth-ers), 'intermediates' (species that need to be synthesized from others), 'substrates/intermediates' (species that can be either starting materials or synthesized from others), and 'byproducts.'We will compare the performance of parameter tuning using these grouping methods in Sec.IV B.
We note that the Optuna library has already been used in parameter tuning for Ising computing in [35,36].In [35], Optuna was applied to tuning parameters of the Digital Annealer, not penalty strength parameters.In [36], all penalty strength parameters are consolidated into one group, and the single, unified penalty strength parameter was tuned by Optuna.In contrast to these studies, the present study systematically elucidates the effects of several different parameter grouping methods on tuning performance.In the next section, we will demonstrate that appropriate choices of parameter grouping methods significantly contribute to performance improvement.

IV. PERFORMANCE EVALUATION
In this section, we present a performance evaluation of the proposed algorithm using a D-Wave Advantage quantum annealing machine and a simulated annealing software.The primary objective of these experiments is to assess the feasibility and efficacy of the proposed algorithm.

A. Experimental setting
We generated 100 benchmark problems of synthesis planning based on the USPTO dataset of chemical reactions [37].In our benchmark problems, fixed reaction costs and substrate purchase costs are randomly assigned to all chemical reactions and commercially available species, respectively.The benchmark problem generation process is detailed in Appendix C.These benchmark problems have been designed to assess the feasibility of the proposed algorithm using a D-Wave Advantage quantum annealer with a limited number of qubits.Thus, they are relatively small-scale problems: the number of integer variables N x , equivalent to the number of chemical and inflow/outflow reactions |R|, ranges from 3 to 116; the number of chemical species |S|, ranges from 2 to 80.All of these problems can be embedded into the D-Wave Advantage's Pegasus topology QPU [38], with the number of required physical qubits ranging from 18 to 3752 when utilizing the unary or order encoding method.Refer to Appendix D 1 for the details of the embedding results.
We solved the benchmark problems using our proposed algorithm.The Ising machines employed were (1) the D-Wave Advantage system 4.1 quantum annealer [18] and (2) the SimulatedAnnealingSampler of the D-Wave Ocean Software [30].We used the minorminer.findembedding function of the D-Wave Ocean Software to find embeddings of QUBO problems into the D-Wave Advantage system 4.1.Note that the simulated annealing software does not require the embedding process, as it can solve Ising models with any spin connectivity.We employed the SteepestDescentSolver of the D-Wave Ocean Software for the steepest descent postprocessing.We implemented the other components of the algorithm, that is, the translation process based on SymPy [39] and the inflow/outflow adjustment postprocessing, in Python.We integrated all these components into the workflow illustrated in Fig. 2. We utilized the multivariate version of the TPESampler of the Optuna library [32] to tune penalty and chain strength parameters.Unless otherwise noted, we used default settings for the functions from the D-Wave Ocean Software and the Optuna library in the benchmarking.The simulated annealing, embedding finding, translating, postprocessing, and parameter tuning were executed on a local Linux machine with Intel Xeon Gold 6248R processors (3.0 GHz, 24 cores × 2).Our local machine in Hokkaido, Japan, communicated with the D-Wave Advantage system 4.1 in British Columbia, Canada, via the internet, using the Leap quantum cloud service.
For performance comparison, we solved the benchmark problems also with the Gurobi Optimizer version 9.5.1 [40].The Gurobi Optimizer is a commercial state-of-theart solver for mathematical programming, including integer linear programming, and was employed in previous studies on chemical pathway-finding problems [13,15].To solve the synthesis planning problems with the Gurobi Optimizer, we translated them into integer linear programming by replacing the positivity indicator function in a fixed cost term, χ + (x r ), with an auxiliary binary variable y r ∈ {0, 1} satisfying x r ≤ u r y r .The chemical pathway-finding algorithm with the Gurobi Optimizer was executed on our local machine using up to 32 threads.

Best choice of parameters
We first investigated the best combination of an integer-to-binary encoding method (unary, order, log, or one-hot) and a parameter grouping method (unified, degree, depth, or category).This investigation was conducted on five representative problems selected from the 100 benchmark problems to cover a wide range of problem sizes.The problems selected were (a) the 20th, (b) the 40th, (c) the 60th, (d) the 80th, and (e) the 100th in ascending order of the number of integer variables.For each combination of encoding and grouping methods, we tuned penalty and chain strength parameters for the five representative problems.During the tuning processes, the annealing time for quantum annealing (QA) was fixed to 20 µs, and the number of sweeps (corresponding to the annealing time) for simulated annealing (SA) was fixed to 1000.These are the default values for the D-Wave Advantage system and the SA program we used.We set the number of Bayesian optimization iterations to 300, which we consider sufficient to check for convergence trends.We sampled 200 solutions per tuning iteration to compute the tuning score function, Eq. 11, with balancing statistical uncertainty and computational cost.We measured the performance of each combination of encoding and grouping methods by the best score of the tuning objective function over the 300 tuning trials.The performance comparisons for the D-wave Advantage system and SA are shown in Fig. 4 and Fig. 5, respectively.
Concerning the encoding methods, the unary and order encoding methods demonstrate the highest performance.Subsequently, the log encoding method also yields satisfactory results, whereas the one-hot encoding method exhibits the poorest performance.These results on the performance of the encoding methods are consistent with a previous study [41].
Among the four grouping methods, the depth and category grouping methods outperform the others when employing the unary or order encoding methods for both QA and SA.The performance difference between these two methods and the others is prominent, especially for large problems (d) and (e).Furthermore, the degree grouping method exhibits the least performance in many cases, although the degree grouping method offers more degrees of freedom for parameter tuning due to the subdivision of mass balance constraints than the unified grouping method.These findings suggest that the appropriate design of parameter groups based on the specific structure of problems is crucial for performance improvement.
Next, we examined the optimized penalty and chain strength values for the depth and category grouping methods to clarify the reason for the advantage of these grouping methods as well as the trends in penalty and chain strength parameter values of each group with respect to the problem size.This examination was conducted on all 100 benchmark problems and employed the unary and order encoding methods.The settings for the tuning processes were the same as those described above.
Figure 6 depicts the penalty and chain strength parameter values optimized through Bayesian optimization for the combination of the category grouping and order encoding methods.We found that: (1) the penalty strength for the mass balance constraints of the target species is larger than that of the others and has a strong linear correlation with the number of integer variables, N x ; (2) the chain strength parameter also exhibits a linear correlation with N x ; and (3) the other penalty strength param- than feasible solutions' cost values.The steepest descent postprocessing may not improve this solution as it relies on the same QUBO formulation as the Ising machine.
The inflow/outflow adjustment postprocessing also cannot reform the solution x = 0 because the target species are overconsumed in the pathway, but the inflow reactions of the target species are unavailable by definition.Therefore, the penalty strength for the mass balance constraints of the target species must be large enough so that the Ising machine hardly samples the solution x = 0. Furthermore, as the order of magnitude of feasible solutions' cost values increases almost linearly with respect to N x , the penalty strength for the mass balance constraints of the target species must also increase linearly.In contrast, the violation of the mass balance constraints for non-target species can often be reconciled by adjusting the associated inflow and outflow, although the effectiveness of the postprocessing may depend on the problem structure.Thus, the penalty strength associated with non-target species can be moderate.Keeping penalty strengths moderate may enhance the exploration in the solution space during the annealing process, potentially leading to high performance in finding solutions.For the depth and category grouping methods, the penalty strength for the mass balance constraints of the target species and the others can be tuned separately; this is likely the reason for their advantage.

Computation time
We next examined the computation time scaling of the pathway-finding algorithms using the D-Wave Advantage system and SA.Since the algorithms output approximate solutions in a stochastic manner, the computation time should be evaluated by the time taken to find a solution within a certain cost tolerance ρ with a specified tolerant failure probability ϵ.This computation time metric is known as time-to-solution (TTS) [16,18], defined by Here, τ is the annealing time; τ algo (τ ) is the average execution time of a single run of the algorithm under the annealing time τ ; p s (ρ, τ ) is the success probability for the algorithm under the annealing time τ to find a solution whose cost is at most ρC min , where C min denotes the minimum cost of feasible solutions.
The computation time analysis was conducted on all 100 benchmark problems and employed the order encoding and category grouping methods.We set the cost tolerance ρ to (a) 1, (b) 2, or (c) 3 and fixed the failure tolerance ϵ to 0.01.We measured the TTS for different annealing time τ : 1, 2, 5, 10, 20, 50, 100, 200, and 500 µs for QA; 10, 20, 50, 100, 200, 500, 1000, 2000, 5000, and 10000 sweeps for SA.For each problem and τ , we ran the algorithm 1000 times with tuned penalty and chain strengths as illustrated in Fig. 6.In the measurement, we took into account only the runtime of annealing and postprocessing as τ algo and excluded the execution time of embedding, translating, tuning, and network communication from τ algo .Lastly, we recorded the minimum value of TTS with respect to τ as the computation time metric for each ρ.
We also computed the average runtime of the pathwayfinding algorithm using the Gurobi Optimizer for comparison.For each cost tolerance ρ, the 'MIPGap' parameter of the Gurobi Optimizer was set to 1 − 1/ρ so that the optimizer terminates when it is assured that the cost value of the current best solution is less than or equal to ρC min .We recorded the average runtime of 1000 runs for each problem and ρ.
Figure 7 shows the scaling of computation times for the pathway-finding algorithms using the D-Wave Advantage system, SA, and the Gurobi Optimizer.When ρ = 1, the computation times for the D-Wave Advantage system and SA increase drastically as the number of variables, N x , increases, compared to that for the Gurobi Optimizer.As ρ increases, the rates of increase in the computation times with respect to N x for the D-Wave Advantage system and SA become slower.Consequently, these computation times approach that of the Gurobi Optimizer.In addition, the pathway-finding algorithm using the D-Wave Advantage system succeeded in finding optimal solutions (ρ = 1) for 34 out of the 100 benchmark problems, while the SA-based algorithm found optimal solutions for 43 problems.Approximate solutions with ρ = 2 were found for 96 problems when using the D-Wave Advantage system and all problems when using SA.Both methods found approximate solutions with ρ = 3 for all problems.
In the case of ρ = 1, the fast increasing rate of the TTS of SA with respect to N x is likely attributed to the increase in the maximum penalty strength, M max .The reasoning behind this statement is as follows.As discussed in Appendix B 1, the initial temperature in SA should be proportional to the largest absolute energy difference between any two states capable of direct transition, denoted by ∆.Such a high initial temperature enhances exploration in the solution space.The final temperature, in contrast, should be sufficiently low relative to the energy gap between the optimal and the second optimal solutions, denoted by δ.This ensures that the Boltzmann factor, and consequently the sampling probability, for the optimal solution(s) are sufficiently larger than those for the others.A larger difference between the initial and final temperatures, or equivalently between ∆ and δ, leads to longer annealing time.In fact, according to an estimate for the computational complexity of SA (Eq.B3 in Appendix B 1), the necessary annealing time exponentially increases with respect to ∆/δ.In the present Ising model, ∆ is O(M max ) as the penalty terms dominate energy barrier heights.Thus, the computational complexity is expected to depend on M max /δ.This ratio increases as N x increases, as shown in Fig. 8.This occurs because M max is O(N x ) as already shown in Fig. 6, whereas the cost difference δ may not necessarily depend on the problem size.Therefore, the increase of M max is likely to cause the increase of ∆/δ, leading to the rapid increase in the TTS of SA for ρ = 1.
In the cases of ρ = 2 and 3, the relatively slow increasing rate of the TTS of SA with respect to N x can be explained by a theoretical expectation that the computational complexity of SA depends on (ρ − 1)M max /C min instead of M max /δ, contrasting the case of ρ = 1.This expectation is based on the following arguments.The sampling probability of the optimal solution(s) is desired to be sufficiently higher than those of non-allowable solutions whose costs are greater than ρC min but not necessarily than those of solutions within the cost tolerance range.This insight suggests that the final temperature should be sufficiently low compared to the cost difference between the optimal and the least-cost non-allowable solutions, (ρ − 1)C min , instead of δ in the case of ρ = 1.Therefore, based on a similar discussion presented in Appendix B 1, the necessary annealing time is expected to depend on (ρ − 1)M max /C min .In fact, as demonstrated in Fig. 9, problems with larger M max /C min tend to have longer TTS for ρ = 2. Furthermore, M max /C min does not exhibit a clear increasing trend with respect to N x , as shown in Fig. 8.This observation is explained by the fact that the minimum cost C min also tends to increase almost linearly as the problem size increases.Therefore, the reason for the moderate increase in the TTS of SA for ρ = 2 and 3 is likely that M max /C min does not necessarily increase with respect to N x .
For the case of using the D-Wave Advantage system, the increasing trend of the TTS with respect to N x can be partially explained in terms of M max /δ and M max /C min , in a way similar to the case of SA.According to an estimate for the computational complexity of QA (Eq.B8 in Appendix B 2), the necessary annealing time required to find the optimal solution(s), i.e., for the case of ρ = 1, increases exponentially as log(1/δ ′ ) increases, where δ ′ denotes the energy gap between the ground and the first excited states of the physical Ising model.When using the D-Wave Advantage system, the Ising Hamiltonian must be rescaled by the maximum of |J ij | and |h i | due to their finite ranges.This rescaling factor scales as O(M max ).Therefore, the factor 1/δ ′ in the necessary annealing time is O(M max /δ) for ρ = 1.Similar to the above arguments for SA with ρ > 1, the factor 1/δ ′ can be replaced by O(M max /C min ) for ρ > 1.Thus, M max /δ and M max /C min are likely important factors determining the computational time scaling for QA as well as SA.In fact, Fig. 9 demonstrates that problems with longer TTS tend to have larger M max /C min value, for problems with fewer than 50 variables.
However, despite the theoretical prediction that QA exhibits an advantage over SA in terms of the computational complexity scaling with respect to M max /δ or Computation time scaling for the pathway-finding algorithms using the D-Wave Advantage system, simulated annealing, and the Gurobi Optimizer.The computation times of the algorithms using the D-Wave Advantage system and simulated annealing are the time-to-solution defined by Eq. 12, the time taken to find a solution whose cost is at most ρ×(the minimum cost) with a probability of at least 99%.Here, the cost tolerance ρ is (a) 1, (b) 2, and (c) 3, respectively.For the Gurobi Optimizer, the time-to-solution refers to the runtime it takes the solver to find a solution guaranteed to satisfy the cost tolerance requirement.Refer to the main text for additional details on the computing procedure.

7KHQXPEHURIYDULDEOHV 5DWLR
FIG. 8.The ratio of the maximum penalty strength Mmax relative to (1) the cost difference δ between the optimal and the second optimal solutions and (2) the minimum cost Cmin of feasible solutions.The maximum penalty strength Mmax was computed for the optimized penalty and chain strengths for simulated annealing (SA) shown in Fig. 6.The minimum cost Cmin was computed by the Gurobi Optimizer.The cost difference δ was estimated as follows: first, we gathered non-optimal feasible solutions from the samples used to compute the time-to-solution for SA shown in Fig. 7; then, among these non-optimal feasible solutions, we identified the minimum cost, denoted by C ′ ; finally, we estimated δ as δestimate = C ′ − Cmin.This estimate provides an upper bound of δ as C ′ is always greater than or equal to the true second minimum cost.Note that for some problems (especially with small sizes), all feasible solutions sampled by the SA-based algorithm are the optimal solution, thus Mmax/δestimate is not plotted.M max /C min (see also Appendix B 2), the results presented in Fig. 7 do not demonstrate an apparent quantum advantage.This gap between theory and reality can be due to various factors, including device-specific characteristics of the D-Wave Advantage system, which are further discussed in the following paragraph.Device-specific characteristics, such as the finite setting precision of J ij and h i and embedding overheads, must impact the TTS for the D-Wave Advantage system.
First, the precision in setting J ij and h i is limited.Since the energy gap between the ground and the first excited states of the rescaled Ising Hamiltonian is O(δ/M max ), the energy gap can be smaller than the setting errors of J ij and h i for large-size problems, leading to poor performance of sampling the optimal solution(s).Second, minor embedding necessitates more physical qubits compared with the original logical Ising/QUBO formulation.As presented in Appendix D 1, the necessary number of physical qubits to represent a pathway-finding problem on the device increases almost quadratically as the number of logical binary variables increases.This embedding overhead contributes to the increased computation time of QA.Lastly, other hardware characteristics, such as thermal noises and readout errors, may also affect performance.These limitations specific to QA hardware devices may nullify the theoretical quantum advantage in the computational complexity scaling with respect to M max /δ or M max /C min .

Computed synthesis pathways
Finally, we compare synthesis pathways computed by the D-Wave Advantage system, SA, and the Gurobi Optimizer.Figs. 10 and 11 depict synthesis pathways computed for two representative problems shown in panels (d) and (e) of Figs. 4 and 5, that is, the 80th and 100th problems in ascending order of the number of integer variables, respectively.In this computation, we employed the order encoding method and the penalty and chain strength parameters optimized using the category grouping method.The annealing time for the D-Wave Advantage system and the number of sweeps for SA were set to their default values, i.e., 20 µs and 1000, respectively.We ran the proposed Ising-computing algorithm 1000 times for each problem and each Ising machine.The feasible synthesis pathways at the minimum cost among the 1000 solutions for each problem and each Ising machine are shown in panels (a) and (b) of Figs. 10 and 11.The synthesis pathways computed by the Gurobi Optimizer shown in panel (c) of these figures are the exact optimal synthesis pathways for each problem.
The synthesis pathways computed by the D-Wave Advantage system and SA differ from the optimal synthesis pathways computed by the Gurobi Optimizer in the following two aspects.First, it has been observed that synthesis pathways computed by the D-Wave Advantage system and SA make sub-optimal choices when alternative reactions are available, in certain cases.Pathways in panels (b) and (c) of Fig. 10 exemplify such a situation; these two pathways differ only in the choice of reactions indicated by the green arrows in the upper left area.Second, there are instances where synthesis pathways computed by the D-Wave Advantage system and SA include synthesis sub-pathways which produce precursors not used for synthesizing any target.Panel (a) of Fig. 11 clearly exemplifies such a situation; the sub-pathway en-closed by the green frame in the upper left area is disconnected with the other part and is not included in the optimal pathway; this sub-pathway produces a precursor not used for synthesizing any target.In general, such unnecessary sub-pathways are not always separated from main pathways and often overlap with them.
These differences between the optimal pathways and those computed by the D-Wave Advantage system and SA may be attributed to the large penalty strength issue and device-specific noise and errors mentioned above.When the penalty strength is high, constraint satisfaction is more prioritized than cost minimization, often leading to feasible but sub-optimal solutions.If the multiplicity of an unnecessary reaction in a chemical reaction network becomes positive due to device-specific noise or errors, the postprocessing methods may add additional reactions to make the returned solution feasible, which may form an unnecessary sub-pathway.
It may be possible to address the issue of unnecessary sub-pathways through the following postprocessing steps: (1) detecting disposed precursors; (2) finding sub-pathways that produces the excess precursors at maximum cost in the original pathway, using the proposed pathway-finding algorithm; (3) subtracting the sub-pathways-which are likely unnecessary-from the original pathway.Unnecessary sub-pathways may also be suppressed by imposing additional penalties for precursors disposal in the QUBO formulation.Introducing these treatments, however, necessitates careful consideration as precursor disposal does not necessarily indicate wasteful pathways and economical reactions may produce precursors as byproducts in some cases.Methods to remove or suppress unnecessary sub-pathways remain open for further investigation.

Other examinations
Additionally, we confirmed that the BO-based parameter tuning used in the present study contributes to performance improvement, outperforming random search (see Appendix D 2).An examination into the postprocessing methods showed that the postprocessing methods indeed enhance the algorithm's performance (see Appendix D 3).

V. CONCLUSIONS
We have developed an algorithm for finding optimal pathways in chemical reaction networks using Ising machines.A pathway-finding problem in a chemical reaction network is formulated as a combinatorial optimization problem that involves integer variables representing reaction multiplicity and mass balance constraints corresponding to the law of conservation of mass.The proposed algorithm translates a chemical pathway-finding problem into the ground-state finding problem of an Ising model and solves the translated problem using an Ising machine.In addition, the proposed algorithm applies two postprocessing methods to enhance the feasibility and optimality of solutions returned from the Ising machine.We utilized Bayesian optimization to tune parame-ters determining penalty strengths for constraint violations.To enhance tuning performance, we have devised a dimensionality reduction technique for the parameter search space.In this technique, parameters are grouped together according to the associated constraint type, and all parameters in each group are constrained to take the same value.We systematically elucidated the effects of several different parameter grouping methods on tuning performance.We found that it is crucial to design parameter groups tailored according to the specific structure of problems for performance improvement.Most Ising/QUBO formulations do not consider such elaborate parameter grouping; instead, all penalty strengths of the same type are often consolidated into one group, like our 'unified' method (for instance, see [24] and applications reviewed in [25]).Our findings may inform the future development of automatic parameter tuning methods for complex constrained problems in Ising computing.
Our performance evaluation and analysis of the proposed algorithm using the D-Wave Advantage system and simulated annealing indicate that: (1) the computation time required to find optimal pathways rapidly increases as the problem size increases, likely due to the growth of penalty strengths with respect to the problem size and hardware limitations of quantum annealing devices; (2) the scaling of the computation time can be moderated if a relative error in a cost value is allowable and if the maximum penalty strength to the minimum cost hardly increases with respect to the problem size.These findings provide insights into future directions of the algorithm development for chemical pathway-finding problems.
In the development of chemical pathway-finding algorithms utilizing the penalty-based Ising/QUBO formulation, it might be advisable for the algorithms to aim at finding approximate solutions within a relative cost error tolerance.In addition, appropriate applications may be problems with a moderate ratio of the maximum penalty strength to the minimum cost.For further performance improvement, the advancement of parameter tuning methods is needed.For instance, annealing schedule optimization [42,43] likely reduces the required annealing time, and transfer learning techniques [43] may reduce the overhead runtime for parameter tuning by leveraging tuning results for other problem instances.Hardware improvements, especially reducing embedding overheads, are also necessary.Furthermore, a method presented in [44] may mitigate the challenges in the penalty-based formulation by transforming quadratic penalty terms into linear terms using the Hubbard-Stratonovich transformation.
An alternative approach is to apply penalty-free quantum optimization algorithms, such as constrained quantum annealing [45] and quantum alternating operator ansatz (QAOA) [46], which may help bypass the penaltyrelated issues.However, these algorithms introduce their own challenges, such as the need for a quantum processing unit capable of handling complex qubit interactions like XX and YY interactions and the difficulty of designing an effective driver Hamiltonian [47].Despite these technical difficulties, advancements in algorithms along this direction hold promise for resolving the challenges identified in the present study when solving chemical pathway-finding problems using Ising machines.
Furthermore, extending the scope of the chemical pathway-finding problems targeted by the Ising computing framework is a future challenge.For example, a chemical pathway-finding problem may involve continuous variables representing real-valued reaction multiplicity in moles.Real-valued reaction multiplicity is necessary to be considered in synthesis planning when taking into account reaction yields because the optimal multiplicity may differ from a theoretical integer value based on stoichiometry.In such a case, the problem can be generally formulated as mixed integer programming.For another instance, a chemical pathway-finding problem can be formulated as multi-objective optimization involving multiple objectives such as low monetary costs and low harmfulness [10].Moreover, chemists are sometimes interested in enumerating near-optimal pathways or even all feasible pathways [12,15].
Several Ising/quantum-computing algorithms aim at solving mixed integer programming [48,49], multiobjective optimization [50], and exhaustive enumeration [51,52].These algorithms may unlock further potential in chemical reaction network analysis using Ising machines.Note also that all chemical pathway-finding problems inherently involve mass balance constraints due to the law of conservation of mass; hence, our methods and findings related to penalty strength for mass balance constraints are likely broadly applicable across these problems.
We hope this study will serve as a starting point for future interdisciplinary technological advancements in chemical reaction network analysis using next-generation computers such as Ising machines and quantum computers.12.Chemical reactions and species relevant and irrelevant to the synthesis planning of specified targets.Relevant chemical reactions (solid squares) are candidates for synthesis steps because they yield one of the potential precursors for the targets.On the other hand, irrelevant chemical reactions (dashed square) are never used for synthesizing the targets.Relevant chemical species (solid circles) are defined as those participating in relevant chemical reactions.Note that not all relevant chemical reactions are always used in chemical synthesis.For instance, four chemical reactions, namely 1β, 1γ, 2δ, and 2ζ, constitute a feasible synthesis pathway, indicated in dark black in the figure.Determining the optimal combination of the candidate reactions is central to the synthesis-planning problem.

If a chemical reaction produces any potential pre-
cursor for the target species, that is, a reactant of another relevant reaction, the chemical reaction is relevant to the synthesis planning.
We also define relevant chemical species as chemical species participating in the relevant chemical reactions.Such species can be either targets, precursors, or byproducts.In Fig. 12, species 0a and 0b are targets, species 1a-1e and 2a-2i are precursors, and species Πa is a byproduct.Here, the number in the label of a target or precursor node indicates the depth of the species node.The node depth is defined as the minimum number of synthesis steps from the species node to any target node.The depth of a byproduct node is undefined.
The above recursive definition establishes an algorithm for collecting chemical species and reactions relevant to the synthesis planning (Fig. 13).This algorithm significantly reduces the number of chemical reactions to be considered compared to that of the whole dataset.
In addition to the relevant chemical reactions enumerated by Algorithm 2 (Fig. 13), the inflow and outflow dummy reactions associated with the relevant species are also relevant to the synthesis planning.In synthesis planning, only commercially available species have their inflow reactions.All target species must have outflow reactions; others may also have outflow reactions representing disposal.Note that inflow reactions of purchasable relevant byproducts and outflow reactions of relevant precursors never produced by chemical reactions in the database (e.g., species 1c in Fig. 12) can be omitted because the purchase of byproducts and disposal of purchased substrates are not economical choices.

Appendix B: Ising Machines and Minor Embedding
Ising machines are dedicated computing devices for finding the ground state(s) of Ising models [16].Major algorithms underlying Ising machines are simulated annealing [53] and quantum annealing [54].This appendix surveys the simulated and quantum annealing algorithms.Moreover, we also explain the minor embedding required for using Ising machines with limited spin connectivity.

Simulated annealing
Simulated annealing (SA) is a physics-inspired heuristic algorithm for optimization, proposed by Kirkpatrick et al. in 1983 [53].SA simulates a thermal annealing process in a statistical mechanics system by a Markov chain Monte Carlo method.The system is designed such that its lowest energy state(s) correspond to the optimal solution(s) of an optimization problem to be solved.In the annealing process, the temperature of the system gradually decreases.If the cooling process is slow enough, the system is expected to remain in thermal equilibrium.Since the Gibbs distribution assigns a high probability to the lowest energy state(s) at sufficiently low temperatures, the system is likely to reach the lowest energy FIG. 13.An algorithm for collecting relevant chemical species and reactions.The input arguments are a set of chemical species, S , a set of chemical reactions, R, a set of targets, T , and the maximum search depth, dmax.This algorithm iteratively enumerates relevant chemical species and reactions up to depth dmax, then returns the sets S and R of the collected species and reactions.Note that the maximum recursive depth, dmax, does not exceed the number of all chemical reactions, |R|.For example, in Fig. 12, a square node with a label starting with integer d represents a relevant reaction in R d at line 5 of the code, and a circle node with a label starting with integer d signifies a relevant species in S d at lines 3 and 7.In addition, the byproduct species 'Πa' is added to S at line 10.

state(s) after annealing.
The annealing schedule, which dictates the rate of temperature decrease, determines the success probability of sampling the lowest energy state(s).Geman and Geman [55] proved that the sampling-probability distribution in SA converges to the uniform distribution on the lowest energy state(s) as t → ∞ if the temperature decreases in time as Here, t denotes time (Monte Carlo step), t 0 (> 1) is a constant, T (t) is the temperature at time t, N σ is the number of spin variables, and ∆ is the largest absolute energy difference between any two states such that the system can directly transition between them in a single step of the Markov chain.The computational complexity of SA with the inverselogarithmic annealing schedule (Eq.B1) is estimated as follows.Assume that the probability distribution stays close to the Gibbs distribution during the annealing process.Let E 0 and E 1 denote the lowest and the second lowest energies, respectively.The temperature at final time τ needs to be low enough compared with the energy gap δ (:= E 1 − E 0 ); this is because the lowest energy state(s) should have a sufficiently larger Boltzmann factor than the second lowest energy state(s): This suggests that the necessary annealing time in SA scales as where c is a positive constant of O(N 0 σ ).This computational complexity is consistent with that implied by a theoretical result in [56]: the total-variation distance between the sampling probability distribution in SA with the annealing schedule given by Eq.B1 and the Gibbs distribution at absolute zero temperature is upper bounded by O((t/N σ ) −δ/Nσ∆ ).
Note that most SA algorithms do not employ the inverse-logarithmic schedule due to impractical slowness in real-world applications.Practical implementations of SA utilize other fast cooling schedules, such as the geometric schedule T (t) = T (0)r t T (0 < r T < 1), to find approximate optimal solutions.For instance, the SA algorithm implemented in the D-Wave Ocean Software [30] employs the geometric annealing schedule as default, which was also used in our benchmarking in Sec.IV.Thus, we reference the theoretical computational complexity given by Eq.B3 merely as a coarse measure to help interpret the observed computation time scaling of SA with respect to the problem size in Sec.IV.

Quantum annealing
Quantum annealing (QA) is a heuristic quantum optimization algorithm inspired by SA.QA for Ising models was first proposed by Kadowaki and Nishimori in 1998 [54].The QA algorithm utilizes the time evolution of a quantum system with the time-dependent Hamiltonian B4) from time 0 to τ .Here, A(t) and B(t) are monotonic functions such that A(0) = 1, B(0) = 0 and A(τ ) = 0, B(τ ) = 1, and σx i and σz i are the Pauli x and z operators acting on the i-th qubit, respectively.The system Hamiltonian varies with time from the first term (the transverse field Hamiltonian) to the second term (the quantum Ising Hamiltonian).In the computation, the quantum system is initialized to the ground state of the initial Hamiltonian, |+⟩ ⊗Nσ , where and |↑⟩ and |↓⟩ are the eigenstates of the Pauli z operator with eigenvalues 1 and −1, respectively.According to the quantum adiabatic theorem, if the system Hamiltonian varies slowly enough, the quantum state is expected to stay close to the instantaneous ground state.Therefore, after the adiabatic time evolution of the quantum system, it reaches a quantum state close to the ground state of the Ising Hamiltonian at time τ , allowing us to observe the lowest-energy spin configuration(s) with a high probability.
A convergence condition of QA is known for the following transverse-field Ising model: (B5) This Hamiltonian is related to the Hamiltonian in Eq.B4 as Ĥ′ QA (t) = ĤQA (t)/B(t).The function Γ(t) (= A(t)/B(t)) controls the magnitude of quantum fluctuation induced by the transverse field.Morita and Nishimori [57,58] proved that the excitation probability is bounded by an arbitrarily small constant ϵ at each time if Γ(t) decreases in time as where a and b are constants of O(N 0 σ ) and t 0 is a positive constant.
The computational complexity of QA with the powerlaw annealing schedule (Eq.B6) is estimated as follows.At the final time τ , the perturbative transverse field needs to be small enough compared with the energy gap δ between the ground and the first excited states of the non-perturbed Ising model.This is because the contributions of excited states of the non-perturbed Ising model to the perturbed ground state should be sufficiently small: where |k⟩ denotes the (k + 1)-th lowest energy eigenstate of the non-perturbed Ising model with energy E k .This suggests that the necessary annealing time in QA scales as where c ′ is a positive constant.Therefore, QA exhibits an advantage over SA in terms of the computational complexity scaling with respect to δ.Note that D-Wave quantum annealing devices are based on the Hamiltonian defined by Eq.B4 and do not employ the power-law annealing schedule given by Eq.B6.Thus, the theoretical computational complexity given by Eq.B8 is merely a coarse measure to help interpret the observed computation time scaling of QA with respect to the problem size.In addition, due to minor embedding, the necessary number of qubits, N σ , is often greater than the number of spins of an Ising model that one wants to solve.We explain the minor embedding in the next subsection.The mapping from the logical model to the physical model is called minor embedding [59].In the embedding, a collection of multiple physical spins, termed a chain, represents a single logical spin.To ensure that the physical spins in a chain collectively behave as a single logical variable, a constraint that all spins in the chain take the same value is imposed.This constraint can be physically implemented by strong ferromagnetic interactions (J ij ≫ 1) between adjacent spins in the chain.The strength of the ferromagnetic interactions is called chain strength.In addition, the embedding ensures that if logical spin variables σ and σ ′ interact, at least one physical spin in the chain of σ and one spin in the chain of σ ′ interact.
Finding minor embedding is an NP-hard problem.However, several heuristic algorithms exist for finding embeddings [59,60].The algorithm proposed in [59] is available in the minorminer package of the D-Wave Ocean Software [30].
Physical spins in a chain often take different values.In such a case, postprocessing is performed on a classical computer, assigning the most common physical spin value in the chain to the corresponding logical spin variable.The probability of such chain breaks mainly depends on the chain strength.Therefore, careful tuning of the chain strength is necessary to obtain feasible, lowcost solutions.Sec.III C describes chain strength tuning in detail.ical structures and chemical reactions with no reactants/products; we removed these invalid reactions from the network.In addition, we ignored catalysis information.The numbers of chemical reactions and species in the whole network of the USPTO dataset are around 1.1 and 1.5 million, respectively.
Second, we defined a set of commercially available substrates and a set of target candidates.Instead of referring to actual chemical makers' catalogs, we generated a set of commercially available substrates and a set of target candidates based solely on the USPTO network structure as follows: (1) We identified species with an in-degree or out-degree of 20 or more as hub.We assumed that a hub species has an established standard way of purchasing or synthesizing it because it can be regarded as popular in chemistry.In other words, we assumed that the hub species are virtually commercially available and that finding synthesis pathways to them is unnecessary.Thus, we omitted deeper searches from hub species in Algorithm 2 (Fig. 13).This treatment prevents a synthesis planning problem from becoming too large and complex to solve by the D-Wave Advantage machine.(2) We classified nonhub species into source (with zero in-degree), sink (with zero out-degree), and non-terminal (others).( 3) We assumed that all source species are commercially available because source species are always used as starting materials in the dataset.( 4) Non-terminal species may be either commercially available or not; we randomly assigned 25% of non-terminal species as commercially available substrates.(5) We designated all commercially unavailable species to target candidates.These target candidates comprise the sink species and the commercially unavailable non-terminal species.
Third, we randomly determined a unit purchase price of each commercially available species and a fixed execution cost for each chemical reaction.In our experiment, we used uniform distribution ranging from 1 to 10 for the random cost assignment.
Fourth, we randomly selected multiple targets for each synthesis planning: The number of targets was determined at uniformly random between 1 to 10; we then randomly selected a set of multiple species with high Tanimoto similarity [61] based on Morgan fingerprints [62,63] from the set of target candidates.Synthesis plans for multiple similar species can save monetary costs by sharing precursors and reactions [11].
Fifth, we extracted a subnetwork underlying each synthesis planning from the USPTO chemical reaction network.Algorithm 2 (Fig. 13) identified the relevant chemical species and reactions for each synthesis.Here, the maximum search depth parameter, d max , was set to |R|, the upper bound of synthesis steps in a chemical reaction network with |R| reactions.Then, inflow dummy reactions of commercially available precursors were added to represent purchasing starting materials; outflow dummy reactions of species other than the source species (with zero in-degree) were added to represent shipments of target chemicals or disposals of synthesized byproducts.
Finally, we created optimization problems of synthesis planning based on the random costs and the underlying chemical reaction networks.In all problems, the multiplicity of the outflow reaction of each target was fixed to 1 by setting the lower and upper bounds to 1, and the lower and upper bounds of the multiplicity of other reactions were set to 0 and 5, respectively.
Note that problems with disconnected relevant chemical reaction networks were not adopted as the benchmark problems because such problems can easily be decomposed into smaller problems.In addition, problems that are infeasible or unable to be embedded into the D-Wave Advantage machine were also not adopted as the benchmark problems.
Appendix D: Supplemental Performance Evaluation

Embedding performance
We calculated embeddings of the benchmark QUBO problems into the D-Wave Advantage Pegasus topology QPU using the minorminer.findembedding function from the D-Wave Ocean Software [30].For each problem, we performed the calculation ten times and selected the embedding with the shortest maximum and mean chain length from those obtained.
Figure 14 illustrates the scaling of (a) the computation time required to find an embedding, (b) the mean chain length, and (c) the maximum chain length, with respect to the number of integer variables.The computation time of the minorminer.findembedding function tends to increase nearly quadratically as the problem size increases.The mean and maximum chain lengths exhibit roughly linear scaling, leading to quadratic scaling of the number of physical qubits with respect to the number of logical qubits.Since the log encoding method requires fewer binary variables than the other encoding methods, both the computation time and chain length for the log encoding method are less than those for the others.

Tuning performance
To evaluate the effectiveness of Bayesian optimization in penalty strength tuning, we compared its results with those of random search.The results for the D-Wave Advantage system and simulated annealing are depicted in Figs. 15 and 16, respectively.For the smallest size problem, both Bayesian optimization and random search converged rapidly to the same value.However, for larger problems, Bayesian optimization converged to a better value than random search when using the same encoding and grouping methods.The solid lines illustrate the convergence trends for Bayesian optimization with the multivariate tree-structured Parzen estimator (TPE), and the dotted lines illustrate those for random search.We performed parameter tuning three times for each problem, grouping, and encoding.The thick, dark lines represent the mean of the three tuning processes, while the thin, light lines depict each individual process to highlight variations in the tuning processes.The problems selected are the same as those in Fig. 4.

Postprocessing performance
To evaluate the effectiveness of the two postprocessing methods, we analyzed changes in the tuning score function ⟨E⟩, defined by Eq. 11, during postprocessing.The results are shown in Fig. 17 annealing case, SD does not contribute to score improvement, whereas IOA does enhance solution quality for some problems.18-20 in this appendix demonstrate that the optimized penalty strength for the mass balance constraints of the target species tends to be larger than that of the others and exhibits a strong linear correlation with the number of integer variables when using the unary/order encoding and depth/category grouping methods.FIG.19.Scaling of the tuned penalty and chain strength parameter values with respect to the number of integer variables in the case of using the order encoding and depth grouping methods.Panels (a)-(j) depict the penalty strength parameters for mass balance constraints of species in each group, panel (k) shows the penalty strength parameter for order encoding, and panel (l) shows the chain strength for embedding.Linear fits are plotted for datasets with a coefficient of determination (R 2 ) greater than 0.5, excluding those with fewer than ten data points.The R 2 values are 0.91 (panel (a) for D-Wave Advantage), 0.86 (panel (a) for simulated annealing), 0.52 (panel (b) for D-Wave Advantage), and 0.54 (panel (l) for D-Wave Advantage), respectively.

FIG. 1 .
FIG.1.An example of a chemical reaction network and a pathway: the Solvay process.

FIG. 4 .FIG. 5 .FIG. 6 .
FIG.4.Performance comparison of different combinations of an integer-to-binary encoding method (unary, order, log, or one-hot) and a parameter grouping method (unified, degree, depth, or category) in the proposed algorithm using the D-Wave Advantage system 4.1.Each panel displays the results for (a) the 20th, (b) the 40th, (c) the 60th, (d) the 80th, and (e) the 100th from the 100 benchmark problems sorted in ascending order by the number of integer variables, respectively.The performance measure ⟨E⟩ best is the best score of the tuning objective function, Eq. 11, during 300 iterations of Bayesian optimization (smaller is better).The insets in panels (d) and (e) provide magnified views.
FIG.7.Computation time scaling for the pathway-finding algorithms using the D-Wave Advantage system, simulated annealing, and the Gurobi Optimizer.The computation times of the algorithms using the D-Wave Advantage system and simulated annealing are the time-to-solution defined by Eq. 12, the time taken to find a solution whose cost is at most ρ×(the minimum cost) with a probability of at least 99%.Here, the cost tolerance ρ is (a) 1, (b) 2, and (c) 3, respectively.For the Gurobi Optimizer, the time-to-solution refers to the runtime it takes the solver to find a solution guaranteed to satisfy the cost tolerance requirement.Refer to the main text for additional details on the computing procedure.

FIG. 9 .
FIG. 9.The relationship between the time-to-solution and the ratio of the maximum penalty strength to the minimum cost.Panels depict the time-to-solution with cost tolerance ρ = 2 for the algorithms using (a) the D-Wave Advantage system and (b) simulated annealing, respectively.The time-to-solution data correspond to that in panel (b) in Fig.7.Each data point is colored according to the value of the ratio of the maximum penalty strength to the minimum cost.For clarity, outlier values (colors) of the penaltycost ratio were treated using the following standard method based on the inter-quartile range: values outside the interval [Q1 − 1.5IQR, Q3 + 1.5IQR] were truncated, where Q1 and Q3 represent the first and third quartiles, respectively, and IQR(:= Q3 − Q1) is the inter-quartile range; smaller outliers were set to the lower bound of the interval, while larger outliers were set to the upper bound.

FIG. 10 .
FIG. 10.Synthesis pathways computed by (a) the D-Wave Advantage system, (b) simulated annealing, and (c) the Gurobi Optimizer.The problem solved is the 80th problem in ascending order of the number of integer variables (the same problem shown in panel (d) of Figs. 4 and 5).Circles and squares represent chemical species and reactions, respectively.Each reaction's multiplicity is indicated inside the reaction node.Colored parts correspond to computed synthesis pathways, whereas light gray parts are not included in these pathways; in other words, the multiplicity of each light gray reaction is zero.The green arrows in the upper left area of panels (b) and (c) indicate different choices of alternative reactions in these two pathways.The costs of the three synthesis pathways are (a) 372.7,(b) 251.4,and (c) 250.0, respectively.

Commercially
FIG.12.Chemical reactions and species relevant and irrelevant to the synthesis planning of specified targets.Relevant chemical reactions (solid squares) are candidates for synthesis steps because they yield one of the potential precursors for the targets.On the other hand, irrelevant chemical reactions (dashed square) are never used for synthesizing the targets.Relevant chemical species (solid circles) are defined as those participating in relevant chemical reactions.Note that not all relevant chemical reactions are always used in chemical synthesis.For instance, four chemical reactions, namely 1β, 1γ, 2δ, and 2ζ, constitute a feasible synthesis pathway, indicated in dark black in the figure.Determining the optimal combination of the candidate reactions is central to the synthesis-planning problem.

3 .
Minor embeddingDue to physical hardware topology, some Ising machines, including D-Wave quantum annealing devices, can only solve Ising models with limited spin interactions.Therefore, one needs preprocessing to represent the logical Ising model one wants to solve by the physical Ising model implemented in hardware.

FIG. 14 .FIG. 15 .
FIG.14.Scaling of (a) the computation time required to find an embedding using the minorminer.findembedding function, (b) the mean chain length, and (c) the maximum chain length, with respect to the number of integer variables.

FIG. 16 .
FIG.16.Convergence plots of parameter tuning for the simulated annealing case.The notation is the same as that in Fig.15.

4 .Figure 6
Figure 6  in Sec.IV B and Figs.18-20 in  this appendix demonstrate that the optimized penalty strength for the mass balance constraints of the target species tends to be larger than that of the others and exhibits a strong linear correlation with the number of integer variables when using the unary/order encoding and depth/category grouping methods.

FIG. 17 .FIG. 18 .
FIG. 17. Effectiveness of the postprocessing methods.Panel (a) and (b) depict the tuning score ⟨E⟩, defined by Eq. 11, for solutions at each step illustrated in Fig. 2, i.e., QUBO solutions, QUBO solutions (improved), and pathway solutions (improved).Red triangles represent the score of QUBO solutions sampled by an Ising machine, ⟨E⟩A.Green squares denote the score of improved QUBO solutions processed by steepest descent, ⟨E⟩+SD.Blue circles indicate the score of improved pathway solutions processed by both steepest descent and inflow/outflow adjustment, ⟨E⟩+IOA.Panel (c) and (d) illustrate changes in the score relative to the score of raw solutions, ⟨E⟩A.Points at position X (X = A, +SD, +IOA) represent the relative score ⟨E⟩X /⟨E⟩A.Each line corresponds to one problem and is colored according to the criteria shown in the lower legend to highlight the type of improvement process.

TABLE I .
Integer-to-binary encoding methods.Here, an integer variable x ∈ [l, u] is expressed by n binary variables q ∈ {0, 1} n .