Graph Coloring with Physics-Inspired Graph Neural Networks

We show how graph neural networks can be used to solve the canonical graph coloring problem. We frame graph coloring as a multi-class node classification problem and utilize an unsupervised training strategy based on the statistical physics Potts model. Generalizations to other multi-class problems such as community detection, data clustering, and the minimum clique cover problem are straightforward. We provide numerical benchmark results and illustrate our approach with an end-to-end application for a real-world scheduling use case within a comprehensive encode-process-decode framework. Our optimization approach performs on par or outperforms existing solvers, with the ability to scale to problems with millions of variables.


I. INTRODUCTION
The graph coloring problem (GCP) is arguably one of the most famous problems in the field of graph theory [1,2].Phrased as an optimization problem, the goal is to find an assignment of labels (traditionally referred to as colors) to the vertices (nodes) of a graph such that no two adjacent vertices are of the same color, while using the smallest number of colors possible.The convention of using colors dates back to the historic inception of this problem: trying to color a map of the counties of England with the smallest number of colors sufficient to color the map such that no regions sharing a common border would be assigned the same color [1].
Today graph coloring is still an active field of research, with real-world applications across a strikingly wide range of domains, including (for example) the production of sports schedules, the assignment of taxis to customer requests, the creation of timetables at schools and universities, the allocation of computer programming variables to computer registers, air traffic flow management [3], and the game of Sudoku, among others [1].
With online access to first-generation quantum computers steadily expanding, the GCP has recently attracted considerable interest in the broader quantum computing community.
In the current era of noisy intermediate-scale quantum (NISQ) devices, typical approaches either involve hybrid quantumclassical algorithms such as the Quantum Approximate Optimization Algorithm (QAOA) [4] or quantum annealing [5,6].Given the low-level access to these devices, the GCP typically has to be cast as a quadratic unconstrained binary optimization problem (QUBO) [7] or, equivalently, as an Ising Hamiltonian [8], at the expense of increased resource requirements.Specifically, the QUBO description of the GCP with q > 2 colors for a graph with n nodes requires q × n binary variables, or (logical) qubits in the corresponding quantum-native or quantum-inspired approach.In addition, the constraint that each vertex is assigned exactly one color has to be enforced by hand with additional penalty terms [8].Because of this added overhead due to the binary representation, it would be preferable to tackle the problem in its native mathematical form.In this work we propose the use of graph neural networks to do so, aided by statistical physics concepts.
In the deep learning community, graph neural networks (GNNs) have emerged as a novel class of neural network architectures designed to consume graph structure data [9][10][11][12][13][14][15][16], with the ability to learn effective feature representations of nodes, edges, or even entire graphs.Paradigmatic problems studied with GNNs can be categorized as node classification, link prediction, graph classification, or community detection, among others.
Prime examples include the classification of users in social networks [17,18], the prediction of future interactions in recommender systems [19], and the prediction of certain properties of molecular graphs [20,21].
Leaving the details of specific GNN implementations aside (see Refs. [15,22,23] for 4 l 7 X o t u K r V H y 6 r j b s i j h K c w h l c Q A D X 0 I B 7 a E I L C I z h G V 7 h z R P e i / f u f S x a 1 7 x i 5 g T + w P v 8 A b h u j g o = < / l a t e x i t > ⌫ < l a t e x i t s h a 1 _ b a s e 6 4 = " q V G o j J e J e i V P m B h t d U h 9 D r d M O J 0 = " > A A A B / X i c b V D L S s N A F L 3 x W e s r P n Z u g k V w V Z I i 6 r K g C 5 c 1: Schematic illustration of our approach.Following a recursive neighborhood aggregation scheme, the graph neural network is iteratively trained against a loss function based on the Potts model (enforcing different color assignments to adjacent nodes).At training completion, the final values for the soft node assignments at the final graph neural network layer are projected to hard class (color) assignments σi = 1, . . ., q, as illustrated here for q = 3 colors.This solution is optimal as the sample graph contains maximum cliques of size three.

arXiv:2202.01606v3 [cs.LG] 23 Nov 2022
further details), the underlying theme for GNNs is the implementation of a message passing [24] scheme whereby GNNs iteratively update the node (or edge) embeddings by aggregating information from their local neighbors following the topology of the underlying graph.Because of their inherent scalability and graph-based design, GNNs present a platform that can solve the graph coloring problem at scale.We have previously presented a physics-inspired, GNN-based framework to (approximately) solve quadratic unconstrained binary [7] combinatorial optimization problems with up to millions of variables [25].In this work we natively extend this framework to multi-color decision variables, and show how to solve the graph coloring problem (GCP) without the need for extra penalty terms as needed when using a QUBO-based approach.To this end we frame the GCP as a multi-class node classification problem and use an unsupervised training strategy based on the Potts model [26], a generalization of the Ising model in statistical physics.For illustration purposes, our approach is schematically depicted in Fig. 1.As discussed in more detail below, generalizations of this approach to applications such as data clustering [27] and community detection [28][29][30] are straightforward.
The paper is structured as follows.
In Sec.II we provide some context for our work, discussing the relevant literature at the cross-section between graph coloring and graph neural networks.In Sec.III we describe the basic concepts for our work, with details on the GCP, the physics of the Potts model and its inherent connection to the GCP, and graph neural networks.In Sec.IV we then detail the theoretical framework underlying our approach, providing a comprehensive physics-inspired, GNN-based approach towards solving the GCP and related multi-color optimization problems.Section V outlines an end-to-end application for a real-world scheduling use case, followed by numerical experiments in Sec.VI.Finally, in Sec.VII we draw conclusions and give an outlook on future directions of research.

II. RELATED WORK
In this section we briefly review relevant existing literature, with the goal to provide a detailed context for our work.To keep the scope manageable, we focus on work using GNN-based solution strategies.For an extensive review of the GCP we refer to Ref. [1].
Supervised Learning.In Ref. [31] the authors devise a binary classifier to solve the decision version of the graph coloring problem, i.e., whether or not a given graph is q-colorable.To this end, Lemos et al. propose a model that combines a graph neural network with a multi-layer perceptron.To train this model a standard binary cross entropy loss function is used, comparing the model's final prediction with the known groundtruth for a given GCP instance, as obtained with a complementary CSP solver, albeit for small problem instances only that can be solved exactly.As discussed in Ref. [32], such a supervised approach critically depends on the existence of representative, labelled training data sets with previously optimized hard problem instances, resulting in a somewhat problematic chicken-and-egg scenario.In contrast to our model, the approach outlined by Lemos et al. can underestimate the chromatic number, and-going beyond binary graph classification-requires a heuristic clustering algorithm such as k-means in order to provide a constructive coloring solution.
Unsupervised Learning.Conceptually, our work is most similar to those approaches that aim to train neural networks in an unsupervised, end-to-end fashion, without the need for labelled training sets.Specifically, Li et al. have recently used graph neural networks to solve the GCP following an unsupervised training strategy [33].With a focus on the formal discriminative power of GNNs for the graph coloring problem and motivated by mere intuition the authors utilize a loss function which, as we show in this work, follows straightforwardly from the Potts model, and therefore emerges as part of a larger, unifying, physics-inspired framework.We also find that our solver improves upon the results of Li et al. on several benchmark problems.
Against this background, our work makes a physics-inspired contribution to the emerging crossfertilization between combinatorial optimization and machine learning [34,35].Specifically, we provide a unified framework that pairs the Potts model [26]as extensively studied in the context of statistical physics-with deep learning tools in the form of graph neural networks to model and solve a large class of graph-based, multi-color optimization problems such as graph coloring, community detection or data clustering, all within a completely unsupervised, end-to-end framework.

III. PRELIMINARIES
To set up our notation and terminology, we first provide a formal problem definition for the graph coloring problem (GCP).We then highlight its close connection to the Potts model.Finally, we provide a brief review of graph neural networks.
Graph coloring.We consider an undirected graph G = (V, E) with vertex set V = {1, 2, . . ., n} and edge set E = {(i, j) : i, j ∈ V}.Given such a graph, in the graph coloring problem we seek to assign an integer c(ν) ∈ {1, 2, . . ., q} to every vertex ν ∈ V, such that (i) the assignment is free of color clashes, i.e., c(u) = c(v) ∀(u, v) ∈ E, and (ii) the number of colors q is minimal.We refer to a clash-free coloring using at most q colors as a proper (feasible) q-coloring.If such a q-coloring can be found, the graph is said to be qcolorable.The chromatic number of a graph G, denoted as χ = χ(G), with 1 ≤ χ ≤ n, is the minimum of colors required for a feasible coloring of G. Accordingly, our goal is to find the chromatic number χ, with a coloring where adjacent vertices are assigned to different colors.In general, this problem is computationally hard, with exact algorithms displaying an exponential runtime in the size of the input n.Specifically, it is known to be NP-hard to compute the chromatic number χ, typically leaving heuristics such as greedy coloring or tabu-search based methods as the go-to approximation strategies [1].
The graph coloring problem is also closely related to yet another NP-hard combinatorial optimization problem, the minimum clique cover problem (MCC) [36].
In MCC the goal is to partition the nodes of a graph into cliques, with as few cliques as possible.Conversely, graph coloring provides color classes, i.e., partitions of the vertex set into independent sets (that is subsets with no adjacencies), yielding the following equivalence between clique covers and coloring: Because a subset of vertices is a clique in G if and only if it is an independent set in the complement of G, a partition of the vertices of G is a clique cover of G if and only if it is a coloring of the complement of G.For a given graph G, the smallest number for which a clique cover exists is called the clique cover number.
Potts model.The GCP outlined above is closely related to the standard Potts model, as argued below.In the Potts model every vertex is associated with a spin variable σ i = 1, . . ., q that can take on q different values.The Hamiltonian for the Potts model can be expressed in compact form as where δ(σ i , σ j ) refers to the Kronecker delta, which equals one whenever σ i = σ j and zero otherwise, thus capturing the hard-core spin-spin interactions characteristic for the Potts model [26,37].Accordingly, if two adjacent spins σ i and σ j are in the same state, the energy contribution is −J, while it is zero whenever they are in different states.To enforce a feasible coloring of the underlying graph, we consider anti-ferromagnetic interactions and (if not stated otherwise) set J = −1 in the following.Generalizations to settings with weighted interactions J ij (where some constraints are more important than others) are straightforward.Important applications thereof include data clustering [27] and community detection as captured by the maximization of the modularity parameter [28][29][30], among others.Specifically, we find that the latter can be described by the generalized Potts Hamiltonian with interaction strength where A ij refers to the adjacency matrix of the graph, d i = j A ij is the degree of node i, and m = 1/2 i d i .
Finally, for two colors (q = 2) the Potts model reduces to the MaxCut problem, with Hamiltonian H MaxCut = i<j J ij z i z j with J ij = A ij /2 and binary spin variables z i ∈ {−1, 1} [25], as can be seen by the transformation δ(σ i , σ j ) → (1 + z i z j )/2.
The close connection between the GCP and the Potts model becomes apparent in the (dimensionless) partition function [38] of the Potts model, which allows one to compute most thermodynamic variables of a system through derivatives.For the Potts model it is given by where β = 1/k B T is the inverse temperature (with T > 0) and k B is the Boltzmann constant.Using the relation exp[Kδ(σ i , σ j )] = 1 + [exp(K) − 1]δ(σ i , σ j ), we then find in generality In the zero-temperature limit we obtain [39] Z In the last step we have introduced the chromatic function (polynomial) P G (q), a central quantity in the theory of graph coloring, thus directly relating the Potts model to graph coloring.In the limit T → 0 adjacent spins are forced to occupy different states, and the partition function Z simply reduces to the chromatic function P G (q) which counts the number of possible qcolorings of G as a function of the number of colors q.
The chromatic number χ = min {q ∈ N : P G > 0} is then the smallest positive integer that is not a zero of the chromatic polynomial.Graph Neural Networks.Graph neural networks are an emergent family of neural networks that extend the standard deep learning toolbox to graph data [40].While convolutional neural networks are well-defined only over rigid, grid-structured data (such as images), and recurrent neural networks are built for sequences of data (such as text), the GNN formalism provides a general framework for defining neural networks on graph-structured data [40].With permutation invariance (under the arbitrary labeling of nodes) built in by design, GNNs offer a scheme to generate node representations that incorporate the topology of the graph.The common theme to any type of GNN is that it implements some form of neural message passing, whereby messages (in the form of vectors) are exchanged between the nodes of the graph to iteratively update the internal representations of the graph's nodes [24].More formally, for a given input graph G = (V, E) along with any relevant node features X ∈ R d0×n , a GNN can be used to generate node embeddings p ν , ∀ν ∈ V [40].This is done iteratively as follows: Consider hidden embedding vectors {h k ν } representing each node ν ∈ V.In each iteration k, every embedding vector h k ν is updated based on information inferred from the corresponding local neighborhood, denoted as N ν = {u ∈ V|(u, ν) ∈ E}.At layer (iteration) k = 0, the initial representations h 0 ν ∈ R d0 are usually derived from the node's labels or given input features of dimensionality d 0 [41].This single-layer update can then be formalized as for the GNN layers (iterations) k = 1, . . ., K, with AGGREGATE(•) and UPDATE(•) referring to some (typically parametrized) differentiable functions [40].See Refs.[15,16,22,23] for several popular design choices such as graph convolutional networks (GCNs).In Eq. ( 7), the vector m k ν represents the k-th layer message for node ν = 1, . . ., n as aggregated from the corresponding local graph neighborhood N ν .At each iteration k, every node aggregates information from its local neighborhood, and as these iterations progress each node embedding encapsulates a larger receptive field within the graph.Specifically, after k iterations every node embedding contains information about its k-hop neighborhood, with the final output (after K iterations of message passing) defined as p ν = h K ν .This output can then be used for prediction tasks, such as node classification.To optimize the predictive power of this approach, the (parametrized) final node embeddings p ν = h K ν (θ) are fed into a problem-specific loss function, with some form of stochastic gradient descent optimizing the weight parameters θ of the network.

IV. THEORETICAL FRAMEWORK
In this section we discuss in detail the theoretical framework underlying our work.We show how to solve the GCP using GNNs, with the anti-ferromagnetic Potts model providing a canonical choice for the loss function controlling the unsupervised GNN training process.
We consider an undirected graph G = (V, E) with vertex set V = {1, 2, . . ., n} and edge set E = {(i, j) : i, j ∈ V}.Given such graph, our goal is to assign colors to the nodes of the graph in such a way that adjacent nodes are assigned different colors and the number of colors used is minimal.To this end we associate a discrete variable (spin) σ ν = 1, . . ., q with every vertex ν ∈ V, thereby assigning one of q possible states (colors) to every node in the graph.To enforce a valid coloring we consider the standard Potts spin model [26] with anti-ferromagnetic interactions as given in Eq. (1); this model gives no energy contribution to neighboring spins with different colors, but penalizes color clashes with a positive energy offset.The ground-state energy is then zero if and only if the graph is q-colorable, thus providing a good cost function for encoding the GCP.To make the GCP compatible with our GNN-based approach we first reformulate the Potts model ( 1) in terms of one-hotencoded variables ŷi as Here, the variable ŷi describes the class assignment for node i ∈ V within a q-dimensional unit vector where all components are zero except for one (set to 1) and signals the color assignment as

with ŷ[α]
i denoting the α-th component of ŷi , and by definition Next, generalizing our approach as detailed in Ref. [25] to multi-class node classification problems, we apply a relaxation strategy to the problem Hamiltonian H Potts to generate a differentiable loss function L(θ) with which we perform unsupervised training on the multicolor node representations of the GNN.To this end we replace the (hard) one-hot-encoded decision vectors ŷi with corresponding (soft) normalized assignments p i (θ) ∈ [0, 1] q , letting ŷi −→ p i (θ).In our approach, these soft assignments p i (θ) are generated by our GNN Ansatz as final node embeddings p i = h K i ∈ [0, 1] q at layer K, after the application of a standard softmax activation function, and used as an input for the Pottslike loss function L(θ) given by To arrive at the predicted soft assignments p i for all nodes i = 1, . . ., n, the GNN follows a standard recursive neighborhood aggregation scheme [24,42], where each node ν = 1, 2, . . ., n collects information (encoded as feature vectors) of its neighbors to compute its new feature vector h k ν at layer k = 0, 1, . . ., K. Similar to Ref. [25], the node embeddings h 0 ν are initialized randomly.After k iterations of aggregation, a node is represented by its transformed feature vector h k ν , which captures the structural information within the node's k-hop neighborhood [14].
For the multi-class node classification task at hand we use convolutional aggregation steps, followed by the application of a nonlinear softmax activation function with the dimensionality set by the number of colors q, thereby providing one-hot-encoded q-dimensional soft (probabilistic) node assignments p ν = h K ν ∈ [0, 1] q , with the softmax function automatically ensuring normalization as By virtue of this built-in normalization and in stark contrast to any QUBO-based approach [7,8,43], we do not have to add additional terms to the loss function to enforce a one-hot constraint that drives the solution towards one unique color assignment per node.Conversely, once the unsupervised training process has completed, we apply a simple projection heuristic to map the soft assignments p ν to hard class variables σ ν = 1, . . ., q using, for example, σ ν = argmax(p ν ) to find the class (color) with the largest predicted probability, thus providing unique color assignments for every node.As shown in Fig. 1, the final color assignment {σ ν } can then be visualized as a q-coloring of the graph.For further illustration, an example 4-coloring solution (as implemented with this approach) for a random 3-regular graph with n = 100 vertices is shown in Fig. 2. Far beyond this sample scale, the scalability inherent to GNNs opens up the possibility of studying unprecedented problem sizes with hundreds of millions of nodes when leveraging distributed training in a mini-batch fashion on a cluster of machines as demonstrated recently in Ref. [44].
Our approach features several hyperparameters, including the number of layers K, the dimensionality of the embedding vectors h k i , and the learning rate β, which can be optimized via hyperparameter optimization techniques.In particular, the number of colors q can be seen as a GCP-specific hyperparameter that together with the graph G defines the input pair (G, q) for the decision problem whether or not G allows for a q-coloring.To identify the chromatic number χ, one can perform, for example, a naive search by sequentially checking if G is q-colorable for q = 1, 2, . . ., or use a binary search to cut down the average number of calls required logarithmically.Alternatively, one can try to solve the graph coloring problem (i.e., the search for a feasible coloring) in parallel with the minimization of colors used by adding a corresponding regularization term to the loss function (such as ∼ q 2 ).However, such term should not overpower the regular Potts-like term L Potts (θ), as to not drive the overall solution towards an infeasible coloring.

V. INDUSTRY APPLICATIONS
The graph coloring problem is known to describe many real-world applications, in particular in scheduling and allocation problems [1].Prominent examples include timetabling problems or frequency assignment problems, relevant to the planning of wireless communication services [1,45].To illustrate both our GNN-based approach as well as the real-world applicability of the graph-coloring problem, we now discuss an end-to-end application for a canonical scheduling use case.We do so within a comprehensive three-step encode-processdecode approach in which we (i) first phrase the use case as a graph coloring problem (encoding), (ii) we then solve this problem using our GNN-based approach (processing), and finally (iii) decode the coloring solution to an actual solution for the use case at hand (decoding).For the sake of this illustrative example we consider a small problem instance as illustrated in Fig. 3.More thorough numerical benchmarks are presented in Sec.VI.
We consider a scenario involving the scheduling of tasks with given start and end times, with applications in car-sharing, taxi companies, aircraft assignments, etc.Specifically, we face n resource requests (or bookings) with a start time indicating when the resource will be needed and an end time indicating the resource is available; see Fig. 3(a) for an example problem with n = 6 resource requests.The problem is then to assign resources (e.g., cars) to these requests (e.g., bookings) in the most efficient way involving the smallest number of resources needed.As illustrated in Fig. 3(a), typically some requests will overlap in time leading to request clashes that cannot be satisfied by the same resource.As commonly done in resource allocation problems and scheduling theory, this situation can conveniently be described with the help of an undirected interval graph in which a vertex is introduced for every request, with edges connecting vertices whose requests overlap.Figure 3(b) displays an encoding of the problem with a graph made of six vertices and six edges, including a clique of size three.While inexpensive, special-purpose algorithms FIG.4: Example solutions to the graph coloring problem for the myciel5 (left) and queen7-7 (right) graphs from the COLOR dataset with q = 6 and q = 7 colors, respectively.The solution for myciel5 corresponds to a feasible coloring with normalized error = 0%, whereas the solution for queen7-7 represents an infeasible, but low-energy solution with = 0.84% for which the remaining four color clashes have been highlighted with bold (conflicting) edges.Further details are provided in Tab.I and the main text.
exist for interval graphs [1], we can then solve the graph coloring problem on this interval graph (in the same way as any other GCP) using our general-purpose GNNbased approach.To this end, we run unsupervised multiclass classification directly on the interval graph with n = 6 nodes and final softmax non-linearity of dimension q = 3, as opposed to QUBO-based approaches involving q × n = 18 binary variables [4].For the sample problem illustrated in Fig. 3(c) we obtain a feasible coloring using just three colors (χ = 3).Finally, as shown in Fig. 3(d), we decode this coloring to the corresponding assignment in which three resources are used to satisfy all six requests.

VI. NUMERICAL EXPERIMENTS
We now turn to systematic numerical experiments using standard benchmark problems for graph coloring.In particular, we provide results for the publicly available COLOR dataset [46], as well as well-known citation datasets (Cora [47], Citeseer [48], and Pubmed [49]) often used for graph-based benchmark experiments.The former provide small and medium-sized dense problem instances with relatively large known chromatic numbers (χ ∼ 10), while the latter are large, but sparse real-world graphs (which, for the purpose of graph coloring, we consider as undirected graphs, dismissing any potential node or edge features).Our basic GNN architecture is very similar to the one detailed in Ref. [25], except for the dimension of the final GNN layer set here to the number of colors q.We provide results for two standard types of GNN architectures, that is graph convolutional networks (GCN) [15] as well as GraphSAGE [13].Model configurations (hyperparameters) are detailed in the Supplemental Material.We compare our results (as given graph nodes edges density colors q χgreedy χGNN Tabucol Tabucol [33] GNN [33] [46].For a given number of colors q, we report the cost = HPotts, that is the number of conflicts in the best coloring result, as achieved with our physics-inspired GNN solvers (PI-GCN and PI-SAGE), together with results for the Tabucol algorithm, as partially sourced from Ref. [33].by the cost directly reported by the Potts Hamiltonian) to previously published results sourced from Ref. [33], including results based on the tabu-search based heuristic called Tabucol [50], a local search algorithm which tracks single moves within a tabu list.We complement these with our own benchmark results obtained with Tabucol and a greedy coloring algorithm.The latter parses through the graph's vertices one by one according to some vertex ordering and greedily assigns the first available color.If no available color can provide a feasible coloring, yet another color is expensed, thus (by design) always providing a feasible coloring with a corresponding upper bound on the chromatic number denoted as χgreedy .Here, we have implemented a greedy algorithm with largest-first ordering strategy as further detailed in Refs.[1,51].Our greedy results for χgreedy largely agree with results presented in Ref. [52].
For a given graph and a fixed number of colors q, we report the total number of color clashes as achieved with our physics-inspired GNN solvers (dubbed PI-GCN and PI-SAGE, respectively), and we assess the solution quality with the normalized error = H Potts /|E| quantifying the number of color clashes normalized by the number of edges |E|.Accordingly, the quantity Ξ = 1 − can be regarded as the coloring accuracy achieved (that is the number of edges without coloring conflicts divided by the total number of edges).In addition we report upper bounds on the chromatic number, denoted as χGNN .To this end we have implemented a simple search as well as randomized post-processing heuristic.For a given GNN solution, the latter tries to remove remaining color clashes at the expense of one additional color, by randomly going through existing clashes, and randomly assigning the new color to one of the two nodes at hand.This process is repeated till a feasible coloring (with zero cost) has been found.Our method has been implemented in python, leveraging the opensource libraries Deep Graph Library [53] and PyTorch (for GNN handling), and NetworkX (for graph handling).All reported experiments have been run on p3.2xlarge AWS instances, with 1 GPU, 8 virtual CPUs, 81 GiB memory, 16 GiB GPU memory, with 2.3 GHz (base) and 2.7 GHz (turbo) Intel Xeon E5-2686 v4 processors.
COLOR graphs.We study several benchmark instances from the COLOR data set [46] which can be categorized as follows [46] in the book.This type of graph is publicly available for Tolstoy's Anna Karenina (anna), and Hugo's Les Misérables (jean), among others.
(ii) Myciel graphs: This family of graphs is based on the Mycielski transformation.The Myciel graphs are known to be difficult to solve because they are triangle free (clique number 2) but the coloring number increases in problem size [46].
(iii) Queens graphs: This family of graphs is constructed as follows.Given an n by n chessboard, a queens graph is a graph made of n 2 nodes, each corresponding to a square of the board.Two nodes are then connected by an edge if the corresponding squares are in the same row, column, or diagonal.In other words, two nodes are adjacent if and only if queens placed on these two nodes can attack each other in a single move.In all cases, the maximum clique in the graph is no more than n, and the coloring value is lower-bounded by n.
Our numerical results are summarized in Tab.I. We consistently find sub-one-percent normalized errors (i.e., < 1%) across all COLOR instances, some of which have been deemed as hard [54,55], with the GraphSAGE-based architecture typically outperforming the GCN-based baseline architecture.This observation appears to be in agreement with existing literature [41] showing that GCN architectures tend to be more susceptible to over-squashing (bottleneck) effects than other GNN architectures.In this work, with its inherent neighborhood sampling strategy, GraphSAGE is seen be more robust to potential over-squashing effects as relevant for the larger and dense COLOR instances.This increased performance of PI-SAGE comes at a price of extended training times, with per-epoch training times being ∼ 5-50x longer than for PI-GCN on the same graph.Whereas the PI-GCN model takes anywhere from ∼ 0.167 to 2hr to train, the PI-SAGE model takes anywhere from ∼ 1 to 8hr to train for the COLOR graphs considered.With potentially multiple factors contributing to this disparity, a more detailed analysis of this observation is left for future research.Finally, we find that PI-SAGE performs on par with Tabucol across the COLOR instances.In addition, the estimated chromatic numbers found with PI-SAGE are on par or better than the greedy baseline results.For example, for queen7-7 we find χGNN = 7, while χgreedy = 9.For solutions with non-zero cost we find that a simple postprocessing heuristic can provide a fully purified (feasible) solution at the expense of a small number of colors, thereby providing a simple estimate for the chromatic number.For example, solutions with just one remaining color clash, as is the case for queen8-8 and queen9-9, are trivial to purify at the expense of one color (yielding estimates of χGNN = q + 1).For a larger number of color clashes, as is the case for queen11-11 and queen13-13, several iterations of this simple post-processing routine may be necessary till a feasible coloring is found.While further improvements may be possible through additional GNN runs at colors q + 1, q + 2, . . ., we observe on-par or better performance compared to the greedy baseline already with this simple post-processing only.The core of this post-processing routine is illustrated in Fig. 5.

Citation graphs.
Next we provide results for publicly-available, real-world citation graphs, with up to n ∼ 2 × 10 4 nodes.While Cora and Citeseer refer to networks of computer science publications (with nodes representing publications and edges referring to citations), the Pubmed citation network is a set of articles related to diabetes from the PubMed database [47][48][49].Following Ref. [33], the number of available colors q for the Cora, Citeseer, and Pubmed graphs has been set to 5, 6 and 8, respectively.The results of our analysis are displayed in Tab.II, with the greedy coloring algorithm providing optimal baseline results for these sparse instances [with graph densities in the range ∼ (0.02 − 0.15)%].We find that the basic PI-GCN solver displays consistent, small errors ∼ 10 −4 , close to the global optimum.For the Cora and Citeseer graphs PI-GCN finds solutions with just one single color clash within ∼ 5 × 10 3 edges, while PI-SAGE finds optimal solutions at zero cost.Note that local optimality has been verified for these solutions through a series of simple local spin flips.Similarly, even for the largest instance (Pubmed with n ∼ 2 × 10 4 nodes) we obtain a small error of ∼ 3 × 10 −4 , while Tabucol fails to color the graph within a 24 hours time limit [33].Conversely, we find that PI-GCN converges for the Cora and Citeseer instances in ∼ 5 to 40min, respectively, while Pubmed takes ∼ 6.7hr for training completion.The comparatively long training time for Pubmed is arguably due to separate logic being used to calculate the loss function: Because of memory constraints on the training instances, here we implemented sparse tensor calculations to ensure we would avoid memory overload, at the expense of training time.A more thorough analysis together with the investigation of warm-starting (transfer learning) strategies is left for future research.
Overall, we find that the general-purpose PI-GNN solver shows the potential to provide competitive coloring results compared to established, state-of-the-art heuristics such as the Tabucol algorithm [50] or greedy coloring algorithms; in particular for large graphs (where Tabucol runtimes become extensively long) or dense graphs (where greedy algorithms may show performance drops).However, with the possibility to scale to problems with millions of nodes [25], as well as the ability to solve other multi-class problems such as community detection or data clustering within the very same framework.

VII. CONCLUSION AND OUTLOOK
In summary, we have shown how graph neural networks can be used to solve graph coloring problems using insights from statistical physics as our guiding principle.In our approach we frame graph coloring as a multi-class (multi-color) node classification problem, with the Potts model providing a canonical choice for the loss function with which we train the GNN.Natively extending our framework as presented in Ref. [25], we apply a relaxation strategy to the Potts model by dropping integrality constraints on the decision variables in order to generate a differentiable loss function with which we perform unsupervised training on the node representations of the GNN.The GNN is then trained to generate soft assignments to predict the likelihood of belonging in one of q classes, for each vertex in the graph.Post training we use simple projection heuristics to find a coloring solution consistent with the original problem.
Finally, we highlight possible extensions of research going beyond our present work.
First, using the unifying framework established by the Potts model, it would be interesting to apply our approach to other multi-class problems such as community detection, data clustering, and the minimum clique cover problem.Beyond that, one could apply our approach to other large-spin (non binary) problems such as, for example, variations of the Blume-Capel model which can be seen as a generalization of QUBO to large-spin (i.e., multicolor) variables.Furthermore, one could systematically study the potential existence of sharp coloring thresholds and the onset of hardness with associated critical graph connectivities, for both families of random graphs as done in Ref. [37] but also more structured realworld problems.Finally, there are several ways to potentially boost the performance of our GNN-based optimizer.For example, one could explore alternative GNN implementations, potentially in combination with graph rewiring techniques, as recently proposed and analyzed in Ref. [56], thereby decoupling the training graph from the original problem graph and providing additional GNN design choices.In addition, post training one could replace the simple deterministic (argmax) projection scheme used here with more sophisticated strategies such as local search routines that further refine the mapping from soft (probabilistic) class assignments to hard (integer) variables.

< l a t
e x i t s h a 1 _ b a s e 6 4 = " S U z a G Y 2 q S T 2 B M 8 S r I 8 7 / h c 0 TM N I = " > A A A B 6 3 i c b V D L S g N B E O z 1 G e M r 6 t H L Y B A 8 h d 0 g 6 j G g B 4 8 R z A O S J c x O Z p M h M 7 P L P I S w 5 B e 8 e F D E q z / k z b 9 x N t m D J h Y 0 F F X d d H d F K W f a + P 6 3 t 7 a + s b m 1 X d o p 7 + 7 t H x x W j o 7 b O r G K 0 B Z J e K K 6 E d a U M 0 l b h h l O u 6 m i W E S c d q L Jb e 5 3 n q j S L J G P Z p r S U O

FIG. 2 :
FIG. 2: Example 4-coloring solution to the graph coloring problem for a random 3-regular graph with n = 100 nodes.At training completion the GNN provides color (class) assignments to each vertex.The optimization problem is to assign the colors in a way that adjacent nodes must be assigned different colors, while using the smallest number of colors possible (corresponding to the antiferromagnetic ground-state of the underlying Potts model).

FIG. 3 :
FIG. 3: Example end-to-end application of graph coloring for a task scheduling problem.(a), The problem is specified in terms of a schedule detailing six resource requests (vertical axis) as a function of time, spread out over the course of 24 hours (horizontal axis).(b), Encoding: The problem is encoded in the form of an interval graph where every node represents one request labelled by the corresponding time interval, and edges refer to clashes within the resource requests whenever two requests overlap in time.(c), Processing: We solve the graph coloring problem on this interval graph using a graph neural network with a Potts-type loss function as detailed in the main text.Once the algorithm has converged, we obtain a graph colored with the smallest number of color clashes for the given number of colors.In this example we find a feasible coloring with χ = 3 colors as expected based on the clique of size three.(d), Decoding: Finally the proposed colors are mapped back to the original resource requests.In this example we find that three resources are sufficient in order to satisfy all requests.
Upper bounds on the chromatic number χ as found by a greedy algorithm as well as PI-SAGE are reported as χgreedy and χGNN, respectively.Best results are marked in boldface.The last column gives the normalized error (for the best PI-GNN result) specifying the relative fraction of edges with color clashes.Example solutions are displayed in Fig.4.Further details are provided in the main text.