Spatial Applications of Topological Data Analysis: Cities, Snowflakes, Random Structures, and Spiders Spinning Under the Influence

Spatial networks are ubiquitous in social, geographic, physical, and biological applications. To understand their large-scale structure, it is important to develop methods that allow one to directly probe the effects of space on structure and dynamics. Historically, algebraic topology has provided one framework for rigorously and quantitatively describing the global structure of a space, and recent advances in topological data analysis (TDA) have given scholars a new lens for analyzing network data. In this paper, we study a variety of spatial networks --- including both synthetic and natural ones --- using novel topological methods that we recently developed specifically for analyzing spatial networks. We demonstrate that our methods are able to capture meaningful quantities, with specifics that depend on context, in spatial networks and thereby provide useful insights into the structure of those networks, including a novel approach for characterizing them based on their topological structures. We illustrate these ideas with examples of synthetic networks and dynamics on them, street networks in cities, snowflakes, and webs spun by spiders under the influence of various psychotropic substances.


I. INTRODUCTION
Many complex systems have a natural embedding in a low-dimensional space or are otherwise influenced by space, and it is often insightful to study such spatial complex systems using the formalism of networks [1,2]. In spatial networks, the location of nodes and edges in space can heavily inform both the structure of the network and the behavior of dynamical processes on it. Obtaining a meaningful understanding of power grids [3][4][5], granular systems [6], rabbit warrens [7], and many other systems is impossible without considering the physical relationships between nodes in a network. For example, to examine traffic patterns on a transportation network in a meaningful way, it is important to include information about the physical distances between points and about the locations and directions of paths between heavily trafficked areas [8].
There are a variety of existing perspectives for studying spatial networks [1,9]. Many of them hail from quantitative geography [10,11]. In the 1970s, geographers were already studying the role of space in the formation of networks and in the activities of individuals and goods over geographic networks. As data have become richer and more readily available, it has become possible to take increasingly intricate computational approaches to the study of spatial networks, and a variety of complexsystems approaches have contributed greatly to the literature on spatial networks [1]. Researchers have also proposed various random models for spatial networks, and studying them yields baseline examples to compare to empirical networks [12][13][14][15]. There have also been investigations of the effects of certain spatial network properties on the behaviors of several well-known dynamical processes, including the Ising model [16], coupled oscillators [17], and random walks [18].
Although there is much existing work on the properties of spatial networks (e.g., degree distributions, short-est paths, and so on), there are relatively few network tools that leverage "global" structure in the traditional topological sense of the word. Current tools for studying global network structure tend to rely on aggregating local information in some way to paint a global picture of a network. By contrast, methods for understanding the global structure of a topological space rely intrinsically on information about the entire space. To illustrate the difference, consider a sphere. If we sample a neighborhood of any point on a sphere, we obtain a surface with the same properties as a plane. If we take a collection of a sphere's neighborhoods (which each resemble a plane) and stitch them together, we are able to obtain a lot of information about the sphere, but we are unable to describe the void in the center of the sphere. (For example, a stereographic projection of a sphere covers the sphere's entire surface, but it fails to capture the void.) To fully understand the structure of a sphere, we must consider the entire sphere at once. Over the last few decades, algebraic topology has been very useful for characterizing the global structure of mathematical spaces [19,20] through its use of mathematical tools that consider spaces as global objects. By reframing spatial networks into the language of topological spaces, we can leverage existing topological tools to better understand their structures. For a case study with voting data, see our recent paper [21].
In particular, homology groups, which were defined originally in algebraic topology and have been applied insightfully to a broad range of mathematical topics, provide one way of distinguishing between mathematical spaces based on their "holes" [19]. Moreover, the extension of homology to so-called "persistent homology" (PH) allows one to quantify holes in data in a meaningful way and has made it possible to apply homological ideas to a wide variety of empirical data sets [22,23]. PH helps for characterizing the "shape" of data, and the myriad applications of it include protein structure [24][25][26][27], DNA structure [28], computer vision [29], diurnal cycles in hurricanes [30], inferring symbolic dynamics in chaotic systems [31], spatial percolation problems [32], and many others.
Because it is so natural to apply PH to the study of the shape of data, many successful applications of it have been to spatial networks. One particular area of interest has been the study of granular materials, because PH is able to effectively capture geometric properties of granular substances [6,33,34]. Many biological applications to proteins and DNA also rely on the ability of PH to describe multiscale spatial relationships, as the structures of these molecules are extremely important to their functions. PH has also been applied to larger-scale biological systems, including leaf-venation patterns [35], aggregation models [36], transportation [37], human migration [38], and the effects of psychoactive substances on brain activity [39].
One confounding factor in the use of PH to study spatial networks is that although PH is able to capture information at multiple scales, traditional distance-based PH constructions can have difficulty with applications in which differences in scale may not be meaningful. For example, in most applications to human geographic data, the difference in population density between urban and rural areas can dominate analyses that employ traditional PH constructions, and they thereby miss signals that do not rely on this variation in density. (For approaches other than PH for analyzing maps while accounting for density variation, see [40,41].) In particular, in a recent paper [21], we examined the shape of voting patterns in the state of California and found that traditional methods for computing PH are more likely to capture disparities in population than to detect the presence of interesting voting patterns. To address this issue, we developed two novel PH constructions -one based on network adjacency and one based on the physical geometry of a map -that were more successful at capturing these voting patterns.
In the present paper, we apply our new PH constructions to a variety of spatial complex systems to demonstrate their usefulness across many domains. We show that these methods are well-adapted to capturing interesting structural properties of spatial networks and can thereby yield new insights into such networks, especially with respect to their global structure. Our examples include several synthetic graph models and dynamics on them, road structures in cities (which we compare both within a city and across different cities), snowflakes, and webs spun by spiders under the influence of various psychotropic substances.
Our paper proceeds as follows. In Section II, we give technical background on PH and on our particular constructions. In Section III, we discuss computational results from computing the PH of (1) several well-known models of synthetic networks and (2) a variety of empirical data sets from diverse applications. We conclude in Section IV.

A. Computing Persistent Homology
We now give a brief introduction to persistent homology and tools for computing it. See [22,42,43] for more details. We begin by defining k-simplices and simplicial complexes. A k-simplex is a convex hull of k + 1 nodes. A face of a k-simplex is any subset of the simplex that is itself a simplex. A simplicial complex K is a set of simplices that satisfy the following requirements: (1) if σ ∈ K is a k-simplex, then every face of σ is in K; and (2) if σ and τ are simplices in K, then σ ∩ τ is a face of both σ and τ .
Given a data set X, we construct a sequence X 1 ⊂ X 2 ⊂ · · · ⊂ X l of simplicial complexes of dimension d. We call the sequence {X i } a "filtered simplicial complex", and we call each X i a "subcomplex" of the filtered simplicial complex. We equip each relation X i ⊂ X i+1 with an inclusion map. We call the filtered simplicial complex, along with its inclusion maps and the chain and boundary maps of each subcomplex, a "persistence complex". The inclusion maps X i → X j induce a map f i,j : H m (X i ) → H m (X j ) between homology groups. The map f i,j allows us to track an element of H m (X i ) (the mth homology group of the subcomplex X i ) to an element of H m (X j ). The mth homologies of the entire persistence complex are given by the pair and we call them the "mth persistent homology" of X.
We refer to the collection of all mth persistent homologies as the "persistent homology" (PH) of X.
Consider a generator x ∈ H m (X i ) for some m and i. If x is not in the image of f i−1,i , we say that x is "born" at time i. Correspondingly, if x ∈ H m (X i ) and f i,i+1 (x) = 0 ∈ H m (X i+1 ), we say that x "dies" at time i + 1. If for every j ≤ l, we have that f i,j (x) = 0, then we say that x never dies, and we assign a death time of ∞ to the element x. For each element x of the PH of X, there is a birth time b x and a death time d x , and the collection of intervals {[b x , d x )} is the "barcode" of X. Generators with longer associated half-open intervals [b x , d x ) are more persistent. It is traditional to construe more-persistent intervals as better indicators of a signal and construe less-persistent intervals as noise, although recent work (including our own [21,44]) indicates that it is not always possible to interpret persistence in this way.
In this paper, we use the software package Gudhi [45,46] to compute PH from the filtered simplicial complex {X i }. We construct the filtered simplicial complex {X i } from X using two different constructions, which we developed recently in a paper on voting data [21].

B. Adjacency Construction of PH
We now describe a way to construct a filtered simplicial complex based on network adjacency. We consider a network in the form of a graph (V, E), with numerical data f (v) associated with each node v. For a given filtration step X i , let the 0-simplices of X i be given by v ∈ V such that f (v) ≤ for some value . For any edge (u, v) ∈ E, if u ∈ X i and v ∈ X i , we add (u, v) to X i . Finally, to X i , we add all triangles (u, v, w) such that (u, v), (v, w), and (u, w) are in X i . We repeat this process for X i+1 , but now we use a larger value of . By construction, each X i ⊂ X i+1 , and we have a valid filtered simplicial complex. See Fig. 1 for an illustration of such a filtered simplicial complex.
In some of our applications, we use an alternate adjacency construction in which we associate data g(u, v) to the edges, instead of to the nodes. This construction differs from the one above only in that we can define the function f (v) = min u:(u,v)∈E {g(u, v)}. We then proceed with the above adjacency construction, but we substitute f for f . We recently introduced our main adjacency construction in [21], and we introduce this adaptation of it to edge-based data in the present work.
FIG. 1: We illustrate an adjacency construction of PH on (a) a planar graph, whose nodes we color according to a function value from yellow to dark blue. At each filtration step (see panels (b)-(e)), we add all nodes with a given range of function values. We also add any edges between these new nodes, as well as any edges between these new nodes and existing nodes, and we fill in any triangles that form. Only cycles of length three form triangles, so the graph panel (a) yields 5 infinite-length features in H 1 (as one can see from the five holes that remain in panel (e).

C. Level-Set Construction of PH
The other PH construction that we use (again see [21] for details) involves describing data as a manifold, rather than as a graph. Let M denote a two-dimensional (2D) manifold, such as data in an image format. We consider the boundary Γ of M and construct a sequence M 0 ⊂ M 1 ⊂ · · · ⊂ M n of manifolds, where at each time step, we evolve the boundary Γ t of M t outward according to the level-set equation. (See [47] for a thorough exposition of the level-set equation and level-set dynamics.) That is, if the manifold M is embedded in R 2 , we define a function φ( x, t) : is the signed distance function from x to Γ t at time t. We propagate Γ t outward at velocity v using the equation Because this evolution gives a signed distance function at every time step t, we take M t to be the set of points x such that φ( x, t) > 0. (This corresponds to points inside the boundary Γ t .) We show an example of this evolution in Fig. 2.
FIG. 2: Illustration of level-set dynamics. Starting from (a) an initial back and white image, we apply level-set evolution (2) for several steps to obtain the image in (b) and then the one in (c). In these images, the white space in the center of the image shrinks until it eventually it is completely covered by the expanding black surface.
By imposing {M i } over a triangular grid of points as described in [21], we obtain a corresponding simplicial complex X i for each M i . In Fig. 3, we give a visualization of this simplicial complex. Because the level-set equation (2) evolves continually outward, this automatically satisfies that condition that X i ⊂ X i+1 , so {X i } is a filtered simplicial complex.

III. APPLICATIONS
We now discuss applications of PH to both synthetic networks and empirical spatial networks from a diverse variety of applications.

A. Synthetic Networks
In this section, we discuss applications of our adjacency PH construction to a dynamical process on synthetic networks in which space plays an important role. For each network (V, E), we run the Watts threshold model (WTM) [48] on it. Given a graph, we select a fraction ρ 0 = 0.05 of its nodes uniformly at random to be "infected" at time 0. At every time step, we then we compute the fraction of each node's neighbors that are infected. (That is, we synchronously update the states of FIG. 3: Illustration of a level-set adjacency construction of PH. In (a), we show a synthetic image that we use as an initial manifold for level-set evolution.
In (b)-(d), we show various filtration steps of the filtered simplicial complex that we generate by performing a level-set evolution on the image in panel (a). Panel (b) shows the simplicial complex that we obtain by overlaying the image in panel (a) on a triangular grid. In panels (c) and (d), we add new nodes, edges, and triangles to the image as it evolves outward. Darker colors indicate simplices that enter the filtration at a later time step.
the nodes [49].) If the fraction of a node's neighbors that are infected meets or exceeds a threshold (in our case, the threshold is υ = 0.18 for all nodes), the node becomes infected. We take this implementation of the WTM to be the generator of a function f : V → N, where f (v) is the time at which node v becomes infected. We say that infected nodes are in the set I. If v never becomes infected, we set f (V ) = max v∈I f (v)+1, so that we eventually add all nodes to a filtered simplicial complex. The resulting filtered simplicial complex consists of the subgraphs that are generated by I at each time step. See [50][51][52] for existing work on the properties of the Watts threshold model on spatial networks. We examine topological changes in the infected subgraph of three different types of synthetic networks. We first examine random geometric graphics (RGGs) [53]. For each instance of an RGG, we pick 100 nodes uniformly at random from the unit square. If the Euclidean distance between two nodes is less than or equal to 0.1, we add an edge between them. Our second type of synthetic network is a square lattice network with 100 nodes. We arrange the 100 nodes in a 10 × 10 grid on the unit square, and we then connect the nodes along the grid lines (as in Fig. 4). Our third type of synthetic network is a Watts-Strogatz (WS) small-world network [54,55].
We begin with a ring of 100 nodes. We then connect each node to its k = 5 nearest neighbors on each side. We then rewire each edge uniformly at random with a probability of p = 0.1 using the implementation of the WS model in NetworkX [56].
For each type of synthetic network, we consider 100 instances, which we generate using NetworkX. For the RGG and WS networks, each instance is a different graph; the square lattice network is deterministic in nature. For all three types of networks, each instance has a different initial set of infected nodes. We show visualizations of each of these types of networks (with WTM dynamics on it) in Fig. 4. Our adjacency construction begins by selecting the initially infected nodes and the the edges between them of a network to create an infected subgraph that we call an "infection network". As the infection spreads, we add more nodes and edges to the infection network until eventually we have added all nodes and edges to it.
Examining the PHs of the RGGs (see Fig. 5), we see that for our parameter values, the infection network tends to have several connected components, resulting in a large number of features in H 0 . However, because of the spreading behavior of the WTM, new nodes can become infected only via their infected neighbors. Because features in H 0 record connected components of a graph, new infected nodes join existing connected components. Therefore, features can only be born at time 0 or in the last step, which is when we add all remaining uninfected nodes to our filtered simplicial complex. By contrast, features in H 1 are relatively rare, as most cycles that occur in an RGG are filled because of the uniform probability distribution of the node locations.
For a square lattice network (see Fig. 6 for a PD of the WTM on such a network), we first note that there is only a single infinite-length feature in H 0 , as the final infection network necessarily consists of a single connected compo- nent. Consequently, H 0 consists of a set of features that are born at time 0 and eventually merge (and therefore die), resulting in a single infinite-length feature. Additionally, there are a constant number (81, to be precise) of features in H 1 , because when we construct a simplicial complex, every grid cell of the lattice is a feature in H 1 at the final filtration. However, these features can be born at a variety of times, as all nodes must enter the filtration for the filtration to incorporate every lattice cell. From Fig. 7, we see a WS small-world network also eventually has an infection network that consists of a single connected component. However, the WS networks consistently have more features in H 1 than the RGG networks, because the former's (non-geometric) shortcut edges usually result in splitting an existing cycle (and hence a feature in H 1 ) into two cycles. We summarize our observations about the various synthetic networks in Table I, in which we give the means and standard deviations of the number of features during the temporal evolution of the WTM in each type of synthetic graph. Our counts include features that appear at any time during the WTM dynamics.

B. Street Networks in Cities
The field of urban analytics has grown rapidly in the last several years [1,11,57]. To give one example that is of central interest to us, increasingly powerful computational tools have allowed researchers to characterize cities in terms of their street networks [58]. Indeed, a variety of tools from network analysis have been applied to the study of urban street networks [8,[59][60][61]. In the present subsection, we use city street networks as base manifolds for our level-set construction, and we thereby characterize cities based on their PHs. We use these PHs to compare city morphologies both within a single city and across a variety of cities.
We use our level-set construction to obtain topological descriptors in the form of persistence for city street networks. We then use these city persistences to compare (1) different regions of the same city and (2) different cities to each other. We obtain all of our city street networks with the software package OSMnx [62] using latitude-longitude coordinates and taking a 1 km block that is centered at specified coordinates. In each example below, we will indicate how we choose these coordinates.
The initial filtration of the filtered simplicial complex that results from our PH construction consists only of the streets in a network. As we increase the filtration time, we slowly add city blocks to the complex, and the topology changes as those blocks are filled in. More regular city blocks are more likely to be filled in without creating any new homological features, and larger blocks take longer to be filled in. Our construction is thereby able to capture information about the size and regularity of city blocks. The existence of dead ends tends to lead to the "pinching" of blocks into multiple homological features -as dead ends expand, they lengthen and eventually meet with nearby streets, cutting through blocks in the process -so our approach also yields information about dead ends.

Comparing Different Regions of the Same City
We sample 169 points from the city of Shanghai using a shapefile of Shanghai's administrative-district boundaries that we downloaded from ArcGIS [63]. From the shapefile, we obtain a bounding box for each point. We sample uniformly within this bounding box, discarding points that do not lie within the polygon district geometry that is defined in the shapefile. Sampling ends when we reach the desired number of points. In total, we sample ten points from each administrative district, and we also include nine historical landmarks with coordinates from Google Maps [64]. In Fig. 8  After computing PH (in the form of a PD) for each map, we compute the bottleneck distance between each pair of maps. Bottleneck distance is a metric that is defined on the space of PDs. It gives the shortest distance d for which there exists a perfect matching between the points of the two PDs and all diagonal points, such that any pair of matched points are at most a distance d from each other, where we use the supremum norm in R 2 to compute the distance between points. Once we have pairwise bottleneck distances between PDs, we perform average-linkage hierarchical clustering into three clusters. (We chose to have three clusters based on looking at the dendrogram.) In Fig. 9, we show the sampled points (which we color according to their cluster). We observe that the three clusters consist largely of historical areas ("City center"), concession-era areas ("Transition areas"), and modern areas ("New construction"). In Fig. 10, we show administrative districts along with the year that they were constructed. We break them down by the percentage of the sample points that are in each cluster.
FIG. 9: Sampled points in Shanghai. We color these points according to their cluster assignment from average-linking hierarchical clustering of areas of Shanghai into three clusters.

Comparing Street Networks from Different Cities
We continue our analysis of cities by characterizing and comparing the structures of street networks of 306 cities across the globe. We downloaded latitude and longitude coordinates from SimpleMaps [65] and selected all cities with a population at least 1.5 million people. Given these latitude and longitude coordinates, we use OSMnx [62] to obtain street networks. We then compute PH for each city and cluster their PDs using average-linkage hierarchical clustering with three clusters. We sometimes refer to a city in a given cluster as a city of a certain "type". ) Most of the older districts have a larger percentage of points that are assigned to the "City center" cluster, whereas the points in the "Transition areas" cluster tend to occur in districts that included development in the 19th and early 20th centuries. The "New construction" cluster is the most common assignment for administrative districts that were built in the 1950s or later.
Our results depend on the specific latitude and longitude coordinates in our downloaded data set. Accordingly, our results are influenced by the particular location of a city's coordinates, which are the standard ones in SimpleMaps.
In the following paragraphs, we describe our clusters of cities. We define "blocks" to be the cells of a planar street graph. Although our level-set construction for computing PH is not designed explicitly to characterize blocks, we take advantage of the fact that our level-set construction takes the set of streets as its initial manifold. As the streets expand outward according to the level-set equation (2), they fill in the blocks. Larger blocks take longer to fill in, and blocks fill in more evenly when they are closer to circular in shape. Roughly, we characterize block sizes based on the death times of H 1 features: small sizes correspond to early death times (specifically, less than 10), medium sizes correspond to death times between 10 and 15, and large sizes correspond to late death times (specifically, more than 15). We also designate blocks as "regular" (when they are close to a regular polygon) or "irregular" (for blocks that do not resemble a rectangle or some other convex polygon). If a block is very irregular, then as its streets expand, it is possible that narrow parts of the block will shrink and close off, such that the block segments into smaller blocks. We refer to this phenomenon as "pinching". Our three main clusters are dominated by (1) gridlike cities, (2) cities with gridlike patches that are interspersed with larger, non-gridlike blocks, and (3) cities that have a large num-ber of non-gridlike structures (specifically, dead ends or large holes) that interrupt other structures. We use the term "interrupted grid" for cities that are either mostly gridlike with some patches that are not gridlike or which consist of patches of disparate grids that are stitched together (with other features between them).
Our first major cluster has 99 cities and is dominated by cities with small, gridlike blocks. All regions of the world have some cities of this type, but North America has the largest percentage (relative to all of the cities that we sample from North America) of these gridlike cities and Europe has the smallest percentage of them. The block sizes in this cluster tend to be small or medium, resulting in filtrations whose maximum filtration value tends to be small in comparison to cities in the other two clusters. In the PDs, we also observe that the distributions of death times of features in H 1 tends to be close to uniform and over a small range. Such distributions occur because these gridlike cities tend to have even distributions of block sizes, even though they include some areas with slightly smaller and/or slightly larger grid sizes. They do not have large blocks, so they do not have features in H 1 with late death times.
Our second major cluster has cities with patches of grids that are interspersed with structures that are not gridlike. This cluster, with 149 cities, is the largest of our three clusters. The PDs in this cluster tend to have larger maximum death times than the PDs for the cities in our first cluster. In the PDs, gridlike blocks yield collections of features in H 1 with early death times, and the larger, non-gridlike structures yield features in H 1 with late death times. The non-gridlike areas in these cities do tend to have fairly regular shapes, resulting in a relatively small number of features in H 1 with late birth times. Such late-birth-time features usually correspond to the pinching of blocks, which can occur either via dead ends or via shape irregularities. By examining the dendrogram from our hierarchical clustering, we can further separate the second cluster into two subclusters, which we show in Fig. 12. The first of these subclusters consists mostly of cities that have large patches of gridlike structure, with a small number of large blocks that interrupt the grids. The PDs for cities in this subcluster tend to have a large number of features in H 1 with early death times, and they tend to have only a small number of isolated features in H 1 with late death times. The second subcluster of our second major cluster consists mostly of cities with small patches of grids that are mixed with large irregular blocks. The PDs for cities in this subcluster tend to have a larger number of features in H 1 with late death times than is the case for the cities in the other subcluster of cluster two.
Our third major cluster, with 58 cities, consists of cities with a large number of non-gridlike structures. In particular, many of these cities contain a large number of dead ends, rectangular blocks that are not arranged in a grid, or both. Some of our observations include streets that do not continue through particular blocks (e.g., there is FIG. 11: Cities in our first major cluster have gridlike street layouts. One example of such a city is Los Angeles, which we show in this figure. We show its street network in the top row and its associated PD in the bottom row. a street, it is obstructed, but then it continues after the obstruction), which leads to a mixture of block sizes even in areas of a city that tend to have regular blocks. We refer to these situations as "obstructions". The PDs of the cities in this cluster have a larger number of features in H 1 with medium death times (specifically, in the range 10-15), and many of these features are closer to the diagonal than t = 0. This is common when large blocks are pinched into several regions, as the smaller regions are born at the pinching time, rather than near the beginning of the filtration. Therefore, they do not survive long enough to have a late death time. By examining the dendrogram from our hierarchical clustering, we see two clear subclusters. However, one of these subclusters consists of only two cities (Beirut and Nanyang). The PDs of both of these cities are dominated by two features in H 1 with late death times, and they also have several fea- in a large grid and is an example of a city in the first subcluster of cluster two. Barcelona, which is in the second subcluster, is an example of a city with small patches of gridlike structure.
tures in H 1 with medium death times. In Fig. 13, we show examples of cities in cluster three. We color-code our cities according to their major cluster and show them on a world map in Fig. 14. In Fig. 15, we show the breakdown of cities from each continent into the various clusters. We calculate PH for only four major cities in Oceania, so we cannot draw strong conclusions from the cluster assignments of those cities. Among the other regions, we observe that North America has the largest proportion of cities with gridlike street layouts and the smallest proportion of cities with non-gridlike layouts. By contrast, Europe has the smallest proportion of cities with gridlike street layouts. This is consistent with the common perception that North American cities are much more gridlike than European cities. In all regions, we also observe that a large fraction of the cities are interrupted grids. Additionally, we observe that South America, Africa, and Asia have similar distributions of city types.
Interestingly, from the map in Fig. 14, South America, Asia, and Africa appear to have areas that are dominated by specific major clusters. We observe non-gridlike cities in the northern part of South America, whereas we see gridlike cities along its east coast. In Africa, most of the non-gridlike cities occur along the western coastline. In Asia, we see few non-gridlike cities throughout most of Southeast Asia. Across the map, there appears to be a This leads to a large range of block sizes and hence to features in H 1 that have medium death times. Such features are rare in the other two major clusters. For example, Nanyang has several streets with obstructions, and London has dead ends and a broad distribution of block sizes.
potential equatorial band of non-gridlike cities. We do not have an explanation for these patterns, but they are fascinating and seem worthy of future research efforts.

Comparison of Our Classification to that of Louf and
Barthelemy [67] We compare our results to the city classification of Louf and Barthelemy [67], who associated each city with a conditional probability distribution that captures the area and shape of its blocks. We choose their method as a point of comparison because they studied a wide range of cities and (like us) codified cities from a block-based perspective. They used the word "fingerprint" as a monicker for their block-based representation of cities. In our method, we codify cities according to their PHs, which we generate using the level-set construction of Section II C. Both the approach of [67] and our approach capture information that is based on city blocks, although our PH representation differs substantially from the fingerprints of [67].
Louf and Barthelemy clustered cities into four groups, whereas we have chosen to cluster our cities into three groups. In [67], European and North American cities largely inhabit the same cluster (group three in [67]), but they appear in distinct subclusters, demonstrating that there is a substantive difference between cities from the two regions. Our method finds that North America has largest proportion of cities with gridlike streets among all of the regions and that Europe has the smallest proportion of such cities.
In contrast to the above situation, Africa, Asia, and South America have a fairly balanced composition of city types, with a potential equatorial band of non-gridlike cities. Louf and Barthelemy observed several clusters (groups one, two, and four in [67]) that occur predominantly in Africa, Asia and Oceania (which they combined into one entity), and South America. Notably, none of our clusters are as dominant as group three (which they described as having heterogenous block sizes and shapes) in [67], although we do observe that our cluster of cities with interrupted grids (such cities are characterized in part by their heterogeneous block sizes) is also our largest cluster. Now that we have compared our results to those of [67], we briefly compare and contrast the types of information that the two methods can capture. Recall that our level-set construction for PH generates filtered simplicial complexes that first consist of streets and then expand outward to absorb the blocks between them. The PH of these filtered simplicial complexes thereby gives a lowdimensional representation of the original image of a city street network. Because irregularly shaped blocks are absorbed into the surrounding streets at a different pace than regular blocks, we capture information about the regularity of each block. Louf and Barthelemy's method also uses information about the regularity of block shape. See Eq. (3.2) in [67] for a precise mathematical statement of how they measured the regularity of blocks. It is related to so-called "compactness measures" [68], which are used in the study of gerrymandering [69,70], that compare the area of a shape to the area of a circle in which the shape is circumscribed.
Because the original image of a street network includes information about the spatial relationships between blocks, the PH that results from our approach also encodes some of this information. By contrast, Louf and Barthelemy's fingerprints do not encode information about the spatial relationships of blocks to each other. Additionally, our method captures information from dead ends, which Louf and Barthelemy discarded.
Overall, although both our approach and that of [67] use a block-based representation to characterize cities, there are subtle differences in the way that the two approaches encode block information. The commonality of a block-based perspective results in some similarities. For example, the clusters that result from the two approaches seem to be based heavily on block size and regularity. However, our approach appears to prioritize spatial relationships between different clusters of blocks (specifically, whether blocks are arranged in a grid); such information is not captured in the approach of [67]. Consequently, the two approaches capture different city morphologies, and we expect them to be useful as complementary techniques for studying structures in spatial complex systems. As a second application that uses empirical data, we consider snowflake crystals [71]. We start with twelve different images (from [71]) of snowflakes with different crystalline structures. (See Fig. 20 in Appendix A.) From these images, we compute level-set complexes and PHs; we then perform average-linkage hierarchical clustering on the PDs to produce the dendrogram in Fig. 17.

C. Snowflakes
The images of snowflakes consist of edges (the black lines in our images) and cells (the white spaces that are bounded by the edges). We refer to the outer areas that extend from the center of the snowflakes as their "points." The twelve snowflakes have fairly regular crystalline structures, so our computation of PH predominantly records information about the distribution of cell sizes in a snowflake. The inherent hexagonal nature of snowflakes and the regularity of their crystalline structures largely overwhelms our ability to use PH to glean information about their spatial relationships and irregularities.
Examining the clusters (see Fig. 17) reveals that snowflake A and snowflake B each reside in their own cluster, and the remaining snowflakes split into two more clusters. Snowflake A's PD (see Fig. 16a) is dominated by a feature in H 1 with an early birth and a late death (see the point at the top left corner of the diagram). This arises from the large feature that is formed by the bold ring near the center of the snowflake. None of the other snowflakes have a bold central ring. More generally, we observe few features in the PD for snowflake A. By contrast, snowflake B's features are largely close to the diagonal (see Fig. 16b) because the initial manifold of the snowflake does not have large holes. Notably, we do not observe any points in the top-left region of its PD. The cell sizes in snowflake B are smaller than those in most of the other snowflakes, and even its central ring structure includes a large number of small holes. The remaining snowflakes either have more large holes than snowflake B, or they do not have a bold central ring like that in snowflake A. Note that snowflake B has a PD that is much closer than that of snowflake A to those of the other snowflakes.

D. Spiderwebs
Our final application is to the topology of spiderwebs. In 1948, Peter Witt began research on the effects of drugs on spiders to test whether garden spiders would shift their web-building hours if they were administered drugs. Witt found that drugs affect the size and shape of the webs that are produced by spiders [72]. He also found that higher doses of most drugs (100 µg per spider, as opposed to 10 µg per spider) tend to lead to larger changes in the shapes of webs, including yielding more irregular webs. Witt eventually published more than 100 papers and several books on the behavior of spiders and spider webs. For more information on his experiments with psychotropic substances and spiders, see his 1971 review article [73]. In a 1995 technical briefing [74], NASA (which was inspired by Witt's research) proposed that spiders who were administered more toxic substances produce webs that are more deformed (in comparison to a web that is spun by a drug-free spider) than less toxic substances. Additionally, using techniques from statistical crystallography, they concluded that spiders fail to complete more sides of their webs when they are under the influence of more toxic substances.
In our case study of PH in spiderwebs, we use five images from the NASA technical briefing [74] and two images from Witt [73] of various webs that were spun by spiders under the influence of a variety of psychotropic substances, apply a level-set construction to compute PH, and perform average-linkage hierarchical clustering to yield the dendrogram in Fig. 18. We show the images of the spiderwebs and their associated PDs in Fig. 19.
Our classification places the drug-free spider into into its own cluster. The drug-free spiderweb is characterized by a clear central hole, threads radiating outward at approximately even intervals, and completed rings of threads that surround the center. We place marijuana, peyote, and LSD in the same cluster. In these webs, there is a clearly identifiable center, and most of the radial threads are evenly-spaced, straight, and radiate outward directly from the center. However, for the webs in this cluster, rings of threads are either difficult to see or are incomplete. The final cluster consists of chloral hydrate, caffeine, and speed. In the caffeinated spider's web, one cannot even clearly identify a center [75]. One can locate a center in the webs of the spiders that were under the influence of speed or chloral hydrate (a sedative that is used in sleeping pills), but many of the radial threads do not join the center and some of the radial threads are not straight. Almost no complete rings of thread are visible in any of the three webs in this cluster.

IV. CONCLUSIONS
In the study of spatial complex systems, it is important to exploit spatial information when studying them. In this paper, by using new methods of computing persistent homology that take spatial information into account, we presented several applications of topological data analysis to spatial networks. We showed that topological methods are capable of characterizing network structures and detecting structural differences from images of various spatial networks. We also demonstrated, using both synthetic examples and networks from empirical data, that such methods are able to provide insights into large-scale FIG. 18: Classification of webs that were spun by spiders under the influence of various psychotropic substances.
network structures that complement those from traditional techniques of network analysis. As an extended case study, we examined the morphology of street networks in cities, and we used spatial TDA to compare and contrast (1) different regions of the same city and (2) different cities. We hope that our examples help illustrate some ways in which topological methods, especially ones that directly incorporate spatial information in their formulation, can be useful for the analysis of spatial complex systems. [The images for panels (a)-(e) are from [74], and the images for panels (f) and (g) are from [73].] FIG. 20: The full set of twelve snowflake images that we examined in Section III C. We label these snowflakes using the panel labels from this figure: we show Snowflake A in panel (a), Snowflake B in panel (b), and so on. [These images are from [71].]