Combining machine learning with physics: A framework for tracking and sorting multiple dark solitons

In ultracold-atom experiments, data often comes in the form of images which suffer information loss inherent in the techniques used to prepare and measure the system. This is particularly problematic when the processes of interest are complicated, such as interactions among excitations in Bose-Einstein condensates (BECs). In this paper, we describe a framework combining machine learning (ML) models with physics-based traditional analyses to identify and track multiple solitonic excitations in images of BECs. We use an ML-based object detector to locate the solitonic excitations and develop a physics-informed classifier to sort solitonic excitations into physically motivated subcategories. Lastly, we introduce a quality metric quantifying the likelihood that a specific feature is a longitudinal soliton. Our trained implementation of this framework, SolDet, is publicly available as an open-source python package. SolDet is broadly applicable to feature identification in cold-atom images when trained on a suitable user-provided dataset.


I. INTRODUCTION
Machine learning (ML) techniques promise improved data analysis and enhanced performance for today's quantum devices and technologies. Ultracold atomic gases are a nearly ideal system to deploy ML-driven analysis, where the automated exploration and interpretation of a very large dataset, in the form of images, can lead to scientific enhancements and experimental optimization [1] as well as new discoveries. Here we focus on the general problem of feature identification, a commonly recurring task in the analysis of such data, from locating vortices [2][3][4] or tracking solitons [5,6], identifying spin textures or magnetic domain walls [7][8][9] to locating topological singular points [10]. While data from these examples have been individually analyzed using task-specific algorithms (or even manual inspection), they are all feature identification problems that can be solved using a single ML-enhanced analysis framework. This paper introduces such a framework, and demonstrates its utility on the specific problem of identifying solitonic excitations in atomic Bose-Einstein condensates (BECs), as well as quantifying the quality of each identified feature.
Traditional statistical analysis using physics-based models, such as least-square fitting and hypotheses testing, have been go-to techniques for data analysis since the 1800's [11] and remain widely applied in quantum coldatom image analysis [12][13][14]. The outcome of physicsmodel-based algorithms and fits are intuitive, physically meaningful, and can help identify patterns present in the data; even fits based on more heuristic functions can have coefficients that are derived in obvious ways from the data. By contrast, ML methods work as "black boxes," * jpzwolak@nist.gov making their operation difficult to interpret. Conventional statistical methods use fixed algorithms in conjunction with preconceived models for data reduction. Overfitting occurs when the number of fit parameters is comparable or larger than the number of independent data points. In this context, the process of training an ML tool essentially codesigns the fitting algorithm and the data model, as encoded by a large number of internal parameters. Training ML models is itself a fitting process that can be susceptible to overfitting, for example when the training dataset has too little variability or the ML model has too many internal parameters. ML involves a class of data-driven techniques that do not rely on preexisting models, but also add additional opportunities for overfitting that can make them less reliable on new data than conventional techniques.
Here, we describe the hybrid two-module feature identification framework shown in Fig. 1, that combines the flexibility of ML techniques with the intuition and robustness of conventional fitting methods. Furthermore the separate outputs of these two very different modules allow us to assess data quality by cross-validation. Hybrid approaches have been employed in other settings, for example for landslide prediction [15], medical image processing [16], and cyber attack detection [17].
The framework begins with a labeled dataset that is used to train the ML module and initialize the physicsbased module. Before trusting either module, we independently validate each module on a subset of the labeled data that was not used for training. Model redesign may be needed until satisfactory performance of each module is reached. We then combine both modules into an integrated system able to analyze new data.
We demonstrate the performance of our framework using data from atomic BECs, quintessential quantum systems. Quantum research with BECs, and coldatom quantum gases more broadly, is multifaceted with examples ranging from realizing collective many-body physics [18] to creating today's most accurate atomic clocks [19]. In the vast majority of these experiments, data is acquired in the form of noisy images that typically have undergone evolution, such as a time of flight, before measurement. This often obfuscates the computation of the quantities of interest. Cold quantum gases therefore make an ideal testbed for our methodology that combines physically motivated, but heuristic, fitting functions with established computer vision techniques. We focus on the specific problem of locating dark solitons (spatially compact excitations that manifest as reductions in the atomic density) as they move in BECs [13,20,21]. This allows us to leverage our established soliton dataset [22,23] to train and validate our framework; representative elements of the dataset are shown in Fig. 2. These data consist of elliptical atom clouds (top row) where solitons appear as vertically aligned density depletions (bottom row). Not all vertically aligned density depletions are created equal: deep depletions mark the location of slowly moving kink solitons; shallow depletions are associated with rapidly moving kink solitons or "longitudinal" solitonic vortices (where the vortex core is aligned in the image plane); asymmetric depletions can result from "transverse" solitonic vortices [24] (where the vortex core is aligned perpendicularly to the image plane); and chains of stripes can result from highly excited phonon modes. Our framework is a tool that can automatically locate all the solitonic excitations in each image and distinguish between longitudinal solitons and transverse solitonic vortices. Here we introduce the term "longitudinal soliton" to include both kink solitons and longitudinal solitonic vortices.
Our ML module leverages and extends established computer vision techniques. Computer vision is a broad field with applications ranging from image classification to semantic segmentation and object detection [25]. Object detection refers to the capability of software systems to locate and identify objects in an image. Convolutional  neutral networks (CNNs) underlie solutions to all of these tasks, and unsurprisingly were employed in our previous work classifying soliton image data into three categories: no solitonic excitation, one solitonic excitation, and other excitations [22]. Our ML module goes beyond simple classification and uses a CNN-based object detector (OD) to provide the location of all candidate excitations in a given image. By contrast our physics-based module employs a leastsquares fit of an inverted and skewed Mexican-hat function to one-dimensional (1D) background-subtracted projections of soliton candidates (shown in bottom row in Fig. 2). We initialized this module using our previously labeled single soliton data and employ a Yeo-Johnson transformation [26] to produce a multivariate normal distribution yielding the likelihood that an unknown feature is a soliton.
This approach yielded three immediate benefits. First, a careful analysis of the coefficients from the physics based-module identified previously overlooked correlations that allow us to distinguish between some solitonic excitations (longitudinal solitons and transverse solitonic vortex [20,21,24,27]). Second, combining the results of the ML and fitting modules allowed us to automatically create a larger, more reliable dataset that includes finegrained information such as the soliton position and type of excitation. This dataset is described in Ref. [28] and published in the NIST data repository [23]. Third, our hybrid framework was prepared solely from a training dataset whose images contain either zero or one solitonic excitation; however, it is performant on complex data containing multiple excitations.
The remainder of this paper is structured as follows: Section II introduces both modules and describes their training and initialization. Section III describes the validation of both modules and their performance on new data that include multiple solitonic excitations. In Sec. III E, we describe an open-source python reference implementation of our framework: soldet [29]. Lastly, in Sec. IV we conclude and discuss the potential applications of the framework as well as the possible future directions.

II. DATA AND MODULES
In addition to the recent success of ML methods [22,30,31], solitonic excitations have also been located and characterized using traditional fitting techniques. For example, Ref. [13] began with the background-removed atom density profiles (blue curves in Fig. 2) described in Sec. II A, then identified the deepest depletion (orange dashed line), and fit to a Gaussian function (a physically motivated, but heuristic choice) centered near the deepest depletion. This yielded physical information including soliton width, depth, and position. Unfortunately, this simple approach is failure prone, as for example in Fig. 2(b)(ii), where the deepest depletion is far from the actual soliton. Moreover, it detects only single solitonic features, making human intervention necessary when many excitations are present. Rather than finding the deepest minimum, our framework first uses an OD (described in Sec. II B) to provide an initial estimate of all solitonic excitation positions, and then uses a skewed Mexican-hat fit function (Sec. II C) that accurately describes their density profiles. The resulting fit coefficients serve two purposes: qualitative likelihood assessment and fine-grained categorization.

A. Data
Our framework is trained and initialized using a revised dataset consisting of about 5.5 × 10 3 manually labeled experimental images of BECs with and without solitonic excitations [23,28]. The experimental setup and preprocessing techniques are described in [13]. Figure 2 shows six selected sample images from the labeled dataset. The dataset includes labels for five classes: "no solitonic excitation," images that do not contain any excitations; "single solitonic excitation," images containing one solitonic excitation; "other excitations," images not in the preceding classes (including those with multiple solitonic excitations, high degrees of noise, and those annotators could not agree on); "mislabeled", data determined to be potentially mislabeled during the curation process; and "unlabeled," images that have not been manually annotated. Additionally, for the single excitation class the dataset includes the horizontal position of excitations within BEC. Horizontal 1D profiles (bottom row of Fig. 2) also have features associated with vertically aligned solitonic excitations and are amenable to least-squares fitting. We obtain these profiles by first summing the pixel values vertically to compress two-dimensional (2D) images to 1D; this sum can be over all (green curves) or part (see Sec. II C 1) of the vertical extent of the image. We then fit a 1D Thomas-Fermi (TF) model to each summed 1D profile, where i is the horizontal pixel index, and n 0 , i 0 , R 0 , and δ n are fitting parameters representing peak density, center position, TF radius, and an overall offset, respectively. This fit (black curves) serves as an overall background that we subtract from the 1D profiles, leaving behind the 1D density fluctuations (blue curves). The orange dashed lines represent the location of deepest depletion in the 1D fluctuations.

B. ML modules
Our previous dark soliton classifier [22] consisted of a CNN model that returned one of the three predefined classes: no solitonic excitation, single solitonic excitation, or other excitations. However, this detector did not locate the excitations. To compare with experimental data, we located the soliton by identifying the deepest depletion and fitting to a Gaussian, as described above. This algorithm has two limitations: (1) The soliton may not be the deepest depletion [as in Fig. 2(b)(ii)]; and (2) multiple solitons cannot be located [as in Fig. 2 Here we retain the CNN classifier to globally organize the data, but inspired by a highly successful recent result using an OD to locate vortices in numerically simulated 2D BECs [30], we employ an OD to locate solitonic excitations in experimental images of highly elongated BECs.
The OD complements the CNN classifier in two ways: (1) it identifies soliton positions rather than classifying; and (2) even though it is trained with single-soliton data, it can locate multiple excitations in the same image. We employ a neural network based OD with six convolution layers and four max-pooling layers but no fully connected layers (see the Appendix A for more details). The OD has an order of magnitude fewer trainable parameters than our previous CNN (7 × 10 4 versus ∼ 10 6 parameters), accelerating the training process and making it lightweight to deploy. Because the OD simply requires a dataset with many representative instances of the object to be detected, it requires far less training data than the CNN classifier (which by design required substantial data from all considered classes).
In our data, the solitonic excitations are roughly four pixels in width. Since our images are 164 pixels wide, we designed our OD to aggregate the image into 41 spatial cells, each with two outputs in the range [0, 1]; the OD therefore returns a 41 × 2 array Y. For our dataset this aggregation guarantees that each output cell can describe the state of at most one soliton. Y ,1 is a probability estimate that cell contains a soliton, and Y ,2 is the fractional position of the soliton center within that cell, where 0 or 1 correspond to the left or right edge of the cell, respectively. The OD considers any cell with Y ,1 > 0.5 as containing an excitation, and then obtains its position from Y ,2 .
When comparing to the training dataset with labels denoted by Y, we use the cost function [30] for each training image, where the label Y ,1 identifying the presence of an excitation in a cell is fully confident, i.e., either 0 or 1. The coefficients w 1 , w 2 are hyperparameters controlling the relative importance of each term. The logarithmic terms increase the cost function when the OD misidentifies solitons, while the quadratic term contributes when a soliton is mislocated within a cell. Our training set uses images with at most one soliton, so cells with Y ,1 = 1 are much less frequent than those with Y ,1 = 0; as a result we expect that w 1 , w 2 1 to give similar overall weight to the three terms in Eq. (2). We train the OD by minimizing the cost function summed over all training images, updating the predicted OD values Y in each iteration. Because the cell size is comparable to the soliton size, a single soliton can span two cells. To prevent double counting, we merge detections occurring in adjacent cells and take the position to be their average.
We deem the OD's detection successful if our training data contains a labeled soliton close to the detected one (within three pixels in our implementation). The two failure modes are failing to detect a solitonic excitation and reporting an excitation that is not present.

C. Physics-based modules
In this section, we introduce our physics-based module that uses constrained least-squares fitting to estimate soliton parameters, and following a Yeo-Johnson transformation [26], produces a quality estimate giving the likelihood of a given feature being solitonic.
We fit the Ricker wavelet [32], i.e., a Mexican-hat function to the 1D density fluctuations described Sec. II A, where n TF (i c ) is evaluated with δ n = 0. The function takes six parameters: normalized logarithmic amplitude A, center position i c , width σ, logarithmic symmetrical shoulder height a, asymmetrical shoulder height b, and an offset δ. When a and b are zero this function is a simple Gaussian, making a nonzero adds symmetric shoulders to the distribution, and b introduces an asymmetry. Our solitonic features are well described by this function; since our excitations manifest as density depletions, the second term in Eq. (3) is negative. Our constrained least-squares fit requires initial guesses for all of these parameters. The guess for the center position i c also provides the initial guess for A by setting it equal to the 1D density fluctuations evaluated at i c . We found the initial values σ = 4, a = 0.2, b = 0, and δ = 0 to lead to convergent fits across the whole dataset. In order to produce reliable fits we apply the following constraints: i c must remain within three pixels from the initial guess, 10 −13 < A < 10 4 , and 10 −13 < a < 10 4 to prevent numerical fitting errors.

Physics-informed excitation classifier
Many candidate solitonic excitations are not vertically symmetric as might be expected [see, e.g., Fig. 2 The location of the largest "shoulder" in the top half of the excitation is reversed with respect to the bottom half; in addition, the location of the minimum is slightly displaced going from the top half to the bottom. Inspired by these differences, we bisect each image into top and bottom halves (labeled by + and −, respectively) and separately apply the Mexican-hat fit to fluctuations in these data, giving vectors Θ ± . Using this observation, we develop a physics-informed excitation (PIE) classifier based on the single-soliton dataset and discover that correlations between these vectors allow for a more finegrained excitation classification. Figure 3 shows the distribution of parameters from a single-soliton dataset that were useful for classifying excitations. No meaningful correlations were found for parameters σ ± and a ± , thus these did not assist in classification. The markers in the top panel show the amplitude ratio and show that they are not correlated. By contrast, the bottom panel shows that the asymmetric shoulder height difference δb = b + /σ + − b − /σ − is clearly anticorrelated with δi c . Both panels are colored based on the cut-off points discussed in Sec. III B (see also Fig. 5).
This distribution and its correlation guide the classification rules described in Sec. III B, yielding a PIE classifier based on cutoffs defined by human examination of the data.

Quality estimation
Here we describe a quality estimate that a candidate excitation in an image is solitonic. We derive the likelihood that a vector of fit outcomes Θ = [A, i c , σ, a, b] is drawn from a k = 5 dimensional prior distribution spanning the set of representative solitonic excitations [33]. Ideally this distribution would be an uncorrelated multivariate normal distribution, but it is not. As a result, we developed the following procedure to bring the distribution into this desired form.
We first fit a Yeo-Johnson power transformation [26] to each separate parameter distribution (having summed the five-dimensional distribution along the remaining parameters) to transform them into independent zero-mean 1D Gaussian distributions with unit variance. Note that this treatment cannot transform the parameter distributions into perfect Gaussians; nevertheless, each resulting distribution is balanced, contains a single peak, and has long tails. The covariance matrix Σ k is uncorrelated after this treatment and the distribution is qualitatively Gaussian in shape.
To calculate the quality estimate for a candidate excitation detected in an image, we 1. fit the subtracted background 1D profile to Mexican-hat function given in Eq.
(3) to obtain Θ; 2. use the established power transformation on Θ to obtain Θ ; and 3. return the quality estimate: M (Θ ) = 1 − χ 2 k D 2 (Θ ) , the likelihood between 0 and 1 that the excitation is solitonic. The chi-squared cumulative distribution function χ 2 k (p) relates the Mahalanobis distance [34] D 2 (Θ ) = Θ † Σ −1 k Θ to the likelihood that an outcome was drawn from the specified distribution [35]. D(Θ ) is unbounded above and decreases to zero as Θ approaches Θ , the average over the prior distribution.

A. ML modules
We train both the CNN classifier and the OD using the refined dataset with added soliton position labels (see Ref. [28]). The CNN classifier is trained using the full dataset while the OD training uses only the no solitonic excitation and single solitonic excitation classes. We assess the performance of both modules using five- fold cross-validation, that is using 80 % of the data to train a given module and the remaining 20 % to test it, and repeating the process five times to fully cover the dataset (see the Appendix A for training details).
The results are summarized in the two cumulative confusion matrices plotted in Fig. 4. The top panel compares the outcome of the OD to the initial labels, showing near perfect delineation between the no excitations and single excitations classes. However, the OD further subdivides the other excitations class, counting anywhere from zero to four candidate solitonic excitations within it. This results from the existence of excitations in this class that are not solitonic, as well as the possibility of having multiple solitons in the same image. The analogous comparison to CNN classification labels in the bottom panel is nearly indistinguishable from the one presented in the top panel, evidencing the quality of the CNN predictions.
Together, these ML tools effectively classify these data and locate excitations; however, they do not provide any fine-grained information on the nature or the quality of the identified excitations. This is addressed in the following sections.

B. PIE classifier
The PIE classifier operates by applying a sequence of "cuts" driven by different combinations of the top-bottom fit outcomes Θ ± . The exact parameter values described below are arrived at manually by exploring the data accepted and rejected by the cut to minimize the number of false-positive longitudinal soliton identifications.
The following cuts are applied sequentially, and the PIE classifier stops as soon as a classification is assigned.
A cut: The amplitude parameters A ± , and their ratio ρ A allow us to identify excitations that do not span the whole cloud. Data with ρ A > 1.57 are classified as "top partial excitation" and those with 1/ρ A > 1.57 are classified as "bottom partial excitation." This threshold identifies large fractional jumps in depth between the top and bottom that likely are off-axis solitonic vortices. Applying A cuts first is important because partial excitations interfere with the subsequent steps.
We therefore add images with δi c < −3.0 and δb > 0.61 to the counterclockwise solitonic vortex class and those with δi c > 1.14 and δb < −0.41 to the clockwise solitonic vortex class.
Other data: The remaining images have δi c = 0 but δb ≈ 0 are labeled as "canted excitations," likely kink solitons in the process of decay. The flow chart in Fig. 5 shows the application of this classifier to a single-soliton dataset. We found that of the initial 3 212 images, about 1/3 failed a cut and were rejected as longitudinal soliton candidates.
This classification was also used in the preparation of Ref. [28] in which we present a refined soliton dataset, which includes improved single longitudinal soliton labels. The cuts above are fairly aggressive to avoid false positives in the longitudinal soliton classification. This implies possible misclassification in the other categories in order to ensure a high quality longitudinal soliton subset and a reliability of the quality metric.

C. Quality estimator
The quality estimator is initialized on the subset of the single excitation class identified as longitudinal soliton using the PIE classifier. Figure 6(a) shows the powertransformed distribution of Mexican-hat fit coefficients Θ , with nontransformed coordinates marked on the top axis for reference. As would be expected, the data from the initialization dataset (orange) are nearly normally distributed; interestingly, the remaining elements of the single excitation class (partial solitons, canted excitations, and solitonic vortices, as labeled by the PIE filter) collectively follow very similar distributions (green). By contrast, the coefficients from every local minimum [37] in the initialization set except solitonic excitations (blue curve) obey a qualitatively different distribution.
Using this initialization, we compare quality estimates M obtained from the single excitation class in Fig. 6(b). The orange data show M for longitudinal solitons, and as intended the majority of this data is associated with larger values of M . The green data for the remaining solitonic excitations are nearly uniformly distributed, and the nonsoliton minima (blue) are highly concentrated at small M . We note that the small peak in longitudinal soliton distribution near-zero M contains a negligible fraction of the longitudinal soliton dataset (about 1.3 %). However, this peak is more pronounced for the remaining excitations, which is not surprising because the power transform was initialized using longitudinal soliton data. These distributions demonstrate the ability of the quality estimator to discriminate between solitonic excitations and other features in the data, reinforcing the importance of the PIE filter for fine-grained classification. We quantify the performance of the quality estimator in terms of the F1 scores plotted in Fig. 6(c), for longitudinal solitons (orange) and all other solitonic excitations (green). The F1 score for longitudinal solitons is maximized with a threshold of just M = 0.02 (stars); however, in practice we minimized false positives and assign a feature to be solitonic when M > 0.2 (circles). This choice gives only small change in the F1 score; however, it gives a marked increase in precision with only a small reduction in recall, as shown in the inset. The performance of the quality estimate on the other solitonic excitations, while far better than random, is subpar; this reemphasizes the importance of the PIE classifier in our framework.

D. Application to other excitation and mislabeled data class
Here we discuss the performance of our soldet framework applied to two classes of data from the dark soliton dataset: other excitations (1 036 images) and mislabeled data (879 images). These classes consist of images with multiple solitonic excitations, such as shown in Fig. 2(c), as well as confusing structures that made human annotation difficult. As such, they are an ideal test dataset since they defeated previous labeling attempts.
As a reminder, after the CNN classification step, the framework first uses the OD to locate all soliton candidates that are then sorted by the PIE classifier. Here, we focus only on features identified as longitudinal solitons. Figure 7(a) plots the frequency of transformed Mexican hat fit outcomes Θ , giving distributions that for both classes are qualitatively the same as those in Fig. 6(a) for the labeled single solitons. By contrast, histograms of the quality estimate for longitudinal solitons detected in these two classes [panel (b)] have important differences. For the other excitations class (N longitudinal = 877, N images = 669), the distribution is nearly uniform, with a potential increase for the higher quality estimates (M > 0.4). For the mislabeled data (N longitudinal = 415, N images = 398), on the other hand, the quality estimate distribution follows a trend consistent with that observed in Fig. 6(b).
To better understand this discrepancy it is important to consider more carefully the differences between the two classes. According to the OD module, nearly 78 % of images in the other excitation class contains two or more excitations. While for excitation spaced apart within the BEC, as in Fig. 7(c)(i), the individual fits to Mexican hat do not affected one another, the contrary holds for excitation captured in close proximity, as shown in Fig. 7(c)(ii). Qualitative differences between these images are quantified by the quality estimate. The quality estimates for the two well separated excitation in image (i) is 0.74 and 0.86. In image (ii), in contrast, even though both excitations are reminiscent of a longitudinal soliton, they are assigned a low quality, with M (ii) = [0.00, 0.01] from left to right. This is likely because the overlap in the adjacent shoulders significantly affects the relative fits. Given that the majority of data in this class contains multiple excitations, the unusually high frequency of the low quality is to be expected.
The mislabeled class, on the other hand, consists of images determined to be potentially mislabeled during the manual annotation (see Ref. [28] for details about the data curation process). These include over 320 images that the annotators found confusing (but in which ODs consistently found exactly one candidate excitation); over 190 images removed during curation from the single excitation class; and about 30 images originally assigned to the no excitation class (but in which the ODs also consistently found exactly one candidate excitation). Unsurprisingly, in almost 83 % of these images the OD module found only one excitation. Two representative images from this set are shown in Figs. 7(c)(iii) and 7(c)(iv), with M (iii) = [0.92, 0.02] and M (iv) = 0.82. The distribution of non-longitudinal soliton quality estimate, shown in the inset in Fig. 7(b), is consistent with that depicted in Fig. 6(b).
The performance on these qualitatively different test sets emphasizes the power of soldet. By combining the CNN and OD modules, soldet autonomously and reliably locates multiple excitations within the BECs, which goes beyond the traditional state-of-the-art deepestdepletion-based approach. The PIE classifier enables further systematic validation that the desired type of excitation (here, longitudinal solitons) has been observed, which previously required visual inspection of each acquired image. Finally, the quality metric provides a quantitative assessment of the excitation quality, further reinforcing the classification reliability. Put together, these tools provide a robust and reliable analysis framework, capable of processing data significantly more complex than possible given the current traditional state-ofthe-art approaches. In this section, we describe our software package soldet that integrates both the ML modules (CNN classifier and OD) with the fitting physics-based modules (PIE classifier and quality estimator), as we described in previous sections. The above discussion showed that the ML modules classify images effectively and can accurately locate one or many candidate solitons. The physics-based modules can sort these candidates into subclasses and provide a quality estimate for longitudinal soliton candidates. Therefore, the ML and physics-based modules contribute to the task of soliton detection in different ways, and the soldet infrastructure leverages their complementing strengths. We emphasize that soliton detection is one of a larger class of feature identification in quantum gases and that soldet was designed to be broadly applicable.
The soldet distribution includes a CNN classifier, OD, PIE classifier, and quality estimator trained and initialized using the soliton dataset [23]. In addition, we provide training scripts to enable the ready application to user-defined data with custom preprocessors, ML models, fitting functions, and even the overall process flow. Figure 8 illustrates a single use of soldet for the specific example of longitudinal soliton detection, where the individual blocks operate as follows: Data processing: Preprocess raw data into a 164 × 132 image format that just encloses the elliptical atom clouds [22]. The preprocessing particulars are not generic and instead are specific to both our task as well as the experimental parameters.
CNN classifier: Apply a trained CNN classifier to processed data and yield labels no excitation, single excitation, or other excitations.
Object detection: Apply trained OD to processed data and yield a list of positions of solitonic excitations.
CNN:0 OR OD:0: If either the CNN classifier or OD finds no soliton, soldet terminates.

PIE classifier:
The PIE classifier is applied to each solitonic excitation.
Quality estimator: The quality estimator is applied to each excitation identified as "longitudinal soliton" by the PIE classifier.
This algorithm is designed to be usable in a laboratory environment where one needs real-time identification, as well as for automated labeling of large datasets, as in Ref. [28].
Our high level framework combines ML methods with physics-based analysis, providing an integrated platform for studying experimental data. Our implementation of this framework, soldet, currently targets the identification, classification, and tracking of features in image data generated by cold-atom experiments. We demonstrated its initialization and performance using a publicly available dark soliton dataset [28]. This investigation focused only on properties of individual images; however, the dataset also includes a label giving time elapsed since the excitation's were created. This opens the door for studies correlating system control parameters and the soldet labels.
While our initialization used only the no excitation and single excitation classes, soldet's feature detection successfully generalizes the learned patterns. This is confirmed by its performance on the other excitations and mislabeled classes that were not part of training, where the CNN classifier gave ambiguous results and human classifiers often disagreed. Going beyond simple classification tasks, soldet allowed us to identify unexpected structure in the data, enabling a fine-grained division of the single excitation class into physically relevant subclasses, including solitonic vortices and partial solitons.
Moreover, for the multiple excitations class, the distribution of the quality metric in Fig. 7 reveals a possible correlation between the quality metric and the excitations relative proximity. These observations illustrate the power of our combined framework as a data analysis tool for discovery.
An interesting application of soldet would be an offline optimization of the experimental setup. Such opti-mization strategy, successfully implemented to, e.g., improve fabrication of quantum dot devices [1], requires an efficient analysis of large volumes of data to find the appropriate correlations in a high-dimensional parameter space. The ML toolbox described in our paper allows one to automatically locate multiple solitonic excitations in the same cloud and produces a fine classification that goes beyond longitudinal solitons. An analysis of the correlations between the various control parameter ranges used in our experiments and the resulting class of data (as determined by soldet) could enable a controlled generation of a desired number, type, and configurations of excitations, with soldet integrated online to provide real-time data analysis and control feedback. Another interesting extension of this work would be to train an OD on a dataset containing a single subclass found by the PIE classifier, e.g., longitudinal solitons, or solitonic vortices.
From the ML perspective, adding modules based on unsupervised [57], active learning [58], and synthetic data generation with generative models [59] may further enhance the performance of the soldet framework.
Going beyond solitonic excitations, the wakefield for sub-and supersonic impurities moving in atomic superfluids have characteristic patterns that could be identified by ML techniques [60][61][62][63]. This might be implemented using a template-based method such as used in the Laser Interferometer Gravitational-Wave Observatory (LIGO) where a large set of numerical simulations provide a library of patterns to correlate with the data [64]. This pattern matching is a form of object detection, and in our context a CNN based object detector could also be trained on such a template set. In this way, our methodology could be employed with a trained OD followed by a LIGO-like algorithm playing the role of our quality estimator and PIE classifier.
In the final analysis, soldet improves the data analysis pipeline for feature identification and classification problems in physically derived image data, but leaves the remainder of the scientific discovery process unchanged. For example, in our studies the PIE classifier module provided a fresh way to process data and enabled us to identify new patterns in the reduced data. The step beyond this is ML-driven discovery, where the identification of previously unknown patterns and physical reasoning are both implemented by ML. An emerging area of ML is the derivation of effective hydrodynamic equations of motion for biological, colloidal, and active fluids based on time-series data [65]. Owing to the complexity of full 3D simulations of nonzero temperature BECs, this datadriven approach could also be applied to create effective kinetic theory of solitons as well as the hydrodynamics of the underlying fluid.

ACKNOWLEDGEMENTS
This work was partially supported by NIST and NSF through the Physics Frontier Center at the JQI. The views and conclusions contained in this paper are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright noted herein.

Appendix A: Parameters of Machine Learning Models
Both machine learning modules are built and trained using the tensorflow (v.2.5.0) keras python api [66]. Figures 9(a) and 9(b) show the visualization of the network architecture for the OD and the CNN classifier, respectively. The model parameters of OD are presented in Table I. The model parameters for the CNN classifier are presented in the Appendix of Ref. [22].
As can be seen in Fig. 9, there are three main differences between the two architectures: (1) the OD outputs 41 local probabilities and positions while the CNN classifier only outputs one of three possible classes; (2) the CNN classifier contains three fully connected layers, which dramatically increase the number of trainable parameters, while OD does not; (3) the OD has asymmetric pool size and strides for vertical and horizontal directions, which are customized to the features in our dataset; the pool size and strides are symmetric for the CNN classifier. As a result, the OD has more than an order of magnitude fewer trainable parameters (7 × 10 4 ) than the CNN classifier (10 6 ).