Validation of model predictions of pore-scale fluid distributions during two-phase flow

Pore-scale two-phase flow modeling is an important technology to study a rock’s relative permeability behavior. To investigate if these models are predictive, the calculated pore-scale fluid distributions which determine the relative permeability need to be validated. In this work, we introduce a methodology to quantitatively compare models to experimental fluid distributions in flow experiments visualized with microcomputed tomography. First, we analyzed five repeated drainage-imbibition experiments on a single sample. In these experiments, the exact fluid distributions were not fully repeatable on a pore-by-pore basis, while the global properties of the fluid distribution were. Then two fractional flow experiments were used to validate a quasistatic pore network model. The model correctly predicted the fluid present in more than 75% of pores and throats in drainage and imbibition. To quantify what this means for the relevant global properties of the fluid distribution, we compare the main flow paths and the connectivity across the different pore sizes in the modeled and experimental fluid distributions. These essential topology characteristics matched well for drainage simulations, but not for imbibition. This suggests that the pore-filling rules in the network model we used need to be improved to make reliable predictions of imbibition. The presented analysis illustrates the potential of our methodology to systematically and robustly test two-phase flow models to aid in model development and calibration.


I. INTRODUCTION
Multiphase flow through permeable rock is a crucial aspect of several important geo-engineering challenges such as petroleum recovery, geological CO 2 sequestration, and environmental remediation of groundwater resources. Large-scale models of these applications require the input of constitutive properties, such as capillary pressure (P c ) and relative permeability (k r ), that describe the rock's behavior during flooding operations [1,2]. While image-based pore-scale models have been used to investigate the P c and k r behavior of reservoir rocks since the seminal work by Bakke and Øren [3], the capacity of these models to capture the appropriate physics and predict experimental data remains contentious [4][5][6]. The models are usually validated by direct comparison to P c and k r measurements at the core scale [7,8], which brings forth different layers of uncertainty: (1) Uncertainty in the experimental input properties to the model: pore space images and spatial wettability distribution [5,9] (2) Uncertainty due to model simplifications, incorporated physics, and numerics [10] (3) Uncertainty due to sampling and scale difference between the model, typically a few mm 3 , and the experiment, which may be performed on a rock volume of several cm 3 [11] * t.bultreys@imperial.ac.uk (4) Uncertainty due to random and systematic variations in the core-scale experiments [12].
In many image-based modeling studies, there is still commonly a philosophy of attempting purely a priori P c and k r predictions, with a strong focus on reducing uncertainties of the second type [13]. While this is an import effort, ignoring the other uncertainties makes a thorough validation of these models extremely difficult. In image-based pore network modeling, it is generally accepted that a model has to be calibrated against experimental measurements of continuum properties to reduce the first two types of uncertainty [4,5,14,15]. The idea is that calibration to cheap experimental data (e.g., absolute permeability, mercury intrusion capillary pressure, and even oil-brine drainage or imbibition capillary pressure) allows models to extrapolate to either other samples or more complex situations (e.g., waterflooding relative permeability when the sample is in a mixed-wet state). However, current models do not always succeed at this, and it is not always clear when and why this is the case [16]. Direct simulation approaches face similar challenges, for example, when trying to constrain the wettability distribution in the model [17].
Summarizing these issues from an uncertainty point of view, it seems that both a priori and continuum-calibrated two-phase flow model predictions are currently at best unproven. The main reason for this is that the information in the P c and k r curves which are often used as the only validation is extremely densely encoded: many aspects of the porous medium and the fluid behavior are lumped into too few parameters to completely characterize the behavior. P c and k r curves are traditionally described as functions of only water saturation, rather than taking all true state variables into account [18,19].
A closely related issue with continuum-calibrated models is that they are very likely overfitted: there are too many parameters to tune for the information we are tuning to [4]. This makes it hard to assess whether model failure is due to inappropriately trained model parameters or due to a fundamentally inadequate description of the pore-scale flow physics.
To unravel this, there is a need for validation criteria based on experimental data which probes the underlying properties that influence relative permeability. Due to recent innovations in microcomputed tomography (micro-CT), it is now possible to image a rock's pore-scale fluid distribution during flow experiments; see, e.g., reviews in Refs. [20][21][22]. This provides a much richer data set with which to validate and calibrate models, and it addresses the third type of uncertainty as the experiments are at the same scale as the models. While qualitative comparisons between experimental fluid distributions and model predictions have been shown [23][24][25], direct quantitative pore-by-pore comparison of micro-CT-based fluid distributions to model predictions are rarer [6,26,27]. Systematic studies of whether an image-based model of a rock captures the appropriate physics or merely suffers from uncertainties on the experimental input properties are lacking, due to the complexity of extracting comprehensive information from such a comparison. Valuable progress has been made to provide pore-scale validation with micromodels [23,[28][29][30], yet these simple two-dimensional pore structures do not capture the three-dimensional structural complexity of natural rocks.
We propose that a rigorous validation of model performance can be done, if the issue is addressed by investigating three questions in specific order (of decreasing fundamentality): (1) Does the model predict the correct pore-filling states (i.e., whether the center of a pore or throat is filled with oil or brine)? This relies on the accuracy of the estimates of the pore and throat radii, on the assigned contact angles in each pore and throat, and on the physics implemented in the invasion algorithm, but also on the reproducibility of the experimental filling states.
(2) Does the model predict the correct saturation for a certain fluid distribution? In a pore network model, for example, this depends on the volume partitioning between pores and throats, as well as on the description of wetting layers.
(3) Does the model predict the correct flow rate for a certain fluid distribution? If the pore-filling states are correctly predicted, this depends chiefly on calculation of the pore, throat, and layer conductivity in the model.
In this work, we focus on the first question. First, we assess the uncertainty on the fluid distribution in flooding experiments by performing an analysis on an unsteady-state flow experiment where imbibition was performed several times on the same Bentheimer sample. This experiment, which was previously presented by Andrew et al. [31,32], is briefly outlined in Sec. II A, and results are shown in Sec. III A. Then, we present a combined modeling and experimental methodology to analyze the pore-by-pore fluid distributions in a two-phase flow model (in this case a quasistatic pore network model [7]) from micro-CT-based steady-state flow experiments (Sec. II B). Using this analysis, we assess the importance of errors in the filling state of pores and throats to the flow during drainage (Sec. III B) and imbibition (Sec. III C), while seeking to split off the influence of volume and conduc-tivity assignment. Section IV offers conclusions and an outlook on future work.

A. Repeatability of pore-filling states in experiments
To investigate the repeatability of fluid distributions measured in micro-CT core-flooding experiments, we perform an analysis of an unsteady-state brine-CO 2 flooding experiment presented by Andrew et al. [31,32]. This particular micro-CT data set was selected because drainage and imbibition were run five times on each sample, allowing the repeatability of the pore-scale fluid distribution to be tested. In the following, a brief description of the experiment is provided (Sec. II A 1). The detailed work flow can be found in Andrew et al. [32]. The analysis method is outlined in Sec. II A 2.

Experimental procedure
A cylindrical Bentheimer sample with diameter 6.5 mm and length 20 mm was placed into a carbon-fiber core holder, specially designed to withstand high pressures and be nearly transparent to x rays. The porosity and permeability were 0.20 and 1.88 × 10 −12 m 2 . The sample was first fully saturated with brine that was equilibrated with CO 2 and doped with potassium iodide (KI) (7 wt%) at reservoir temperature and pressure. KI was used for its high x-ray attenuation coefficient, which provides contrast with the supercritical CO 2 that was used as the nonwetting phase. Then, drainage was performed by injecting 10 pore volumes of supercritical CO 2 into the core at a capillary number around 10 −6 . This was followed by an imbibition step with 10 pore volumes of equilibrated KI brine at the same low flow rate. Finally, the sample was imaged in the postimbibition state using a Zeiss Versa XRM-500 x-ray microscope (Zeiss X-ray Microscopy, Pleasanton, CA, USA). The resulting images had a voxel size of 6.16 μm and an image size of 6.5 × 6.5 × 22 mm. The sample was then flooded with unequilibriated brine to dissolve all the CO 2 , and the flooding cycles was repeated a total of five times.

Analysis of the experimental data
The acquired postimbibition images were spatially registered to each other so that the same voxel in each of the images corresponds to the same physical location in the sample. To this end, the mean square gray value differences between the different images were minimized. Due to experimental constraints at the time these experiments were performed, the resolution and the quality of the images are lower than in more recent experiments (e.g., the ones presented in Sec. II B). To analyze the images, they have to be segmented into brine-filled, CO 2 -filled, and solid voxels. Brine-filled and CO 2 -filled voxels together make up the pore space, i.e., the complement of the solid voxels (which remain the same in all five experiments). To compensate for the low contrast between brine-filled and solid voxels, the five images were averaged and noise filtered by a bilateral filter [33], after which the solid voxels were segmented using the Otsu algorithm [34]. Then the widest regions in the void space (usually referred to as "pore bodies" or simply "pores") were identified from the maximal ball method described by Dong and Blunt [35], using the generalized network extraction algorithm of Raeini et al. [36].
After identifying the pore space and solid voxels, the five imbibition images were treated separately. For each image, the pore space voxels were segmented into brine and CO 2 using Otsu's algorithm, which was much easier than the previous segmentation step due to the large contrast between the CO 2 and the other phases in the image. Then the inscribed spheres in the pores were overlayed on the segmented images, and we determined for each sphere which phase occupied the majority of its voxels. In other words, we determined if each pore's central region was filled with brine or CO 2 in every one of the five experiments. This will be referred to as the filling state in the remainder of this work. The analysis here was found to have a low sensitivity to the segmentation, because (a) we are not attempting an exact determination of pore sizes but are rather interested in approximate sizes, especially relative to each other, and (b) the majority criterion for pore filling in each inscribed sphere effectively acts as a median filter on the segmented image, thereby removing noise.

B. Validation of drainage and imbibition simulations with micro-CT experiments
To thoroughly evaluate the accuracy of imbibition simulations, experiments which can provide high-quality images of the fluid distributions during the imbibition process are needed. To this end, we performed steady-state co-injection imbibition experiments on two Bentheimer sandstone miniplugs (6 mm diameter, 5 cm long) in a micro-CT flow cell. These steadystate experiments will be referred to as experiments SSE A and SSE B . The sample used in experiment SSE A will be referred to as sample A, while the sample used in experiment SSE B will be called sample B. Note that these two samples are not twin samples but are cored from different blocks: sample B is tighter and has a narrower pore and throat size distribution than sample A (these can be found in Sec. III B). The permeabilities measured by monitoring the pressure drop over these samples during (single-phase) brine flow were 1.91 × 10 −12 m 2 (sample A) and 1.47 × 10 −12 m 2 (sample B).

Steady-state imbibition experiments
A similar approach and experimental apparatus as described in Gao et al. [37] was followed. We briefly reiterate the experimental work flow and highlight differences with the experiments described there: (1) The sample was placed into the flow cell and micro-CT scanned a first time while it was dry.
(2) Next, the sample was saturated with 3.5 wt% KI-brine. A micro-CT image was then made to check that the sample was fully saturated with brine.
(3) Oil (n-decane or decahydronaphthalene) was pumped through the sample at a flow rate of 1 ml/min in experiment SSE A and 5 ml/min in experiment SSE B to establish an initial oil saturation of approximately 65% and 95%, respectively (measured on the micro-CT scan). This provides us with the opportunity to test the influence of the initial wetting saturation used in the model. We used n-decane in experiment SSE A and decahydronaphthalene in experiment SSE B , because decahydronaphthalene (also called decalin) is a commonly used mineral oil in core-scale experiments [38]. The flow rate was then set to 0.02 ml/min. After the pressure drop over the sample stabilized (as measured by a differential pressure transducer), a postdrainage micro-CT scan was acquired without turning off the flow.
(4) In both experiments, the fractional flow of brine (the ratio of the brine to the total oil plus brine flow rate) was then successively increased to 0.05, 0.2, 0.5, 0.7, and 1.0, while maintaining the same total oil and brine flow rate of 2 ml/min. This corresponds to a capillary number of approximately 3 × 10 −7 in both experiments [37]. After each fractional flow increment, we waited until the pressure drop over the sample stabilized and acquired a micro-CT image, while maintaining the oil and brine flow.
The micro-CT imaging was performed using a Zeiss Versa 510 micro-CT scanner with a flat-panel detector. We acquired 3201 projections with an integration time of 0.8 s and an accelerating voltage of 75 kV for the x-ray source. The total scanning time was approximately 80 min, and the voxel size was 3.56 μm in all scans. For each image, three separate scans at different heights of the sample were performed and then stitched together with Avizo 9.2 software (ThermoFisher Scientific), resulting in images of approximately 6.2 × 6.2 × 12 mm (approximately 1800 × 1800 × 3400 voxels). For both experiments, all images were spatially registered to the dry scan acquired at the start of the experiment using Avizo.

Pore network extraction and modeling
For both experiments, the scan of the dry sample was denoised with a nonlocal means filter and segmented by applying the watershed segmentation algorithm on the gradient image in Avizo 9.2. A sensitivity study on the the segmentation based on the workflow described in Leu et al. [39] can be found in the Appendix. From the segmentation, which represents the pore space in the Bentheimer samples, a network of pores and throats was extracted using the generalized network extraction code of Raeini et al. [36]. This algorithm first calculates the distance map for the pore space: this is the closest distance from each pore space voxel to the solid surface. Local maxima in the distance map define pores in the pore network extraction; this is equivalent to the pore-finding algorithm employed previously in this paper. Every voxel is assigned to a pore, such that the distance map increases in the direction of the pore center. The algorithm also finds throat surfaces by looking for restrictions in the void space that bound pores: here the distance map increases on either side of the throat surface. More details on the throat-finding procedure can be found in Raeini et al. [36]. For throats, the shape factor is defined as where R is the inscribed radius and A is the throat's crosssectional area. For pores, the shape factor is calculated as the average of the neighboring throat shape factors, weighted by the cross-sectional area they share with the pore. The pore network simulations are performed on the extracted network structures using the approach outlined in Valvatne and Blunt [7]. This model simulates drainage and imbibition in network elements with circular, triangular, and 053104-3 square cross sections, allowing for the presence of corner wetting layers and snap-off. The simulations started with a drainage cycle until a prescribed oil saturation was reached. This saturation was chosen to provide the optimal match with the postdrainage experimental fluid distribution obtained from micro-CT (see Sec. II B 3). During drainage, the receding contact angles were set to 0 • in all pores and throats. This is motivated by the advancing-receding contact angle model by Morrow [40], which predicts receding contact angles below 10 • for all intrinsic contact angles below 60 • . Receding contact angles in this water-wet sample are thus unlikely to deviate from 0 • that significantly to affect the fluid distributions during drainage.
After reaching the initial oil saturation for imbibition at the end of the drainage cycle, several imbibition simulations were performed with different contact angle distributions. Unlike the receding contact angles in drainage simulations, the assignment of advancing contact angles in imbibition is generally considered to be one of the main sources of uncertainty in pore-scale modeling [14]. In most previous work, advancing contact angles were randomly distributed within certain ranges that are assumed to be plausible, and simulations were then mainly used for sensitivity assessment [41][42][43]. A approach which carries the potential of being more predictive is to measure contact angles on an image of the rock with the wetting and nonwetting fluids present [44][45][46][47]. Here we assign the advancing contact angles in the pore network model randomly in the range between 15 • and 45 • , because the image-based measurements by Khishvand et al. [48] suggest this is a plausible range for a Bentheimer-brine-decane system. To test the model sensitivity to the contact angle assignment, we also consider a second series of simulations with an advancing contact angle range of 15 • to 25 • .

Comparison of model-predicted to experimental fluid distributions
When comparing model predictions of the fluid distribution in the two Bentheimer samples to the steady-state experiments, the extracted pore network structures can be used as an image analysis tool as well as a computational grid for the network model. During pore network extraction from the dry scans, the inscribed spheres in each pore and throat identified by the pore network extraction are calculated. In pores, these are the largest spheres, centered on each pore center, which lie completely in the pore space. In throats, these are the largest spheres that can fit in the void space centered on the throat surface.
The inscribed spheres represent the local maxima (dilations) and minima (constrictions) in the distance map of the pore space: these are the locations where a nonwetting phase resides when it has just entered a pore or throat during drainage, or the last location within a pore or throat where it can reside during imbibition [49]. Therefore, the fluid which fills the center of a pore or throat in the pore network model can be directly compared to the fluid which resides in the corresponding inscribed sphere in the micro-CT image. It should be noted that the identification of these inscribed spheres is a rather robust process: most of the ambiguity related to the network modeling enters the process later, when volumes and conductivities have to be assigned to the network elements.
To measure which fluid resides in each inscribed sphere, the images containing the inscribed spheres-each labeled with their corresponding pore number or throat number (as extracted from the dry scan)-are overlayed on each image acquired during the flow experiments. Then, the gray-scale values of the voxels in the fractional flow image that fall within each sphere are investigated. For each sphere i we compute the average gray-scale valueḡ i and the 10th-percentile gray-scale value g 10 i (i.e., 10% of the gray-scale values in the sphere are lower than this value). This allows us to automatically segment oil-filled spheres from brine-filled spheres by K-means clustering the (ḡ i ,g 10 i ) couples (see Fig. 1) [50]. This algorithm classifies each of these couples into clusters so that each couple belongs to the cluster with the nearest mean. Considering both (ḡ i ,g 10 i ) values together allows a more accurate segmentation of spheres with intermediate average gray-scale values, because oil-filled spheres are expected to have a "core" of dark gray-scale values, which can be seen from the g 10 i value. Note that the 053104-4 FIG. 2. A slice through the Bentheimer sample from the experiment presented by Andrew et al. [31,32], color-coded with the frequency of voxels being filled with CO 2 or brine in five repeated experiments. Red voxels are filled with CO 2 in every experiment, blue voxels are always filled with brine. Some pores are filled with a different fluid in the repeated experiments and are assigned an intermediate color accordingly. Green represents the largest uncertainty. segmentation of oil-and brine-filled spheres does not rely on a user-defined threshold but happens fully automatically. An example of the segmentation result can be seen in Fig. 1.

A. Repeatability of pore-filling states in experiments
When comparing image-based simulations to pore-scale imbibition experiments on a pore-by-pore basis, it is important to understand the repeatability of the experiments. Several experiments on Bentheimer have been presented in the literature [37,[51][52][53], typically finding a residual nonwetting phase saturation between 0.3 and 0.4 and similar power law-shaped size distributions of the trapped nonwetting phase clusters. This agreement in the literature suggests that the average properties of the fluid distribution after imbibition are reproducible.
However, when comparing a model prediction to an experiment on the same sample, the obvious test is to compare the filling state (i.e., the pore center being water-filled or brinefilled) of individual pores. We investigate the repeatability of the filling state of individual pores explained in Sec. II A. The variation of the fluid distribution over the five experiments is visualized in Fig. 2. Figure 3(a) shows that pores with small radii were filled with brine after all five imbibition experiments performed on the same sample. Similarly, the pores with the largest radii were filled with supercritical CO 2 in all the experiments. This is expected, due to the capillary forces in this strongly water-wet sample. However, the filling state of intermediately sized pores was much less repeatable. Figure 3(b) shows the probability of finding a pore with a certain radius to be brine-filled in the 5 experiments. Pores with intermediate sizes, around 40 μm, have the highest variability. Figure 3 also shows the standard deviation of the filling state in the five experiments, averaged over all pores within the same pore size bin. The averaged standard deviation peaks at 25% for pores between 42 and 44 μm. This standard deviation can be used to calculate the 2σ confidence interval for the probability of a pore to be brine-filled: in a large number of repeated experiments of this kind, we can reasonably (i.e., within a 95% confidence interval) expect pores with radii below 30 μm and above 60 μm to be filled with the same fluid in more than 80% of the cases. However, the filling state of intermediate pore sizes is clearly very uncertain, as the confidence interval spans almost the entire range of possible values for pore radii around 40 μm.
The observed variability comes from several sources. In this experiment, the repeated flooding sequences were all performed on the same sample, and so there is no variability due to using different cores of the same rock type. However, two-phase flow is very sensitive to perturbations in the boundary conditions [28,54]. Also, small differences in the initial saturations for imbibition at the end of drainage may have led to discrepancies in the observed fluid distributions. A further possible source of variability is the effect of chemical and structural pore space alteration over time. Depending on the flow conditions, on the salinity, acidity, and temperature of the brine and on the minerals present in the rock, fines migration and mineral dissolution or precipitation can take place [55][56][57].
In the experiments investigated in this work, no such alterations were observed on the micro-CT images; however, small, subresolution variations in pore space geometry or surface properties cannot be completely excluded. We hypothesize that the combined effect of these sources of variability is the largest in intermediate-sized pores because these are less strongly predisposed to be filled with a certain fluid due to capillary forces, compared to very large or very small pores.
Despite the variability in the fluid distribution on a pore-bypore basis, averaged properties such as residual gas saturation (0.320 ± 0.010), trapped gas cluster sizes (fitted Fisher exponent 2.106 ± 0.011), and nonwetting phase Euler numbers did not show strong variability in the experiments investigated here [32], nor in other experiments in the literature [58][59][60]. This suggests that the different pore-scale fluid distributions in the experiment are statistically similar and have globally similar connectivity properties. The consequence for validation of pore-scale models is that the analysis should go beyond comparing the filling state of individual pores and throats, in which discrepancies are unavoidable because of the variability in the experiments, but should also include an investigation of appropriate connectivity statistics.

B. Validation of drainage simulations
To validate the pore network model predictions of the fluid distributions after drainage, we compare these to the postdrainage micro-CT scans from experiments SSE A and SSE B . However, to predict the drainage end state with these models, the user typically has to prescribe a final capillary pressure or a final saturation for drainage. Otherwise, the model will continue draining down to a brine saturation of nearly zero at an infinite capillary pressure (and, since it is quasistatic, this represents the situation after an infinite amount of physical time). This is the case because brine trapping is assumed to be negligible due to the presence of connected wetting layers in the corners of nearly all pores and throats, and the absence of a time-scale in quasistatic models. In steady-state 053104-5 experiments, the capillary pressure is generally unknown due to the difficulty of measuring this parameter. If, instead, the brine saturation measured in the micro-CT experiment was prescribed, the analysis would be complicated because this would introduce discrepancies coming from the pore and throat volume assignment and layer thickness calculations, which we explicitly seek to avoid here (see Sec. I). Therefore, rather than draining the models to a prescribed capillary pressure or saturation, we rather check whether they reproduce the fluid distribution in the experiments at any point during the drainage simulation. In the postdrainage micro-CT scan, experiments A and B have a brine saturation of approximately 45% and 5%, respectively, due to different flow rates used for the oil flooding. The experiments therefore allow us to test how the simulations behave for different drainage states and, during imbibition, for different initial water saturations.

Filling discrepancy analysis
We define the filling discrepancy as the percentage of all pores and throats for which the model predicted the wrong filling state. The minimal total filling discrepancy during the drainage simulation on sample A was 24%. Figure 4 shows the pore and throat radius distribution of the pore network structure for this sample, along with the filling discrepancy as a function of the radius. The filling discrepancy is largest for intermediate-sized pores and throats, and rather insignificant for the largest and smallest ones. This can likely be explained by similar arguments as seen for the repeatability of imbibition experiments presented in Sec. III A. Small errors in imaging, segmentation, network extraction, and receding contact angle assignment can be seen as perturbations which cause the model to invade different-but similarly sized-throats than in the experiment. Furthermore, it is expected that the experiment itself suffers from similar uncertainty characteristics as the imbibition experiments in Sec. III A. Figure 4 confirms that the radii distributions of the oil-filled network elements in the model match the experiment fairly well. In steady-state flow, the capillary pressure needed to perform drainage is achieved by injecting oil at relatively high flow rates, which may cause deviations from the purely capillary-dominated situation in the model and hence which may explain some of the discrepancies, particularly for the smaller throats. We are planning future work to perform unsteady-state drainage experiments, which do not suffer from this problem, to supplement the steady-state experiments presented here.
The filling discrepancy after drainage in experiment SSE B is shown in Fig. 5. As the water saturation in this case was much lower than in experiment SSE A , the total filling discrepancy, 14%, is significantly smaller. Around 80% of all pores and throats are oil-filled in both model and experiment.

Flow path analysis
We noted in Sec. III A that a nonzero filling discrepancy is not only unavoidable, but, more importantly, does not necessarily translate into significant errors in the resulting up-scaled properties. What matters is that the filling states and connectivity as a function of the pore and throat size are well captured in a statistical sense when averaged over a volume approaching the representative elementary volume (REV). Therefore, it is necessary to also study how oil-filled pores and throats of different sizes are connected in the experiment and in the model. We can use a graph structure to describe the experimentally measured fluid distribution as well as the model results. Each inscribed sphere for which we measured the filling state on the micro-CT image represents a node (for pores) or a link (for throats) in a graph which is directly comparable to the pore network structure on which we model the fluid distributions.
A first descriptor we define for this purpose is the widest percolating oil path (WPOP): the path through the oil with the largest possible minimum throat size, without loops. This is the path to follow if one imagines pushing the largest possible sphere from inlet to outlet of the rock, through the pores and throats that are deemed oil-filled either by the model or by the experiment. Conceptually, this bears resemblance to invasion percolation approaches [61], but applied to characterize the pore sizes occupied by a single fluid rather than to simulate fluid invasion. Figure 6 shows the collection of WPOPs calculated from each input and output pore. As expected from the filling discrepancy analysis, these paths are not identical in the model and the experiment. If we look at the distribution of throat sizes in both paths, however, we find that they are similar, indicating that these paths are essentially statistically identical.
To investigate this further, we analyzed the throat size distributions in progressively smaller paths. We first calculated 053104-7 FIG. 7. Analysis of (a) minimum and (b) average and standard deviation of the throat sizes in a persistence analysis of the WPOP in experiment SSE A (removing the widest path and calculating the second widest, and so on) shows that the oil phase in the experiment is more disconnected than in the model. (c) This is confirmed by the connectivity function in terms of the normalized Euler number of the oil network, which decreases faster and to a lower minimum for the experiment.
the WPOP between the largest oil-filled input and output pores (i.e., the main path out of the collection depicted in Fig. 6), calculated its statistics, and then removed the throats in this path from the oil phase to calculate the second largest path, and so on. Figure 7 shows that the largest paths are quite similar in the model and the experiments, yet as we look at smaller and smaller paths the oil phase in the experiment turns out to be less well connected. This is confirmed by the Euler number of the oil network, which is in both cases strongly negative, but is 25% lower (more negative) in the model, and thus indicates a higher degree of connectivity than in the experiment.
A WPOP analysis of experiment SSE B shows the same behavior as for experiment SSE A and is not shown here for brevity.

Euler number analysis
The Euler number is a topological invariant which can be used to characterize the topology of complex shapes and has therefore been used extensively to characterize the nonwetting phase topology in two-phase flow studies [19,21,59]. The Euler number χ of a general nonplanar graph can be defined as χ = v − e with v the number of vertices (in our case, oil-filled pores) and e the number of edges (oil-filled throats) [62]. χ is strongly negative if the oil is very well connected, and positive if it is broken up into many separate clusters. To investigate the connectivity of the oil over different pore and throat sizes, Vogel et al. [63] defined the connectivity function as the Euler number of the oil-filled network as a function of the smallest pore and throat radius considered. We calculate the normalized connectivity function χ (R) as with v O (r > R) and e O (r > R) the number of oil-filled pores and throats with radius larger than R, respectively, and v O and e O the total number of oil-filled pores and throats. The connectivity function for experiment SSE A and the associated simulation after drainage is shown in Fig. 7. For both cases, the connectivity function decreases as the subnetwork becomes more disconnected, because it is normalized by the Euler number of the full oil-filled network, which is negative. For small cutoff radii, more throats than pores are deleted, so the connectivity function decreases. When there are an equal amount of throats and pores left in the oil graph, the connectivity function is zero. Then removing any single throat breaks up the oil graph and increases the amount of disconnected clusters in it. The part of the curve around zero therefore represents the "backbone" of the oil cluster: the pore and throat sizes that are essential for the oil flow. Finally, The connectivity function of the oil distribution in the experiment and the model (Fig. 7) match well, especially around the zero crossing. The observed minor mismatch indicates that the modeled oil distribution is slightly better connected and less heterogeneously distributed than the experimental one. These observations are consistent with the previously discussed oil path analysis.
The analysis of the connectivity function of experiment SSE B leads to the same conclusions as for experiment SSE A (yet with a lower filling discrepancy) and is not shown here for brevity.
From our analysis, we conclude that the model captures the main oil flow paths well but mispredicts some of the less important aspects of the oil cluster. This is in accordance with previous analysis by Berg et al. [6], who found a good match between the drainage relative permeability calculated from fluid distributions using quasistatic simulation and those derived from direct flow calculations on images. Some of the discrepancies we see can be ascribed to boundary effects: the image does not contain the full rock sample used in the experiment, and the wettability of the sleeve around the sample is not explicitly taken into account. Another cause of potential errors is the relatively high flow rate used to establish the oil saturation in the experiment.

C. Validation of imbibition simulations
To validate the model for imbibition, contact angles have to be assigned to each pore and throat. Clean quarried sandstones, such as the Bentheimer samples used here, are generally water-wet. Khishvand et al. [48] measured contact angles on micro-CT images of brine and n-decane in a Bentheimer sandstone after imbibition and found contact angles distributed between approximately 15 • and 45 • . Therefore, we test the situation where each pore and throat is randomly assigned an advancing contact angle in this range. The initial oil saturation for the imbibition simulations was set to 30% and 7.5% for experiments SSE A and SSE B , respectively, as these saturations provided the lowest filling discrepancy after drainage (Sec. III B). The residual oil saturations calculated by the simulations were 0.39 (sample A) and 0.34 (sample B), which agrees with the literature [51,53]. The oil saturations measured in the micro-CT scans at the end of the imbibition experiments were 0.41 (experiment SSE A ) and 0.40 (experiment SSE B ). In both experiments, the oil phase still contained a percolating path through the imaged part of the sample, and this part of the sample therefore does not seem to be at residual oil saturation. This could be due to end effects outside of the limited field of view of the scans, or the experiment was not run for a sufficiently long time to reach a true end state.
Like in the drainage analysis, we select the imbibition simulation point at which the model has the lowest possible filling discrepancy to check whether the simulation at any point reproduces the experimental fluid distribution. At these points, the model mispredicts, respectively, 18% and 21% of all pores and throats in experiments SSE A and SSE B . Figures 8 and 9 show the fraction of the pores and throats filled with the invading brine as a function of their radii. In both cases, the throats matched rather well, yet at the expense of overestimating the amount of brine-filled pores. Filling discrepancies shifted to larger radii compared to the situation after drainage.
To get an indication of how the simulation discrepancy relates to experimental uncertainty, we infer the 2σ -confidence interval from the repeat experiments presented in Sec. III A. The standard deviation histogram from Fig. 3 was scaled to the pore radius range in samples A and B, and the 2σ confidence interval around the experimental pore-filling probability was then plotted in Fig. 10. The simulated brine-filling probabilities fall on the edge of the confidence interval, suggesting they are about as different from the experiment as a repeat experiment could possibly be from the first. The deviation between models and experiment is clearly systematic, with higher brine-filled probabilities in the models.
The filling discrepancy in experiment SSE A was lower after imbibition than after drainage; yet the connectivity metrics indicate a worse match for the former. Figure 11 shows the cluster size distribution of the oil phases and the connectivity function for the experiment and the model. Table I contains the number and properties of the oil clusters, as well as the Euler number. All these indicators show that the model with contact angles between 15 • and 45 • overestimates the connectivity of the oil phase after imbibition. The match is clearly much poorer after imbibition compared to drainage, including around the zero crossing. Experiment SSE B shows similar behavior ( Fig. 12 and Table II): in the model there is still one very large cluster which connects 18 % of the oil-filled pores. The cluster size distribution therefore contains fewer small clusters than the experiment. The connectivity function for experiment SSE B confirms the oil phase connectivity overestimation.
In both experiments, there are many more brine-filled throats adjacent to two oil-filled pores than in the model, thereby reducing the connectivity (Tables I and II). These are potentially locations where snap-off happened [64], and therefore it seems that the model fills too many pores with brine while not allowing sufficient snap-off in the throats. This is supported by the brine filling histogram in Fig. 8. The discrepancies in the connectivity can thus be counteracted by setting the advancing contact angles in the model closer to zero, as this promotes snap-off compared to cooperative pore filling [65]. Simulations with contact angles between 15 • and  (Tables I and II).
Despite the better match, advancing contact angles smaller than the range 15 • to 45 • are not supported by in situ measured contact angles in Bentheimer [25,48], nor are they expected from theoretical considerations based on measurements on rough surfaces [40]. This is likely caused by the overly simplistic parametric cooperative pore-filling algorithm in this model [7]. More advanced methods to calculate cooperative pore-filling pressures have recently been introduced by Ruspini et al. [66] and Raeini et al. [67]. The disconnection events could also be the signature of ganglion dynamics [68,69]. Further research will point out if the observed mismatch can be solved by technical adaptations to the quasistatic network modeling framework, or that it requires the incorporation of more advanced pore-scale physics (in the form of nonlocal viscous and inertial effects).

IV. CONCLUSIONS AND OUTLOOK
In this work, we show how to use experimental fluid distributions to validate pore-scale two-phase flow models. The presented work flow allows us to probe if simulation discrepancies are caused by the fluid distribution prediction, the most fundamental and difficult aspect of multiphase flow simulations, or rather by subsequent volume and conductivity calculations. While this study investigates how to validate the predicted fluid distributions, it does not look into the calculation of the associated fluid saturations and flow rates, which translate the fluid distribution into relative permeability. This will be studied in further work.
The analysis of repeated unsteady-state drainage and imbibition experiments performed by Andrew et al. [31,32] shows that while the global characteristics of the fluid distribution in pore-scale flow experiments are repeatable, the exact microscopic arrangement is not. Intermediate-sized pores have a variable fluid-filling state, due to small perturbations in the experimental circumstances and in the properties of the porous medium itself. This means global connectivity statistics should be included in the validation, rather than only the filling states of individual pores and throats.
Building on this insight, we present steady-state flow experiments on Bentheimer sandstone cores and compare the fluid distributions in these experiments to pore network simulations on the same sample both on a pore-by-pore and on a global basis. During drainage, we find a mismatch between predicted and the experimental filling state (i.e.. the fluid occupying the center) of up to 24% of the network elements. The filling state of the smallest and largest pores and throats in the network structure are predicted accurately, while the model assigns oil to different (but similarly sized) pores and throats with intermediate radii. This is in general agreement with the repeatability study, which predicts large experimental uncertainties for these pores. Despite the filling discrepancy, the simulations succeed at reproducing the main oil flow paths in a statistical sense. This confirms studies that show quasistatic simulations successfully predict drainage relative permeability [6]. It also suggests that matching the exact microscopic fluid distribution may not be a necessary condition to provide a good estimate of up-scaled properties (in a statistical sense), as long as the model places the fluids in regions with the right average pore sizes and the topology of these regions is predicted well. During imbibition, we find a similar mismatch on a poreby-pore basis as during drainage, but the connectivity statistics of the oil phase do not correspond to the experiment. If we reduce the advancing contact angles in the model to values smaller than the experimentally measured range of 15 • -45 • [48], the model prediction is improved due to an increase of snap-off events. The underestimation of snapoff in simulations with this contact angle range could be explained by the overly simplified parametric cooperative pore-filling algorithm. Another potential cause are viscous effects that are not taken into account in quasistatic models. The underlying fundamental question is whether the quasistatic displacement processes simply need to be implemented with a higher accuracy or if the quasistatic assumption is flawed altogether even at low capillary numbers. The latter hypothesis could be related to dynamic effects during pore-scale filling events [23]. Further work on validating models with enhanced cooperative pore-filling algorithms will shed more light on this. In this context it is also necessary to treat a wider range of samples than in this study to allow making general statements. Furthermore, the imbibition validation illustrates the need for direct measurement of advancing contact angles (including potential wetting heterogeneity) to decrease the simulation uncertainty.

ACKNOWLEDGMENTS
Steffen Berg, Øve Wilson, and Apostolos Georgiadis (Shell) are thanked for valuable comments and discussions on this work. Shell is acknowledged for financial support and for permission to publish this paper.

APPENDIX: SEGMENTATION SENSITIVITY STUDY FOR EXPERIMENTS SSE A AND SSE B
We performed a sensitivity study on the parameters of the watershed segmentation algoritm from Avizo 9.2 (Ther-moFisher Scientific), which was used to segment the dry scans of experiments SSE A and SSE B . Due to computational considerations, we performed the sensitivity study only on a subset of experiment SSE B , of 500 × 500 × 500 voxels in size. We varied the seed parameters and then extracted a pore network model from the obtained segmentation in the same way as described in Sec. II B. The obtained porosity and permeability values were then obtained from network model simulations. Figure 13 shows the gray value histogram for the investigated image. The seeds were varied by steps of 5% up and down of the central "best" value, which was selected by visual inspection. Note that this can be considered a large variation compared to the variations normally made when optimizing the values by visual inspection. The selected upper and lower seed gray value thresholds can be found in Table III. The results for the porosity and permeability variation are depicted in Fig. 14. The conclusion is that the segmentation is not strongly sensitive within small variations of 1%-2% around the center point, though one has to be careful not to stray too far from this, especially for the permeability.  Table III. Permeability values were obtained using PNM simulations.