Stochastic sensing of polynucleotides using patterned nanopores

The effect of the microscopic structure of a pore on polymer translocation is studied using Langevin dynamics simulation, and the consequence of introducing patterned stickiness inside the pore is investigated. It is found that the translocation process is extremely sensitive to the detailed structure of such patterns with faster than exponential dependence of translocation times on the stickiness of the pore. The stochastic nature of the translocation process leads to discernable differences between how polymers with different sequences go through specifically patterned pores. This notion is utilized to propose a stochastic sensing protocol for polynucleotides, and it is demonstrated that the method, which would be significantly faster than the existing methods, could be made arbitrarily robust.


I. INTRODUCTION
The quest for efficient sequencing of single stranded DNA using synthetic nanopores has recently led to the development of a plethora of novel theoretical and experimental design ideas that use a variety of different approaches [1][2][3][4][5][6][7].Experiments have demonstrated that the current blockade readout from single stranded DNA and RNA molecules that are electrophoretically driven through biological and synthetic nanopores could in principle reflect a signature of the underlying sequence [8][9][10][11][12][13][14].It is now possible to design solid-state nanopores [15,16] with tailored surface properties that could regulate DNA-pore surface interaction [17][18][19] and also reduce noise [20,21].A number of recent experiments have been successful in discriminating between polynucleotides [22] and identifying single nucleotides [23][24][25].However, more remains to be done to resolve issues involving stability, sensitivity, and resolution, before they can be integrated into fast and efficient devices for sequencing purposes [26,27].
Theoretical studies of polymer translocation through nanopores  have revealed that the process is intrinsically stochastic and features a rather wide distribution for the translocation time.The inherent noise acts as an overwhelming source of error for the sequence detection strategies that are based on deterministic patterns in the translocation time readout, unless the process is sufficiently slowed down such that time-averaging eliminates the noise [8,22].In other words, achieving robustness in sequencing using deterministic strategies has intrinsic limitations, and might require significant compromise in translocation speed [3].
Here, we propose a strategy to control the transloca-tion time and its statistics by using pores that have patterned surface energetics.We then address the question of whether it is possible to engineer distinct stochastic features for the translocation of heteropolymers with any given sequence through different pores, such that the statistical readout from combined translocation events of a particular sample through a collection of different pores could quickly and accurately reveal its sequence by synergistic exclusion of unlikely sequences.We start by studying the translocation of a homopolymer that is driven from the cis (entrance) to the trans (exit) side of a narrow pore by a uniform external field, F (Fig. 1; see Appendix A).We vary the stickiness of the pore (characterized by the attractive strength, ǫ pm ) along its length and consider three different examples (Fig. 1a-c).A uniformly attractive pore, Pore α, serves as the control case.Pore β is structured to have an attractive entrance and exit separated by a repulsive core, while Pore γ is designed to have an attractive entrance and a repulsive exit.These apparently minor changes in the pore patterning turn out to have significant effects on the translocation times.

II. TRANSLOCATION TIME DISTRIBUTIONS FOR HOMOPOLYMERS
The translocation time (τ ) is divided into (i) the initial filling time (τ 1 ): the time taken by the first monomer of the polymer to reach the exit without returning to the pore, (ii) the transfer time (τ 2 ): the time taken from the exit of the first monomer into the trans side to the entry of the last monomer from the cis side, and (iii) the escape time (τ 3 ): the time between entry of the last monomer in the pore and its escape to the trans side; see Fig. 1a-c.These definitions are better characterized by counting the number of monomers of the polymer on the cis side, N cis , inside the pore, N pore , and on the trans side, N trans , as functions of time (Fig. 1d), with N = N cis + N pore + N trans (see Supplementary Movie 1).
For Pores α and β, the filling time, τ 1 , depends weakly on the stickiness of the pore (Fig. 2d).In the presence of the weak driving force, Pore β has a shallower potential well near the cis end which reduces trapping time making filling easier (see Appendix B).The barrier encountered near the core is small enough to be overcome by the fluctuations of the polymer.With increasing ǫ pm , the effect of trapping becomes more dominant and thus the difference in τ 1 between pores α and β increases.Pore γ, which has a repulsive exit, takes a relatively longer time to fill.The large potential barrier beyond the cis side slows down the polymer increasingly more as the entrance becomes stickier (with increasing ǫ pm ).The distribution of filling times shows a relatively longer exponential tail for Pore γ due to this potential barrier.In sharp contrast to the filling time, the transfer time τ 2 has a much more regular behavior with increasing stickiness of the pores (Fig. 2e).The transfer of the polymer over the length of the pore depends on the potential landscape inside the pore: Pore α, which is attractive throughout, has the longest transfer time, while Pore γ, which is the least attractive pore, has the shortest τ 2 .Figure 2e shows that the transfer time depends exponentially on ǫ pm , and the difference in scales across the three pores is consistent with the number of attractive beads inside each pore.The escape time, τ 3 , depends strongly on the pore interaction near the exit, and differs most dramatically across the three pores (Fig. 2f).In this time interval, the polymer is already inside the pore and to escape the pore it needs to overcome the potential barrier near the exit.The dependence of the escape time on ǫ pm turns out to be faster than exponential, which suggests that seemingly insignificant changes in the stickiness pattern and strength of the wall of the pore could modify the translocation time by several orders of magnitude.The average total translocation time for the homopolymer across all the pores is plotted in Fig. 2g, which shows that for the relatively weak external force used here the translocation process is controlled by the escape mode (see Fig. 2d-g).The overall translocation time distributions for the three pores are also very different (Fig. 2a-c), despite the fact that the general shape of the distributions for each mode of the translocation process were similar.The extreme sensitivity of the translocation dynamics of the homopolymer on the pore patterning and stickiness suggests that it might be possible to engineer pores such that heteropolymers of any given sequence will have distinct statistical features that could be used for stochastic sequence detection.

III. HETEROPOLYMER SEQUENCE SENSING
To examine the feasibility of this sequencing strategy, we replace the homopolymer with heteropolymers constructed in accordance with earlier experimental [12] and theoretical [34] studies of polynucleotide translocation through nanopores; those containing symmetric purinepyrimidine blocks of the form A n C n , with the block length M = 2n (Fig. 1; see Appendix A).We assign different values to the attractive interactions of the sticky The total translocation time for the three pores as a function of ǫpm indicates that for small forces, the escape time dominates the translocation process.The orders of magnitude differences in the translocation times between different pores shows the extraordinary sensitivity of the translocation dynamics on pore patterning.
beads in the pore with base A (ǫ pA ) and base C (ǫ pC ), with ǫ pA > ǫ pC .
The translocation time distributions for five different sequences are shown in Fig. 3a-b for Pores β and γ.We find that the different modes of translocation across the two pores respond differently to variations in the block length, such that the outcome for the total translocation time exhibits distinct features (see Supplementary Movies 2 and 3 and Appendix C).To simplify the picture, we summarize the distributions for each pore in a scatter plot by using only the two basic characteristics of mean and standard deviation (Fig. 4a-d).We observe a number of interesting features.For example, both mean time and standard deviation seem to roughly increase with block length for Pore β, whereas for Pore γ mean time initially increases with block length, peaks at n = 4 and goes back to smaller values for longer blocks.While Pore β cannot easily distinguish between (A 4 C 4 ) 4 and (A 2 C 2 ) 8 , Pore γ can, and the reverse is true for (A 4 C 4 ) 4 and (A 8 C 8 ) 2 .We have also examined the effect of the orientation of the heteropolymer when it enters the pore, and considered polymers of total length N = 32 and N = 64 (Fig. 4a-d).The differences in the scatter immediately suggests that a combined translocation time measurement across the two pores and comparison with the statistics of the known sequences could help identify an unknown sequence to a high accuracy.
To demonstrate this idea and probe its statistical robustness as a sequencing strategy, we run a test on a model sequencing device that would be made up of multiple copies of Pores β and γ that are arranged inseries, such that the readout from translocation of a given polynucleotide with an unknown sequence through all of them can be independently recorded.We calculate the average and standard deviation of the translocation times through the Pores β and γ, separately, using their corresponding multiple readouts.Using the difference between the measured means and standard deviations and the tabulated values for known sequences through each pore, we calculate the relative error for each sequence and minimize it for all sequences across both pores to find the closest match, which will be returned as the predicted sequence.The ratio of the number of successful sequence detection events and the total number of attempts, which is defined as the accuracy of the statistical sequence detection algorithm, turns out to be remarkably high (Fig. 4e).For N = 32 and fixed orientation of the polynucleotide for all pores, the accuracy starts off at 75% with just the minimum two copies of each pore and rises quickly to above 95% when there are ten copies of each pore.
Instead of just using the first two moments, we can choose to use the full translocation time distributions for the sequence detection, using the following method.If we make a measurement of the translocation time (τ ) of a polymer with an unknown sequence through a given pore (Pore β, say), then the probability of the time being part of a distribution of a known sequence (say n) is P β n (τ ), where P n is the known probability distribution.After m measurements, the likelihood of the translocation times being part of a given distribution can be defined as L n = Π m i=1 P β n (τ i ).The structure of the unknown heteropolymer is determined by finding the n with the maximum likelihood, seq ≡ seq[max{L n }].For multiple pores and fixed orientation of the polymer through all of them, the likelihood can be generalized to 4e shows the resulting accuracy plots as obtained using the full translocation time distributions, which exhibit a considerably faster convergence in the algorithm.The corresponding results are very similar for N = 64 (Fig. 4e).This shows that an inherently statistical DNA sequencing strategy could be designed to have an arbitrary accuracy.
During the sequence detection process the heteropolymer could enter the pore with either base A or base C entering first.Therefore it is imperative to consider orientation effects on the translocation time distributions, as seen in Fig. 4a-d, and hence on our sequencing strategy.Due to the possibility of orientation flips during multiple readouts of the unknown sequence, we need to consider all permutations of the two orientations in a given set of readouts.When we incorporate the possibility of different orientation in the translocation time measurements, then we would need to consider the sum of all possible permutations of orientations in determining the likelihood of the translocation times being part of a given distribution.This leads For the method that uses the moments, the sample average and standard deviation are calculated and used to find the relative error of the sample average and standard deviation compared to the known values and compound them into an error metric for each pore and each sequence.The error metric is subsequently used for each pore to define a closeness metric, which will be minimized to predict the sequence.The accuracy is the ratio between the number of successful predictions and the total number of attempts.In (e) the orientation of the polymer is known and preserved when it passes through the different pores.In (f) the orientation of the polymer is not known and randomly changes when it passes through the different pores.
result of this calculation shows that using the full distribution is surprisingly robust with respect to the randomization of the orientation, which is of paramount importance in practice.For the set of measurements which do not involve orientation effects we observe a distinctly faster detection of a sequence (95% when there are 4 copies of each pore) as compared to the detection using the scatter plots (Fig. 4e).With the orientations of the polymer as it enters the pore taken into account, the accuracy of detection rises to 95% with just 5 copies of each pore (Fig. 4f).

IV. CONCLUSION
In contrast to the generally accepted notion of suppressing the stochastic element of polynucleotide motion through nanopores to achieve efficient DNA sequencing, we propose to extract information from the statistical fluctuations towards sequence detection.Our strategy is based on designing distinguishable translocation time statistics for any given sequence by engineering the polymer-pore interactions and combining readouts from multiple pores for rapid convergence.The desired patterns in surface interaction could be achieved by using biological nanopores with appropriate modification [53,54] or those with known hydrophobic-hydrophilic pattern structure [55], as well as solid-state nanopores with tailormade surface interactions [15][16][17][18][19].The proposed approach could potentially improve the overall speed of sequence detection by orders of magnitude, and could be integrated in high throughput microfluidic devices.Homopolymer model.We model the polymer as a self avoiding chain by using beads and springs (Fig. 1).The beads represent monomer groups of the polymer and we model the excluded volume interaction between a pair of monomers by a truncated repulsive Lennard-Jones (rLJ) potential of the form where ǫ is the potential depth and σ is the monomer diameter.The cut-off distance, r min = 2 1/6 σ, is set at the potential minimum.The bonding springs between monomer groups are modelled by a finite extension nonlinear elastic (FENE) potential of the form where k = 7ǫ/σ 2 and R = 2σ are the spring constant and bond length respectively.FENE potentials are convenient as the bond length effectively sets the maximum allowed separation between monomer groups.We use polymers of length N = 32 and N = 64 in our simulations.
Heteropolymer model.We model the heteropolymers similarly using beads and springs (Fig. 1) with the polymer beads representing the bases A and C arranged in symmetric blocks A n C n .With a DNA of length N = 32, the minimum value of n = 1 is for poly(dAdC) 16 and the maximum value of n = N/2 for poly(dA 16 dC 16 ).The bases A and C are only distinguished by their relative interactions with the pore.
Pore model.The pore and wall are constructed from stationary monomers separated by a distance of σ from each other.The pore is made up of two of monomers symmetric about the coordinate system with a length L = 5σ and separated by a width of W = 2.25σ.The pore width is chosen to allow only single file translocation of the polymer and avoid hair-pin configurations.The polymer translocates from the cis (entrance) end to the trans (exit) end of the pore (Fig. 1).The walls of the pore extend in the y direction.
Polymer-pore interaction.The interaction of the pore with the polymer is tuned such that the interaction varies along the length of the pore.This interaction could either be the short-range repulsive form described above or the standard LJ form: with ǫ pm denoting the potential depth and r c = 2.5σ denoting the cut-off distance.We choose three different pore patterns with the patterning symmetric about the x-axis: (1) Pore α is an attractive pore with all the monomers of the pore interacting with the polymer by the LJ potential.( 2) Pore β has an attractive entrance and exit with the first two monomers and the last two monomers of the pore interacting with the polymer by the LJ potential and the middle monomer being repulsive.
(3) Pore γ has an attractive entrance (first two monomers attractive) and a repulsive exit (last three monomers repulsive).Note that in all the three cases the pore entrance is chosen to be attractive to successfully initiate translocation.The stickiness of the pore (ǫ pm ) is varied during homopolymer translocation.During the translocation of the heteropolymer the stickiness differs for base A (ǫ pA ) and base C (ǫ pC ).We fix these values to ǫ pA = 3.0 and ǫ pC = 1.0 respectively.The polymer interacts with the wall (U LJ mw ) with the same rLJ potential as used for the intra-monomer excluded volume interaction.In addition the polymer experiences a driving force, F e = F x directed along the pore axis with magnitude F , Polymer injection.In our simulation we are not concerned with injection of the polymer into the pore, but only with the dynamics of the polymer during translocation.We initially place the first bead of the polymer chain at the entrance of the pore and allow the remaining beads to fluctuate.Once the polymer relaxes to its equilibrium configuration, the bead is released and the translocation of the polymer across the pore is monitored.The translocation time is defined as the time that elapses between the entrance of the first bead of the polymer and the exit of the last bead.All failed translocation events are discarded.
Integration algorithm.The equations of motion of the monomers of the polymer were integrated using a Langevin dynamics (LD) algorithm that includes a velocity Verlet update [52].Within the LD formalism, the interaction of the monomers with a solvent is simulated by a viscous drag term proportional to the monomer velocity and a random force term modeled by Gaussian white noise with an auto-correlation function that satisfies the fluctuation-dissipation theorem.The equation of motion for a monomer therefore takes the form: where m is the monomer mass, U i = U LJ mm + U FENE ch + U LJ wm + U LJ pm is the total potential experienced by a monomer, ζ is the friction coefficient, v i is the monomer velocity, and η i is the random force with η i (t)•η j (t 0 ) = 4k B T ζδ ij δ(t − t 0 ), T being the temperature.A time step of ∆t = 0.01 is used in all simulation runs.
Reduced units.The units of energy, length, and mass are set by ǫ, σ, and m, respectively.These set the scale for the time as (mσ 2 /ǫ) 1/2 .Following Luo et al. [34], we assume that the size of each bead in our coarse-grained polymer model corresponds to the Kuhn length of a single stranded DNA, which is approximately three nucleotide bases.This sets the bead size, σ ≈ 1.5 nm, the mass of the bead, m ≈ 936 amu (given that the mass of a base in DNA is ≈ 312 amu) and the charge of a bead, q ≈ 0.3 e (each base having a charge of 0.1 e effectively [56]).We set ζ = 0.7 and k B T = 1.2 to allow comparison with known results.Therefore, the interaction strength at T = 295 K is given by ǫ = k B T /1.2 ≈ 3.4 × 10 −21 J.This gives the time scale of (mσ 2 /ǫ) 1/2 ≈ 30 ps and a force scale of ǫ/σ ≈ 2.3 pN.Therefore an external driving force in the range 0.5 − 1.0 corresponds to a voltage range V = F L/q ≈ 190 − 380 mV across the pores.Note, however, that higher values of up to 0.5 e for the effective base charge have also been reported in the literature [57], which suggest that the appropriate voltage range could be lower than the above-mentioned values.

As a rough indication of how much the patterning could
Appendix C: Optimizing the pore for sequencing: the effect of driving force and pore width To understand the effects of patterning the pore on sequencing, we considered the translocation of heteropolymers through the pores.Following Luo et al. [34], the polymers were represented as consisting of symmetric blocks A n C n of A and C bases, which interact differently with the pore.The time distributions for the three pores show a varying degree of sensitivity on the specific sequence of the polymer (Figs. 9, 10, 11, and 12), depending on the strength of the external force and the pore width.In Fig. 9, we show the dependence of the distributions for F = 1.0 and W = 2.5.For Pore α, the difference in distributions for short block lengths is relatively small.As the block lengths are increased, the distribution changes sharply.However, for larger block lengths it becomes difficult again to distinguish them.For Pores β and γ, the distributions have a high degree of overlap and are not suitable for sequencing.
As the width is decreased (Fig. 10, W = 2.25), Pore α takes extremely long to translocate for larger block lengths.The potential barrier proves difficult to surmount and the polymer is stuck for long periods inside the pore.However, lowering the width has a positive impact on Pore β which leads to translocation time distributions that can distinguished from one another.Although the translocation times are much longer, the distributions are well separated by their means and standard deviations.This impact is much less for Pore γ.
On the other hand, we could keep the pore width fixed (W = 2.5) and lower the strength of the external driving force (Fig. 11).The effect on Pore α is drastic as the polymers fail to cross the potential barrier.Pore β and γ, on the other hand, still translocate polymers although their distributions for the different sequences are far from distinguishable.
Finally, we keep the width at W = 2.25 and lower the force to F = 0.5 (Fig. 12).This completely takes out Pore α from consideration as the translocation times become prohibitively long.The translocation time scales are now much longer for Pores β and γ as well.However, the mean and standard deviations for Pore γ are again well separated making it easier to distinguish between the distributions, and hence make it a suitable candidate for sequencing.In our simulations, we use Pore β at F = 1.0,W = 2.25 and Pore γ at F = 0.5, W = 2.25 as the two most suitable pores for our sequencer.

FIG. 1 :
FIG.1:The schematics of the polymer and nanopore models.Simulation snapshots showing the translocation of a homopolymer and heteropolymers with different block lengths across the three patterned pores at various stages of the translocation process, namely, filling, transfer and escape.(a) A homopolymer (yellow) translocating from the cis to the trans side of Pore α.The interaction of the pore monomers (red) with the polymer has an attractive well whereas the monomers that make up the walls of the pore (blue) have an excluded volume interaction with the polymer.The pore width is fixed and there is a constant force driving the polymer that acts inside the pore.(b) A heteropolymer (poly(dAdC)16) of block length M = 2 with alternating bases A (yellow) and C (green) translocating across Pore β.Pore β consists of two sticky monomers on either end of the pore that are separated by a wall monomer.The bases A and C have different interactions with the sticky monomers.(c) A heteropolymer (poly(dA4dC4)8) of block length M = 8 translocating across Pore γ, which has two sticky monomers on the cis side and three repulsive monomers on the trans side that result in an attractive entrance and a repulsive exit.(Inset) Shows the pore-polymer potentials.(d) A trace of the monomer count at the trans end, middle, and cis end of the pore as functions of time for a homopolymer translocating through Pore γ, with ǫpm = 3 and F = 0.5.τ1, τ2 and τ3 change dramatically when pore patterning is introduced.

FIG. 2 :
FIG. 2:Translocation time statistics for homopolymers.(a-c) Comparison of translocation time distributions for the three patterned pores, for F = 0.5 and ǫpm = 2.0.The filling, transfer and escape distributions are similar across the three pores, but have distinctly different scales (e.g.average and variance), such that the overall translocation time distribution for the three pores are discernably different.(d-f) Comparison of average filling, transfer and escape times for the three different pores for F = 0.5 as a function of ǫpm.While the filling time shows only a moderate dependence on the stickiness, and the transfer time exhibits an exponential dependence on ǫpm, the dependence of the the escape time is even faster than exponential.(g) The total translocation time for the three pores as a function of ǫpm indicates that for small forces, the escape time dominates the translocation process.The orders of magnitude differences in the translocation times between different pores shows the extraordinary sensitivity of the translocation dynamics on pore patterning.

FIG. 3 :
FIG. 3: Translocation time statistics for heteropolymers.Comparison of filling, transfer, escape and translocation time distributions for Pores β (a) and γ (b) and five different sequences of the heteropolymer.The distributions correspond to F = 1.0 for Pore β, and F = 0.5 for Pore γ, respectively.

FIG. 4 :
FIG. 4: Using translocation time statistics to detect polynucleotide sequences.(a-d) Scatter plots showing the distinctive mean and standard deviations of the different sequences and for different ends of the polynucleotide entering the pore.The mean translocation time, τ , and its standard deviation, τ 2 − τ 2 , for the different sequences are calculated from the distributions.The scatter plots reveal the distinctive characteristics of the translocation events for the different sequences through each pore.The plots correspond to (a) Pore β with F = 1.0 and N = 32 (b) Pore β with F = 1.0 and N = 64 (c) Pore γ with F = 0.5 and N = 32 and (d) Pore γ with F = 0.5 and N = 64.(e-f) Accuracy of sequence detection using multiple joint translocation events through Pores β and γ.The plots are constructed by recording a given number of translocation times through Pores β and γ, and using a comparison with either the full distribution or the first two moments of the distribution shown in the scatter plots (a-d).For the method that uses the moments, the sample average and standard deviation are calculated and used to find the relative error of the sample average and standard deviation compared to the known values and compound them into an error metric for each pore and each sequence.The error metric is subsequently used for each pore to define a closeness metric, which will be minimized to predict the sequence.The accuracy is the ratio between the number of successful predictions and the total number of attempts.In (e) the orientation of the polymer is known and preserved when it passes through the different pores.In (f) the orientation of the polymer is not known and randomly changes when it passes through the different pores.

FIG. 5 :
FIG.5: Comparison of the polymer-pore interaction potentials.(Top left) A schematic of the pore.Pore monomers could either have an attractive (LJ) interaction (red) or a short range repulsive (rLJ) interaction (blue) with the polymer inside the pore.(Top right) The potential energy landscape in the center (y = 0) along the length of the channel (blue) is modified (green) in the presence of an external driving force F = 0.5.(Bottom) The complete potential energy landscape experienced by the polymer inside the pore.Blue to red represents increasing potential depth.

FIG. 6 :
FIG. 6: Comparison of the filling, transfer, escape and translocation time distributions for the three pores at F = 0.5 and ǫ = 1.0

FIG. 7 :FIG. 8 :
FIG. 7: Comparison of the filling, transfer, escape and translocation time distributions for the three pores at F = 0.5 and ǫ = 1.5