Derivation of the Boltzmann Equation for Financial Brownian Motion: Direct Observation of the Collective Motion of High-Frequency Traders

A microscopic model is established for financial Brownian motion from the direct observation of the dynamics of high-frequency traders (HFTs) in a foreign exchange market. Furthermore, a theoretical framework parallel to molecular kinetic theory is developed for the systematic description of the financial market from microscopic dynamics of HFTs. We report first on a microscopic empirical law of traders' trend-following behavior by tracking the trajectories of all individuals, which quantifies the collective motion of HFTs but has not been captured in conventional order-book models. We next introduce the corresponding microscopic model of HFTs and present its theoretical solution paralleling molecular kinetic theory: Boltzmann-like and Langevin-like equations are derived from the microscopic dynamics via the Bogoliubov-Born-Green-Kirkwood-Yvon hierarchy. Our model is the first microscopic model that has been directly validated through data analysis of the microscopic dynamics, exhibiting quantitative agreements with mesoscopic and macroscopic empirical results.

Introduction.-In physics, the study of colloidal Brownian motion has a long history beginning with Einstein's famous work [1]; the understanding of its mechanism has been systematically developed in kinetic theory [2,3]. Specifically, from microscopic Newtonian dynamics, the Boltzmann and Langevin equations are derived for the mesoscopic and macroscopic dynamics, respectively. This framework is a rigid foundation for various nonequilibrium systems (e.g., active matter, granular gas, Feynman ratchets, and traffic flow [4][5][6][7][8][9][10]), and its direct experimental foundation has been revisited because of recent technological breakthroughs [11,12]. In light of this success, it is natural to apply this framework beyond physics to social science [13], such as finance. Indeed, the concept of random walks was historically invented for price dynamics by Bachelier earlier than Einstein [14], and its similarities to physical Brownian motion (e.g., the fluctuation-dissipation relation) are intensively studied by recent high-frequency data analysis [15]. As an idea in statistical physics, the dynamics of financial markets are expected to be clarified from first principles by extending kinetic theory.
Although this idea is attractive, the kinetic description has not been established for financial Brownian motion. Why has not this idea been realized yet? In our view, the biggest problem is the absence of established microscopic models; there exist empirical validations of mesoscopic [15][16][17][18][19][20][21] and macroscopic models [22][23][24][25][26][27][28], whereas no microscopic model has been validated by direct empirical analysis. Indeed, previous microscopic models [29][30][31][32][33] were purely theoretical and have no quantitative ev- * Corresponding author: kanazawa.k.ae@m.titech.ac.jp idence microscopically. To overcome this crucial problem as an empirical science, two missing links have to be connected: (i) establishment of the microscopic model by direct observation of traders' dynamics ( Fig. 1a) and (ii) construction of a kinetic theory to show its consistency with mesoscopic and macroscopic findings (i.e., the order-book and price dynamics (Fig. 1b, c)).
In this Letter, we present the corresponding solutions by direct observation of high-frequency trader (HFT) dynamics in a foreign exchange (FX) market: (i) a microscopic model of HFTs is established by direct microscopic evidence, and (ii) corresponding kinetic theory is developed to show its consistency with mesoscopic and macroscopic evidence. We analyzed order-book data with anonymized trader identifiers (IDs) to track trajectories of all individuals. We found an empirical law concerning trend following among HFTs, which has not been captured by previous order-book models. Remarkably, this property induces the collective motion of the order book and naturally leads the layered order-book structure [15]. We then introduce a corresponding microscopic model of trend-following HFTs. Starting from their "equations of motion," Boltzmann-like and Langevin-like equations are derived for the order-book and price dynamics. A quantitative agreement is finally shown with our empirical all findings. Our work opens the door to systematic descriptions of finance based on microscopic evidence.
Observed microscopic dynamics.-We analyzed the high-frequency FX data between the U.S. dollar (USD) and the Japanese Yen (JPY) on Electronic Broking Services for a week in June 2016 (see Appendix A 1). The currency unit used in this study is 0.001 yen, called the tenth pip (tpip). Here we particularly focused on the dynamics of HFTs [34], frequently submitting or canceling orders according to algorithms (see Appendix A 2). The typical trajectories of bid and ask quoted prices are illustrated in Fig. 2a-c for the top 3 HFTs. They modify their quoted prices by successive submission and cancellation at high speed typically within seconds; almost 99% of their submissions were finally canceled without transactions (see Appendix A 4). With the two-sided quotes they also play the role of liquidity providers [35,36] according to the market rule, keeping the balance between the bid and ask order book. Buy-sell spreads, the difference between the best bid and ask prices for a single HFT, were observed to fluctuate around certain time constants (see the insets for their distributions and Appendix A 5). We then report the empirical microscopic law for the trend-following strategy of individual traders. The bid and ask quoted prices of the top ith HFT are denoted by b i and a i (see Appendix A 6). We investigated the average movement of the trader's quoted midprice z i ≡ (b i +a i )/2 between transactions conditional on the previous market transacted price movement (Fig. 2d). Here we introduce the tick time T as an integer time incremented by every transaction. The mean transaction interval is 9.3 seconds during this week. Because typical HFTs frequently modify their price between transactions, we here study HFTs' trend following at one-tick precision. For the top 20 HFTs (Fig. 3), we found that the average and variance of movement where the conditional average . . . ∆p is taken when the last price change is ∆p(T − 1) ≡ p(T ) − p(T − 1) and ∆z i = 0 (see Appendix A 6) and the conditional variance is defined by ∆p . Here, p(T ) is the market transacted price at the T tick, and c i , ∆p * i , σ 2 i are characteristic constants unique to the trader and independent of ∆p. Their typical values were found to be c i ≈ 6.0 tpip, ∆p * i ≈ 7.5 tpip, and σ i ≈ 14.5 tpip. Our finding (1) implies that the reaction of traders is linear for small trends but saturates for large trends, and quantifies the collective motion of HFTs. Remarkably, a similar behavior was reported from a price movement data analysis at one-month precision [37].
Microscopic model.-Here we introduce a minimal microscopic model of HFTs incorporating the above characters. We make four assumptions: (i) The number of traders is sufficiently large; (ii) traders always quote both bid and ask prices (for the ith trader, b i and a i ) simultaneously with a unit volume; (iii) buy-sell spreads are time constants unique to traders with distribution ρ(L). The trader dynamics are then characterized by the midprice z i ≡ (b i + a i )/2; and (iv) trend-following random walks are assumed in the microscopic dynamics ( Fig. 4a-c), with strength for trend following c, previous price movement ∆p, and white Gaussian noise ση R i with variance σ 2 . Here, c, ∆p * , and σ are assumed shared for all traders for simplicity. In this model, HFTs frequently modify their quoted price by successive submission and cancellation. Indeed, this model can be reformulated as a Poisson price modification process with high cancellation rate (see Appendix B 2). After transaction a j (t) = b i (t) (Fig. 4b), the updated market price and its corresponding movement are recorded as  4 -3 -2 -1 0 1 2 3 4   1st  2nd  3rd  4th  5th  6th  7th   8th  9th  10th  11th  12th  13th  14th  15th  16th  17th  18th  19th 20th Scaling (a) Trend-following movement on average (hyperbolic function) (b) Standard deviation of random noise effect (independent of ) Scaling  Scaling   tpip  tpip  tpip   tpip  tpip   19th HFT  11th HFT   1st  2nd  3rd  4th  5th   6th  7th  8th  9th  10th   11th  12th  13th  14th  15th   16th  17th  18th  19th  20th   4th  and a requotation jump occurs (Fig. 4c), Here, t + 0 implies the time after transaction. A unique character of this model is the order-book collective motion due to trend following (Fig. 4d). For ∆p > 0, the bid (ask) volume change tends to be positive (negative) near the best price ( Fig. 4e), consistently with the layered order-book structure [15]. Kinetic formulation.-We next present an analytical solution to this model (2) according to kinetic theory [2,3]. Let us first introduce the relative distance r i ≡ z i − z c.m. from the "center of mass" z c.m. ≡ i z i /N (Fig. 4a), where the trend-following effect in Eq. (2) is absorbed into the dynamics of z c.m. . The dynamics of r i become simpler because trend-following effects disappear in this moving frame (see Appendix B 3). We next introduce the one-body (two-body) probability distribution as φ L (r) (φ LL (r, r )) conditional on traders' buy-sell spreads. From the microscopic model (2), the lowest-order hierarchy equation is derived as ∂φ L /∂t = (σ 2 /2)(∂ 2 φ L /∂r 2 )+N s=±1 dL ρ(L )[J s LL (r+sL/2)− J s LL ] with J s LL (r) ≡ (σ 2 /2)|∂ rr |φ LL (r, r ) r−r =s(L+L )/2 and |∂ rr |f ≡ |∂f /∂r| + |∂f /∂r | (see Appendix B 5). By assuming "molecular chaos" we derive the Boltzmann-like equation with collision integrals for the order book: withJ s LL (r) ≡ (σ 2 /2)|∂ rr |{φ L (r)φ L (r )} r−r =s(L+L )/2 . Here, s = +1 (s = −1) represents transactions as bidder (asker). Because traders exhibit collective motion arising from trend following, a Langevin-like equation is also derived as the macroscopic description of the model (2), where τ (T ) and ζ(T ) are transaction time interval and random noise at the T th tick time, respectively. The first trend-following term corresponds to the momentum inertia in the conventional Langevin equation. Equations (6) and (7) can be analytically assessed for N → ∞. We first set the buy-sell spread distribution as with decay length L * = 15.5 ± 0.2 tpip, empirically validated in our data set (Fig. 5a). The solution to Eq. (6) for N → ∞ is given by φ L (r) = (4/L 2 ) max{L/2−|r|, 0}. The average order-book profile f A (r) = dLρ(L)φ L (r − L/2) is then given for r > 0 by The statistics of τ (T ) in the macroscopic model (7) is derived from the mesoscopic model (6), and the tail of the price movement is approximately given by with decay length κ ≈ 2∆z * /3, average movement from trend following ∆z * ≡ cτ * , average transaction interval τ * ≈ 3L * 2 /N σ 2 , and complementary cumulative distribution function (CDF) P (≥ |∆p|; κ) (see also Appendices B 6 and B 7 for numerical validation). Mesoscopic and macroscopic data analysis.-We next investigated whether our microscopic model is consistent with our data set. The empirical daily profile was first studied for the average ask order book for the best prices of HFTs f A (r) (Fig. 5b). Surprisingly, we found a quantitative agreement with our theory (9) without any fitting parameters, which strongly supports the validity of our description.
The two-hourly segmented CDF for the price movement is also evaluated in one-tick precision P 2h (≥ |∆p|; κ) ( Fig. 5c), which obeys an exponential law that is qualitatively consistent with our theoretical prediction (10). The value of the two-hourly decay length κ fluctuates significantly during a week. To remove this nonstationary feature, we introduced the two-hourly scaled CDFP 2h (≥ |∆p|) ≡ P 2h (≥ κ|∆p|; κ)/Z with scaling parameters κ and Z (Fig. 5d), thereby incorporating the two-hourly exponential law for the whole week.
The price movements obey an exponential law for short periods but simultaneously obey a power law over long periods with exponent α = 3.6 ± 0.13 (Fig. 5e). This apparent discrepancy originates from the power-law nature of the decay length κ. Because κ approximately obeys a power-law CDF Q(≥ κ) ∼ κ −m over the week with m = 3.5 ± 0.13 ( Fig. 5f), the one-week CDF P w (≥ |∆p|) asymptotically obeys the power law as a superposition of the two-hourly segmented exponential CDF, Our result is therefore consistent with the previous reported power law [24][25][26][27] as a nonstationary property of κ.
Since our trend-following HFT model exhibits the order-book collective motion ( Fig. 4d and e), this model can reproduce the layered order-book structure [15] (see Appendix B 8). Let us define c − r (c + r ) and a − r (a + r ) as the number of bid (ask) submission and cancellation between one tick at the relative distance r from the market midprice. We also define the number change at the distance r for the bid (ask) side. Fig. 5g between N − r (N + r ) and ∆p, showing positive and negative correlation in the inner (outer) and outer (inner) layers, respectively. We further show a linear correlation between the price movement ∆p and the total number change in the inner layer N inner ≡ γc −∞ dr(N − r − N + r ). The trend-following HFT model is thus qualitatively consistent with the previous findings [15] (see Appendix B 9 for data analyses), implying that the layered structure was the direct consequence of the collective motion.
Discussion. -We have empirically studied the trend following of HFTs, inducing the collective motion of the order book. This property has not been captured in the previous order-book model [16][17][18][19][20][21] and was critical in reproducing our empirical findings. Indeed, none of our empirical findings, the order-book profile, the exponential price movement, and the layered order-book structure [15] were reproduced by the previous order-book model under realistic parameters in the absence of the collective motion (see Appendix B 10). We expect that introduction of this collective motion to order-book models would be the key to replicate these empirical findings.
Conclusion.-We have established both a microscopic model and a kinetic theory for FX traders by direct observation of the HFTs' dynamics, quantitatively agree-ing with empirical results under minimal assumptions. In the stream of econophysics, our model (2) is the first microscopic model directly supported by microscopic dynamical evidence and exhibiting agreement with mesoscopic and macroscopic findings. We expect that a new stream arises toward systematic descriptions of the financial market based on microscopic evidence. Interested readers are referred to Ref. [38] for more mathematical details. We analyzed high-frequency trading data in Electronic Broking Services (EBS), one of the biggest financial markets in the world. This market is continuously open except for weekends under few regulations. All trader activities were recorded for our data set with anonymized trader IDs and with one-millisecond time-precision from the 5th 18:00 to the 10th 22:00 GMT June 2016. The minimum price-precision was 0.005 yen for the USD/JPY pair at that time, and the currency unit used in this study is 0.001 yen, called the tenth pip (tpip). The minimum volume unit for transaction was one million USD, and the total monetary flow was about 68 billion USD during this week. The EBS market is a hybrid market combining both quote-driven and order-driven systems, where traders have three options: limit order, market order, and cancellation. A limit order is an order quoting price with a certain volume and the quoted price displayed on the order book. A market order is an order to buy or sell currencies immediately at the available best price.
Here we define terminology in this paper. The highest bid and lowest ask quoted prices are called the market best bid and ask prices (denoted by b M and a M ), respectively (see Fig. 6a). The average of the market bid and ask prices is called the market midprice (denoted by z M ). Also, the market transacted price p (or the market price for short) means the price at which a transaction occurs in the market.
We note a central trading rule regarding the mutual credit lines between traders [36]. All market participants are required to set credit lines to counterparties in advance, and they cannot transact with each other in the absence of mutual credit. Therefore, traders sometimes transact at the worse price than the best market price.

Definition of the high frequency traders
For this paper, a high frequency trader (HFT) is defined as a trader who submits more than 500 times a day on average (i.e., more than 2500 times for the week). This definition is similar to that introduced in Ref. [39]. As a few traders are unwilling to transact and often interrupt orders at the instant of submission, we excluded traders with live orders of less than 0.5% of the transaction time. With this definition, the number of HFTs was 134 during this week, whereas the total number of traders was 1015. We note that the total number of traders who submitted limit orders was 922; the other 93 traders submitted only market orders. We also note that the presence of HFTs has rapidly grown recently and 87.8% of the total orders were submitted by the HFTs in our data set.
Here we note a regulation on cancellations in this market, which is related to motivating HFTs to play the role of key liquidity providers (KLPs) [36]. For market stability, all traders are required not to cancel orders frequently; there is a threshold on the ratio between dealt quote and total number of quotations called the quote fill ratio (QFR). If the QFR of a trader is lower than a threshold, penalties are imposed on the trader in this market. However, there is a special rule to lower the threshold. If a trader maintains two-sided quotes continuously for a fixed time interval (called key liquidity hours), the trader qualifies as a KLP and is subject to a lower threshold QFR. Because HFTs tend to cancel orders frequently, they are typically KLPs as illustrated in Fig. 2a We also note the typical number of HFTs related to snapshots of the order book. We took snapshots of the order book after every transaction and counted the total number of different trader identifiers (IDs) for both bid and ask sides. The counting weight for an HFT quoting both sides is 1 and that for an HFT quoting one side is 1/2. We then plotted the average of the number of trader IDs for both bid and ask sides every two hours in Fig. 6b, showing the periodic intraday activity pattern of HFTs (i.e., N tends to be small during 20:00-22:00 GMT). The typical number of HFT was about 35 in our data set with this definition. The number of total volumes quoted by HFTs is typically about 80. Admittedly, there is room for debate on which number is appropriate for the calibration of the total number of traders in our model; it remains a topic for future study.

Percentage of two-sided quotes
We calculated the percentage of two-sided quotes as follows; when a bid (ask) order is submitted by a trader, we check whether the corresponding ask (bid) orders exist. We then count the number of two-sided quotes for all traders at the submission of every order and finally divide it by the total number of submissions.

Cancellation ratio for individual traders
For each trader, we calculated the total number of canceled volumes over that of submitted volumes for the cancellation ratio of the trader. The cancellation ratio for the first, second, and third top HFTs were 98.59%, 99.93%, and 98.70%, respectively (or equivalently, their QFR were 1.41%, 0.07%, and 1.30%, respectively). The total cancellation ratio among all the HFTs was 94.42% (or equivalently the total QFR was 5.58%).

Buy-sell spread
The difference in the best bid and ask prices was studied as a buy-sell spread for an HFT. Samples where only both bid and ask prices exist are taken at one-second time-intervals for the insets in Fig. 2a-c and Fig. 5a. We plotted standard deviations of the averages as error bars for each point.

Trend-following effect
We explain the precise definition of the bid (ask) price of individual HFTs for the analysis of trend following. If a trader quotes both single-bid and single-ask orders at any time, the bid and ask prices are defined literally. In the presence of multiple bid or ask orders, we use the best value for the bid or ask orders as b i or a i . In the absence of any bid or ask or both orders, we use the most recent bid or ask price as b i or a i for interpolation.
Because of the discrete nature of this data analysis, the probability that traders do not move at all (i.e., ∆z i = 0) is estimated high. We therefore excluded the samples during an inactive time interval ∆z i = 0 for the calculation of representative values in the following. This exception handling does not have a big impact on the hyperbolic structure in Eq. (1). Exceptional samples for which the bid or ask price is far from the market price by 0.1 yen (0.02% of the total) are also excluded from the calculation of the conditional ensemble average . . . ∆p . In Fig. 3a, data points are plotted whose samples are over 100 in each bin. The standard deviations of the conditional averages are plotted for each point as error bars. Also, median values in the top 20 HFTs are given using c i ∼ 6.0 tpip/tick and ∆p * i ∼ 7.5 tpip, which are estimated by the least squares methods implemented in gnuplot.
We have also calculated the standard deviation of quoted price movements for individual traders at one-tick precision in Fig. 3b. For the ith top HFT, we calculated the conditional variance V ∆p [∆z i ] ≡ (∆z i − ∆z i ∆p ) 2 ∆p and took its square root. As can be seen from Fig. 3b, the standard deviation is approximately independent of ∆p for the top 20 HFTs. We note that the median value was σ i ∼ 14.5 tpip/tick. This observation is consistent with the assumption that only the drift term depends on ∆p but the random noise effect does not depend on ∆p in our microscopic model.

Average order-book profile
The daily average order-book profile is calculated for the best prices of the HFTs. We took snapshots of the order book for the best prices of the HFTs every second and we calculated its ensemble average every day. We also plotted standard deviations of the averages as error bars for each point.

Price movement distributions and decay length
The two-hourly segmented complementary cumulative distribution functions (CDFs) for the price movement ∆p are calculated in one-tick precision in Figs. 5c, d: ∆p(T ) ≡ p(T + 1) − p(T ) with market price p(T ) at tick time T . The decay length κ and its error were estimated by the least squares methods implemented in gnuplot (Figs. 5d, f) and the two-hourly scaled CDFs were plotted in Fig. 5d with the maximum samples excluded as outliers. The time-series of the estimated decay length κ is plotted in Fig. 6b, showing that κ was the longest just after the opening of the EBS market (the 5th 18:00-20:00 GMT). We conjectured that the decay length κ was related to the market activity, represented by such as the number of HFTs during the time region. Indeed, the number of HFTs was also the least during the 5th 18:00-20:00 GMT in the week.
Appendix B: Theoretical Analysis

Model dynamics
We explained the model dynamics as trend following random walks (2) with jump rules (3) and (4). These dynamics can be represented within the framework of Markovian stochastic processes using the δ-functions. The stochastic dynamics can be written as where we have used the Itô convention. Here, τ k;ij is the k-th collision time; jump size ∆z ij between traders i and j, post-collisional price p post , and price movement ∆p post are defined by  with signature function sgn(x) defined by sgn(x) = x/|x| for x = 0 and sgn(0) = 0. Remarkably, the jump rule Eq. (B2) corresponds to the contact condition and momentum exchange in the conventional kinetic theory. In the following, we present effective descriptions of this model for mesoscopic and macroscopic hierarchies.

Note on a Poisson price modification process
Since the Gaussian noise can be obtained by taking the high-frequent small jump limit for Poisson noises [40], the model (B1) can be reformulated as a Poisson price modification process with high-frequent cancellation rate. Here, let us focus on the quoted price dynamics for HFTs in the absence of transactions. As shown in Fig. 2a-c, HFTs tend to frequently and continuously modify their price by successive order cancellation and submission, possibly due to the market rule (i.e., they are required to maintain the continuous two-sided quote for a fixed time interval [36]). On the basis of these characters, we can consider a Poisson cancellation model corresponding to the model (B1). Let us introduce the order cancellation rate λ, which gives the cancellation probability during [t, t + dt] as λdt. The mean-cancellation interval is characterized by ∆t can ≡ 1/λ. After cancellation, we assume that HFTs instantaneously requote their price to maintain continuous limit orders. In the absence of transaction, the requoted price is assumed to be described by a discrete version of Eq. (B1) as with a standard Gaussian random number η R i according to our empirical finding (1). Transaction rule is also assumed the same as the continuous model (B1). Here, the infinitesimal time step dt is different from the mean-cancellation interval ∆t can . A schematic trajectory described by this Poisson dynamics is illustrated in Fig. 7. The continuous model (B1) is obtained in the high-frequent cancellation limit λ → ∞ for the discrete model (B3). The HFTs' nature on high-frequent price modifications is thus reflected in the continuous model (B1).

Introduction of the center of mass and the corresponding relative price
We here introduce the center of mass (c.m.) and the corresponding relative price (see Fig. 8 for a schematic): The dynamics of the c.m. and the relative price is given by Remarkably, trend following only appears in the dynamics of the c.m., but does not appear in that of the relative price. This is natural because trend following induces a collective behavior of traders, and can be absorbed into the dynamics of the c.m.. Furthermore, the contribution of ξ is much smaller than that of ση R i and η T i for N → ∞: In the moving frame of the c.m., the dynamics of the relative price r i is thus simplified and approximately obeys the following dynamical equation:

BBGKY Hierarchical equation for two-body problem: N = 2
Before deriving the Bogoliubov-Born-Green-Kirkwood-Yvon (BBGKY) hierarchical equation for N 1, we first consider the two-body system of traders to specify the collision integrals. Extension to the many-body problem will be studied in the next subsection. Let us denote the relative midprices of the first and second traders by r 1 and r 2 with constant spreads L 1 and L 2 . The dynamics is given by with jump sizes ∆r 1 , ∆r 2 and k-th transaction time τ k . Here, η i;ε is the colored Gaussian noise satisfying η R i;ε (t)η R j;ε (s) = δ ij e −|t−s|/ε /2ε for i, j = 1, 2. Later, we shall take the ε → 0 limit, whereby colored Gaussian noise η R i;ε converges to white Gaussian noise as lim ε→0 η R i;ε (t)η R j;ε (s) = δ ij δ(t − s). The k-th transaction time τ k and the jump sizes ∆r 1 , ∆r 2 are determined using the collision rule, We first derive the master equation for this system. For the two-body probability distribution function (PDF) P 12 (r 1 , r 2 ), we exactly obtain a time-evolution equation where |∂ 12 |g(r 1 , r 2 ) ≡ |∂g(r 1 , r 2 )/∂r 1 | + |∂g(r 1 , r 2 )/∂r 2 | is the sum of the absolute value of the partial derivatives for arbitrary g(r 1 , r 2 ). This equation can be derived as follows. For an arbitrary function f (r 1 , r 2 ), we obtain an identity where we have used the expansion of the δ-function: δ(g(t)) = ∞ k=0 δ(t − τ k )/|g (τ k )| with the k-th zero points, such that g(τ k ) = 0 and τ k < τ k+1 . Here we consider the direction of the collision; that is, η 1;ε − η 2;ε must be positive just before the collision r 1 − r 2 = (L 1 + L 2 )/2. Inversely, η 1;ε − η 2;ε must be negative just before the collision r 1 − r 2 = −(L 1 + L 2 )/2. We thus obtain We take the ensemble average of both sides to obtain Here the two-body PDF P 12 (x 1 , x 2 ) characterizes the probability of , we obtain the master equation (B13) where the abbreviation symbol involving the derivatives is defined as∂ 12 ≡ ∂/∂x 1 − ∂/∂x 2 , using the Novikov's theorem [41] for an arbitrary function g(r 1 , r 2 ) as We note that∂ 12 is a slightly different symbol from |∂ 12 | in terms of signatures (see Eq. (B15) for their relation). We comment on the signature of the derivatives. Considering that P 12 (x 1 , x 2 ) ≥ 0 for all x 1 , x 2 and P 12 (x 1 , x 2 ) = 0 for We also obtain (∂P 12 (x 1 , In summary, we have By a change of notation x 1 → r 1 and x 2 → r 2 , we obtain Eq. (B9). By integrating over r 2 on both sides, we obtain a hierarchical equation for the one-body PDF P 1 (r 1 ) ≡ dr 2 P 12 (r 1 , r 2 ) as where J s 12 (r 1 ) is the transaction probability per unit time as bidder (s = +1) or asker (s = −1). The first and second terms on the right-hand side account for the self-diffusion and collision terms, respectively. This is a lowest-order BBGKY hierarchical equation for the special case of N = 2. Remarkably, the collision term has a quite similar mathematical structure to the collision integral in the conventional Boltzmann equation.

BBGKY hierarchical equation for many-body problem: N 1
We have derived the hierarchical equation for the one-body PDF for the special case N = 2. Here we extend the hierarchical equation for the many-body problem with N 1. We first assume that the number of traders N is sufficiently large that the spread distribution ρ(L) can be approximated as a continuous function. The one-body and two-body PDFs conditional on buy-sell spread L and L are denoted by φ L (r) and φ LL (r, r ), respectively. We note the relations P i (r i ) = φ Li (r i ) and P ij (r i , r j ) = φ LiLj (r i , r j ) hold for the one-body and two-body PDFs P i (r i ) and P ij (r i , r j ) for the traders i and j, considering the symmetry between traders. Within the spirit of the Boltzmann equation, the dynamical equation for the one-body distribution φ L (r) can be decomposed into two parts: with the self-diffusion term (σ 2 /2)(∂ 2 φ L /∂r 2 ) and the collision integral C(φ LL ). By extending the collision term in Eq. (B16) for large N 1, we can specify the collision integral as with the collision probability per unit time as bidder (s = +1) or asker (s = −1) against a trader with spread L . This is the Boltzmann-like equation, Eq. (6). We note that this BBGKY hierarchical equation can be systematically derived via the pseudo-Liouville equation. The derivation will be given in another technical paper in preparation [38].

Boltzmann-like equation for finance
We next derive a closed equation for the one-body distribution function φ L by assuming a mean-field approximation. Let us truncate the two-body correlation (i.e., molecular chaos in kinetic theory), A closed mean-field equation for the one-body distribution φ L is thereby obtained, with the mean-field collision probability per unit time as bidder (s = +1) or asker (s = −1) Equation (B20) is a closed equation for the one-body distribution function, and corresponds to the Boltzmann equation in molecular kinetic theory. Equation (B20) can be analytically solved for N → ∞, and the steady solution ψ L (r) is given by the tent function, Here, a technicality on the appropriate boundary condition will be summarized in another technical paper in preparation [38]. Note that the tent function (B22) for the traders' midprice order book implies the tent functions for both bid and ask order books in shifted coordinates (see Fig. 9 for a schematic). The average order-book profile for the ask side f A (r) is then given by convolution with the tent function, We discuss here the intuitive meaning of the mean-field solution (B23). The mean-field solution (B23) is exactly zero at r = ±L/2 as ψ L (+L/2) = ψ L (−L/2) = 0, implying that the edge points r = ±L/2 effectively play the role of hopping barriers at which the particle hops into r = 0. Indeed, Eq. (B22) gives exactly the same solution to the problem of the Brownian motion confined by hopping barriers, as shown in Sec. C 2. This is a reasonable result for the N → ∞ limit, where the market is sufficiently liquid and most of the transactions occur just around r = ±L/2. If the spreads are distributed in accordance with the γ-distribution, as empirically studied in the main text, the average order-book profile is given by To check the validity of this formula, we performed Monte Carlo simulations of the microscopic model (B1) (Fig. 10a), where the theoretical formula (B24) works for various N . In the figure, we denote the relative price by r c.m. to stress that it is defined from the c.m. as r c.m. ≡ z − z c.m. . (r c.m. ) is theoretically more tractable than f mid A (r mid ). Here r mid ≡ a i − z M is the relative distance from the market midprice z M for the ask price a i of the ith trader. Fortunately, they are asymptotically equivalent for the large N limit and the above formulation is sufficient in understanding the average order-book f mid A (r mid ) from the market midprice: To validate this asymptotic equivalence, we numerically demonstrate the average order-book profile f mid A (r mid ) from the market midprice z mid in Fig. 10b. This figure numerically shows that the average order-book formula (B24) is valid even for the order-book from the market midprice.

c. Statistics of transaction interval
We comment on the statistics of the transaction interval τ . In the mean-field approximation, the average of the transaction interval is given by which is phenomenologically derived in Sec. C 1 and is numerically validated in Fig. 10c. Note that this formula can be derived from the pseudo-Liouville equation more systematically [38]. Based on the average transaction interval (B26), the CDF for the transaction interval P (≥ τ ) is approximately given by the phenomenological formula, with transaction interval PDF P (τ ). This formula is derived in Sec. C 2 and is numerically validated in Fig. 10d.

Langevin-like equation for finance
Here we derive phenomenologically the fundamental equation for the financial Brownian motion, which corresponds to the conventional Langevin equation. Let us denote the T th transaction price by p(T ) and the T th price movement by ∆p(T ) ≡ p(T + 1) − p(T ). Here we focus on the effects of trend following obtained from Eq. (B5), which induces inertia-like collective motion of the macroscopic dynamics. The dynamical equation for the price movement is thus given by where τ (T ) is the time interval between the T th and (T + 1)th transaction. The first and second terms originate from the trend following and random noise, respectively. Note that the statistics of τ (T ) is derived from the mesoscopic model (B20) as Eqs. We next study the price movement distribution using the financial Langevin equation (B28), which is however a stochastic difference equation that cannot be solved exactly. Nonetheless, its qualitative behavior can be assessed approximately by making the following two assumptions.
(ii) The average movement by trend following ∆z * ≡ cτ * is much larger than the saturation threshold: ∆z * ∆p * .
Under condition (i), the qualitative behavior is governed by the statistics of the transaction interval τ (T ). Under condition (ii), furthermore, the hyperbolic function in Eq. (1) can be approximated for large fluctuations as tanh(∆p/∆p * ) ≈ sgn(∆p) and the term ζ(T ) is irrelevant for the tail. Based on Eq. (B27) for the transaction interval τ , the price distribution P (≥ |∆p|) is approximately obtained for |∆p| → ∞ as with an estimated decay length of κ ≈ 2∆z * /3. The validity of this formula was numerically checked in Fig. 11.
8. Numerical analysis of the layered order-book structure for the HFT model Here we show the detailed analysis to study the layered order-book structure for the HFT model according to the method in Ref. [15]. the instant of an order submission and cancellation for the bid (ask) side, with bin-width of 1 tpip we measured the relative depth r from the market midprice z M , defined as r ≡ z M − b i (r ≡ a i − z M ), and we incremented the numbers of c − r (c + r ) and a − r (a + r ) by one, respectively. We then accumulated their numbers between T and T + 1 tick to obtain one sample of N − . We also study the movement of the market price ∆p(T ) ≡ p(T + 1) − p(T ) and calculated Pearson's correlation coefficient C − r (C + r ) between ∆p(T ) and N − r (T ) (N + r (T )) as shown in Fig. 5g. The crossover point was estimated to be γ c ≈ 16.5 tpip in the numerical simulation. We next study the linear correlation between the number change N inner in the inner layer and the price movement ∆p. Between T and T + 1 tick, we take a sample of both N − r (T ) and N + r (T ), and calculated their integral in the inner layer as N inner (T ) ≡ We show the empirical layered structure of the HFTs' order book in our data set. According to the essentially same method in Sec. B 8, we have calculated the layered structure as shown in Fig. 12a and b. The volume change in the inner layer N inner (T ) has a significant correlation of Pearson's coefficient 0.616 with the price movement ∆p(T ).
For consistency throughout this Letter, we have focused on the best prices of HFTs for the correlation analysis in Fig. 12a and b. In other words, we incremented c − r and c + r when the newly quoted price was the best price of the trader. Also, we incremented a − r and a + r when the price of the canceled order was the best price of the trader.

Numerical comparison with the zero-intelligence order-book models
Here we compare our empirical findings with the zero-intelligence order-book (ZI-OB) model [18][19][20]. The basic ZI-OB model is the uniform decomposition order-book model introduced in Ref. [18,19], where both submission and cancellation are assumed to obey the homogeneous Poisson processes. To understand real average order-book profiles, in Ref. [20], the uniform submission rate was replaced with a real nonuniform submission rate obeying a power law. Here we study the improved ZI-OB model in Ref. [20] from the viewpoint of the consistency with our empirical findings. The inputs to the ZI-OB model are the following three components.
1. Submission rate density µ(r mid ): limit order submissions are assumed to obey the inhomogeneous Poisson process characterized by the submission rate µ(r mid ) with relative depth r mid from the market midprice. In other words, a new limit order is submitted in the range [r mid , r mid + dr mid ] between time interval [t, t + dt] with probability of µ(r mid )dr mid dt.
The empirical submission histogram in our data set is depicted in Fig. 12c, showing a power-law tail with exponent 2.9. For our numerical implementation, the limit order submission rate is directly fixed from the  [18][19][20] to examine its consistency with the empirical findings in the main text. (a-d) We first studied a simulation under a realistic parameter set satisfying (N vol , QFR) ≈ (100, 5%). The price movement obeyed the Gaussian law (Fig. a), which is contradictory to the exponential law in our data set. The average order-book profile did not quantitatively fit the real order book during this week (Fig. b). The layered order-book structure was not also observed ( Figs. c and d). (e-h) By adjusting the market order rate ω, we attempted to fit the real order-book profile by the ZI-OB model. Though the average order-book profile was replicated by the ZI-OB model by parameter adjustment (Fig. f), neither the price movement statistics nor the layered order-book structure were consistent with our data set instead (i.e., Fig. e shows the power-law statistics for ∆p and Figs. g and h shows the absence of the layered structure). We also note that the adjusted parameter implies QFR ≈ 75%, which is over ten-times larger than the real QFR. In this sense, the ZI-OB model was not consistent with the empirical findings under realistic parameters. empirical submission histogram for r mid ≥ 0. The total submission rate is given by characterizing the frequency of total submissions. The gender of order (i.e., buy or sell) is randomly selected with equal probability.
2. Cancellation rate λ: any order is assumed to be cancel according to the homogeneous Poisson process with intensity λ. In other words, an order is canceled between time interval [t, t + dt] with probability of λdt.
3. Market order rate ω: market orders are assumed to obey the Poisson process with intensity ω. In other words, a buy or sell market order is submitted between interval [t, t + dt] with probability of ωdt.
The gender of order is randomly selected with equal probability.
These parameters characterize the order-book dynamics in the steady state. For example, the average total order-book volume N vol in both sides and the QFR (i.e., the probability for an order to be transacted finally) are given by respectively. These relations are deduced from the conservation of order flux in the steady state: µ tot ≈ λN vol + ω.

a. Numerical simulation with realistic parameters
We first consider a numerical simulation based on realistic parameters in our data set with minimum price precision of 1 tpip. The submission rate density µ(r mid ) is directly fixed from the empirical submission histogram in our data set (Fig. 12c). The cancellation and market order rates are fixed as λ/µ tot = 9.5 × 10 −3 and ω/µ tot = 0.05 to satisfy N vol ≈ 100 and QFR ≈ 5%. Though these parameters were realistic in our data set, the numerical results in Fig. 13a-d were not consistent with the empirical findings. Indeed, the price movement in the ZI-OB model obeyed the Gaussian statistics (Fig. 13a), which is different from the empirical exponential law in our data set. The numerical average order-book profile did not quantitatively fit the real order-book profile (Fig. 13b). In addition, the layered structure of the order book did not emerge from the ZI-OB model (Fig. 13c and d). To replicate these empirical findings, in particular the layered order-book structure, we conjectured that the collective motion of the limit order book (i.e., the microscopic trend-following behavior) needs to be incorporated with conventional order-book models.

b. Numerical simulation with adjusted parameters
In Ref. [20], the possibility of the ZI-OB model was studied to fit realistic order-book profiles by adjusting parameters. In the same way, we here seek the possibility to adjust the model parameters in replicating the real order-book profile in our data set. By fixing the average order-book volume as N vol ≈ 100, we adjusted the market order rate ω as a fitting parameter to replicate the real order-book profile in our data set (see Fig. 13e-h). By inputting ω/µ tot = 0.75 (or equivalently QFR ≈ 75%), the ZI-OB model replicated the real order-book profile as shown in Fig. 13f. Instead, however, other numerical results of the ZI-OB model were not consistent with the exponential price movement statistics (Fig. 13e for the power-law price movement) nor the layered order-book structure ( Fig. 13g and h). In addition, the parameter adjustment implied QFR ≈ 75%, which was over ten-times larger than the real QFR (see Sec. A 4). We thus conclude that our empirical results were not consistently replicated by the ZI-OB model under realistic parameters, at least in our data set.
Appendix C: Technical Issues for Derivation

Brownian motion confined by hopping barriers
In this subsection, we study the Brownian motion confined by the hopping barriers at r = ±L/2 (see Fig. 14 for a schematic). Let us assume that a particle moves randomly in the absence of collision for r ∈ (−L/2, L/2). We then place hopping barriers at r = ±L/2, and we assume that the particle moves to the origin r = 0 after collisions. The particle's position r(t) then obeys the dynamical equation whereη R is the white Gaussian noise with unit variance, andη T + andη T − are respectively the jump terms originating from the hopping barriers at r = ±L/2. Here, the kth collision times τ + k and τ − k at the barriers r = ±L/2 satisfy the relation r(τ ± k ) = ±L/2. In a parallel calculation to that in Sec. B 4, the dynamical equation for the probability distribution function P (r) is given by [J s (r − sL/2) − J s (r)], J s (r) = σ 2 2 δ(r + sL/2)|∂ s P (r)|. (C2) Collision & Hopping FIG. 14. Schematic of Brownian motion confined by the hopping barriers at r = ±L/2. When the Brownian particle collides with the hopping barriers, the particle hops to the origin r = 0.
during the time interval T yields n L = T /(L 2 /4σ 2 ), when T is sufficiently large. The total number of collisions n tot is then given by where there are duplicate counts because any transaction occurs as a binary collision. Considering the duplicate counts, the mean transaction interval τ * for the whole system is given by τ * = T /(n tot /2), which implies Eq. (B26).

Transaction interval distribution
The phenomenological estimation of the cumulative distribution for transaction interval (B27) is presented here. Let us assume that the arrival-time intervals of a bidder and an asker at the center of mass obey the Poisson statistics: with the characteristic time interval a. P A (τ A ) (P B (τ B )) and P A (≥ τ A ) (P B (≥ τ B )) are the PDFs and CDFs of arrival time intervals for an asker (a bidder), respectively. We also assume that the transaction occurs when both bidder and asker arrive at the center of mass. This picture implies that the transaction interval τ is approximately given by where we have used a formula for the order statistics [42]. Considering the consistency between Eq. (C5) and the mean transaction interval (B26), we obtain the self-consistent condition τ * = 3a/2. Equation (B27) then follows.