Can we infer microscopic financial information from the long memory in market-order flow?: a quantitative test of the Lillo-Mike-Farmer model

In financial markets, the market order sign exhibits strong persistence, widely known as the long-range correlation (LRC) of order flow; specifically, the sign correlation function displays long memory with power-law exponent $\gamma$, such that $C(\tau) \propto \tau^{-\gamma}$ for large time-lag $\tau$. One of the most promising microscopic hypotheses is the order-splitting behaviour at the level of individual traders. Indeed, Lillo, Mike, and Farmer (LMF) introduced in 2005 a simple microscopic model of order-splitting behaviour, which predicts that the macroscopic sign correlation is quantitatively associated with the microscopic distribution of metaorders. While this hypothesis has been a central issue of debate in econophysics, its direct quantitative validation has been missing because it requires large microscopic datasets with high resolution to observe the order-splitting behaviour of all individual traders. Here we present the first quantitative validation of this LFM prediction by analysing a large microscopic dataset in the Tokyo Stock Exchange market for more than nine years. On classifying all traders as either order-splitting traders or random traders as a statistical clustering, we directly measured the metaorder-length distributions $P(L)\propto L^{-\alpha-1}$ as the microscopic parameter of the LMF model and examined the theoretical prediction on the macroscopic order correlation: $\gamma \approx \alpha - 1$. We discover that the LMF prediction agrees with the actual data even at the quantitative level. Our work provides the first solid support of the microscopic model and solves directly a long-standing problem in the field of econophysics and market microstructure.

Introduction.Can a statistical-physics approach help in understanding macroscopic phenomena in financial markets from their microscopic dynamics [1,2]?In posing this challenging thought, physicists have greatly benefitted from recent high-frequency data for econophysics modelling of market microstructure [3,4], even at the level of individual traders [5,6].In this Letter, we provide the first quantitative evidence of a historic econophysics theory regarding the long-range correlation (LRC) in the market order flow [7][8][9].
Let us briefly review the trading rules in recent financial markets, where traders have two options.The first option is the limit order, by which traders provide the market liquidity and show the potential prices at which they are willing to transact.The second option is the market order, by which traders immediately consume the liquidity and transact at the best prices (i.e., the highest bid or the lowest ask prices).This Letter tests an econophysics microscopic model for the market-order flow, particularly on their statistical persistence.
The strong persistence of the market-order flow underscores an established empirical law in financial markets [3,8,9]: i.e., once a buy (sell) market order is observed, a buy (sell) market order is likely to be observed (see Fig. 1).This predictability regarding market orders is mathematically characterised by a power-law decay for the order-sign autocorrelation function (ACF): for large time-lag τ ≫ 1.Here ϵ(t) is the market order sign at time t defined by ϵ(t) = +1 (ϵ(t) = −1) for the buy (sell) market order, ⟨. ..⟩ represents the ensemble average, c 0 is the prefactor, and γ is the power-law exponent for the LRC.Because it is ubiquitously observed across broad markets, the LRC is believed essential to a market microstructure.
Then, what is the microscopic origin of the LRC as a macroscopic phenomenon?One promising response is the order-splitting hypothesis for individual traders' behaviours [7] (Fig. 1(a)).This hypothesis claims the LRC appears because some traders split large metaorders into a long series of small child orders.Because all the child orders share for a while the same sign, there is weak predictability of the future order sign, which is ultimately reflected in the powerlaw decay of the ACF as summarised in Eq. (1) (Fig. 1(b)).Furthermore, Lillo, Mike, and Farmer (LMF) proposed a simple microscopic theory based on the order-splitting hypothesis.They assumed (i) the presence of splitting traders (STs), and (ii) the power-law probability density function (PDF) for the metaorder length L such that P (L) ∝ L −α−1 with microscopic exponent α > 1.By assuming random order submissions, the ACF macroscopically exhibits a power-law decay (1).Specifically, they showed which in this Letter we refer to as the quantitative LMF prediction.The prediction ( 2) is beautiful and quantitatively powerful because it connects the macroscopic and microscopic parameters in alignment with the central spirit of statistical physics.
While the plausibility of this scenario was confirmed qualitatively in [10] (i.e., a decomposition of the ACF into an ordersplitting component and the remainder), the detailed verification of the quantitative prediction (2) has been missing for 18 years.The original LMF paper [7] reported an initial attempt to test their prediction.However, they only confirmed a minimum consistency of their theory (i.e., the theoretical line passes through the centre of the mass in the scatterplot; see Fig. 5 and Sec.III B in [11] for a brief review) when lacking FIG.1: Schematic of the LRC of the market-order flow and the order-splitting hypothesis (particularly, the LMF model); as a shorthand notation for +1 (−1), "+" ("−") signifies a buy (sell).(a) As a microscopic model, we assume the presence of STs.Also, STs successively submit the child orders with the same sign for L times, where L is called the metaorder length and obeys power law statistics, P (L) ∝ L −α−1 .(b) Consequently, the LRC appears as a macroscopic phenomenon.The LMF theory predicts a quantitative relation γ = α − 1, which we empirically establish in this Letter through data analysis.
suitably large datasets.
In this Letter, together with companion paper [11], we solve this long-standing econophysics problem precisely by analysing a large comprehensive microscopic dataset of the Tokyo Stock Exchange (TSE).We accessed a special microscopic dataset, including trading-account identifiers (IDs) on the TSE, enabling us to track effectively the behaviour of trading accounts.Using our microscopic dataset, we first applied a strategy clustering of individual traders to test assumption (i).In regard to market orders, and after classifying all traders as STs or random traders (RTs), we confirmed the presence of STs in most of the TSE markets.We next studied the empirical metaorder-length PDF P (L) to test assumption (ii), which we validated from our dataset.With the measured microscopic parameter α, we generated a scatterplot between α and γ to test the quantitative LMF prediction (2).Finally, we found the prediction (2) agreed with our dataset, providing quantitatively the first solid support for the LMF model as the minimal microscopic description of the order-splitting behaviour.As the last discussion, we estimate the total number of the STs from the observed prefactor c 0 .Our findings imply that the long memory in the market-order ACF is useful in inferring microscopic financial information.
Data description.Let us briefly describe our dataset provided by the Japan Exchange (JPX) Group, the platform manager of the TSE.The TSE being the biggest stock market in Japan, our dataset covers all the order flows in the TSE (market orders, limit orders, and cancellations), enabling us to track their complete life cycle for all the stocks for nine years (from the 4th January 2012 to the 30th December 2020).Furthermore, this dataset includes virtual server IDs (VSIDs), a unit of trading accounts on the TSE.The VSID is not technically equivalent to the membership ID, because any trader may have several VSIDs.However, we can effectively define trader IDs to track individual trader behaviour with high resolution by appropriately aggregating VSIDs [12,13] (e.g., if a limit order is submitted by VSID 1 and is cancelled from VSID 2, both VSIDs are associated with the same trader); see also [11] for more technical details.
Our study focused on the sign sequences of market orders during double auctions from 09:00-11:30 and 12:30-15:00 Japan Standard Time.A yearly segmented order-sign sequence was extracted for each stock to obtain one market datapoint.We only used datapoints with more than 0.5 million transactions and removed transaction data from the opening and closing ten minutes of auctions to suppress intradayseasonality effect.
Assumptions of the LMF model.As summarised in Fig. 1, there are two key assumptions in the LMF model: (i) the presence of STs who have large latent demand (metaorders) and split them into small child orders, which are assumed to share the same sign for L successive times, and (ii) the metaorder length L obeys a power law P (L) ∝ L −α−1 with α > 1.
In previous literature, there was no solid direct evidence of assumption (i), although [10] shows indirect but promising evidence based on the ACF decomposition.Also, the plausibility of assumption (ii) was studied in [7] by analysing the off-book data for the London stock exchange market as an "imperfect proxy".However, with the absence of appropriate datasets at that time, the precise estimation of α became a technical problem for LMF verification.To verify assumptions (i) and (ii) directly, it is necessary to identify STs by strategy clustering at the level of individual traders and then study their metaorder-length PDF to measure α precisely.
Presence of STs.We proceeded with strategy clustering to identify STs.We studied the order-sign sequence for each ST (Fig. 1(a)) to construct the metaorder length L by defining L as a length of successively equal signs.Concerning exceptional handling, if there was more than one business day between two successive orders, we assume they belong to different metaorders [14] to avoid overestimating metaorder length.
For a given metaorder-length sequence, we apply the bi- nent α for all the markets.The power-law exponents were evaluated systematically using Clauset's algorithm [15,16] across all the markets.The exponent α typically satisfies 1 < α < 2, consistent with the standard assumption for the LMF model.
nomial test for strategy clustering; the null hypothesis is that the order-sign sequence is purely random (obeying a symmetric Bernoulli process) and, thus, the trader belongs to the RT set.The trader is regarded as an ST if the null hypothesis is rejected with a significance level θ := 0.01.
On the basis of this clustering scheme, we identified the ST set for each market datapoint.With summary statistics across all the markets during nine years, we evaluated the empirical PDF for the ST percentage in each market [Fig.2(a)], and the contribution to market orders from the ST set [Fig.2(b)].We concluded that typically a quarter of all traders are STs, but they dominate the total market orders.Via this strategy clustering, we thus validated assumption (i) directly.
Metaorder-length PDF.Having identified the set of STs, we measured the aggregated empirical PDF for the metaorder length of all STs.Most of the aggregated complementary-cumulative distribution functions (CCDFs) for the metaorder length of STs obey a power law P > (L) ∝ L −α with the CCDF defined by P > (L) := ∞ L P (L ′ )dL ′ .As a typical example, we plotted the metaorder-length CCDF for the Toyota Motor Corporation (with ticker number 7203) in 2020 [Fig.3(a)]; it features a power-law asymptotic tail for large L. We then evaluated α using Clauset's algorithm [15,16] to plot the empirical PDF of α [Fig.3(b)] across all the stocks.Typically, the exponent α is distributed over values 1 < α < 2, in agreement with the standard assumption for the LMF model.We thus validated assumption (ii) for our dataset.
Power-law exponent in the ACF.Having measured the microscopic power-law exponent α, we next measured the macroscopic power-law exponent γ of the ACF, which we did by fitting directly the sample order-sign ACF as follows (see [11] for details of the method): We first calculated the sample ACF from C sample (τ ) := Although the NLLS estimator γ NLLS gives numerical consistency for the LMF model, we noticed that the NLLS estimator γ NLLS has a finite-sample-size bias.To remove this bias, we constructed heuristically an approximate unbiased estimator γ unbiased based on the LMF model (see companion paper [11] for details).For this Letter, we used this unbiased estimator γ unbiased for the final scatterplot.
As a robustness check, we also measured the power-law exponent γ via the power-spectral density (PSD) method (see [11]).The exponent measured by the ACF and PSD fittings are respectively denoted by γ Scatterplot.Having evaluated the microscopic and macroscopic power-law exponents α and γ using our huge TSE dataset, we are ready to draw the scatterplot between α and γ and test the LMF prediction (2).As the main result, we provide the scattered boxplots (Figs.4(a) and (b) for the ACF and PSD methods, respectively) between α and γ unbiased with focus on the range 1 < α < 2 in accordance with the standard LMF assumption [21].These figures exhibit excellent agreement with the theoretical line (2).From these figures, we conclude that with our microscopic dataset the LMF prediction (2) has quantitative validity.
Discussion on the prefactor.While we extracted the microscopic information α from the ACF power-law exponent γ via Eq.( 2), is it possible to extract other microscopic information from the prefactor c 0 ?The LMF theory predicts c 0 ≃ N α−2 ST /α with the total number of the STs N ST , implying that N ST can be estimated by the LMF estimator where γ and c 0 can be observed from publicly data.
Note that original LMF work made an assumption of the homogeneiety of order-splitting intensities {λ (i) } i among traders in [7], such that λ (i) = 1/N ST for all i.While we noticed that this homogeneiety assumption is unrealistic, we tested this prediction in our dataset by drawing the scattered boxplots (Fig. 4(c) and (d) based on the ACF and PSD methods, respectively) between log 10 N 1−γ ST and the LMF estimator log 10 N LMF ST 1−γ with the finite-sample size bias removed (see Ref. [11]).We find that the LMF estimator N LMF ST is highly correlated with the true value N ST , implying that the ACF prefactor is a useful resource to infer N ST .At the same time, the LMF estimator N LMF ST systematically underestimates the true value N ST , such that N LMF ST ≲ N ST .Interestingly, our finding is consistent with a generalised LMF model with the heterogeneous intensity distribution {λ (i) } i .Indeed, Ref. [17] shows the ACF formula (3a) is nonrobust but sensitive to the heterogeneous intensity distribution, while the power-law-exponent formula (2) is robust.Furthermore, the LMF estimator N LMF ST is shown to provide the lower bound of the true value of N ST , such that showing the consistency with Fig. 4(c, d).Thus, we have successfully confirmed the qualitative validity of the LMF picture even for the estimation of N ST , while for better quantitative understanding it might require theoretical updates regarding the heterogeneity of trading strategies.

Conclusion.
Although the power-law memory character in the order-sign ACF has been a central issue in econophysics, and with the absence of an appropriate huge microscopic dataset, no quantitative evidence had been provided for the corresponding microscopic model (the LMF model).In this Letter, we have provided the first solid evidence for the LMF model at the quantitative level (2) at least for the TSE market and, thus, solved this long-lasting problem.
Let us briefly discuss the implication of our findings.Our work shows that the microscopic parameters α and N ST (usually unobservable because its direct estimation requires special microscopic datasets like ours) can be inferred via the LMF predictions ( 2) and ( 3), where γ and c 0 are observable even for public data.This is reminiscent of Einstein's theory for physical Brownian motions: Avogadro's number N A (unobservable) was indirectly estimated from the thermal fluctuations via the Einstein relation for the diffusion constant.The LMF theory can play a similar role in inferring microscopic financial parameters from financial fluctuations.The microscopic parameter set (α, N ST ) quantifies how the latent demand is hidden in the long term.For markets with small α, the revealed liquidity on the limit-order book is insufficient for liquidity takers, and takers have no choice but to split their large metaorders into a longer series of child orders (see also [3] for a standard interpretation of the order-splitting behaviour from the viewpoint of practitioners).In this sense, markets with smaller α and large N ST might not be liquid enough because many large institutional investors are waiting for the liquidity to replenish during their order-splitting.This characteristic of liquidity has not been captured in practice through conventional metrics such as market spread (the difference between the best bid and ask prices), market depth (the typical volume size at the best prices), and market impact (the average price movement after a market order).Thus, the parameter set (α, N ST ) is a new measure quantifying how the market is potentially illiquid due to the hidden demand by large institutional investors.
Remarkably, successful strategy clustering was the key to our data analysis at the individual trader level in revealing the market ecology from a microscopic viewpoint.This research direction aligns with the previous literature [18][19][20] proposing the need of market-ecology analyses.We believe that this direction of research holds promise, particularly for econophysics and sociophysics modelling [4] at it benefits from recent microscopic financial datasets.
YS was supported by JST SPRING (Grant Number JP-MJSP2110).KK was supported by JST PRESTO (Grant

FIG. 2 :FIG. 3 :
FIG. 2: Presence of the STs by our strategy clustering.(a) Empirical PDF for the percentage of STs in each market, showing direct evidence of the presence of STs.Typically, 25% of all traders were STs.(b) Empirical PDF for STs' contribution to market orders in each market.Typically, 80% of all the market orders were issued by the STs, implying their overwhelming contribution to market orders.
methods exhibit reasonable and consistent results, implying the statistical robustness of our results.

FIG. 4 :
FIG. 4: (a, b) Scattered boxplots between α and γ with the median, the first and third quartiles for (a) the ACF and (b) the PSD methods, exhibiting excellent agreement with the LMF prediction (2) (black line).γ was evaluated using the approximate unbiased estimator γ unbiased , based on the NLLS estimator and the LMF model.(c, d) Scattered boxplots between the LMF estimator N LMF ST and the actual total number of the STs NST for (c) the ACF and (d) the PSD methods for the datapoints with α < 2. The LMF estimator is highly correlated with the true value of NST as the classical theory predicts, but systematically underestimates NST, such that N LMF ST ≲ NST.This observation is consistent with a generalised LMF theory [17] with heterogeneous intensities {λ (i) }i.