Data-driven analysis of annual rain distributions

Rainfall is an important component of the climate system and its statistical properties are vital for prediction purposes. In this study, we have developed a statistical method for constructing the distribution of annual precipitation. The method is based on the convolution of the measured monthly rainfall distributions and does not depend on any presumed annual rainfall distribution. Using a simple statistical model, we demonstrate that our approach allows for a better prediction of extremely dry or wet years with a recurrence time several times longer than the original time series. The method that has been proposed can be utilized for other climate variables as well.


I. INTRODUCTION
Terrestrial precipitation is a crucial component for the survival of humans and has a significant impact, e.g., on water supplies, river flow, and dryland farming.Additionally, precipitation has a significant impact on water usage in irrigated agriculture, industry, and hydroelectric power generation.It is thus important to be able to make reliable quantitative predictions of rain distributions, to design and manage irrigation systems, etc.One may be particularly interested in estimating the likelihood of rare events such as droughts or floods, which can have devastating effects.These problems are becoming of growing importance, and in particular, water scarcity is a significant problem faced by many societies today [1].According to the recent sixth IPCC (Intergovernmental Panel on Climate Change) report, about half the world population faces severe water scarcity each year due to climate and other factors [2] and humans are contributing more to extreme weather, especially with regard to drought and precipitation [3].It was also predicted that global warming will lead to increased heavy precipitation and worsened droughts in certain regions [3].Throughout history, civilizations have collapsed and settlements have shifted due to extreme changes in precipitation [4][5][6][7][8][9].
Accurately understanding the accumulated precipitation statistics over extended periods such as annual rainfall, as focused on in this work, is crucial, e.g., for efficient water management and crop planning.Typically, forecasts are based on a statistical analysis of the precipitation time series [10].This approach is expected to work reasonably well because the annual rain statistics appear relatively robust to current climate change in some regions.For instance, the annual rain over most of the global land surface was found to be fairly stationary over the last century [11,12], although over shorter timescales, extreme weather events are becoming likelier [3,13].
With regards to precipitation in Israel which is investigated in this paper, one study showed a change over the period  in the shape parameters of fitted rain distributions in Israel [14], although a subsequent study showed no significant trend in Israel [15].A third study showed mixed results [16].Yet another study showed a shortening of the rainy season in Israel [17].Our analysis (not shown) indicates a drop in annual precipitation in Israel, between 1980 and 2019 (including) where only 5 stations out of 110 exhibited increasing trends.The reason for the above, somehow contradicting, studies may be rooted in the fact that the different studies analyzed different periods, capturing different parts of much slower decadal and centennial cycles.
Predicting the likelihood of atypical annual rainfall events, like extreme droughts, has been a challenge for existing forecasts.This is because such events have occurred only a few times in the historical data or not at all.One could attempt to do this by fitting the data to some standard distribution [18,19] (often, gamma distributions are used), and then extrapolating the fitted distribution to its tails.However, this would be problematic for two reasons, which are both related to the fact that reliable historical rainfall data exists for the last century or two at most.First, different studies used different distributions to fit the annual rain data where the choice was arbitrary to some degree; the goodness of fit may be of similar level when the number of observations (years) rarely exceeds 100; see for example Fig. 1 and Fig. S1.Second, the fit is performed on the main part of the distribution, describing typical fluctuations.However, the fit is expected to break down in the distribution tails, as for a wide variety of statistical models in which the distributions display very different behaviors in the central part and the tails [20][21][22][23][24].
One alternative approach would be to use climate models to make the predictions [25][26][27][28][29].This approach has some merit; however, one may still be concerned about how well the models correctly describe the real system, especially in its large-deviation regime.Moreover, highresolution models (e.g., atmospheric General Circulation Models, GCMs) are computationally expensive to run for extended periods of simulation time, which is necessary for predicting rare events (this difficulty can be some-what mitigated by using large deviations simulation algorithms [25][26][27][28][29]).Another approach is to combine data from many different locations.This increases the amount of data considerably but involves assuming that different locations behave similarly.For instance, one can assume that, up to scaling factors, the annual rainfall in different locations follows the same distribution.This is, however, not the case for annual precipitation in Israel since the spatial variability is relatively large-see Supplementary Table S1.
In this paper, we develop an alternative method for predicting annual rainfall, which is based solely on measured data, without a need for performing any fits on observed distributions (using one of many distributions that were used to fit rain data).Our method does not involve any parameters that need to be fitted to observed data either.The method exploits the timescale separation between the correlation time of weather (synoptic) events-typically on the order of one week-and the timescale of one year over which the rain accumulates.It enables us to make quantitative predictions for annual rainfall statistics, outside the central part of the distribution.In particular, it enables us to predict the likelihood of extremely dry or wet years beyond observed data.
While we focus on annual rain distributions in Israel, our method is quite versatile, and can be used for any geographical region, and for any quantity that is accumulated/averaged over a period of time much longer than the typical correlation time of the dynamical system in question.The method is based on dividing the time series under consideration into time windows of intermediate duration, which are approximated to be uncorrelated.Some modifications may needed to apply the method, e.g., on seasonally averaged temperatures [27,28,30].

II. ANNUAL RAINFALL DISTRIBUTION
Let us consider, as a particular example, the annual rainfall in Sede Boker in Israel, for the period 1952-2022; see Fig. 1a.The data of this station as well as the data of 15 additional stations are all downloaded from the Israeli Meteorological Service web page https://ims.gov.il/en/datagov; see Supplementary Table S1 and Fig. S1.Sede Boker is located in the Negev desert, in an extreme arid region with an average annual rainfall of ≈90 mm/year.The annual rainfall time series does not appear to follow a clear increasing or decreasing temporal trend; this is also the situation for the other 15 stations' data we studied.Following this observation and Hasselmann [31] we have calculated the correlation coefficient R 2 of the annual precipitation time series (i.e., 1year lag cross-correlation) and found no significant correlations in all 16 stations under consideration.Moreover, the rainfall of consecutive months also appears uncorrelated (Fig. 1e,f).We thus approximate the rainfall of the different months as independent random variables.The goal of this work is to estimate, based on the data, the PDF P (r) that describes the statistics of the annual rainfall r in a given location (e.g., Sede Boker).
A simple estimate P h (r) for the annual rain PDF is obtained by constructing a histogram of the annual rainfall distribution; see Fig. 1b,c.However, since this histogram is not based on a very large number of years (typically less than 100 years), it does not give a good estimate of the P (r); see the bi-modal distribution presented in Fig. 1b (and the PDFs of the other stations analyzed here, Fig. S2) while the PDF is expected to be uni-modal.As mentioned in the Introduction, a common practice is to fit P h (r) to a known distribution (like the gamma or some other standard distributions); such a fit (i.e., fit to the gamma distribution) is plotted in Fig. 1b,c.One can then treat this fit, P γ (r), as a prediction for the true PDF P (r).However, the choice of the fitted distribution is in some sense arbitrary and may lead to different predictions, especially of extreme precipitation years.In addition, it is difficult to assess the reliability of such predictions.This difficulty becomes especially pronounced in the tails of the distribution, where the probabilities are very small, making it impossible to quantitatively test the accuracy of P γ (r) due to insufficient data.

A. The convolution PDF method
The key idea behind the method we propose is to use additional information, beyond the total annual rainfall data; an example of annual rain time series is plotted in Fig. 1a and Fig. S1.Indeed, rainfall data is often recorded at much higher resolutions (like 10 minutes), and as we show below, one can make use of this additional information to predict the annual rain PDF to much higher accuracy, including, to a certain extent, the prediction of extreme annual precipitation corresponding to the tails of the distribution.
We exploit the separation of scales between the typical timescale τ for correlations in weather data (typically, of the order of one week) and the much larger timescale of an entire year.We do this by dividing the year into intermediate units of time that are much shorter than a year but still much longer than τ .A convenient choice we adopt here is to take these time units to be the 12 months of the year, but in principle, we would expect any choice between ∼ 15 and ∼ 60 days to work reasonably well.(One advantage of dividing the data into months is that some historical rainfall data is given at a monthly resolution.)Our method can be summarized as follows: We write the annual rainfall as the sum r = r 1 + • • • + r 12 of the monthly rainfalls r i , where r i are regarded as statistically independent.We tested our approximation of absence of correlation between the monthly rains by calculating the R 2 of the Feb. versus Jan. rains and Mar.versus Feb. rains (Fig. 1d,e); the R 2 of all 16 stations under consideration (Supplementary Table S1) is very low, indicating the absence of correlations, validating our approximation.We note that precipitation in Israel falls  S1).Note the very small R 2 , indicating the absence of correlations between the different months.
during the winter months (Nov.to May) and is nearly absent during the summer months; see Supplementary Fig. S4.The next step in the analysis is to construct monthly rain histograms P h 1 (r 1 ), . . ., P h 12 (r 12 ) corresponding to each of monthly rainfall time series r 1 , . . ., r 12 .We use these histograms as approximations of the true monthly PDFs P 1 (r 1 ), . . ., P 12 (r 12 ).Based on the approximation of statistical independence between the rainfall of the different months (Fig. 1d,e), our estimate for the annual rainfall PDF P (r) is given by the convolution of the 12 monthly histograms, i.e., Before testing the quantitative predictions of our method, let us briefly discuss its expected advantages and disadvantages.First, we do not assume an underlying model or theoretical distribution.Our only assumptions are (as discussed above) that, to a good approximation, the system's statistics do not change from year to year and that r 1 , . . ., r 12 are statistically independent (Fig. 1d,e).Under these assumptions, one expects our method to yield a more accurate and reliable prediction than the rough estimate of the annual rain histogram P h (r) (bars in Fig. 1b,c) that is based on annual rainfall data.In particular, our method yields predictions of the annual rain PDF for a wider range of r's, including in the distribution tails.
We now conduct a quantitative analysis of our results for data from the Sede Boker station and 15 other sta-tions (Supplementary Table S1).Moreover, to validate the proposed method, we test its predictions on a (very) simplified statistical model of annual rainfall.

B. Results
Our prediction P conv (r) for the PDF of the annual rainfall in Sede Boker is shown in Fig. 1b,c (red line).We note that the predicted PDF is a smooth curve, although no fitting to a smooth theoretical curve was performed.Moreover, unlike the histogram P h (r) of the annual rainfall data (blue bars) which is bi-modal, the PDF calculated by our convolution method is uni-modal; this is the situation for some other stations analyzed in this study (Fig. S2).In the regime of typical fluctuations, r = 30 − 200 mm/year, our prediction is close to the fitted gamma distribution P γ (r) (black line), as seen in Fig. 1b,c.However, in the left (r ≲ 10 mm/year) and right (r ≳ 250 mm/year) tails of the distribution, the two predictions differ by orders of magnitude: our predicted PDF is much larger (smaller) than the gamma distribution in the left (right) tail.The inset in Fig. 1c indicates that the left tail of the gamma distribution, P γ (r) lays below the the convolution distribution, P conv (r); this is expected as the probability for zero annual rainfall for the extreme arid Sde-Boqer station is finite while the probability density of the gamma PDF is by definition zero.
To estimate the size of the errors in our prediction P conv (r), we also performed bootstrapping as follows.We generated a large number of realizations of N = 72 years, where each year was created by randomly sampling each of the monthly precipitations r i from the data.For each realization, we then created the histogram P h (r).The results of the bootstrapping are plotted in Fig. S3.As is seen in the figure, in the central part of the distribution, the errors are of order 10%, while in the tails they become much larger.
There is no reason to believe that the gamma distribution, or other theoretical distributions that are used to fit the annual rain distribution, captures the distribution tails correctly.Generically, in large deviation theory, one finds that theoretical curves that correctly capture the regime of typical fluctuations do not faithfully describe the tails of the distribution [20][21][22][23][24].For example, the distribution of a particle undergoing a random walk, and the distribution of the largest eigenvalue of a Gaussian unitary ensemble (GUE) random matrix are described by the Gaussian and Tracy-Widom distributions, respectively, in the typical fluctuations regimes, but these predictions break down in the tails of the distributions [24,33].

C. Rate of convergence of our method
The accuracy of our method is expected to improve as more data is accumulated.However, for any finite amount of data, there will be discrepancies between the predicted PDF and the true annual rain PDF.Generally speaking, we expect relatively larger discrepancies in the tails of the annual rain distribution.Still, the two tails of the annual rain distribution may have different qualitative behaviors as the left tail is bounded by zero precipitation r = 0, making its prediction, most probably, more accurate, as we explain below.
We expect the predicted annual PDF P conv (r) to underestimate the true PDF P (r) in the right tail.One way to see this is to notice that P conv (r) has a finite support: It is truncated at some maximal value that equals the sum of the recorded maximum monthly values.However, there must be some nonzero probability for r to exceed this value-for instance, for a longer time series, this maximum value is expected to be higher.Moreover, an unusually large total annual rain may involve just one of r 1 , . . ., r 12 being much larger than expected.Indeed, if the monthly PDFs P i (r i ) decay exponentially or slower at r i → ∞, then the r → ∞ tail is expected to be dominated by this scenario.This situation is called the "big-jump principle", in analogy with discrete-time random walks, in which the tail of the position distribution of the particle after many steps is dominated by scenarios in which a single step contributes more than the sum of all the others.The big-jump principle is known to occur when considering the distributions of sums of random variables (and in analogous situations involving continuous-time dynamical systems) in many physical systems and mathematical models [34][35][36][37][38][39][40][41][42][43][44][45][46][47][48][49][50][51].This leads to the (conservative) estimate that, in the right tail, our predictions are reliable up to events whose recurrence time is approximately the number of years of data; however, as shown below, it appears that our predictions are reliable well beyond this conservative estimate.In any case, our prediction gives a lower bound for the true PDF in the right tail.
In contrast, in the left tail, the "big jump principle" cannot possibly hold, since the distribution is bounded from below by r = 0.An unusually small annual rain r must therefore be the result of all (or most) of r 1 , . . ., r 12 being below average.Such a combination of slightly unusual events (namely, small monthly rainfalls), which all may occur within the observed data, may thus enable us to accurately predict the likelihood of an extremely rare event (small annual rainfall).
We quantitatively tested the validity of our method by applying it to a simple toy model.In this model, the monthly rainfall distributions P 1 (r 1 ), . . ., P 12 (r 12 ) are assumed to be statistically-independent gamma distributions, whose parameters were estimated through the Maximum Likelihood Estimation (MLE), using the measured data of each station.The sum of independent gamma variables, the "hypogamma" PDF, P (r), is given The PDF calculated by our convolution method, Pconv(r) (blue line) using the time series shown in (a).The theoretical hypogamma distribution [32] (red line) of the suggested model that was used to generate the time series shown in (a).For comparison, we also include the convolution PDF based on the actual Sede Boqer annual rainfall data (also shown in Fig. 1b).(c) Same as (b) in semi-log representation.The inset shows an enlargement of the left side of the distribution.
in [32].We then simulate N years of data and apply our method to the simulated time series.More specifically, we create monthly histograms P h 1 (r 1 ), . . ., P h 12 (r 12 ) and convolute them to obtain the estimate P conv (r).We then compare this estimated PDF to the exact PDF, P (r).A typical model time series generated using parameters that are based on Sede Boqer data (Fig. 1a) (and of the same duration, N = 72 years) is shown in Fig. 2a.In Fig. 2b,c we depict the corresponding convolution PDF of the model time series shown in Fig. 2a that was constructed using our method (blue line).We also show the theoretical hypogamma PDF (red line) and the convolution PDF of the original Sede Boqer time series (Fig. 1b,c).It is apparent that the convolution PDFs of the model and the data are similar.It is also clear from Fig. 2c, as sug-gested above, that the convolution PDF P conv (r) (blue) reliably predicts the theoretical PDF P (r) (red) outside the range of the model data shown in Fig. 2a.As expected, P conv (r) underestimates P (r) sufficiently far into the right tail.In the left tail, the prediction P conv (r) is very accurate and only breaks down at r which is of the order of a few mm/year (corresponding to extremely rare events).
We next systematically analyzed how the length N of the annual precipitation time series affects the accuracy of predicted PDFs.In Fig. 3a, we present a few examples of the convolution PDFs, based on time series of different lengths that were generated by the model.For comparison, we also depict the theoretical hypogamma distribution.As expected, the predicted PDF converges to the theoretical one as the number of years N of the model's time series increases.Fig. 3b depicts the ratio between the theoretical hypogamma distribution P (r) and the convolution PDFs P conv (r) obtained from the model's time series as a function of the precipitation rate r.As expected, longer time series exhibit a better match with the theoretical distribution.It is also noticeable that the ratio is larger (smaller) than one on the right (left) side of the graph, i.e., the predicted PDF underestimates (overestimates) the theoretical distribution at the right (left) tail.To remind the reader, the right tail of the distribution provides an estimation for the recurrence time of very wet years while the left tail provides an estimation for the recurrence time of very dry years.
Let us describe first the analysis of the right tail of the distribution.In Fig. 3c,d we study the maximum annual precipitation r + that can be predicted by our method.We arbitrarily chose a factor of 1.5, 2, and 3-time difference (horizontal dashed lines in Fig. 3b) as the limit of validity; i.e., the ratio between the theoretical and calculated PDFs, P (r)/P conv (r) or P conv (r)/P (r) equal to a specific value.To capture these two cases, we studied | ln P conv (r)/P (r)|.Fig. 3c depicts the maximal precipitation rate r + that can be predicted by our method for the different mismatch factors.More specifically, Fig. 3c depicts the median (circles) and the 25%-75% quantile range (errorbars) based on 1000 realizations, as a function of the length of the simulated time series, N , in model's years (see also Figs.S11 for the curves of the other 15 stations).r + appears to increase logarithmically as the length of the time series N increases, a behavior that one may expect to observe due to the exponential tail of P (r).The recurrence rate of r + as a function of the length of the time series is presented in Fig. 3d and it appears to follow a power law with an exponent of ∼ −1.Importantly, our method provides prediction beyond the range of the available data.The first point in Fig. 3c,d presents the results for time series of lengths N = 72 years (as in Fig. 2).The observed maximal precipitation for Sede-Boqer time series is less than 200 mm/yr-the predicted maximum annual precipitation is much larger with r + ≃ 270 mm/yr (for factor 2) corresponding to events with a recurrence time of about 1000 yrs, far beyond the 72 years of the time series.We elaborate more on the recurrence time and precipitation below.
We found that, for all other 15 stations under consideration, the recurrence rate of r + (in 1/yr) versus the length N of the simulated time series (in years) approximately follows a power law too (Fig. S7,S9).We estimated the power law exponents for all the 16 stations under consideration and found that different stations are characterized by different exponents, within the range [−2.5, −1] (Fig. S5a).The stations are sorted according to the annual mean precipitation and it is apparent that the exponent is more negative as the precipitation rate increases.In addition, the larger mismatch factors (represented by different colors) result in a more negative exponent.We also studied an additional model in which the monthly rainfall is modeled by an exponential distribution (instead of the gamma distribution).The sum of exponential PDF is the hypoexponential distribution; we performed the same analysis shown in Fig. 3 using the hypoexponential distribution as the theoretical distribution and found the corresponding power law exponents (Fig. S5b).Here the exponents span a smaller range of [−1.6, −1] where we did not observe a clear relation between the annual mean precipitation and the exponent.Yet, also here larger mismatch factors result in a more negative exponent.The individual curves for each of the stations from which we calculated the exponents presented in Fig. S5 are shown in Figs.S11,S13.We note that the model for which the monthly rainfall is gamma-distributed exhibits better performance than the model for which the monthly rainfall is exponentially distributed.The recurrence time of r + for the gamma and exponential PDF models using ratios of 1.5, 2, and 3, for the 16 stations under consideration is depicted in Fig. 4a,c.
The left tail of the distribution exhibits very different behavior in comparison to the right tail; see Fig. 3a,b and Figs.S8, S10 in comparison to Figs.S7, S9.This is expressed in the predictions in the left tail which are remarkably accurate, even for very rare events; see Fig. 4a,c versus Fig. 4b,d.In the left tail, our method provides accurate predictions up to a minimal annual precipitation rate r − whose recurrence time exceeds several thousand years, much longer than the length of the simulated time series [52] of ∼ 100 years.This is valid for both gamma and exponential PDF models, and even when r − is calculated using a relatively small mismatch factor of 1.5.Notably, except for station no. 1 (Elat) which is characterized by a very low annual precipitation rate of less than 30 mm/yr and for which the precipitation at some years was only a few mm (Fig. S1), the prediction time of r − for the other stations exceeds 5000 years.The prediction time for a rare dry year for station no. 1 is relatively not long simply because such a year is more probable in this station than for the other stations.Furthermore, Fig. 3d indicates that the ratio for the left tail is less than one.This is valid for the extreme arid station of Sde-Boqer while for other, less arid, stations this ratio is not necessarily smaller than one.
We note that the prediction time for a rare wet year is longer for the station with higher annual mean precipitation (Fig. 4b).From Fig. 4 we find that, for a mismatch of a factor of two, the predicted recurrence times, both for wet and dry years (r ± ), are much longer than the length of the observed annual precipitation time series; i.e., the length of the time series of the 16 stations we consider is around 100 years while the predicted recurrence times are at least 400 years, for both models and both tails.As expected, the smaller ratio of 1.5 yielded a shorter prediction time, yet, larger than the length of the observed time series.For a mismatch factor 3 (yellow circles) the prediction time is much longer than the length of the original time series, more than 1000 years.In Fig. S6  the minimum/maximum predicted annual precipitations r ∓ versus the corresponding minimum/maximum annual precipitations observed in the simulated data.For both gamma and exponential PDF models, and for mismatch ratios of 1.5, 2, and 3, the maximum predicted annual precipitation is larger than the maximum observed precipitation for almost all stations.The situation is clearer for the minimum precipitation rate where the predicted minimum annual precipitation is well below the observed value, consistent with the much longer prediction time for minimum precipitation.An exception is station no. 1 (Elat) for which a minimum zero annual precipitation is a probable scenario as this station experienced a few mm annual precipitation for some years (Fig. S1).
FIG. 4. The recurrence times (in years) of the annual precipitation rates r∓ at which the ratio between the theoretical and convoluted PDFs is equal to 1.5, 2, or 3 (blue, red, and yellow symbols); i.e., P (r)/Pconv(r) or Pconv(r)/P (r) equal to 1.5, 2, 3 (the crossing points between the solid and dashed lines in Fig. 3b).The analysis was performed for the 16 stations under consideration based on the first point of the curves of Fig. 3d  We thus conclude that the method we proposed not only provides prediction for the central part of the annual rain distribution but also provides reliable predictions far outside the range of the actual data, especially in the left tail of the distribution.As explained above, the difference in the quality of our predictions in the two tails follows from the qualitative difference between their behaviors.The right tail of the annual rain distribution (wet years) is not bounded such that a longer time series yields a longer range of ratio that is close to one (Fig. 3b).The left tail is bounded by r = 0 and the annual precipitation time series of very different lengths have almost identical left tails and mismatch ratios (Fig. 3a,b and Figs.S8, S7).

III. DISCUSSION AND CONCLUSION
Precipitation plays an important role in the climate system.As such, it is necessary to develop reliable statistical methods to analyze the precipitation data.These can be used to quantify the statistical properties of the data, as well as, to provide better predictions for future events.This is especially important in light of the ongoing climate change which is expected to experience more frequent rare events (droughts and floods) [2,3].We aimed to improve the prediction of the distribution of annual rain.In particular, we aimed to provide reliable statistical predictions of rare dry or wet years, including those that fall outside the range of the observed annual rain.
The proposed method is based on the approximation that the rainfall of different months is independent-this assumption is supported by the data.We then regarded the rainfall of the different months as independent random variables and constructed the annual rainfall distribution by convoluting the measured monthly PDFs.This is in contrast to previous studies that assumed that the annual rainfall follows a specific distribution (typically the gamma distribution) and fitted the distribution to the annual precipitation data.The method we proposed provides reliable estimation for the central part of the distribution as well as outside the range of the measured data.In particular, the method we propose predicts the likelihood of extremely dry years remarkably accurately, including events whose recurrence time is very long.Our method also predicts the likelihood of unusually (but not extremely) wet years.In contrast to our analysis, previous studies predicted extremely wet or dry years based on the tails of assumed distributions-as the tails of the real distribution are not necessarily related to the central part of the distribution, such predictions may be unreliable with a mismatch of orders of magnitudes.
While our predictions yield very large recurrence times T (e.g., of the order of tens of thousands of years) for certain stations/cases, they should of course not be interpreted as an estimate of the typical time that will elapse until a wet or dry year will happen or already happened.This is because it is known that the climate system exhibits a very long temporal correlations of tens of thousands of years (like the glacial-interglacial oscillations), and, in fact, these correlations are stronger for longer times [53].Instead, a recurrence time T should be interpreted as a probability of 1/T (where T is measured in years) that the wet or dry year in question will occur in a given year with today's climate.
The method we proposed here can be applied to predict probabilities of events involving climate variables other than rainfall, e.g., temperature (including prolonged heatwaves or cold spells) and winds.Moreover, one could apply it to other dynamical systems, when considering statistical fluctuations of quantities that are integrated or averaged over a period that is much longer than the correlation time of the dynamical system in question [54].
In the present study we focused on precipitation in Israel, covering a wide range of precipitation rate, from extreme arid to semi arid environment.Precipitation in Israel falls during the winter.The conclusions of the present study are thus valid for the Israeli environment (or similar environments) but may be different for other environments for which the precipitation of different months may be correlated.In arid regions, like Israel, the prediction of extremely dry years is probably more important than the prediction of extremely wet years.However, the prediction of high precipitation rate is probably more important in regions with high precipitation rate, which are prone to flooding in populated areas.Ap-plying the proposed method to temperature time series may not be straightforward due to temporal correlations.Yet, there are several several daily temperature records that extend more than two centuries that may strengthen the statistical significance of the results.
A basic limitation that our method has is its assumption that the monthly rainfalls are uncorrelated.Although this is a good approximation, it would be interesting to improve our method by taking the existing possible small correlations into account, and/or by changing the size of the time windows.The climate system exhibits long-term fluctuations or trends (such as the global warming trend) and the extension of our method to handle these conditions may be important to assess the impact of phenomena like global warming on the distribution and extreme events of different climate variables.that the predicted annual precipitation is smaller (larger) than the observed minimum (maximum) precipitation.In addition, as expected, larger ratios yield a more pronounced difference between the observed and the predicted minimum/maximum precipitation.Station no.two (Sde Boker) is the station analyzed in the main text.

FIG. 1 .
FIG. 1. (a)Annual rainfall (Sep.to August) versus time (year) in Sede Boker, for the period 1952-2023, and the corresponding histogram, P h (r), in (b) regular and (c) semi-log plots.In (b) and (c), the red line indicates the PDF, Pconv(r), constructed based on the method reported in this paper, and the black line represents a fit to a gamma distribution, P γ (r).The inset in (c) depicts an enlargement of the left side of the distribution.(d) Feb. precipitation versus Jan. precipitation (blue circles), Mar.precipitation versus Feb. precipitation (red circles), and their corresponding linear fits (blue and red solid line).Note the absence of correlation between the rains of different months.(e) The R 2 of Feb. rain (over the different years) versus Jan. rains (empty blue circles) and March rain versus Feb. rain (full red circles) of 16 stations scattered over Israel (see details in Supplementary TableS1).Note the very small R 2 , indicating the absence of correlations between the different months.

5 FIG. 2 .
FIG. 2. (a)A typical annual rainfall time series generated by our toy model with parameters based on Sede Boker data shown in Fig.1.(b) The PDF calculated by our convolution method, Pconv(r) (blue line) using the time series shown in (a).The theoretical hypogamma distribution[32] (red line) of the suggested model that was used to generate the time series shown in (a).For comparison, we also include the convolution PDF based on the actual Sede Boqer annual rainfall data (also shown in Fig.1b).(c) Same as (b) in semi-log representation.The inset shows an enlargement of the left side of the distribution.

FIG. 3 .
FIG. 3. (a)The predicted PDF for model data using an increasing number of data points (years, see legend).The theoretical hypogamma PDF is indicated by the solid black line.(b) The ratio between the theoretical hypogamma PDF P (r) and the convolution PDFs Pconv(r), based on different length time series (see legend of panel a).Note that the ratio is larger (smaller) than one at the right (left) side of the graph, corresponding to the right (left) tail of the distributions shown in (a).(c) The annual precipitation value r+ at which the ratio P (r)/Pconv(r) or Pconv(r)/P (r) exceeds 1.5 (blue), 2 (red), and 3 (yellow) versus the length of the time series (in model years).The results are based on 1000 model realizations (for each N ) where the errorbars indicate the 25%-75% quantile range and the circles indicate the median.(d) Same as (c) for the recurrence rate (in 1/yr) corresponding to r+.Note the power law decay of the graphs with exponents of -1.1, -1.2, and -1.3 (black dashed lines) for the factors 1.5, 2, and 3 respectively.
FIG.4.The recurrence times (in years) of the annual precipitation rates r∓ at which the ratio between the theoretical and convoluted PDFs is equal to 1.5, 2, or 3 (blue, red, and yellow symbols); i.e., P (r)/Pconv(r) or Pconv(r)/P (r) equal to 1.5, 2, 3 (the crossing points between the solid and dashed lines in Fig.3b).The analysis was performed for the 16 stations under consideration based on the first point of the curves of Fig.3dand Figs.S7-S10.(a) Left tail based on the gamma PDF model, (b) right tail based on the gamma PDF model, (c) left tail based on the exponential PDF model, (d) right tail based on the exponential PDF model.The stations are ordered according to the mean annual precipitation and it is apparent that: (i) the predicted length increases with the precipitation rate in panels b and d, (ii) as expected larger ratios yield longer prediction time, and (iii) the dry years' prediction time (left tail) is much longer than the wet years' prediction time (right tail).Station no.two (Sde Boker) is the station analyzed in the main text.
FIG. S3.The results of bootstrapping of Sde-Boqer station based on 13888888 sets of 72 years (all together slightly less than one billion years).(a) The median P h (r) (blue circles) and the 25%-75% quantiles (error bars) of the bootstrapping sets, together with the convolution pdf, Pconv(r) (red) and the gamma pdf, P γ (r) (yellow).(b) Same as a depicting the mean and the standard deviation.(c) Same as a in semi-log plot.(d) Same as b in semi-log plot.Note the large error bars.Also note that in panels c, d some of the symbols or error bars are now shown as they are equal or smaller than zero.

TABLE S1 .
The details of the 16 stations analyzed in this study.The rainfall data was downloaded from the Israeli Meteorological Survey (IMS) web page (https://ims.gov.il/en/datagov or https://data.gov.il/dataset/481).Note that for some stations the data for some years is missing.Annual rains in Israel accumulate from the beginning of September to the end of August; e.g., the rain of 2023 is the sum of precipitation that occurred between Sep. 1, 2022, and August 31, 2023.The results that are shown in the main text are those of station no. 2 (Sede Boqer).Annual precipitation rate versus time (year) for the 16 stations analyzed in this study (see TableS1).The horizontal red line indicates the mean precipitation rate (in mm/yr).The main text presents the results of station no. 2.