## 1 Introduction

Seismic prospecting is the most widely used method for oil–gas and mineral exploration. With the depletion of the easy-explored resources, exploration must be carried out in deep reservoirs and irregular layers, which presents significant challenges (Wu et al., 2016). High-quality seismic records can help to achieve high-accuracy seismic prospecting, especially in complex strata (Li et al., 2017). Random noise is the key factor that impacts the signal-to-noise ratio (SNR) of seismic records; further, its irregular interference complicates seismic data processing. Here, random noise refers to the incoherent noise in seismic records, which primarily originates from wind motion, environmental noise, noise from recording instruments, and geophones loosely coupled with the ground (Yilmaz, 2001). Understanding the properties of random noise is the prerequisite for designing noise mitigation methods (Xiong et al., 2014; Zhong et al., 2015; Zhuang et al., 2015). Recently, the properties of the random seismic noise have attracted increasing attention in geophysics, and several important results have been obtained (Akhouayri et al., 2011; Groos and Ritter, 2009; Groos et al., 2012). However, most of these studies focused on the statistical properties of the seismic noise, which can only provide a qualitative judgment of the random noise, such as whether seismic noise is stationary. Obtaining quantitative results of the random noise by investigating the random noise modeling problem is of more practical importance (Li and Li, 2015b; Zhong et al., 2016; Zhong et al., 2017). Hence, we herein focus on the parametric modeling algorithm for seismic-prospecting random noise.

Modeling research is useful to understand the nature and analyze seismic noise generation mechanisms. Recently, the properties of seismic noise resources have been extensively discussed, and the findings can be used as the theoretical foundation for the modeling algorithm. In noise modeling, the use of wave equation theory to simulate seismic noise has yielded a significant advance in recent years (Li and Li, 2015a; Li et al., 2017). According to the random noise characteristics, the functions of different noise sources, which propagate by the wave equation, can be determined. Hence, random noise can be viewed as a superimposed wave field excited by several independent sources in a homogeneous isotropic half-space. Similarly, the modeling algorithm for random seismic noise in a desert area is investigated (Li and Li, 2015a), and the influences of environmental conditions, such as wind speed and surface roughness, are also discussed.

However, the wave-equation-based modeling algorithm is complex and the findings cannot be directly applied to noise attenuation. Therefore, research on statistical parametric modeling algorithms for random seismic noise, which aims at revealing the underlying stochastic processes, appears to have more practical significance. Specifically, if a known stochastic process can be used to represent the seismic noise, the process findings can be introduced in seismic noise attenuation. Random-noise modeling from a stochastic perspective is first proposed in seismic noise-field analysis. When processing the seismic records collected at seismological stations, Caserta et al. (2007) proposed that seismic noise has a super-diffuse nature, which means that noise has a persistent Markovian character with a memory longer than ordinary Brownian motion. Similarly, Mulargia (2012) and Pilz and Parolai (2014) also proposed that the seismic noise wave field is not diffuse. Owing to the differences in the frequency bands concerned, these findings cannot be directly used to represent the seismic-prospecting random noise. However, all these studies indicate that random noise is related to fractional Brownian motion (FBM) (Mandelbrot and Ness, 1968). Recently, some studies have shown that seismic noise is similar to FBM series in power spectral density (PSD), whose energy are primarily concentrated in the low-frequency bands (Li and Li, 2015a; Mandjes, 2008; Zhong et al., 2015). Based on these findings, it is feasible to simulate the seismic-prospecting random noise using the FBM theory.

The rest of this paper is organized as follows: the theoretical background of the proposed modeling algorithm is introduced in Section 2, which includes the descriptions for the FBM theory and the principle for the modeling methodology; Section 3 mentions real noise data and acquisition conditions; Section 4 presents the numerical experiments performed to verify the efficiency of the proposed methodology using both time-domain analysis and spatio-temporal analysis; finally, we present our conclusions in Section 5.

## 2 Theory

In this section, we briefly describe the principle and properties of the FBM theory, which is used as the basis of the modeling algorithm. Then we explain in detail the methodology developed.

### 2.1 Fractional Brownian motion

The FBM is one of the most commonly used stochastic processes. According to probability theory, a generalized Brownian motion can be viewed as a continuous Gaussian process over time. Generally, the FBM is a moving average in which the past increments are weighted by kernel functions. Mandelbrot and Ness (1968) presented a precise expression for the FBM, which is denoted as B_{H}(t):

$${B}_{H}\left(t\right)-{B}_{H}\left(0\right)=\frac{1}{\Gamma \left(H+1/2\right)}\left\{\underset{-\infty}{\overset{0}{\int}}\left[{\left(t-s\right)}^{H-1/2}-{\left(-s\right)}^{H-1/2}\right]\text{\hspace{0.28em}}\text{d}B\left(s\right)+\underset{0}{\overset{t}{\int}}{\left(t-s\right)}^{H-1/2}\text{\hspace{0.28em}}\text{d}B\left(s\right)\right\}$$ | (1) |

where B(t), which is equal to B_{1/2}(t), denotes an ordinary Brownian motion; B_{H}(0) is a known value; Γ(H + 1/2) is a gamma function. The H parameter, called the Hurst exponent, is used as a measure of the long-term memory of a time series. This exponent varies from 0 to 1 and describes the irregularity of the resultant motion; the higher the value of H, the smoother the motion. Thus, the FBM series with different properties can be obtained by modifying the Hurst exponent.

The FBM has several important characteristics, namely:

(a) the increment B_{H}(t)–B_{H}(t–1) is stationary and follows a Gaussian distribution,

$${B}_{H}\left(t+1\right)-{B}_{H}\left(t\right)\sim N\left(0,{\sigma}^{2}\right)\text{,}$$ | (2) |

(b) the FBM process is the only self-similar Gaussian process,

$${B}_{H}\left(at\right)\sim {\left|a\right|}^{H}{B}_{H}\left(t\right)$$ | (3) |

$$E\left[{\left|{B}_{H}\left(t+1\right)-{B}_{H}\left(t\right)\right|}^{2}\right]\propto \frac{1}{{\left|\tau \right|}^{2H}}E\left[{\left|{B}_{H}\left(t+\tau \right)-{B}_{H}\left(t\right)\right|}^{2}\right]\text{,}$$ | (4) |

(c) the mean square of the increment is directly proportional to the changing time range,

$$E\left[{\left|{B}_{H}\left(t+\tau \right)-{B}_{H}\left(t\right)\right|}^{2}\right]\propto {\left|\tau \right|}^{2H}\text{.}$$ | (5) |

Several effective algorithms have been proposed for the FBM series generation, such as the wavelet-based synthesis algorithm (Abry and Sellan, 2008), successive random additions methods (Mcgaughey and Aitken, 2000), and the random midpoint displacement (RMD) algorithm (Norros et al., 2000). The RMD algorithm has a relatively low computation cost and can provide an accurate simulation series. Therefore, in this study, we used the RMD algorithm to generate the FBM series. The basic principle of the RMD algorithm is based on bisections and interpolations. Owing to the self-similarity of the FBM process, the midpoint and endpoint values for a given interval can be randomly selected according to the corresponding distributions. Therefore, the realization of an FBM process could be obtained iteratively. More details about this process can be found in Norros et al. (2000).

### 2.2 Modeling algorithm

An FBM series always has a color spectrum, and the PSD can be approximately denoted as f^{1–2H}, where f represents the frequency and H is the Hurst exponent. Similarly, the seismic-prospecting random noise is also a type of “1/f noise,” whose energy is primarily concentrated in the low-frequency band. Fig. 1 shows the comparison between a field noise series in desert and an FBM series with a Hurst exponent of 0.9. As shown in this figure through the respective PSD functions, the FBM series and the noise data have a similar energy distribution, especially in the middle-frequency range [50–400 Hz]. Therefore, using the FBM to model the random seismic noise in desert is reasonable.

Here, the multitaper method (Thomson, 2005), which is a popular signal processing algorithm, is used to estimate the PSD of the series. This method overcomes some of the limitations of the conventional Fourier analysis, especially in terms of spectrum leakage, estimation error, and frequency resolution. In general, the result of the PSD estimation for a given series x(n) can be denoted as

$$S\left(f\right)=\frac{1}{K}\sum _{k=1}^{K}{\left|\sum _{n=0}^{N-1}{d}_{k}\left(n\right)x\left(n\right)\mathrm{exp}\left(-\text{j}2\mathrm{\pi}fn\right)\right|}^{2}$$ | (6) |

_{k}(n), which is selected from the discrete prolate spheroidal sequences, denotes the window functions.

In practice, the amplitude response of the geophone used and the truncation effect of the recording instrument affect the seismic record. To demonstrate the effects of the geophones and recording instruments in detail, we calculated the normalized PSD of a field noise series, and the results are shown in Fig. 2. As can be seen, the attenuation effects for the components below 10 Hz or above 400 Hz are conspicuous, so that these effects should be considered as band-pass filtering in the modeling process.

Therefore, in this study we follow FBM theory to simulate background seismic noise and use a band-pass filter to represent the effects of the recording systems. Hence, the random noise modeling algorithm can be generalized as

$$y\left(n\right)=\sum _{l=-\infty}^{\infty}{B}_{H}\left(l\right)h\left(n-l\right)$$ | (7) |

where y(n) denotes the result of the noise simulation, and B_{H}(l) is an FBM series with a known Hurst exponent (H). Moreover, h(n) represents a band-pass filter with corner frequencies of 10 Hz and 400 Hz. In Eq. (7), the Hurst exponent is used to amend the irregularity of the simulation outputs. The optimal Hurst exponent value is determined by comparing the simulation results to the real noise data in terms of PSD. Generally, the optimal value of the parameter H is the solution of the following optimization problem:

$$\begin{array}{l}{\mathrm{min}}_{H}E\left[\left(\frac{1}{K}\sum _{k=1}^{K}{\left|\sum _{n=0}^{N-1}{d}_{k}\left(n\right)x\left(n\right){\text{e}}^{-\text{j}2\mathrm{\pi}fn}\right|}^{2}-\frac{1}{K}\sum _{k=1}^{K}{\left|\sum _{n=0}^{N-1}{d}_{k}\left(n\right)\sum _{l=-\infty}^{\infty}{B}_{H}\left(l\right)h\left(n-1\right){\text{e}}^{-\text{j}2\mathrm{\pi}fn}\right|}^{2}\right)\right]\\ subjectto\sum _{f=0}^{M-1}\sum _{k=1}^{K}{\left|\sum _{n=0}^{N-1}{d}_{k}\left(n\right)x\left(n\right){\text{e}}^{-\text{j}2\mathrm{\pi}fn}\right|}^{2}=\sum _{f=0}^{M-1}\sum _{k=1}^{K}{\left|\sum _{n=0}^{N-1}{d}_{k}\left(n\right)\sum _{l=-\infty}^{\infty}{B}_{H}\left(l\right)h\left(n-1\right){\text{e}}^{-\text{j}2\mathrm{\pi}fn}\right|}^{2}\end{array}$$ | (8) |

From these equations, we can conclude that the optimal result of the simulation has the most similar PSD with respect to the noise data analyzed. In summary, once the optimal value H_{opt} of the Hurst exponent is determined, the synthetic noise can be formulated as follows:

$${y}_{\text{opt}}\left(n\right)=\sum _{l=-\infty}^{\infty}{B}_{{H}_{\text{opt}}}\left(l\right)h\left(n-1\right)$$ | (9) |

## 3 Real noise data

The dataset for analysis comprises passive noise records collected by a receiver array deployed in the Tarim Basin in West China. This basin is mostly flat with no vegetation. The random noise is considered to be driven primarily by wind friction over the ground surface. The experimental seismic array, which meets the requirements generally adopted in seismic prospecting, consists of 512 geophones arranged in a single survey line at intervals of 50 m. The sampling interval is 1 ms. The geophone used is the JF-20DX-10, whose amplitude response is similar to a high-pass filter with a corner frequency of 10 Hz.

As is well known, whereas no shooting is being done, the sources of coherent noise in land-seismic prospecting come mainly from vehicle traffic, machinery, and 60-Hz (or 50-Hz) power lines (Cooper and Cook, 1984). Since our acquisitions are performed in remote areas, the cultural noise is relatively low. As an example, Fig. 3 shows a 5-s noise record acquired in the Tarim Basin, in which no obvious coherent noise can be observed in the record.

## 4 Results

We applied the proposed modeling algorithm to simulate seismic-prospecting random noise. To verify the performance of the modeling method, the waveforms of the real noise and synthetic noise data and their respective properties are compared in the time domain and the spatio-temporal domain. In general, these properties are investigated through the spectral characteristics, the statistical characteristics, and the phase space.

### 4.1 Checking results in the time domain

Here, we perform a detailed comparison between the real noise data and synthetic noise records in the time domain, such as the PSD and phase space. Fig. 4a shows the waveforms for a real noise series (top) and a synthetic noise record with a Hurst exponent of 0.95 (bottom). This figure shows that the real noise data and the synthetic noise series have almost the same fluctuation tendency and a strong resemblance. Fig. 4b shows the respective PSD functions of both noise series that reveal similar spectral characteristics. Both real noise and simulated noise keep a great resemblance, with the energy concentrated in the [0–30 Hz] frequency band. All these results support the efficiency of the proposed modeling method.

Fig. 5 allows the analysis of the statistical properties of the real noise data and the synthetic noise series. In this illustration, we compare the results obtained from the noise data by means of the amplitude distributions and cumulative distribution functions. The plots in Fig. 5a present the probability distributions of the real noise and synthetic record, respectively. The comparison results reveal that the histograms of the corresponding noise data are similar in trend, and the amplitudes centralize at approximately [–0.02, 0.02]. Moreover, the cumulative distribution curves presented in Fig. 5b are almost the same, except a slight difference appearing in the interval [0.005, 0.015]. In summary, the field and synthetic noise records have similar statistical characteristics.

The chaotic properties attributed to random seismic noise (Wang et al., 2016) also serve to verify the efficiency of the modeling algorithm. In Fig. 6 we have drawn the orbits in the phase space corresponding to the series of field noise (Fig. 6a) and synthetic noise (Fig. 6b). These orbits, as may well be appreciated, are very similar. From the previous experiments, we conclude that the field noise data and the synthetic noise records have similar characteristics, and that the modeling algorithm is feasible.

### 4.2 Checking results in the spatio–temporal domain

For further verifying the efficiency of the simulation algorithm, we selected a noise record acquired in the desert to obtain the simulation results when applying the proposed method. In Fig. 7, we show a real noise record composed of 50 traces up to a time of 1 s, collected in the course of a seismic survey in the Tarim Basin, and right next to it we show the synthetic noise record. As before, one can see that the two records have similar vibration patterns.

While the seismic noise data can be seen as a 2-D dataset, the properties of the random seismic noise change both in the time direction and the spatio–temporal domain. The noise characteristics, when analyzed using a 2-D Fourier transform, can also be explored through the signature of the f–k spectrum after converting the noise data in the spatio–temporal domain to the frequency–wavenumber domain. Fig. 8 shows the f–k spectra of the real and simulated noise data records. Again we find that these two f–k spectra for real and synthetic noise are quite similar.

In addition to qualitative comparisons, a quantitative analysis is perhaps more informative to the extent that it provides more precisely the characteristics of the noise data. It is known that statistical properties can be expressed by statistical moments. In this sense, we calculated several high-order moments, such as the mean, variance, kurtosis, and skewness, which facilitate the presentation of data properties at a certain level. The comparison of statistical moments for real noise data and synthetic noise data is shown in Table 1. We see that the differences between these statistical moments are relatively small; in particular, the variance is almost the same, while the other moments take slightly smaller values for synthetic noise than for real noise. The results concerning kurtosis and skewness indicate that both the real noise and the synthetic data have symmetric character that is close to a Gaussian distribution. Given the similarity of the spatio-temporal characteristics of the real and synthetic noise data, the efficiency and accuracy of the Brownian-motion-based noise-modeling algorithm seems to be guaranteed.

**Table 1**

Statistical moments estimated for real noise and synthetic noise.

Mean | Variance | Kurtosis | Skewness | |

Real noise | –3.67 × 10^{−4} |
8.12 × 10^{−4} |
3.154 | 0.07 |

Synthetic noise | –2.02 × 10^{−4} |
8.13 × 10^{−4} |
2.929 | –0.06 |

## 5 Conclusions

In this study, we investigate the seismic-prospecting random noise-modeling problem from a statistical viewpoint, by analyzing the stochastic process underlying the random noise. We follow the FBM theory to simulate background seismic noise, and we use a band-pass filter to control the effects of recording systems. The optimal Hurst exponent value is determined by comparing the simulation results to the real noise data in terms of PSD. The parametric modeling algorithm of random seismic noise that we propose is based on the similarity between the PSD of the real noise and the FBM process. We verify the performance of the processing scheme by comparing the simulated noise with the real noise data in both the time domain and the spatio-temporal domain. First, we focus our attention on the analysis of the statistical properties of real and simulated noise, as PSD, amplitude distribution, cumulative distribution curve, and behavior in the phase space. Second, we explore the noise characteristics through the frequency–wavenumber spectrum. The results reveal the similarity that the synthetic noise records keep with the real noise dataset. Hence, the noise generation algorithm, which uses the FBM theory to perform the modeling task, is feasible. In other words: the seismic-prospecting random noise can be considered as the production of a stochastic FBM process. The findings of this study are useful to develop better models for land-seismic-prospecting random noise, and also for noise reduction and signal-detection algorithms.

## Acknowledgments

This research is financially supported by the National Natural Science Foundation of China (41730422), Postdoctoral Projects of the Science Foundation of China (2018M631839), the Foundation for Youth Excellence of the Jilin Scientific Committee (20180520091JH), and a Research Project of Jilin Province Education Department (JJKH20180420KJ).