## 1 Introduction

Solar activity forecasting is an imperative tool, which is used in the various scientific and technological fields, like the operations of the low-Earth orbiting satellites, the electric power transmission lines, the high frequency radio communications and the geophysical applications (e.g. Alexandris et al., 1999; Katsambas et al., 1997). The particles and electromagnetic radiations flowing from the solar activity outbursts are also important to study the long-term climatic variations (Sello, 2001). The easily observable and long-term recorded aspects of the solar activity are the dark regions on the solar disk, i.e., the sunspots. Their numbers per time interval represent an index of the general solar magnetic activity (Echer et al., 2004a). Sunspot number (SN) series represents the longest running direct record of solar activity, with reliable observations dating back to the year 1610, soon after the invention of the telescope (Usoskin et al., 2003). SN is connected with solar flares strongly affecting the Earth's magnetosphere and the radiation environment. Mean solar magnetic field value is related to the interplanetary magnetic field controlling the averaged Earth's magnetosphere conditions (Dmitriev et al., 1999). The number of sunspots is characterized by a long-term temporal variation, reaching its maximum or minimum approximately every 11 years (the solar cycle). This variation, in turn, has an effect in terms of variation in the global climate (Tzanis and Varotsos, 2008; Varotsos and Cracknell, 2004). Many atmospheric phenomena exhibit decadal variability on both regional and global scales. Such phenomena have often been related empirically to solar cycle variations, on the order of 11 or sometimes 22 years (Rind, 2002). Shindell et al. (1999) showed how upper stratospheric ozone changes may amplify observed, 11-year solar cycle irradiance changes to affect climate. The latter is evidence of the interplay between the photochemistry and dynamics of the ozone layer especially over the middle and high latitudes in both hemispheres (Varotsos, 2002). A connection between the 11-year solar cycle, the QBO and total ozone was discussed by Varotsos (1989). Varotsos and Cracknell (2004) identified the existence of Guttenberg-Richter seismic law with-in the solar cycle, which was further validated by Chattopadhyay and Chattopadhyay (2008) using the Artificial Neural Network (ANN) and factor analysis. Sunspots grow over a few days and last from several days to a few months, indicating enhanced solar activity that modulates the weather and the climate (Chattopadhyay and Chattopadhyay, 2008). The association between climatic variation and the variation in solar cycle has been studied by several authors like Willett (1951), Friis-Christensen and Lassen (1991), Fröhlich and Lean (1998) and many others. Haigh (2006) used a general-circulation model (GCM) to investigate the impact of the 11-year solar-activity cycle on the climate of the lower atmosphere.

The prediction of solar cycles, a major thrust research area for long time, requires the knowledge of the solar dynamo, a term that includes the processes involved in the production, transport, and destruction of solar magnetic field (Pesnell, 2008). There are two methods of predicting future solar activity. First is the purely numeric method that explores the periodicity within the SN time series. Examples in this direction are Rigozo and Nordemann (1998) and Rigozo et al. (2001), Echer et al. (2004b), which do not depend upon the intrinsic physics. The other method is the precursor method which is based on correlating any solar/geophysical parameter from earlier in the solar cycle to SNs at solar maximum or some other point later in the cycle (Echer et al., 2004a). Examples in this direction are Kane (2001), Ahluwalia and Ygbuhay (2009). The work, which the authors are going to present in this article, comes in the first category.

The rate of solar flares and the amount of energy released by the solar flares are well correlated with the SN, as is the rate of coronal mass ejections (Pesnell, 2008). The SN time series provides the longest existing record of solar activity, and is thus the best available data set for studying the long-term evolution of solar activity and, in particular, of the 11-year activity cycle (Baranovski et al., 2008). Very recently, Gil and Luis (2009) studied the monthly SN time series and proved that the SN series displays the long memory, with a large degree of dependence between the observations that tends to disappear very slowly in time. Standard autoregressive models have been used to predict solar cycles (Hathaway et al., 1999). However, in the situations of non-linear and non-stationary situations, the conventional autoregressive processes are not very useful, and thus, are not much acceptable. ANN, which is the mathematical analogue of the nervous system, has been recognized by the geo-scientists as a powerful alternative to the conventional statistical approaches for developing predictive models for complex geophysical systems (e.g. Gardner and Dorling, 1998; Hsieh and Tang, 1998). The method has an ability to identify a relationship from given patterns and this makes it possible for ANNs to solve large-scale complex problems such as pattern recognition, non-linear modelling, classification, association, and control (Tayfur, 2002). Forecasting time series by ANN has been discussed in papers like Lin et al. (1994), Dorffner (1996). Complexity of the solar cycle is already documented in Kurths and Ruzmaikin (1990), Price et al. (1992) and Sello (2001). Consequently, the study of solar cycle has also been an area where ANN has got significant importance. In a multivariate environment, Lundstedt (1992) implemented ANN for predictions of solar-terrestrial effects. Sello (2001) showed that the monthly mean SNs contain true non-linear dependencies with the use of the method of surrogate data combined with the computation of linear and non-linear redundancies. Calvo et al. (1995) showed how the ANN can capture the intrinsic dynamics of solar activity, producing good long-term forecasting for periods up to a complete cycle. Conway et al. (1998) examined the use of feed forward neural networks in the long-term prediction of SNs. Park and Woo (2009) compared the performance of two types of ANN in forecasting the monthly SNs.

Analysis of the trend in the geophysical time series has long been done to view various climatic aspects. Analysis of the trend in the Indian summer monsoon rainfall has been done by Ghosh et al. (2009), Guhathakurta and Rajeevan (2008), Pal and Al-Tabbaa (2011) and Rana et al. (2011). Both parametric and non-parametric methods have been employed for identifying trends in data. However, recent studies have shown that non-parametric tests are more suitable for non-normally distributed and censored data, including missing values, which are frequently encountered in hydrological time series. These methods are less influenced by the presence of outliers in the data. Among those, the Mann-Kendall (MK) test (Mann, 1945; Kendall, 1975) is one of popular methods for trend analysis in meteorological time series (Chiew and McMahon, 1993; Dinpashoh et al., 2011; Jhajharia et al., 2011; Yue and Pilon, 2004; Zetterqvist, 1991). Influence of serial correlation on the MK test in trend-detection studies of hydrological time series has been studied by Yue and Wang (2002). In a recent study, Jhajharia and Singh (2011) applied MK test to study the trends in temperature, diurnal temperature range and sunshine duration in Northeast India. In another recent study, Jhajharia et al. (2011) investigated the trends in the reference evapotranspiration estimated through the Penman-Monteith method over the humid region of Northeast India by using the MK test after removing the effect of significant lag-1 serial correlation from the time series of reference evapotranspiration by pre-whitening. Using long-term ionosonde measurements in mid-latitudes, Bremer (1992) executed trend analysis and identified decrease in the peak height of the ionospheric F2-layer. The geomagnetic activity that is controlled by the Sun and its solar wind, has been tested for its trend by MK test by Love (2011). In the present article, first, we have studied the 11-year solar cycle in terms of the autocorrelation function (ACF) and Euclidean distance (ED) which are being used the first time to study the SN series and solar activity. Afterwards, we analyzed the trend within the SN timer series using MK test. Details of the test procedure are present in subsequent sections. In the next phase, instead of dealing with raw time series, we transformed the monthly SN data by Box-Cox transformation (Box and Cox, 1964), and then the probability distribution of the time series under the transformation was studied. After removing the irregularities and trends from the time series by the Box-Cox transformation, stationary and non-stationary methodologies were applied to the time series. Lastly, we implemented the autoregressive neural network (ARNN) model and compared it with ARMA and ARIMA models.

## 2 Data analysis

For the present study, the mean annual SNs for the duration 1749 to 2007 have been collected from the web site of NOAA, Boulder, Colorado (http://www.ngdc.noaa.gov/stp/solar/ssndata.html last accessed on 24^{th} December, 2010). First, the nature of the ACF (Wilks, 2006) of the SA time series has been investigated up to lag-200 and it is presented in Fig. 1a. It is observed that the ACF is gradually decaying in a sinusoidal manner. This indicates that the time series is stationary. The most interesting feature of the time series is that the maximum autocorrelation coefficients (ACC) are occurring at the lags separated by 11. This means that the entries of time series are gaining highest positive association with the first entry in 11 years interval. In the subsequent sections, we viewed the 11-year cycle using the concept of ED and generated the univariate model for SN time series using various methods. In Fig. 1b, we present a periodogram and the most prominent peak is found at the time point 11. This further supports the presence of an 11-year cycle in the SN time series.

## 3 Trend estimation in Sunspot number time series

Chattopadhyay et al. (2011) discussed that the parametric methods require the data to be independent and normally distributed, whereas, even for smaller departures from normality, the non-parametric methods are sometimes better than parametric methods. Non-parametric methods use the ranks of observations rather than their actual values, which relax the requirements concerning the distribution of the data. The MK test, which is a non-parametric test, provides a test for positive or negative trends, which are not necessarily linear. It provides a robust test for trend, free from assumptions about mathematical form of the trend or the probability distribution of errors (Hasanean, 2001). The mathematical background of the MK test is discussed at length in papers such as Chattopadhyay et al. (2011), Jhajharia et al. (2009), Onoz and Bayazit (2003) and Yue et al. (2002). The MK trend test first computes S statistic given by (Dinpashoh et al., 2011; Jhajharia et al., 2011)

$$S=\sum _{i=1}^{n-1}\sum _{j=i+1}^{n}\mathrm{sgn}\left({x}_{j}-{x}_{i}\right)$$ |

_{j}is the j

^{th}observation, and sgn(

^{.}) is the sign function which can be defined as

$$\mathrm{sgn}\left(\theta \right)=\left\{\begin{array}{ccc}\hfill 1\hfill & \hfill if\hfill & \hfill \theta >0\hfill \\ \hfill 0\hfill & \hfill if\hfill & \hfill \theta =0\hfill \\ \hfill -1\hfill & \hfill if\hfill & \hfill \theta <0\hfill \end{array}\right.$$ |

Under the assumption that data are independent and identically distributed, the mean and variance of the S statistic are given by:

$$E\left(S\right)=0$$ |

$$V\left(S\right)=\frac{n\left(n-1\right)\left(2n+5\right)-\sum _{i=1}^{m}{t}_{i}\left({t}_{i}-1\right)\left(2{t}_{i}+5\right)}{18}$$ |

_{i}tied observations. The original MK statistic, designated by Z, can be computed as

$$Z=\left\{\begin{array}{cc}\hfill \frac{S-1}{\sqrt{Var\left(S\right)}}\hfill & \hfill S>0\hfill \\ \hfill 0\hfill & \hfill S=0\hfill \\ \hfill \frac{S+1}{\sqrt{Var\left(S\right)}}\hfill & \hfill S<0\hfill \end{array}\right.$$ |

If $-{Z}_{1-\alpha /2}\le Z\le {Z}_{1-\alpha /2}$ then the null hypothesis of no trend can be accepted at significant level of α. Otherwise, the null hypothesis can be rejected and alternative hypothesis can be accepted at the significant level of α. In the present article, we are dealing with the time series of annual SNs. However, to have a clear view of the data under consideration, we have examined the trends in the SN time series in monthly scales as well. The MK non-parametric test was used to examine the trends in the SN time series because the SN time series are non-normally distributed. The results are displayed in Table 1.

**Table 1**

Résultats du test MK.

MK-test statistic | Observation | |

January | 1.98231 | Rising trend at 5% significance level |

February | 1.95139 | Rising trend at 10% significance level |

March | 2.26572 | Rising trend at 5% significance level |

April | 1.97941 | Rising trend at 5% significance level |

May | 1.90333 | Rising trend at 10% significance level |

June | 2.41074 | Rising trend at 1% significance level |

July | 2.47742 | Rising trend at 1% significance level |

August | 2.38199 | Rising trend at 1% significance level |

September | 2.0655 | Rising trend at 5% significance level |

October | 2.17385 | Rising trend at 5% significance level |

November | 1.39374 | No trend |

December | 1.60689 | No trend |

Annual | 2.13507 | Rising trend at 5% significance level |

From Table 1, it is observed that there is a rising trend in the annual SN time series, which resembles the nature of the time series in most of the months.

## 4 Euclidean distance approach

The approach of ED is not new in the field of the meteorology. Elmore and Richman (2001) used ED as a similarity metric for Principal Component Analysis. Geometrically, the ED between two points is the shortest possible distance between the two points. The majority of the cluster analyses in the climatological literature have been based upon EDs because of the many useful properties of the Euclidean metric. Most of the recent developments in clustering algorithms predominantly have involved the use of EDs as well (Mimmack et al., 2001). Considering the said time series from the point of view of ED, the time series of SNs is given as X = {x_{1}, x_{2}, x_{3},…, x_{259}}, where x_{1} indicates the mean SN of the year 1749 and x_{2}, x_{3},…, x_{259} etc. indicate the mean SNs of the subsequent years. From the series X, we have generated new series X_{1}, X_{2} etc., where, X_{k} indicates the time series starting from x_{k}. The ED between X and X_{k} is defined as:

$${d}_{k}=\sqrt{{({x}_{1}-{x}_{k})}^{2}+{({x}_{2}-{x}_{k+1})}^{2}+\mathrm{...}+{({x}_{259-k+1}-{x}_{259})}^{2}}$$ | (1) |

After computing the EDs for several lags k, we have plotted them in Fig. 2. The figure shows that like the ACF, the d_{k} is following a wave pattern having its peaks in 11 years intervals. This further indicates the existence of 11-year solar cycles which is established in the literature.

So far, we have considered the yearly mean SNs in its raw form. Now we examine whether the time series of annual SNs is normally distributed. The Anderson-Darling test (Anderson and Stern, 1996; Elmore, 2005) was used to test the normality of the SN time series. The assumption of the null hypothesis that the data are normally distributed is examined. We find that the value of test statistic under the null hypothesis is 6.078 and subsequently the null hypothesis is rejected. Therefore, the data are not normally distributed. For further study, we transformed the time series by means of Box-Cox transformations. This transformation removes the trend components and other irregularities from the time series. The Box-Cox transformation is given by (Seleshi et al., 1994):

$${Z}_{t}=\frac{{X}_{t}^{\lambda}-1}{\lambda {G}^{\lambda -1}}$$ | (2) |

_{t}is the original time series, G is the sample geometric mean, λ is the transformation parameter, and Z

_{t}is the transformed series. In this paper, we have taken λ = 0.359. The autocorrelation structure after Box-Cox transformation has been presented in Fig. 3, which resembles the same pattern as that of the original time series. After transforming the time series, we test for normal distribution, and we find that the transformed time series is following normal distribution with mean 0 and standard deviation 3.5.

The plot presented in Fig. 4 represents a normal distribution followed by the Box-Cox transformed time series. However, before Box-Cox transformation, the mean and the standard deviation were 52.423 and 41.491, and when we try to fit a normal distribution to it, we find a plot in Fig. 5, which is not symmetric about the mean. This indicates that before the transformation, the data series was not normal.

## 5 Development of ARMA and ARIMA models

After applying Box-Cox transformation, the data series is now free from trend components and other irregularities. As the ACF is showing a wave pattern with significant positive and negative spikes gradually tending to 0, we apply the stationary approach (Moran, 1954) autoregressive moving average (ARMA) to this time series. An ARMA (p, q) process is expressed as (Box et al., 2007):

$$\varphi (B){\overline{x}}_{t}=\theta (B){a}_{t}$$ | (3) |

_{t}with upper bar is (x

_{t}- μ), where, μ is the sample mean. A detailed description of the ARIMA process and its theoretical background is available in Box et al. (2007).

As we have already seen that there is an 11-year cycle within the time series, we go up to 11 orders of autoregression with moving average orders of 1 and 2 respectively. Therefore, as a whole, 22 ARMA (p, q) models are to be tested. Goodness of fit of the models are tested though the Akaike Information Criteria (AIC) (Wilks, 2006). The minimization of the AIC indicates best fit. We find that the AIC is getting its minimum (= 913.11) for the model ARMA (11, 1). Therefore, it is understood that the yearly mean SNs are best represented by ARMA (11, 1), where the order of autoregression is 11 and order of moving average is 1.

Previously we have mentioned that because of the sinusoidal and gradually decaying pattern of the time series under consideration we are using a stationary approach. However, we also take into account the further increasing autocorrelation after lag 56. That's why, we are trying a non-stationary approach in this section. Autoregressive integrated moving average (ARIMA) (Box et al., 2007) modelling is attempted in this section. In general, this process is denoted as ARIMA(p, d, q), where d denotes the number of times the stationary process is summed. An ARIMA(p, d, q) process is expressed as

$$\varphi (B){\nabla}^{d}{x}_{t}=\theta \left(B\right){a}_{t}$$ | (4) |

The difference operator in the left hand side of (4) is equal to (1-B). Here also, we have examined autoregressive orders up to 11 and moving average orders up to 2. We have kept d fixed at 1. Consequently, 22 models have been generated. The AIC has attained its minimum (= 921.13) at ARIMA (8, 1, 1). Thus, we find that in the case of ARIMA (p, d, q), we require only 8 previous values of the time series as predictor, whereas in ARMA (p, q) we require 11 previous values of the time series as predictors. Some examples of application of ARIMA in the non-stationary meteorological time series include El-Fandy et al. (1994), Visser and Molenaar (1995) and Woodward and Gray (1993, 1995).

Now, the question that naturally arises, is: Which one is better? ARMA (11, 1) or ARIMA (8, 1, 1)? To answer this question, we require computing a statistic that would measure potential of a model to make forecast. To do the same we compute Willmott's index that was advocated by Willmott (1982) to measure the degree of agreement between actual and predicted values. This is given as (Willmott, 1982)

$$d=1-{\left[\sum \left|{P}_{i}-{O}_{i}\right|\right]}^{2}{\left[\sum {(\left|{P}_{i}-\overline{O}\right|+\left|{O}_{i}-\overline{O}\right|)}^{2}\right]}^{-1}$$ | (5) |

where, P_{i}, O_{i}, and O with upper bar denote the ith predicted value, ith observed value, and mean of the observed values, respectively. Willmott (1982) recommended this statistic to assess the degree to which a model output fits an observed dataset. This statistic has been used by Chattopadhyay and Chattopadhyay (2009) and Comrie (1997). In the cases of the ARMA (11, 1) and ARIMA (8, 1, 1), the values of d are approximately equal to 0.96. This indicates that these two models are equally potent to forecast the time series of yearly mean SNs. However, as less number of predictors are needed in the ARIMA (8, 1, 1), the ARIMA (8, 1, 1) appears to be more useful than the ARMA (11, 1) to predict the yearly mean SNs. This indicates that before the transformation the data series was not normal.

## 6 Development of Autoregressive Neural Network (ARNN) model

ANN is now applied to the time series under consideration in an autoregressive manner. The form of ANN used in this letter is the multilayer perceptron (MLP), the most popular form of ANN used for prediction purpose. Application of an ANN in forecasting the SNs is not very new. The computations carried out inside the ANN amount to: (i) performing the weighted average of its impinging inputs; (ii) sending this average through a (biased) sigmoid function; and (iii) forwarding the sigmoid-function output to the next layer of neurons (Calvo et al., 1995). The newness in our approach is that we would examine the possibility of forecasting the yearly mean SNs by ANN in an autoregressive manner. In the autoregressive approach, the values of a time series are predicted on the basis of the previous values of the same time series. An autoregressive model of the order k implies that the current value of the time series is being predicted on the basis of past k data of the same random variable. Thus, a linear autoregressive model of order p can be expressed using p number of previous values of the time series, including a noise term as follows (Chattopadhyay and Chattopadhyay, 2009):

$$x\left(t\right)=\sum _{i=1}^{p}{\alpha}_{i}x\left(t-i\right)+\in \left(t\right)$$ | (6) |

In the above equation, a linear function ${F}^{L}$ can be introduced, in which case the equivalent form of the above equation would be (Chattopadhyay and Chattopadhyay, 2009):

$$x\left(t\right)={F}^{L}\left(x\left(t-1\right),x\left(t-2\right),\dots ,x\left(t-p\right)\right)+\in \left(t\right)$$ | (7) |

Replacing L by MLP leads to the AR-NN model. Since we have already identified the 11 years cycle in the ACF, we would vary the number of predictors from 1 to 11. Here, we denote the proposed neural network model as ARNN (n) where n is the number of predictors derived from the time series. Thus, 11 ARNN (n) models would be generated by varying n from 1 to 11. The ARNN (n) models are trained as MLP (Gardner and Dorling, 1998) with sigmoid non-linearity. In mathematical form, the adaptive procedure of a feed forward MLP can be presented as (Kamarthi and Pittner, 1999):

$${w}_{k+1}={w}_{k}+\eta {d}_{k}$$ | (8) |

The above equation represents an iterative process that finds the optimal weight vector by adapting the initial weight vector w_{0}. This adaptation is performed by presenting to the network sequentially a set pairs of input and target vectors. The positive constant η is called the learning rate. The direction vector d_{k} is the negative gradient of the output error function E. Mathematically, it is denoted as:

$${d}_{k}=-\nabla E({x}_{k})$$ | (9) |

We use the adaptive gradient learning method to train the ARNN (n) models. Stopping criteria is chosen as the minimization of the mean squared error. The entire dataset, in each case, is divided into training and test cases in the ratio 7:3. In each case, an extensive variable selection procedure is followed by stepwise multiple regression analysis. The type of non-linearity in the ANN depends upon the type of activation function chosen to train the ANN. Gardner and Dorling (1998) have shown that the sigmoid activation function $\left(f\left(x\right)={\left(1+{e}^{-x}\right)}^{-1}\right)$ performs better than other types of activation function. After generating the ARNN (n) models, the Willmott's indices are plotted in Fig. 6. The figure shows that the Willmott's indices attain their maximum for ARNN (1) and ARNN (11). Thus, from Fig. 6 it can be concluded that these two models are producing best forecast. However, the most important fact, which is understood from this observation, is that the neural network can perform very well even when it is provided with only one predictor. At this juncture, it can, therefore, be concluded that ARNN in general is more effective than ARMA or ARIMA (they require more number of predictors) to produce univariate forecast for the yearly mean SN time series. To view the quality of the forecasts pictorially we have created scatterplots in Figs. 7 and 8, which show that ARNN (1) produces forecasts with coefficient of determination R^{2} equals 0.98.

## 7 Concluding remarks

Finally, it is our observation that there is a prominent 11-year cycle in the mean yearly SN time series and it is supported by the pattern of the ACF and measures of EDs. Trend analysis by means of the non-parametric MK test reveals that the yearly SN time series exhibits increasing trend. It is further revealed that although it has a systematic pattern for autocorrelation, it is not distributed normally. Application of Box-Cox transformation converts it to a normally distributed time series. At the end, it has been revealed that for univariate forecast of the yearly mean SNs, ARNN model performs better than the autoregressive moving average and the autoregressive integrated moving average models.

## Acknowledgements

The first author wishes to acknowledge the warm hospitality provided by Inter University Centre for Astronomy and Astrophysics, Pune, India, where a part of this study was carried out during the visit from 30 December, 2010 to 10 January, 2011. The authors wish to sincerely acknowledge the constructive suggestions from the reviewers to enhance the quality of the work.