12.07.2015 Views

NCC Report No. 1 - (IMD), Pune

NCC Report No. 1 - (IMD), Pune

NCC Report No. 1 - (IMD), Pune

SHOW MORE
SHOW LESS

Create successful ePaper yourself

Turn your PDF publications into a flip-book with our unique Google optimized e-Paper software.

NEW STATISTICAL MODELS FORLONG RANGE FORECASTING OFSOUTHWEST MONSOON RAINFALLOVER INDIAM. RajeevanD. S. PaiAnil Kumar RohillaNational Climate CentreIndia Meteorological Department<strong>Pune</strong> – 411 005.


1. IntroductionIn an agricultural country like India, the success or failure of the crops andwater scarcity in any year is always viewed with the greatest concern. A majorportion of annual rainfall over India is received during the southwest monsoonseason (June-September). There are known vagaries of the monsoon as regardsthe time of onset as also of the nature and amount of monsoon rains and itsdistribution in different parts of the country. Although the regional rainfall has largeyear to year fluctuations, the southwest monsoon rainfall over the country as a wholevaries by about 10% of the mean rainfall. However, this small fluctuation in theseasonal rainfall can have devastating impacts on India’s economy. In spite ofincrease in share from the service sector to India’s growth, the performance of farmsector ultimately is a decisive factor in the growth of GDP of India. The recent twodroughts, 2002 and 2004 made an adverse impact on India’s economy. Therefore,long range prediction (seasonal prediction) of southwest monsoon rainfall deserveshigh priority in India. Since farmers need forecast for sub-regional level, ultimatelyoperational forecasts should target at sub-regiona0.l level. Nevertheless, anaccurate prediction of monsoon performance averaged over the country as a wholeis also very vital for better planning of finance, power and water resources.The India Meteorological Department (<strong>IMD</strong>) has been issuing long rangeforecasts of the southwest monsoon rainfall since 1886. It was, however theextensive and pioneering work of Sir. Gilbert Walker (1923,1924), that led to thedevelopment of the first objective models based on statistical correlations betweenmonsoon rainfall and antecedent global atmosphere, land and ocean parameters.Since then, <strong>IMD</strong>’s operational long range forecasting system has undergonechanges in its approach and scope from time to time. There are good detailedreviews on the long range forecasting of Indian southwest monsoon rainfall (ISMR)(<strong>No</strong>rmand 1953, Jagannathan, 1960, Thapliyal and Kulshrestha 1992, Hastenrath1995, Krishna Kumar et al. 1995, Rajeevan 2001, Gadgil et al. 2005). In a veryrecent study, Gadgil et al. (2005) addressed the major problems of the statistical anddynamical methods for long range prediction of monsoon rainfall. The study revealedthat <strong>IMD</strong>’s operational forecast skill has not improved over the seven decadesdespite continued changes in the operational models.2


For the long range forecasting (LRF) of the ISMR, mainly three approachesare used. The first is the statistical method, which uses the historical relationshipbetween the ISMR and global atmosphere-ocean parameters (Walker 1914 & 1923,Thapliyal 1982, Gowariker et al. 1989 & 1991, Navone and Cecatto 1995, Singh andPai 1996, Rajeevan et al. 2000 & 2004, Delsole and Shukla 2002, Sahai et al. 2003,Pai and Rajeevan 2005 etc.). The second approach is the empirical method basedon time series analysis. This method uses only the time series of past rainfall data(Goswami and Srividya 1996, Iyengar and Raghukanth 2004, and Kishtawal et al2003) and do not use any predictors. The third approach is based on the dynamicalmethod, which uses General Circulation Models (GCM) of the atmosphere andoceans to simulate the summer monsoon circulation and associated rainfall. In spiteof its inherent problems, at present, statistical models perform better than thedynamical models in seasonal forecasts of ISMR. The dynamical models have notshown the required skill to accurately simulate the salient features of the meanmonsoon and its interannual variability (Latif et al. 1994, Gadgil and Sajini 1998,Krishnamurti et al. 2000, Kang et al. 2002, Gadgil et al. 2005, Krishna Kumar et al.2005, Wang et al. 2005).During the period 1988-2002, <strong>IMD</strong>’s operational forecasts were based on the16 parameter power regression and parametric models (Gowariker et al. 1989 &1991). The forecasts issued during this period were qualitatively correct. However,the mean forecast error during this period was more than the mean error of theforecasts based on climatology alone. This model failed to predict the severedrought of 2002. However, none of the other experimental forecasts based onstatistical and dynamical models also could predict the severe drought of 2002.Following the failure of forecast in 2002, a critical evaluation of the 16-parameterpower regression and parametric models was made and in 2003, two new models (8and 10 parameter models) were introduced for the operational work. Further a twostageforecasting strategy also was adopted with the provision for a forecast updateby end of June/first week of July (Rajeevan et al. 2004). According to this newstrategy, forecasts for the seasonal ISMR for the country as whole are issued in twostages. The first stage forecast is issued in mid April and an update or second stageforecast is issued by the end of June. The 2003 operational forecasts for thesouthwest monsoon rainfall based on these new models were proved correct.3


However, in 2004, the forecast for a normal monsoon did not materialize, due tounexpected poor performance of monsoon.Following the forecast failure in 2004, <strong>IMD</strong> has critically analyzed two majorissues, a) a re-visit to the identification of suitable and stable predictors, which havephysical relationships with monsoon rainfall and b) critical way of modeldevelopment like optimum number of predictors and model development period etc.<strong>IMD</strong> also further explored new statistical methods with the objective for an improvedmodel performance.In this report, we discuss the details of new experimental models developedat the National Climate Centre, <strong>IMD</strong>, <strong>Pune</strong>, for the long range forecasts of ISMR.These new statistical models are based on the following techniques.1. Ensemble Multiple Linear Regression (EMR)2. Projection Pursuit Regression (PPR)3. Principal Component Regression (PCR)4. Artificial Neural Network (ANN)In the section 2, we discuss the data used for this study and in the section 3,we discuss the identification of predictors. Methodology of model development isdiscussed in section 4. In the section 5, results from the various prediction modelsare discussed. Finally, the conclusions are presented in section 6.2. Data usedThe southwest monsoon season (June-September) rainfall over the countryas whole (ISMR) is calculated as the area weighted average of the seasonalmonsoon rainfall data of all 36 meteorological subdivisions in India. The long periodaverage (LPA) (1901-1970) of the seasonal rainfall is 88 cm and the coefficient ofvariation is about 10%. The ISMR was expressed as the percentage departure fromthe long period average (LPA). When the ISMR during a year is more (less) than4


computed as the average of surface air temperature anomalies of five land stationsin Europe (Orland, Oslo/Gendermon, Ostursund/Froson, Karlstad and De Bilt). Theseasonal tendency in the NINO3.4 anomaly index was computed by subtractingmonthly anomalies averaged over the winter season (DJF) from that averaged overthe spring season MAM (March to May). The WWV anomaly over Pacific (February+ March) was derived from the monthly averages of WWV computed between 5 o Nand 5 o S integrated across the Pacific basin including all ocean areas between 80 o Wand 120 o E (Rajeevan and McPhaden 2004). The lower boundary for this integrationis the depth of the 20 o C isotherm, which is located in the middle of the upperthermocline. All anomalies were computed using the climatological base period of1971-2000. One of the SST predictors (southeast Indian Ocean) common in both thepredictor sets showed significant warming trend during the data period (1951-2005).Hence, time series of this predictor was de-trended by removing the linear trendfitted for the period 1951-2000 from the time series.The physical relationships of the selected predictors with the ISMR are brieflydescribed below :The positive Correlation Coefficient (C.C) between the surface airtemperature anomaly over Europe and ISMR is associated with the strength &phase of <strong>No</strong>rth Atlantic Oscillation (NAO) as well as with the anomalous winter snowcover. Rajeevan et al. (1998) and Pai (2004) discussed the relationships betweenthe NW Europe temperature anomalies in winter and ISMR. Positive temperatureanomaly is associated with the positive phase of NAO. It may also reflect reducedsnow cover over NW Europe. Bamzai and Shukla (1999) have observed significantinverse correlations between winter snow cover and surface temperatures over NWEurope.The negative relationship between the seasonal tendency in the surfacemean sea level pressure (MSLP) anomaly over the <strong>No</strong>rth Europe and ISMR isexplained through the changes in the mid latitude wind pattern over Eurasia.Rajeevan (2002) had shown that during the deficient monsoon years, the northsouthpressure gradient over Europe is weaker than normal. The reduced surfacepressure gradient reduces the mid-latitude zonal flow across Eurasia and increases7


the chances of blocking highs and intrusion of dry westerlies into Indian regionduring the spring season (Raman and Rao 1981, Raman and Maliekal 1985).Pressure anomalies over the East Asian region during February and Marchare associated with the relative shifting of the climatological low pressure area overthe north Pacific to its east and the high pressure area over middle Asia to its west.During a deficient (excess) monsoon year, the pressure over East Asia is below(above) normal due to the shifting of the north Pacific low towards west (east) of itsnormal position. This may cause weakening (strengthening) of the climatologicalhigh pressure over north Pacific and its eastward (westward) shift during thesubsequent monsoon season. This modifies the circulation pattern and the typhoontracks over the <strong>No</strong>rth Pacific, which may influence monsoon performance (Saha etal. 1981).The positive relationship between the SST anomalies over the southeastIndian Ocean during February - March was discussed by Rajeevan et al. (2002) andTerray et al. (2003). Positive (negative) SST anomaly induces more (less) thannormal evaporation and hence more (less) than normal moisture availability to becarried to the monsoon region by the low- level monsoon cross equatorial flow (Pai2003). The interannual variation in the SST anomalies over the Southeast Indianregion can be due to the changes in the solar surface heating caused by thechanges in the cloud cover or resulted from the changes in the low level dynamicalflow pattern (Behera and Yamagata 2001, Rajeevan et al. 2002).The concurrent robust inverse relationship between ISMR and SSTanomalies over various regions over the equatorial Pacific is well known (Sikka1980, Pant and Parthasarathy 1981, Angell 1981, Rasmusson and Carpenter 1983,Krishna Kumar et al. 1999). The positive (negative) NINO 3.4 anomaly tendencyrepresents the strengthening (weakening) of the El Nino over the Pacific which isassociated with the weakening (strengthening) of the monsoon circulation over Asia.Several studies indicated that variability in ocean heat content or itsequivalent of volume of warm water (WWV) in the tropical Pacific is important ingoverning the evolution of the ENSO cycle. Warm Water Volume (WWV) in the8


tropical Pacific Ocean in boreal winter is a good precursor of ENSO warm and coldevents. Rajeevan and McPhaden (2004) examined the relationship between WWVand ISMR and found significant negative correlations between WWV anomalies inboreal winter and ISMR, thus providing a much longer lead time than ENSO SSTindicators for prediction of ensuing monsoon rainfall.The negative relationship between the surface Mean Sea Level Pressureanomaly over the <strong>No</strong>rth Atlantic (May) and ISMR can be explained through thevariation in the strength of the subtropical level climatological high pressure areaover the Atlantic. The variation in the strength or position of the climatological high inthe north Atlantic may be associated to the NAO.Another predictor from the Atlantic having significant and stable negativerelationship with ISMR was the <strong>No</strong>rth Atlantic SST anomalies (December +January). This predictor did not show any inter-correlation with other predictors(Table-3). East Asia pressure anomaly (February +March) also did not show intercorrelations.However, the possible physical mechanism behind the negativecorrelation between the SST anomalies over the <strong>No</strong>rth Atlantic and ISMR is not veryclear and is to be further explored.Fig.3 shows the 21 year moving C.C between ISMR and the 9 predictorsselected for model development. The horizontal dotted lines represent the C.C.significant at 5% level. As seen in the Fig.3, the relationship of each of the predictorswith ISMR was stable (C.C. near or above 5% significant level) during most of thedata period. Thus, we have chosen predictors having statistically significant andstable relationship, which are also physically linked with the ISMR variability.However, there were inter-correlations among some of the predictors. This can beseen in the Table-3, which depicts the lower triangle of the inter-correlation matrix ofall the 9 predictors used in both the SET-I and SET-II together. As the predictor J6has data only from 1958, the correlations were computed using the data for theperiod 1958-2000. Significant C.C values are shown by bold letters.Before using these predictors in the models, all the predictor time series werestandardized using the base period 1971-2000. The last predictor in the SET-II is a9


The most sensible approach for selecting a best subset out of availablevariables in a complex linear model is to compare all possible subsets. Thisprocedure simply fits all the possible regression models (i.e. all possiblecombinations of predictors) and chooses the best one (or more than one). Standardselection procedure like stepwise regression (forward or backward selection) hassome serious problems of selecting the best model. The problems in stepwiseregression method are described in detail by Quinn and Keough (2002). With theavailability of fast computing facilities, trying all possible combination of thepredictors is not a difficult task.In this study, we have fitted all the possible regression models (63 (= 2 6 -1) for6 predictors) and used jacknife method for the selection of 5 best models amongthem. For ‘m’ standardised predictors for ‘n’ years, the multiple linear regression(MR) model equation can be written as Y = BZ + ε. Where Y is the (nX1) predictand(ISMR) matrix, B is the (1Xm) matrix of regression coefficients, Z is the (nXm)matrix of predictors of model size m and ε is the (nX1) error matrix.Jacknife method (Crask and Perreault 1977; Tukey 1958) is the most suitablefor comparing the performance of different models when the data period is smalland testing is to be made using relatively longer period. In this method,development and testing of the models is possible in the same period. For SET-I,data for the period 1951-2000 and for SET-II, data for 1958-2000 were used for thispurpose. In the Jacknife method, for each of the possible models, predictions foreach of the years (say i th year) within the given data period of k years were doneusing the remaining k-1 years. In the case of SET-I, k is 50 (1951 to 2000) and thatfor SET-II, k is 43 (1958-2000).After preparing the hindcasts by a number of different regression models forthe given period, the best performing models can be selected using a suitablecriterion. This criterion is based on the error measure statistics computed for the testperiod. While comparing the performance, it is possible that the error measures ofvarious models particularly with different model size are approximately equal. Insuch cases, the principle of parsimony (Box et al. 1994) suggests to give preference11


to a model with smallest model size. There are also various error measures which inits definition include adjustment for the number of predictors used in the model.Some of the commonly used error measures for selecting among models withdifferent model sizes (Quinn and Keough, 2002) are adjusted R 2 2( R ), Mallow’s C p ,Akaike Information Criterion (AIC) and Schwarz Bayesian Information Criterion(BIC). R is the estimated multiple correlation coefficient (R) adjusted for the numberof predictors used in the model. Among various models, the best model is the onehaving the highest R2value. The Mallow’s C p works by comparing a specific reducedmodel to the full model with all predictors included. Among various models, the bestmodel is the one that having the lowest C p value. Delsole and Shukla (2002) usedthis criterion to select best subset of predictors. The remaining two measures (AICand BIC) are in the category information criteria, accounting for both sample sizeand number of predictors.In this study, the selection of best among the all possible regression modelswas done based on the R 2 criteria.equation.2R is related to the R 2 by the followingR2= R22p(1−R )−(k − p − 1)size. TheWhere k is the number of years in the validation period and p is the model2R can also be computed using the following formula.R2SS1−SS/(k − p − 1)residual=,total/(k− 1))2Where, SS ( = ∑ (Y − Y ) is the residual sum of squares andresidual)2SStotal( = ∑ (Y − Y) ) is the sum of squares of deviation from mean. Y, Y & Y ) arerespectively the actual, mean and predicted values of the predictand. SS residual & R 222in turn are related by the equation SS(Y − Y) (1−R )resudual= ∑ .12


The second step in the EMR model development involved of determination ofthe correct model development period for each of the selected best models assuggested by Nicholls (1984). Once the best model development period wasdetermined, forecast for each year was made by the selected best models, bykeeping this parameter constant.For this purpose, the performance of the model predictions for a fixedcommon period 1986-2000 was examined using a sliding development periodtechnique (retroactive technique) for all the possible development periods (Mason etal. 1996). For example, for a case of development period of 15 years, data for 1971to 1985 were used for predicting rainfall in 1986 and data for 1982 to 1986 wereused for predicting the 1987 value and so on. Similarly for a case of developmentperiod of 25 years, data for 1961 to 1985 were used for predicting the 1986 value.In this case, the comparison needs to be done of the performance of predictionsover a fixed time period obtained with the same set predictors but using differentdevelopment periods. Therefore, we did not use an error measure that is adjustedfor model size for the comparison of performance. We used R and root meansquared error (RMSE) as the error measure statistics for the purpose. The RMSEwas computed using the following equation;∑ (Y − Y))RMSE =k − 12In this report, the unit of the RMSE is percentage departure from LPA of theseasonal ISMR. For each of the selected best models, the RMSE and R for theperiod 1986-2000 were computed for all the possible development periods (startingfrom 8 years). <strong>No</strong>w for each of the selected 5 models, the best development periodwas that period for which the RMSE of predictions during 1986 to 2000 was theleast.The final step in the development of the EMR models was the computation ofensemble forecasts. Forecasts for each of the 5 selected models were prepared by13


using the best development period and retroactive technique. The ensembleforecast for each year was then computed as the simple average of the forecasts forthat year obtained from the best 5 models.The skill of the models was also measured by calculating the model statisticssuch as C.C. between actual and predicted ISMR (R) and bias in the modelpredictions (BIAS) in addition to RMSE. The BIAS has also the same units as that ofRMSE. The BIAS was computed using the following equation.)∑ (Y − Y)BIAS =k4.b.Principal Component - Regression (PCR) ModelsThe PCR technique is recommended when there is significant intercorrelationamong the predictors. The PCR model avoids the inter-correlation andhelps to reduce the degrees of freedom by restricting the number of predictors. Asseen in the Table-3, there is significant inter-correlation among some of thepredictors in SET-I and SET-II. Singh and Pai (1996) used Principal Component-Regression (PCR) model for the prediction of the seasonal (June-September) ISMRfor the country as a whole based on predictors from the Indian Ocean only.Rajeevan et al. (2000) on the other hand used PCR model for the prediction ofseasonal summer monsoon rainfall over two homogeneous regions of India.To develop a PCR model, Principle Component Analysis (PCA) is first appliedon the predictor set and then few PCs (principle components) having highest C.Cwith the predictand (ISMR) are selected. The selected PCs are than related to therainfall time series using a multiple regression equation. In this study, PCR modelsfor the first and second stage forecasts (PCR-I and PCR-II) were developed usingpredictor sets SET-I and SET-II respectively. The general mathematical formulationof PCR model is given below in brief.A set of time series of ‘m’ inter-correlated standardised variables for ‘n’ yearscan be represented by (nXm) matrix, Z= [Z ij : i=1,…..,n;j=1,…..,m]. Z is transformed14


into E via the matrix transformation, eij=Z ij a ji , using empirical orthogonal functionanalysis (EOFA). Here, e ij is empirical orthogonal variable for j th empirical orthogonalvariable for i th year and a ji for i th empirical orthogonal weight for j th variable. In matrixform above equation can be written as: E = ZAIf in addition, the matrices E and A obtained from EOFA are rescaledaccording to: L= AD 1/2 and F = ED 1/2 ; the complete procedure is called PCA. Here Fis (nXm) matrix of PC scores each having zero mean and unit variance (i.e. the PCscores are standardized), L is (mXm) matrix of PC loadings and D is the (mXm)diagonal matrix whose elements are eigen values of correlation matrix C (= 1/nZ T Z)arranged in descending order of magnitude.If we select any ‘p’ modes of the PC scores (p


the most “interesting” projections. Huber (1985) proposed Projection pursuit (PP) asan alternative to the traditional principal component technique to reduce a highdimensionaldata set into a lower-dimensional one. One such application of thismethod was the Projection Pursuit Regression. This new method of nonparametricmultiple regression was proposed by Friedman and Stuetzle (1981). Chan et al.(1998) demonstrated that this technique of regression analysis is superior totraditional multiple regression analysis, because the PPR technique is capable ofidentifying nonlinear relationships between the predictand and the predictors. Chanand Shi (1999) used this technique to develop models for prediction of summermonsoon rainfall over south China and obtained useful results.The mathematical formulation of the PPR technique is explained below.A classical linear model expresses the predictand Y as linear functions of thepredictor variablesY(zp1, z2, z3, − − − zp) = α0+ ∑ αjzj=1jThe PPR technique has the formY(z1,z2,z3,− − − zp) = ∑ fMmm=1(αTmZ)TmWhere Z is the column vector of predictor variables (z 1 , z 2 , ………, z p ) andα is the row vector of the coefficients α m(j), j = 1,2,.....p for the j th response variable.That isα = ∑α (j)z and f m is the single-valued (ridge) functions of a single variable,Tmmjwith M being the number of such functions (i.e., the number of projections). Thus inthe PPR, each response is expressed as the sum of functions of linear combinationsof predictor variables; whereas in linear regression, each response is expressed asthe linear combination of predictor variables. The estimation of the parameters oflinear combination α T mas well as the functions f m was done by the least squaresmethod (Chan et al. 1998).16


Two PPR models were developed, one for the first stage forecasts (PPR-I) inApril using the SET-I data and second for the second stage forecast (PPR-II) in Juneusing the SET-II data. In this study, M was set to 2 (representing two projections, themaximum being p, which is the number of potential projections). Thus the predictandwas expressed as the sum of the functions of first two projections. In this case also,retroactive technique was used to make the independent predictions.4.d.Artificial Neural Network (ANN) ModelsThe methodology adopted for the ANN models was almost similar to thatused for the PCR model. As the in the case of the PCR models, PCA was firstapplied on the predictor set to select best PCs that having highest C.C with ISMR.The selected PCs were then related to the predictand through an Artificial NeuralNetwork (ANN), whereas in the case of PCR model, a multiple regression equationwas used for this purpose. For this case also retroactive technique was used for theindependent predictions.There are many studies (Navone and Ceccatto 1995, Venkatesan et al. 1997,Guhathakurta et al. 1999) which used ANN for long range forecasting of seasonalmonsoon rainfall over India. Rajeevan et al. (2000) used ANN with one hidden layerof 3 neurons for the Longrange forecasts of summer monsoon rainfall over<strong>No</strong>rthwest and Peninsular India.For the development of ANN model, we used a simple feed forward mappingnetwork with one hidden layer consisting of two tangent sigmoid neurons. Theneurons of the hidden layer were connected to the output layer of single linearneuron. The symbolic diagram showing the architecture of the neural network usedin this study is shown in the Fig.4.In the hidden layer, the value of the j th neuron is H j , is computed in terms ofthe input values x i by a tangent sigmoidal function as,H j = 2/(1+exp(-2 * Net j ))-117


Where, Net j denotes the weighted sum of inputs for j th hidden layer neuron which isdefined asNetpj= ∑ Wjixi+ θj,i=1Where W ji denotes the weight between i th input layer neuron & j th hiddenlayer, x i represents the i th element of input pattern and θ j represents the bias weightfor the j th hidden layer neuron. <strong>No</strong>w the output Y is computed as:Y = W H1+ W2H21+δW 1 & W 2 are the weights between output neuron and H 1 & H 2 respectivelyand δ is the bias weight.There are many back propagation training algorithms. We have used theResilient Back propagation algorithm (Riedmiller and Braun 1993) to train thenetwork. For each case, 20 model runs were made by changing the initial guessvalues of weights. The predicted output was then assumed as the average of thevalues obtained during the 20 independent model runs. For the development of theANN models we have used algorithms available in the Matlab software.5. Results and Discussions5.a. EMR Models2Fig.5a shows the R plotted against the number of predictors for all possiblemodels (63 models) derived from the SET-I. Fig.5 a clearly shows that the increasein the number of model predictors need not essentially improve the modelperformance. Fig.5 b shows the same details like Fig.5a but for all possible modelsfrom SET-II. The Tables 4a & 4b respectively show the details of 5 best modelsselected using the maximum2R criteria.18


Further, the second task of identification of best development period wasundertaken. The RMSE and R were calculated for all the possible developmentperiods (8 to 35 years) for each of the 10 MR models listed in the Tables 4a & 4b.Fig.6 shows the variation of R & RMSE with the development period for Model 4shown in Table -4a, which has 5 predictors of SET-I as the input. It is seen that foraround 23 years of development period, the RMSE has its minimum value and the Rhas maximum value indicating the best model performance around this period. Thus,we can assume that the best model development period for this particular modelwas around 23 years. Similar exercise on other models revealed that thebest/correct model development period for all the models were in general around 21-25 years. To keep uniformity in the further analysis, we have taken 23 years as themodel development period for all the models.For the independent verification of the models, independent forecasts weregenerated with the retroactive technique with development period as 23 years. Asthe data period of SET-I was 1951-2005, the independent predictions bycorresponding 5 best models were possible for the period of 32 years (1974 to2005). In the case of SET-II, as the data period was 1958-2005, the independentpredictions by the corresponding 5 best models were possible only for the period of25 years (1981 to 2005).Fig.7 shows the model predictions for the period 1974-2005 by the 5 EMR-1models, and their ensemble average. Similarly, Fig.8 shows the model predictionsfor the period 1981-2005 by the 5 EMR-II models and their ensemble average.Table-5 shows the model performance statistics of EMR-I and EMR-II models. Asseen in the Table-5, RMSE for the period 1981-2004 of the best 5 regression modelsused in the EMR-I model varied in the range of 7.18-8.37 % of LPA. The RMSE ofthe ensemble model was 7.54% of LPA. In the case of EMR-II, the RMSE for thesame period (1981-2004) of the selected 5 best models was in the range of 5.91-6.82 % of LPA and that of the ensemble forecast was 5.89 % of LPA. This indicatesthat the performance of the EMR-II and the individual MR models used in itsensemble was better than that of EMR-I model and the individual MR models usedin its ensemble. The RMSE of predictions based on climatology was 10.3% of LPA.19


The climatological forecast for a given year was obtained as the average of ISMRduring the just preceding 23 years to a given year. Thus the performance of modelsEMR-I & II was far better than that of the model based on climatology.As seen in the Fig.7, during all the extreme years like 1974, 1975, 1979,1982, 1983, 1986, 1987, 2002 & 2004, both the individual predictions as well as theirensemble average showed same sign as that of the actual ISMR. Even themagnitudes were closer to actual values. But the actual advantage of the ensembleforecast was more evident in the recent drought years of 2002 and 2004. In the caseof 2004, the individual ISMR forecasts by the best 5 MR models ranged from -8% to-16%, whereas the ensemble forecast was -13%, which was closer to actual value.In the case of 2002, two of the five MR model predictions for ISMR were in the rangeof -1% to -2 % and the remaining MR model predictions were in the range of -7% to -8%. The resultant ensemble average was -5%. Thus the ensemble method providespractically better predictions then the individual MR models. However, there alsoyears like 1985, 1992, 1996 & 1997 when the sign of the ensemble average wasopposite to that of actual. Of these 4 years, during the latter two years both theensemble model forecast and actual ISMR values were close to zero.As seen in the Fig.8, predictions from EMR-II as well as the individual MRmodels in the ensemble showed same sign as that of the actual ISMR. Themagnitudes were also closer to actual values. However, in 3 years (1985, 1989 and1997) the sign of the EMR-II predictions were opposite to that of the actual ISMR.On comparing the Figures 8 & 9, it is clear that the year to year performance ofEMR-II during most of the years (particularly in recent years from 1998), was betterthan that of EMR-I. This was also evident from model statistics presented in theTable-5.5.b.PCR ModelsFor the independent predictions using the PCR model, retro active techniquewith development period of 23 years was used. The performance of the PCR-I &PCR-II models is shown in Fig.9. The corresponding model performance parameters20


are shown in Table-6. The RMSE calculated for the models PCR-I & PCR-II for thecommon period of 1981-2004 were 7.93% and 6.23% respectively. This indicatesbetter performance of PCR-II model in comparison to that of PCR-I. These RMSEvalues were also smaller than the RMSE of the climatology based model. Onexamining the Fig.9, it can be seen that both the models were able to predict thecorrect sign as well as the magnitude of the ISMR during most of the years.Particularly during the extreme monsoon rainfall years, the predictions were close toactual value. However, the PCR-I failed to predict correct sign of the actual ISMRduring the years 1975, 1981, 1984 & 1992. The year 1975 was an excess monsoonyear and other three years were normal years. Similarly, the model PCR-II failed topredict correct sign of the actual ISMR during the years 1981, 1985, 1989 & 1997,which were all normal monsoon years. In the year 1993, when the actual value ofISMR was 0, the PCR-I predicted positive sign; whereas the PCR-II predictednegative sign. However, 1998 onwards, both the models were able predict the signof the ISMR every year. Further, both these models were able to predict the recenttwo drought years (2002 & 2004) correctly.5.C.PPR ModelsAs in the case of EMR and PCR models, 23 years were taken as thedevelopment period in the retro-active technique for the independent predictions byTPPR models. The relationship between the first projection predictor Z 1= α X ) and(1ISMR is shown in the Fig. 10a. It is clear that there is a linear relationship betweenTZ 1 and ISMR. However, relationship between the residual δ x)= Y − f 1( α X )(1obtained after removing the contribution of Z 1 in the predictand showed a nonlinearTrelationship with the second projection predictor Z 2= α X ) . This is depicted in(2Fig.10b. Thus the first two most interesting projections of the predictor setrepresented the linear and nonlinear relationships respectively between the ISMRand the predictors. This is one of the import advantages of PPR models incomparison to the models based on linear regression.The model performance parameters of both the PPR models are given in theTable-7. The RMSE calculated for the models PPR-I & PPR-II for the common21


period of 1981-2004 were 6.89% and 6.42% respectively. The year to yearperformance of the PPR models is given in the Fig.11. As seen in Fig.11, both themodels were successful in predicting the ISMR during most of the years andparticularly during the extreme years. However, the PPR-I model failed to predict thesign of ISMR correctly I four normal monsoon years (1981, 1992, 1996 & 1999).Similarly, The PPR-II model failed to correctly predict the sign of ISMR in threenormal monsoon years (1985, 1990 &1996). During the year 1993 (ISMR =0), thesign of the predictions from both models were negative. On the other hand, during1995 (ISMR=0), the sign of the predictions were positive. Like the EMR and PCRmodels, the predictions by PPR models were all correct during the recent years. TheRMSE of predictions by both the models for the common period (1981-2004) weresmaller than that of the predictions by climatology based model. However, as seenin the Table-7, the PPR-II performed better than the PPR-I.5.d.ANN ModelsThe performance of the ANN-I & ANN-II models is shown in Fig. 12. For ANNmodels also, 23 years of data were used as the development period in the retroactivetechnique. The performance parameters of the ANN models are shown inTable-8. The RMSE of the predictions by the models ANN-I & ANN-II for thecommon period 1981-2004 were 7.68% and 5.54% respectively. By taking othermodel statistics given in the Table-8 also into account, it is clear that theperformance of ANN-II model was better than that of ANN-I. RMSE values of boththe ANN models were also smaller than the RMSE values of model based onclimatology.Like PCR models, both the ANN models were able to make correctpredictions of ISMR during most of the years. Particularly during the extrememonsoon rainfall years, the predictions were close to actual values.6. ConclusionsAs per the present operational Long Range Forecasting system, <strong>IMD</strong> issuesfirst stage forecast for the seasonal monsoon rainfall over the country as a whole in22


April and the second stage or update forecast in June. These forecasts are based onthe statistical models developed indigenously. Because of the inherent problems inthe statistical models such as epochal variation in the predictand-predictorrelationship, inter-correlation between the predictors, changing predictability etc.,there is necessity of subjecting statistical models to a constant scrutiny and changesif necessary (Rajeevan et al. 2004). The changes can be brought out by differentways like changing the model size, use of new predictors, changing the combinationof the predictors, changing the model development period etc. The changes areacceptable when the modified model shows better performance compared to theexisting model during a common test period. The continuing efforts towards this endand attempt to adopt better statistical techniques have resulted in the developmentof number of new models that make use novel statistical ideas for forecasting ofseasonal ISMR over the country as whole.Four models namely EMR-I, PCR-I, PPR-I & ANN-I which were based on apredictor set needing data up to March only were developed to support the firststage forecast. Another four models namely EMR-II, PCR-II, PPR-II & ANN-II basedon another data set requiring data up to May were developed to support the secondstage forecast. All the models showed good skill in forecasting the ISMR during mostof the years correctly. Particularly, during most of the extreme ISMR years, thepredicted ISMR was close to the actual value. All these models were also able tocorrectly predict the recent two drought monsoon years (2002 & 2004). The RMSEduring the independent forecasting period for all the models (used for both first andsecond stage forecast) was relatively smaller than that for the model based onclimatology. The models developed for the second stage or update forecast showedbetter skill in predicting year to year variation of ISMR in comparison to thatdeveloped for the first stage forecast.This is the first time that the ensemble technique was applied in statisticalmodels for predicting ISMR. The use of ensemble average of predictions from the 5best MR models having different predictor combinations as the final predictioneffectively reduces the error resulting from the use of single model that having lowerRMSE. However, it may be mentioned that the reduction of error by the combineduse of a set (ensemble) of models in place of a single model means only less23


chance of failure but no guarantee of correct forecasts in all the occasions. Here wehave used predictions from the best 5 models among all possible multiple regressionmodels. But other techniques such as neural network, projection pursuit regressionetc can also be used in place of multiple regression technique. We have alreadymade such experiments successfully and found that there are not much significantchanges from the results that we have presented here.The PPR is a new nonparametric technique, used for the first time for theprediction of ISMR. In this method, the ISMR was expressed as sum of two parts.The first part was computed from the first most interesting projection thatrepresented the linear relationship between predictors and the predictand. Thesecond part was computed from the second most interesting projection thatrepresented non linear relationship between predictor and ISMR. Iyengar and RaguKanth (2003) suggested a method of separating the ISMR series into linear and nonlinear parts using a decomposition technique and then predicting them individually.But their method used the time series of ISMR itself only for this purpose and did notuse any predictor parameters. In the PPR method, the predictors (derived fromslowing varying boundary conditions) that having significant physical and statisticalrelationship with ISMR were used to predict the linear and non linear parts of theISMR. Thus the PPR model is conceptually better placed to take into account thereal non linear relationship that may exist between the predictors and the predictandwhile making prediction.In both the PCR and ANN models, the use of selected PCs as the input wasto reduce the predictor dimension and effect due to the inter-correlation among thepredictors if any. Although the ANN method generally claims to have superiorproperties such as adaptivity, robustness, ruggedness, speed (via massiveparallelism), non-linearity and optimality with respect to error compared to linerregression the results in this study did not show much difference as far as theprediction of the ISMR is concerned (see the figures 9 and 12) during the dataperiod considered here. However, it may be mentioned that in this study we haveused only a simple feed forward network having only a single hidden layer. Use ofmore complex neural network with better error propagation algorithm may be able to24


account the variability more accurately but only at the cost of reduced degree offreedom as the more complex network will add more nodes and more dimensions.Availability of predictions from 4 different models for the similar period butbased on different technqiue provided us an ideal opportunity to try super ensemblemethod of prediction like that proposed by Krishnamurti et al. (2000) for dynamicalmodels. The Fig.13 shows the super-ensemble prediction of ISMR for the first stageforecasts derived as the average of the predictions generated from the EMR, PCR,PPR and ANN the super-ensemble prediction of ISMR for the second stageforecasts. Table-9 shows the model statistics of the super ensemble models for thefirst stage and second stage forecasts. As seen in the Figures 13 & 14, noticeablegeneral improvement was not observed with super ensemble method over thepredictions by the individual models used in the super ensemble. However, as seenin the Fig.13, in some years like 1984, 1993, 1996 & 1997, when the predictionsusing the 4 different models showed large deviation on either side of the actualvalues, the super ensemble average was closer to actual value. This only indicatesscope for the use of super ensemble technique in the predictions based on statisticalmodels if number of models is available in the ensemble and there is large deviationin the predictions derived from these models.It is interesting to note that we have used many predictors from Europe andAtlantic Ocean for the development of LRF models. This is due to improvedstatistical relationship of ISMR with the Atlantic and Europe climate anomalies.Recent studies (Gadgil et al. 2003 and Gadgil et al. 2004) have shown the influenceof equatorial Indian Ocean on the variability of ISMR. We need to explore precursorsfor the events in the equatorial Indian Ocean (like EQUINOO) to use as predictors inthe empirical models along with other ENSO-related predictors.Experimental forecasts for 2005In view of the satisfactory performance of these models during the recentyears, experimental forecasts for the year 2005 were generated. To generateensemble forecast (using EMR-I) for the year 2005, the best models were derivedusing data of 1951-2000 as we discussed earlier. Similarly, the best models for25


EMR-II were based on the data of 1958-2000. The forecasts for the 2005 monsoonseason are shown in Table -10. As seen in the Table-10, the first stage forecasts byvarious models varied from 101 to 103 and the second stage forecasts varied from102 to 107.Various components of the Indian monsoon exhibit significant inter-decadalvariability (see the references given in Goswami, 2005). Modulation of inter-annualvariability by the inter-decadal variability influences predictability of the seasonalmean monsoon (Goswami 2005). A recent study by Goswami (2004) revealed thatpotential predictability of monthly mean summer monsoon climate has decreased byalmost a factor of two during the recent decades (1980s and 1990s) compared to thedecades of 1950s and 1960s associated with the major inter-decadal transition ofclimate in mid 1970s. Therefore, to generate forecasts for 2005 monsoon using thelatest data, say after mid 1970s, we have done yet another experiment. Wegenerated the first and second stage forecasts for the year 2005 using the followingsteps. i) Best models were identified by Jacknife method using predictor series oflatest 23 years (1982-2004) and ii) the ensemble forecast was computed as thesimple average of the predictions using the selected 5 models. We found that theselected 5 models were entirely different from the best five models shown in theTables-4a and 4b. The first stage forecast for the year 2005 using new EMR-I was100% and that for the second stage forecast using new EMR-II was 103% of LPA.For the PPR model, corresponding forecasts were 98% and 102% of LPArespectively.Preliminary forecasts from these experimental models were included in theofficial press releases made by <strong>IMD</strong> on the long range forecasts for the 2005southwest monsoon rainfall (http://www.imd.ernet.in).AcknowledgementsWe are thankful to Prof. V. S. Ramamurthy, Secretary, DST, Shri. B. Lal,LACD DGM, Shri. S R. Kalsi, ADGM(S) and Dr.(Mrs) N. Jayanthi, LACD ADGM(R)for encouragements and kind support given to us. Some of these results were26


presented to the scientists at the Indian Institute of Science and Indian StatisticalInstitute, Bangalore. We are thankful to them for their valuable comments. We alsoappreciate the sincere efforts made by Shri. J. D. Kale, AMI and Shri. S. G.Nargund, AM II and other staff members of LRF division. We thank them for theirsupport.27


ReferencesAngell J. K., 1981, Comparison of variations in atmospheric quantities with SSTVariations in equatorial eastern Pacific. Mon. Wea. Rev. 109, 230-243.Bamzai A. S., Shukla J., 1999, Relationship between Eurasian snow cover, snowdepth, and the Indian summer monsoon: An observational study. J Climate12, 3117-3132.Behera, S.K. and Yamagata, T., 2001, Subtropical SST dipole events in thesouthern Indian Ocean, Geophys. Res. Letters., 28, 327-330.Box, G. E. P., Jenkins, G. M. and Reinsel, G.C., 1994, Time Series Analysis,Forecasting and Control, 3rd ed. Prentice Hall, Englewood Clifs, NJ.Chan, J. C.L. and Shi, J., 1999, Prediction of the summer monsoon rainfall overSouth China. Int. J. Climatol., 19, 1255Chan, J. C. L., Shi, J. and Lam, C., 1998, Seasonal Forecasting of Tropical CycloneActivity over the Western <strong>No</strong>rth Pacific and the South China Sea, J. Wea.Foreca., 13, 997-1004.Crask, M. R. and Perreault, W. D., Jr., 1977, Validation of discriminant analysis inmarketing research. Journal of Marketing Research, 14, 60-68.Delsole, T., Shukla J., 2002, Linear prediction of the Indian monsoon rainfall. Centrefor Ocean-Land-Atmosphere Studies (COLA) Technical <strong>Report</strong>. CTR 114-58.Friedman, J. H. and Stuetzle, W., 1981, Projection pursuit regression. J. Amer.Statist. Assoc. 76, 817-823.Friedman, J. H. and Tukey, J. W., 1974, A projection pursuit algorithm forexploratory data analysis. IEEE Trans. Comput. 23, 881-890.Gadgil, S., Vinaychandran, P.N., Francis, P.A., 2003, Droughts of the Indiansummer monsoon: Role of clouds over the Indian ocean, Curr. Science, 84,1713-1719.28


Gadgil, S., Vinaychandran, P.N., Francis, P.A., Gadgil, S., Extremes of Indiansummer monsoon rainfall, ENSO, equatorial Indian Ocean Oscillation,Geophys.Res.Letters, 2004, 31, doi.10.1029/2004GL019733.Gadgil, S. and Sajani, S., 1998, Monsoon precipitation in the AMIP runs, ClimateDynamics, 14, 659-689.Gadgil, S., Rajeevan, M. and Nanjundiah, R., 2005, Monsoon prediction - Why yetanother failure?. Current Science 88(9), 1389-1400.Goswami, B,N., 2005, The Asian Monsoon: Interdecadal Variability, In GlobalMonsoon System: Research and Forecast, Edited by C.P Chang, Bin Wangand N.C.Lau, Chapter 26, 455.Goswami, B.N., 2004, Interdecadal changes in potential predictability of the Indiansummer monsoon, Geophys.Res.Letters, 31, L16208, doi, 10.1029/2004GL020337.Goswami, P. and Srividya, P., 1996, A Neural Network design for long rangeprediction of rainfall pattern, Curr. Sci., 70, 447-457.Gowariker, V., Thapliyal, V., Kulshrestha, S. M., Mandal, G. S., Sen Roy, N andSikka, D. R., 1991, A power regression model for long range forecast ofsouthwest monsoon rainfall over India, Mausam, 42, 125-130.Gowariker, V., Thapliyal, V., Sarker, R. P., Mandal G. S., Sikka, D. R., 1989,Parametric and power regression models: New approach to long rangeforecasting of monsoon rainfall in India, Mausam, 40, 115-122.Guhathakurta, P., Rajeevan, M. and Thapliyal, V., 1999, Long range forecastingIndian summer monsoon rainfall by a Principal component Neural NetworkModel. Met and Atmos. Phys, 71, 255-266.Hastenrath, S. and L. Greischar, 1993, Further work on the prediction of northeastBrazil rainfall anomalies. J. Climate, 6, 743-758Hastenrath, S., 1995, Recent advances in tropical climate prediction, J. Climate, 8,1519-1532.Huber, P., 1985, Projection pursuit, The Annals of Statistics 13, 435–475.Iyengar R.N and Raghukanth, S.T.G., 2003, Empirical Modelling and Forecastingof Indian Monsoon Rainfall. Current Science, 85, 8, 1189-1201.29


Iyengar R.N and Raghukanth, S.T.G., 2004, Intrinsic mode functions and a strategyfor forecasting Indian monsoon rainfall . Met. and Atmos. Physics. 90, 17-36,Jagannathan, P., 1960, Seasonal Forecasting in India: A review, <strong>IMD</strong> <strong>Pune</strong>, 80.Kalnay, E. et al., 1996, The NCEP/NCAR reanalysis project. Bull. Amer. Meteor.Soc. 77, 437-471.Kang, I.S. and co authors, 2002, Intercomparision of the Climatological variations ofAsian Summer Monsoon Precipitation simulated by 10 GCMS, ClimateDynamics, 19, 383-395.Kishtawal, C. M., Basu, S., Patadia, F. and Thapliyal, P. K., 2003, Forecastingsummer monsoon rainfall over India using genetic algorithm. Geophys. Res.Lett., 2003, 30, doi.10.1029/2003GL018504.Krishna Kumar, K., Rajagopalan B., Cane M. A., 1999, On the weakeningrelationship between the Indian monsoon and ENSO. Science 284, 2156-2159.Krishna Kumar, K., Soman M. K. and Rupa Kumar, K., 1995, Seasonal forecastingof Indian summer monsoon rainfall. Weather. 50, 449-467.Krishna Kumar, K., Hoerling, M. and Rajagopalan, B. 2005: Advancing dynamicalprediction of Indian monsoon rainfall. Geophys. Res. Lett., 32(8), L08704,10.1029/2004GL021979, 4.Krishnamurti, T.N., Kishtawal, C.M., LaRow, T.E., Bachiochi, D.R., Zhang, Z.,Williford, C.E., Gadgil, S., and Surendran, S., 2000, Multimodel ensembleforecasts for weather and seasonal climate, J. Climate, 13, 4196-4216.Kung, E. C. and Sharif, T. A., 1982, Long range forecasting of the Indian Summermonsoon onset and rainfall with upper air parameters and SST, J. Met. Soc.Japan, 60, 672-681.Latif, M., Stockdale, T., Wolff, J., Burgers, G., Maier-Reimer, E., Junge, M. M., Arpe,K. and Bengtsson, L., 1994, Climatology and Variability in the ECHO coupledGCM. Tellus, 46A, 351-366.Mason, S. J., Joubert, A. M. C. Cosijn, and Crimp, S.J. 1996: Review of seasonalforecasting techniques and their applicability to southern Africa. Water SA,22, 203-209.30


McBride, J.L. and Nicholls, N., 1983, Seasonal relationships between Australianrainfall and the Southern Oscillation. Monthly Weather Review, 111, 1998-2004.McPhaden, M.J., 2003, Tropical Pacific Ocean heat content variations and ENSOpersistence barriers, Geophys. Res. Letters, 30, 1480, doi: 10.1029/2003GL016872.McPhaden, M.J., Busalacchi, A.J. Cheney, R. Donguy, J.R. Gage, K.S. Halpern, D.Ji, M. Julian, P. Meyers, G. Mitchum, G.T. Niiler, P.P. Picaut, J. Reynolds,R.W. Smith, N. and Takeuchi, K. 1998, The Tropical Ocean GlobalAtmosphere (TOGA) observing system: a decade of progress. J. Geophys.Res., 103, 14, 169-14, 240.Meinen, C.S. and McPhaden, M.J. 2000, Observations of warm water volumechanges in the equatorial Pacific and their relationship to El Niño and LaNiña, J. Clim., 13, 3551-3559.Navone, H. P. and Ceccato, H. A., 1995, Predicting Indian monsoon rainfall: Aneural network approach, Climate Dynamics, 10, 305-312.Nicholls, N., 1984, The stability of Empirical Long Range Forecast Techniques: Acase study, J. Clim. A l. Meteorol., 143-147.<strong>No</strong>rmand, C., 1953, Monsoon seasonal forecasting. Quart. J. Roy. Meteor. Soc.,79, 463–473.Pai D.S., 2003, Teleconnections of Indian summer monsoon with global surface airtemperature anomalies. Mausam 54, 407-418.Pai D.S., 2004, A possible mechanism for the weakening of the El Nino-Monsoonrelationship during recent decade. Met. Atmos. Physics 86, 143-157.Pai. D.S. and Rajeevan, M., 2005, Long range prediction models for the Indiansummer monsoon rainfall with different lead time periods based on the globalSST anomalies (Meteorol. Atmos. Physics [in Press]).Pant G.B., Parthasarathy B., 1981, Some aspects of an association between thesouthern oscillation and Indian summer monsoon. Arch. Met. Geophy. Biok.l.,B 29, 2, 245-252.31


Parthasarathy, B., Rupa Kumar, K. and Munot, A. A., 1996, Homogeneous regionalsummer monsoon rainfall over India: Interannual variability andteleconnections, IITM, <strong>Pune</strong>, RR <strong>No</strong>. 070.Quinn, G. P. and Keough, M.J. 2002, Experimental Design and Data Analysis forBiologists. Cambridge University Press. 520 .Rajeevan M., 2002, Winter surface pressure anomalies over Eurasia and Indiansummer monsoon, Geophys. Res. Lett. 29, 94.1-94.4.Rajeevan M., Pai D.S., Dikshit S.K., Kelkar R.R., 2004, <strong>IMD</strong>’s New OperationalModels for Long Range Forecast of South-west Monsoon Rainfall over Indiaand their Verification for 2003. Current Science 86, 422-431.Rajeevan M., Pai D.S., Thapliyal V., 1998, Spatial and Temporal Relationshipsbetween Global Land Surface Air Temperature Anomalies and IndianSummer Monsoon. Met. Atmos. Physics 66: 157-171.Rajeevan M., Pai D.S., Thapliyal V., 2002, Predictive relationships between IndianOcean sea surface temperatures and Indian summer monsoon rainfall.Mausam 53, 337-348.Rajeevan, M., 2001, Prediction of Indian summer monsoon: Status, problems andprospects, Current Science, 11, 1451-1457.Rajeevan, M., and McPhaden, M.J. 2004, Tropical Pacific upper ocean heat contentvariations and Indian summer monsoon rainfall, Geophys.Res.Letters, 31,L18203, doi.10.1029/2994GL020631.Rajeevan, M., Guhathakurta, P. and Thapliyal, V., 2000, New models for longrange forecasts of summer monsoon rainfall over <strong>No</strong>rthwest and PeninsularIndia, Meteorol. Atmos Phys., 73, 211-225.Raman C.R.V., Maliekal J.A., 1985, A northern oscillation relating <strong>No</strong>rthernHemisphere pressure anomalies and the Indian summer monsoon?. Nature314 , 430-432.Raman C.R.V., Rao Y.P. ,1981, Blocking highs over Asia and droughts over India.Nature 289, 271-273.Rao C.R., 1964, The use and interpretation of principal component analysis inapplied research. Sankhya (Indian Journ. of Statistics) 26, 329-358.32


Rasmusson, E.M., Carpenter T.H., 1983, The relationship between easternequatorial Pacific sea surface temperature and rainfall over India and SriLanka. Mon. Wea. Rev. 111, 517-528.Riedmiller, M. Braun, H. 1993, A direct adaptive method for faster back propagationlearning: The rprop algorithm, In Proceedings of the IEEE InternationalConference on Neural Networks, San Francisco, CA, 586--591.Saha, K, Sanders, F. and Shukla, J. 1981, Westward propagating predecessors ofmonsoon depressions. Mon. Wea. Rev., 109, 303–343.Sahai A.K., Grimm A.M., Satyan V., Pant G.B., 2003, Long-lead prediction of Indiansummer monsoon rainfall from global SST evolution, Climate Dynamics 20,855-863.Sikka D. R. (1980) Some aspects of the large scale fluctuations of summer monsoonrainfall over India in relation to fluctuations in the planetary and regional scalecirculation parameters. Proc Ind Acad Sci (Earth and Planetary Sci) 89,179-195.Singh, O. P. and Pai , D.S. , 1996, An oceanic model for the prediction of SWmonsoon Rainfall Over India, Mausam, 47, 91-98.Smith, T.M., and R.W. Reynolds, 2004, Improved Extended Reconstruction of SST(1854-1997). Journal of Climate, 17, 2466-2477.Terray, P., Delecluse, P., Labattu, S. and Terray. L., 2003, Sea surface temperatureassociations with the late Indian summer monsoon, Climate Dynamics, doi.10.1007/s00382-003-0354-0.Thapliyal, V., 1982, Stochastic Dynamic model for long range forecasting of summermonsoon rainfall in peninsular India, Mausam, 33, 399-404.Thapliyal, V. and Kulshrestha, S.M., 1992, Recent models for long-range forecastingof south-west monsoon rainfall over India, Mausam, 43, 239-248.Tukey, J. W., 1958, Bias and confidence in not-quite large samples. Annals ofMathematical Statistics, 29, 614.Venkatesan, C., Raskar, S. D., Tambe, S. S., Kulkarni, B. D. and Keshavamurty, R.N., 1997, Predication of all India summer monsoon rainfall using error-backpropagation Neural Networks, Meteorol. Atmos. Phys., 62, 225-240.Walker, G. T., 1914, A further study of relationships with Indian monsoon rainfall, II,Mem. India Meteorol. Dept., 23, 123-129.33


Walker, G. T., 1923, Correlation in seasonal variations of weather, VIII, Apreliminary study of world weather, Mem. India Meteorol. Dept., 24, 75-131.Walker, G. T., 1924, Correlation in seasonal variations of weather, IV, A furtherstudy of world weather, Mem. India Meteorol. Dept., 24, 275-332.Wang,B., Ding, Q., Fu, X., Kang, I., Jin, K., Shukla, J., Doblas-Reyes, F., 2005,Fundamental challenge in simulation and prediction of summer monsoonrainfall, Geophys. Res. Lett., 32, <strong>No</strong>. 15, L15711, doi. 10.1029/2005GL022734.Woodruff S.D., Diaz H.F., Elms J.D., Worley S.J., 1998, COADS release 2 data andmetadata enhancements for improvements of marine surface flux fields.Phys. Chem. Earth 23, 517-527.34


Table – 1 : Details of Predictors used for First Stage Forecast (SET-I)<strong>No</strong>. Parameter PeriodA1 <strong>No</strong>rth Atlantic SST Anomaly DEC+JANA2A3A4A5A6Equatorial SE Indian OceanSST AnomalyEast Asia Surface PressureAnomalyEurope Land Surface AirTemperature Anomaly<strong>No</strong>rthwest Europe surfacePressure Ano. TendencyWarm Water VolumeAnomalyFEB+MARFEB+MARSpatialDomain20N-30N,100W-80W20S-10S,100E-120E35N-45N,120E-140EC.C with ISMR(1951-2000)-0.400.450.33JAN 5 Stations 0.34DJF(0) –SON (-1)FEB+MAR65N-75N,20E-40E5S-5N,120E-80W-0.40-0.37Table – 2 : Details of Predictors used for Second Stage Forecast (SET-II)<strong>No</strong>. Parameter PeriodJ1 <strong>No</strong>rth Atlantic SST Anomaly DEC+JANJ2J3J4J5J6Equatorial SE Indian OceanSST AnomalyEast Asia Surface PressureAnomalyNino3.4 SST AnomalyTendency<strong>No</strong>rth Atlantic PressureAnomaly<strong>No</strong>rth Central Pacific ZonalWind Anomaly at 850hpaFEB+MARFEB+MARMAM(0) -DJF(0)MayMaySpatialDomain20N-30N,100W-80W20S-10S,100E-120E35N-45N,120E-140E5S-5N,170W-120W35N-45N,30W-10W5N-15N,180E-150WC.C with ISMR(1951-2000)-0.400.450.33-0.40-0.30-0.4035


Table – 3 : Inter-correlation among the predictors used in SET-I & SET-II. For the computation ofthe C.Cs the data for the period 1958-2000 were used. C.C. values significant at and above 5%level are shown in bold letters.A1/J1 A2/J2 A3/J3 A4 A5 A6 J4 J5 J6A1/J1 1.00A2/J2 0.02 1.00A3/J3 -0.29 0.18 1.00A4 -0.04 0.16 0.23 1.00A5 0.05 -0.06 0.04 -0.69 1.00A6 0.22 -0.47 -0.07 -0.06 -0.01 1.00J4 0.24 -0.53 -0.18 -0.14 -0.03 0.61 1.00J5 0.22 -0.06 -0.17 -0.39 0.46 0.21 0.01 1.00J6 0.08 -0.47 -0.04 -0.20 0.06 0.17 0.12 0.03 1.00Table - 4a : Best 5 models selected by Jacknife method on SET –I and theirerror statistics for the period 1951-2000.Regression Model<strong>No</strong> ofPredictorsError Measures(1951-2000)PredictorsR RMSE RModel 1 4 A1, A2, A4and A5 0.47 7.59 0.71Model 2 3 A1, A2 and A5 0.43 7.91 0.68Model 3 5 A1, A2, A3, A4 and A5 0.43 7.76 0.70Model 4 5 A1, A2, A3, A5 and A6 0.43 7.75 0.70Model 5 4 A1, A2, A4, and A6 0.40 8.02 0.67Table - 4b : Best 5 models selected by Jacknife method on SET -II and their errorstatistics for the period 1958-2000.Regression Model<strong>No</strong> ofPredictorsError Measures(1958-2000)PredictorsR RMSE RModel 1 5 J1, J3, J4, J5 and J6 0.56 6.76 0.78Model 2 4 J1, J3,J4 and J5 0.55 6.88 0.77Model 3 5 J1, J2, J3, J5 and J6 0.55 6.80 0.78Model 4 6 J1, J2, J3, J4, J5 and J6 0.55 6.74 0.78Model 5 4 J1, J2, J5and J6 0.54 6.99 0.7636


Table – 5 : Model statistics of the Ensemble Multiple Regression Models (EMR-I and EMR-II).The combination of predictors in the selected 5 MR models used forthe ensemble models are given in Table-3 a & b.ModelC.CEMR – IBias(% of LPA)RMSE(% of LPA)C.CEMR - IIBias(% ofLPA)RMSE(% ofLPA)1974-20041981-20041974-20041981-20041974-20041981-20041981-20041981-20041981-2004Model 1 0.84 0.84 2.56 4.11 6.78 7.34 0.81 -1.25 6.04Model 2 0.82 0.80 3.42 4.67 7.27 7.96 0.73 -0.08 6.82Model 3 0.85 0.84 2.35 3.80 6.94 7.73 0.76 0.53 6.67Model 4 0.82 0.86 2.52 4.12 6.79 7.18 0.83 -0.99 5.91Model 5 0.85 0.81 3.22 4.54 7.41 8.37 0.78 1.26 6.60Ensemble 0.84 0.84 2.81 4.23 6.82 7.54 0.81 -0.10 5.89Table – 6 : Model statistics of the Principle Component Regression Models (PCR-I and PCR-II).Model Performance ParametersModel C.CBiasRMSE(% of LPA) (% of LPA)1974-20041981-20041974-20041981-20041974-20041981-2004PCR-I 0.81 0.86 3.85 5.52 7.63 7.93PCR-II 0.81 0.47 6.23Table – 7 : Model statistics of the Projection Pursuit Regression Models (PPR-I and PPR-II).ModelC.CModel Performance ParametersBias(% of LPA)RMSE(% of LPA)1974-20041981-20041974-20041981-20041974-20041981-2004PPR-I 0.81 0.82 1.52 3.03 6.70 6.89PPR-II 0.78 -0.92 6.4237


Table - 8: Model statistics of the Artificial Neural Network Models (ANN-I and ANN-II).ModelC.CModel Performance ParametersBias(% of LPA)RMSE(% of LPA)1974-20041981-20041974-20041981-20041974-20041981-2004ANN-I 0.79 0.80 3.4 4.84 7.32 7.68ANN-II 0.83 -0.25 5.54Table - 9: Model statistics of the Super Ensemble Models (SEM-I and SEM-II).ModelC.CModel Performance ParametersBias(% of LPA)RMSE(% of LPA)1974-20041981-20041974-20041981-20041974-20041981-2004first Stage 0.84 0.86 2.89 4.44 6.51 6.88secondStage 0.83 -0.20 5.69Table - 10: Experimental forecasts for the 2005 Seasonal (June –September) Indian SummerMonsoon Rainfall for the country as whole by various models developed in the study.Modelfirst StageForecast(% of LPA)secondStageForecast(% of LPA)Empirical Multiple Regression (EMR) 101 102Principle Component Regression (PCR) 103 104Projection Pursuit Regression (PPR) 101 107Artificial Neural Network (ANN) 103 104Super Ensemble Model (SEM) 102 10438


Fig.1. Geographical locations of the 9 predictors used3N. ATLANTIC SST ANOMALY2STD. ISMR10-1-2-3-3 -2 -1 0 1 2 3STD. SST ANOMALYFig.2. Scatter plots of the ISMR against the 9 predictorsof SET-I & SET-II39


SE INDIAN OCEAN SST ANOMALYSTD. ISMR3210-1-2-3-3 -2 -1 0 1 2 3STD. SST ANOMALYEAST ASIA MSLP ANOMALYSTD. ISMR3210-1-2-3-4 -3 -2 -1 0 1 2 3 4STD. MSLP ANOMALYNW EUROPE SUR.AIR TEMP. ANOM.STD. ISMR3210-1-2-3-3 -2 -1 0 1 2 3STD. SUR. AIR TEMP. ANOMALYFig. 2. (contd.) Scatter plots of the ISMR against the 9 predictorsof SET-I & SET-II40


N. EUROPE MSLP TENDENCY ANOM.STD. ISMR3210-1-2-3-2 -1 0 1 2STD. MSLP TENDENCY ANOMALYWARM WATER VOLUME ANOM.STD. ISMR3210-1-2-3-3 -2 -1 0 1 2 3STD. WWV ANOMALYNINO3.4 SST ANOMALYSTD. ISMR3210-1-2-3-2 -1 0 1 2STD. SST ANOMALYFig. 2. (contd.) Scatter plots of the ISMR against the 9 predictorsof SET-I & SET-II41


N. ATLANTIC MSLP ANOMALYSTD. ISMR3210-1-2-3-3 -2 -1 0 1 2 3STD. MSLP ANOMALYNE NE PACIFIC U850 U850 ANOMALYSTD. ISMR3210-1-2-3-3 -2 -1 0 1 2 3STD. ZONAL WIND ANOMALYFig. 2. (contd.) Scatter plots of the ISMR against the 9 predictorsof SET-I & SET-II42


21 YEARS MOVING C.C BETWEEN ISMR AND THE PREDICTORS0.800.60N ATLEQSEINDO0.40NINO3.4Correlation Coefficient0.200.00-0.20-0.40EUROPEE ASIANATL-0.60NWEUR-0.80WWV-1.00NCPC_U850197019721974197619781980198219841986198819901992199419961998200020022004YEAR ENDING 21 YEARS MOVING WINDOWFig. 3.21 years moving C.C between ISMR and the 9 predictors ofSET-I & SET-IIFig. 4. A simple feed forward neural network with 1 hidden layer of 2 tangentsigmoid neurons and output layer with one linear neuron.43


ADJUSTED R-SQUARE Vs MODEL SIZE: DATA SET-I0.6Adjusted R-Square0.50.40.30.20.100 1 2 3 4 5 6 7Model SizeFig. 5a. Scatter plot of adjusted R-square (1951-2000) againstmodel size of all possible models from SET-IADJUSTED R-SQUARE Vs MODEL SIZE: DATA SET- II0.6Adjusted R-Square0.50.40.30.20.100 1 2 3 4 5 6 7Model SizeFig. 5b. Scatter plot of adjusted R-square against model size of allpossible models from SET-II44


SELECTION OF DEVELOPMENT PERIOD12RMSE0.911C.C0.85ADJUSTED RMSE109870.80.750.70.65CORRELATION COEFFICIENT(C.C)68 10 12 14 16 18 20 22 24 26 28 30 32 340.6DEVELOPMENT PERIODFig. 6. Line plots of RMSE and C.C computed for the predictionsduring the period 1986-2000 for different development periods.The predictions were done using retro-active techniquePERFORMANCE OF ENSEMBLE MODEL : APRIL FORECASTSRAINFALL (% DEP. FROM LPA)2520151050-5-10-15-20-2519741976197819801982198419861988YEAR19901992199419961998200020022004ACTUALMODEL1MODEL3MODEL5ENS_AVEMODEL2MODEL4Fig. 7. Performance of Ensemble MR Model for April Forecasts45


PERFORMANCE OF ENSEMBLE MODEL: JUNE FORECASTS2520RAINFALL (% DEP. FROM LPA)151050-5-10-15-20-2519811982198319841985198619871988198919901991ACTUALMODEL1MODEL3MODEL519921993199419951996YEARENS_AVEMODEL2MODEL4199719981999200020012002200320042005Fig. 8. Performance of Ensemble MR Model for April ForecastsPERFORMANCE OF PCR MODELSRAINFALL (% DEP. FROM LPA)2520151050-5-10-15-20-2519741976197819801982198419861988YEAR19901992199419961998200020022004ACTUAL PCR-I PCR-IIFig. 9. Performance of PCR Model Forecasts46


Fig. 10. The left figure shows graph between linear combination of firstprojection and the predictand. Right figure shows graph between linearcombination of second projection and residuals25PERFORMANCE OF PPR MODELS FOR THEPREDICTION OF ALL INDIA SEASONAL RAINFALLRAINFALL (% DEP. FROM LPA)20151050-5-10-15-20Fig.11-2519741976197819801982198419861988YEAR19901992199419961998200020022004ACTUAL PPR-I PPR-IIFig.11. Performance of PPR Model Forecasts47


25PERFORMANCE OF ANN MODELS FOR THEPREDICTION OF ALL INDIA SEASONAL RAINFALL20151050-5-10-15-20-251974197619781980198219841986198819901992199419961998200020022004RAINFALL (% DEP. FROM LPA)YEARACTUAL ANN-I ANN-IIFig.12. Performance of ANN Model ForecastsPERFORMANCE OF SUPER ENSEMBLE MODEL2520RAINFALL (% DEP. FROM LPA)151050-5-10-15-20-251974197519761977197819791980198119821983198419851986198719881989YEAR199019911992199319941995199619971998199920002001200220032004ACTUALEMR-IPPR-ISUPER_ENS_AVEPCR-IANN-IFig.13. Performance of Super Ensemble Model for April Forecasts48


25PERFORMANCE OF SUPER ENSEMBLE MODEL:JUNE FORECASTSRAINFALL (% DEP. FROM LPA)20151050-5-10-15-20-251981198219831984198519861987198819891990199119921993YEAR19941995199619971998199920002001200220032004ACTUAL ENS_AVE EMR-IIPCR-II PPR-II ANN-IIFig.14. Performance of Super Ensemble Model for June Forecasts49

Hooray! Your file is uploaded and ready to be published.

Saved successfully!

Ooh no, something went wrong!