Long range lake water level estimation using artificial intelligence methods

This paper covers the estimation of the water levels of Beysehir Lake, located in middle of Turkey, using the artificial intelligence (AI) such as the neural networks (NN) and the fuzzy logic (FL). The study considers the detailed investigation of the effect of the longterm estimate duration on the lake water level estimation. The analysed estimate ranges were 1 day, 30 days, 60 days and 90 days. The lake parameters such as the shortwave radiation, the lake total inflow rate, the lake total outflow rate and the past lake water levels constituted the input layer of the AI configurations. This study clearly showed that the estimate performance of the AI methods decreases with the increasing estimate range. It is also seen that the best estimate performance criteria are obtained by different AI methods for different estimate ranges. It is seen that the Generalized Regression Neural Network (GRNN) showed relatively superior performance compared with the other two artificial neural networks, i.e. the Radial Basis Function (RBF) and the Feed Forward Back Propagation method (FFBP), and the Adaptive Neuro-Fuzzy Inference System (ANFIS) method, for the long estimation ranges such as 60 and 90 days. The second overall best performance was obtained by FFBP.


INTRODUCTION
The forecasting of a hydrologic variable is one of the main issues on hydrology for the management and planning of reservoir, watershed, and land. The application of the physicsbased process computer software programs necessitates detailed spatial and temporal environmental data which is not often available. Therefore, the artificial intelligence techniques (AI) like the artificial neural networks (ANN), the fuzzy logic (FL) and the genetic algorithm (GA) are frequently used in the literature to forecast the hydrological events/parameters. Artificial neural network (ANN) and fuzzy logic (FL) are non-linear models and can be used to identify this relation. ANN and FL are increasingly being used in the diverse engineering applications. This is due to the ability of ANN and FL to solve the nonlinear problems successfully. This feature is highly important aspect of the neural computing and the linguistic computing, as it can be used to model a function where one has a little information or incomplete understanding.
Although ANNs have successful applications on many hydrological variables, the accuracy of the model predictions is very subjective and highly dependent on the user's ability, knowledge and understanding of the model [26]. However, one of the major criticisms of ANN hydrologic models is that they do not explain the underlying physical processes in a watershed, resulting in them being labelled as black-box models. In the recent years studies about the physics involved in the ANNs have been published. Jain et al. investigated the physics embedded within the correlation weights of the ANNs [27]. Sudheer and Jain tried to explain the internal behaviour of artificial neural network river flow models [28]. Sudheer studied the knowledge extraction from trained neural network river flow models [29].
The number of fuzzy logic applications in hydrology is increasing rapidly [2-3, 9, 30-31]. It seems that the number of usages will increase in science in the form of hybrid models.
The researches about forecasting the water level (WL) of various water bodies in hydrology are changing with the forecasting range from 1 hour to 30 months ( Table 1). The water level is forecasted in the ocean or the sea [1,14,17], the reservoir or the lake [12,31], the river [2][3][9][10] and the groundwater [4,8,11,16,20,32]. The forecasting ranges for groundwater level estimation are longer compared with the others, because the groundwater velocities are quite slow and the conditions are almost stable. In the groundwater systems variations can be observed on the monthly basis. The duration of forecasting water level during the flood event is shorter because the system is dynamic and its properties are changing on the minute time increment.
In the presented study the water levels of Beysehir Lake, located in middle of Turkey are investigated. The Lake has freshwater and the surrounding area has karstic structure [33]. The importance of Beysehir Lake for the economy is mainly due to the agriculture, fishery and discharge of wastewater in the surrounding region. The small changes in the elevation of the large lake surfaces can lead to enormous changes in the amount of land surface. When the lake level decreases/increases in one centimetre, the averaged lake volume changes in seven million cubic meter for Lake Beysehir.
The aim of the presented research is to employ the AI methods to forecast the daily lake water level (WL) for long time ranges. The work comprised four studies, i.e., 1 day-ahead estimation, 30 days-ahead estimation, 60 days-ahead estimation and 90 days-ahead estimation. Three type of ANN and one Adaptive Neuro-Fuzzy Inference System (ANFIS) are used to determine the best model for long term forecasting.

ARTIFICIAL INTELLIGENCE METHODS
In this study three artificial neural networks methods and adaptive network based fuzzy method are used to forecast the water level of Lake Beysehir.

FFBP
The FFBP is the most popular ANN training method in the water resources literature. A FFBP distinguishes itself by the presence of one or more hidden layers, whose computation nodes are correspondingly called hidden neurons of hidden units ( Figure 1). The function of hidden neurons is to intervene between the external input and the network output in some useful e-Zbornik 20/2020. Akyuz, D. E., Cigizoglu, H. K. Long range lake water level estimation using artificial intelligence methods manner. By adding one or more hidden layers, the network is enabled to extract higher order statistics. If a training set of input-output data is given, the most common learning rule for multilayer perceptron is the back-propagation algorithm. The back propagation involves two phases; a feed forward phase in which the external input information at the input nodes is propagated forward to compute the output information signal at the output unit, and a backward phase in which modifications to the connection strengths are made based on the differences between the computed and observed information signals at the output units. Different type of activation functions can be employed for the computation of the input layer and output layer outputs. In this study the "tangent sigmoid" function and the "logsig" function are used and the corresponding estimation performances are compared.

GRNN
The GRNN consists of four layers; the input layer, the pattern layer, the summation layer and the output layer [34]. In the first layer, there are input parameters and they are completely connected to the second layer, i.e. the pattern layer ( Figure 2). Each pattern layer unit is connected to the two neurons in the summation layer. The optimal value of spread (s) is often determined experimentally [7]. The larger that spread is the smoother the function approximation will be. The GRNN approximates any arbitrary function between input and output vectors, drawing the function estimate directly from the training data. Furthermore, it is consistent; that is, as the training set size becomes large, the estimation error approaches zero, with only mild restrictions on the function. The GRNN is used for the estimation of the continuous variables, as in the standard regression techniques. It is related to the radial basis function network and is based on a standard statistical technique called kernel regression. Long range lake water level estimation using artificial intelligence methods 5

RBF
RBF networks were introduced into the neural network literature by Broomhead and Lowe [35]. The RBF network model is motivated by the locally turned response observed in biological neurons ( Figure 3). The theoretical basis of the RBF approach lies in the field of interpolation of multivariate functions [26]. The solution of the exact interpolating RBF mapping passes through every data point. Different spread constants were tried in the study.

ANFIS Method
Fuzzy algorithms for complex systems and decision processes are presented by Lotfi Asker Zadeh in 1973 [36]. ANFIS based on fuzzy algorithms was proposed in 1993 by Jyh-Shing Roger Jang as allowing the fuzzy systems to learn [37]. It has an input-output mapping based on both human knowledge and stipulated input-output data pairs so it has ability to deal with nonlinear and complex mathematics problem ( Figure 4). ANFIS is mostly used in the hydrological applications for modelling and prediction. Some researchers used ANFIS to forecast the water level [2-3, 9, 31]. In this study ANFIS is employed for the lake water level estimation after long time ranges.

Study Area
Lake Beysehir is at the north of Konya-Beysehir, south of Isparta-Sarkikaraagac and in the tectonic pit between mountains of Sultan and Anamas, is the largest freshwater lake in Turkey. It has 700 km 2 surface area. The deepest location has an approximate depth of 11 m, and the average depth is 6 m. The lake has an important role for irrigation and supplying drinking water for middle Anatolia. The location of Lake Beysehir is showed at Figure 5. The lake and the surrounding area are under protection since they are National Park, Protection Area of Drinking and Irrigation Reservoir; in group "A" inland water with international importance and they have historical and cultural significance. The lake has a special importance among all of other lakes because of the wildlife and the outstanding nature. Lake Beysehir is an attractive lake in Turkey with islands having different sizes, sandy beach, karstic caves and flora. But the lake faces lots of problems such as the decrease on the lake water level, incorrect water using policy, uncontrolled fishery, urbanization and excessive lake use. It has socio-economic importance for fishery, irrigation, bird life.
The climatic and hydrological daily data (1991)(1992)(1993)(1994)(1995)(1996)(1997)(1998)(1999)(2000)(2001) used in this study are provided by State Meteorological Service of Turkey (DMI) and State Water Works of Turkey (DSI). The lake parameters considered in this study are the shortwave radiation, the total outflow, the total inflow and the water level. The daily mean data belongs to Beysehir Lake in the central part of Turkey ( The common data period for all lake parameters covers the time period starting from 1st of September 1991 to 2nd of July 2001 ( Figure 6). The data in this time period is divided into two time periods as training and testing. The testing data covers the last 20% of all daily values ( Table 2).

Statistics of the Lake Parameters
The basic descriptive statistics such as the maximum (X max ), the minimum (X min ), the mean (X mean ), the standard deviation (s x ) and the skewness (c sx ) are computed for the training, the testing and the whole data period (Table 3). Although the skewness variation range for all time periods for the lake water level (0.06-0.14) is close to zero, the corresponding range for the total inflows is the opposite (2.66-5.07, Table 3). Similar to the lake water level also the shortwave variation demonstrates a skewness variation range (-0.06-0.004) quite close to zero. The total outflow skewness, on the other hand, varies between 0.30 and 0.97 (Table 3). It can be concluded that the shortwave radiation and the WL illustrate symmetrical marginal probability distribution (Normal Distribution) whereas the total inflow and the total outflow deviate from Normal Distribution with positive skewness. Except the total inflow the testing and the training maximum values are close to each other (Table 3).  The autocorrelations for the water level are given at Figure 7. The autocorrelation variation range is 0.58-1 for the first 100 lags, i.e.100 days. The cross-correlations between the lake parameters show that the water level (WL t ) has the lowest correlation with the outflow (r=0.01, Table 4). The water level-shortwave radiation and the water level-the total inflow correlations are equal to 0.30 (Table 4). The auto-correlations for the water level time series are also provided in Table 4. Accordingly, the lag_1 autocorrelation (between WL t and WL t+1 ), the lag_30 autocorrelation (between WL t and WL t+30 ), the lag_60 autocorrelation (between WL t and WL t+60 ), and the lag_90 autocorrelation (between WL t and WL t+90 ) values are found as 1.00, 0.94, 0.81 and 0.64, respectively (Table 4).

RESULTS
In the study 3 different ANN methods, FFBP, GRNN, RBF with Levenberg-Marquardt learning algorithm [25], and ANFIS method [37] are employed for each selected long-term estimation case. Hence each method is trained and tested for four different WL estimation ranges, i.e., 1 day, 30 days, 60 days, and 90 days. In total, 16 different simulations are accomplished. As there are 4 different single output cases (WL t+1 , WL t+30 , WL t+60 , WL t+90 ) the input structure consists always of the total inflow, the total outflow, the shortwave radiation and WL t . The model's parameters are summarized at Table 5. The estimation study results for four different time ranges are summarized in the following section. All three ANN methods, FFBP, GRNN, and RBF, and the ANFIS method are trained with an input layer having 4 inputs, i.e., the total inflow, the outflow, the shortwave radiation, and the WL, all measured at time "t". The unique output represented the water level at time "t+1", "t+30", "t+60", or "t+90". The training and the testing time periods of the ANN models are as presented in Table 2. The related ANN and ANFIS model parameters and the model configurations which provided the best testing performances are provided in Table 5. According to this table FFBP (4,4,1) represents a FFBP configuration with an input layer of 4 neurons, a hidden layer having 4 neurons and an output layer with a unique node (Table 5, second column). The training iteration number for FFBP is found as 1000. The best activation functions are found as tangent sigmoid between input layer and hidden layer and as logarithmic sigmoid between the hidden layer and the output layer.

Long range lake water level estimation using artificial intelligence methods
The testing stage performance evaluation criteria such as the root mean square error (RMSE) and the determination coefficient (R 2 ) obtained for the testing period are listed in Tables 6 and 7 for each AI method.
The performance evaluation criterion RMSE is formulated as below: The second performance evaluation criterion i.e. the determination coefficient (R 2 ) is computed as presented below:

1 day Ahead Estimation (WL t+1 )
The GRNN and RBF had the spread values equal to 0.8 and 0.67, respectively (Table 5, third row). The RMSE and R 2 values for 1 day ahead estimation for WL t+1 , are given in Tables  6 and 7 under the heading "t+1". The lowest RMSE is obtained with ANFIS (0.007, 0.006) for the training, testing data and the whole data (Table 6). Except GRNN, all other three methods provided RMSE values either equal or less than 0.010. The best RMSE values are shown in bold font and underlined (Table 6).  All of the R 2 values obtained by 4 are methods are equal to 1.00 showing quite high performance for training, testing and the whole series for the estimation range 1 day ( Table  7). The WL plots in the form of the water level hydrograph and scatter plot are illustrated in Figure 8 and Figure 9 for all artificial intelligence methods. Except GRNN the model estimations and the observed values are nearly indistinguishable (Figure 8 and 9). For GRNN, however, deviations from the observed values can be noticed (Figure 8b and 9b).

30 days Ahead Estimation (WL t+30 )
The next step of the study was extending the estimation range from 1 day to 30 days (1 month). The estimation results are defined again in terms of RMSE and R 2 (Tables 6 and 7) and the water level hydrographs and the scatter plots are illustrated in Figure 10 and 11. The e-Zbornik 20/2020. Akyuz, D. E., Cigizoglu, H. K. Long range lake water level estimation using artificial intelligence methods

12
ANN and ANFIS configurations with best performances are presented in Table 5. The FFBP method provided the best RMSE performance for testing series ( Table 6). The GRNN method had the lowest RMSE for the whole and training series. The FFBP had the best performance again in terms of R 2 ( Table 7). The R 2 values vary between 0.931 and 1.000 pointing that all methods have high performance for 30 days ahead estimation ( Table 7). The WL plots in the form of the water level hydrograph and scatter plot show that the lake level estimation for the testing stage are quite close to the observed values with acceptable deviations from the trend line (Figures 10 and Figure 11).

60 days Ahead Estimation (WL t+60 )
In this part of the estimation work the estimation range is extended to 60 days (2 months). It is seen that the GRNN method dominated the estimation study with best RMSE and R 2 performances for the training data and the whole series (Tables 6 and 7). The FFBP method had the second-best performance.

Long range lake water level estimation using artificial intelligence methods
The RMSE and R 2 values for 60 days ahead estimations for WL t+60 , are given in Tables  6 and 7 under the heading "t+60". The lowest RMSE is obtained with GRNN (0.123) with RBF having the second-best performance (0.148) for the whole series ( Table 6). The highest R 2 value is obtained with GRNN (0.966) with FFBP having the second-best performance (0.961) again if whole series is considered ( Table 7). The water level hydrographs and the scatter plots show that the estimates deviate from the observed values staying however within the acceptable error range (Figures 12 and 13).

90 days Ahead Estimation (WL t+90 )
The final part of the estimation analysis comprised the 90 days (3 months) ahead estimation. The RMSE and R 2 values for 90 days ahead estimation for WL t+90 , are given in Tables 6 and  7 under the heading "t+90". The lowest RMSE is obtained with GRNN (0.150) with RBF having the second-best performance (0.190) for the whole series (Table 6). GRNN had again the highest R 2 (0.926) followed by FFBP (0.887) for the whole data ( Table 7). The performance of the ANFIS method was relatively inferior compared to other three ANN methods in terms of these two performance evaluation criteria except the testing case (Tables 6 and 7). ANFIS demonstrate lower deviations for the testing stage (Figures 14 and  15).

CONCLUSIONS
The RMSE performances of models for all estimation ranges are given Table 6. All of the AI methods provided satisfactory estimation performances to predict the lake water level for estimation ranges varying between 1 day and 90 days. It is seen that the GRNN method had the best performance evaluation criteria values for the estimation ranges higher than 1 day. It can be deduced that the estimation performance of the GRNN dominates the other AI methods as the estimation range increases. The GRNN approach does not require an iterative training procedure differing from FFBP and ANFIS. It approximates any arbitrary function between input and output vectors, drawing the function estimate directly from the training data [34]. Although the performance of GRNN was also found superior to FFBP in previous studies [38][39] its performance in long term estimation of a hydrological variable, i.e. lake water level, was investigated for the first time in the presented study. The performance of the ANFIS method was relatively inferior to other three ANN methods in terms of these two performance evaluation criteria for the estimation range (Tables 6 and 7). ANFIS demonstrates lower deviations from the observed values for the estimation range 90 days and the testing stage (Figures 14 and Figure 15). On the test data, FFBP seems to be superior to other three AI methods (Tables 6 and 7). For the testing data, ANFIS provides a linear increase of the RMSE with the increasing estimation range. In contrast, the other methods show different trends on the RMSE line between consecutive estimation ranges. The reason of the relatively inferior performance of the GRNN on the testing data might be the high skewness coefficient of the inflow and the outflow testing data ( Table 3). The testing flow skewness is noticeably higher than the training value ( Table 3). The spread parameter of the GRNN is completely dependent on the skewness of the considered time series [34]. Since both for the training and the testing the same spread is employed the performance of GRNN for the training was quite good but for the testing the estimation performance relatively decreased owing to the nearly doubled flow skewness values ( Table 3).
The accurate estimation of long-term lake water levels is quite important both for the ecological activities within the lake and the human made projects depending on the water levels and the water budget on the lake. With the help of the close estimations for the longtime ranges covering several months, decisions about the future of the water resources projects can be taken previously providing sufficient time for the related local people involved or affected by these projects.
The lake water level represents the lake depth and hence the lake water volume. So, if the water level increases the water volume will have a parallel increase as well. The water volume has a dominant role on the stability of the lake. Even 1 cm lake water level variation can cause high lake water volume change for the lakes with the big surface area. A lake with a low water level is more sensitive to the external effects compared with the high-water volume case. The heat quantity to the increase the temperature of a deep lake is higher than e-Zbornik 20/2020. Akyuz, D. E., Cigizoglu, H. K. Long range lake water level estimation using artificial intelligence methods a shallow lake. Therefore, the quantity of the energy and the duration for the temperature change required for the warming or cooling of the lake depends on the lake water level. This will directly affect the microorganisms, the chemical activities and the stratifications/circulations pattern. Subsequently, these parameters will have effect on the organisms (such as fish, phytoplankton). For example, if the mixture will be high then the stratification will decrease. So, the numbers of the microorganisms increase and then decrease on the water quality. For the opposite situation, i.e., in case of the high stratification, the hypolimnion layer (the bottom of lake) has nutrients but oxygen, the epilimnion layer (the top of lake) has oxygen but nutrients. So as the microorganism number decreases the water quality improves [40].
Briefly, several ecological activities within the lake are related to the water depth either directly or indirectly. This shows how the long-term estimation of the lake water level is significant for the lake management plans considering that the biological and chemical activities are influenced by the lake water volume either positively or negatively. Under the normal conditions the lake water level has a cycle and this cycle has a different structure if various time intervals are examined. On the hourly basis, during a sunny day in a lake where evaporation is dominant the water level is high in the morning and low in the evening hours. For the rainy days, however, the water level is low before the rainfall and high following the rain. On the monthly basis, the WL is high on the rainy seasons (winter/spring) and low on the dry season (summer). If the time interval is year, the WL is high in a rainy year and the opposite in a dry year. Short duration cycles can be comprehended better. But since the observed values are limited and the affecting parameter number is high the water level character is more complex and the variation alternatives are a lot for the long term cycles. This study however confirmed the successful employment of AI methods for this purpose.
The long-term estimation performance of the AI methods should also be tested for the other hydro-meteorological variables such as the river flow discharge, the precipitation, the suspended sediment, the temperature etc. Since most of these variables have lower autocorrelations compared with the lake water level it would be quite challenging to analyse the estimation performances of the AI methods for these variables.