Prediction of the Bicarbonate Amount in Drinking Water in the Region of Médéa Using Artificial Neural Network Modelling

The region of Médéa (Algeria) located in an agricultural site requires a large amount of drinking water. For this purpose, the water analyses in question are imperative. To examine the evolution of the drinking water quality in this region, firstly, an experimental protocol was done in order to obtain a dataset by taking into account several physicochemical parameters. Sec-ondly, the obtained data set was divided into two parts to form the artificial neural network, where 70 % of the data set was used for training, and the remaining 30 % was also divided into two equal parts: one for testing and the other for validation of the model. The intelligent model obtained was evaluated as a function of the correlation coefficient nearest to 1 and lowest mean square error (RMSE). A set of 84 data points were used in this study. Eighteen parameters in the input layer, five neurons in the hidden layer, and one parameter in the output layer were used for the ANN modelling. Levenberg Marquardt learning (LM) algorithm, logarithmic sigmoid, and linear transfer function were used, respectively, for the hidden and the output layers. The results obtained during the present study showed a correlation coefficient of R = 0.99276 with root mean square error RMSE = 11.52613 mg dm −3 . These results show that obtained ANN model gave far better and more significant results. It is obviously more accurate since its relative error is small with a correlation coefficient close to unity. Finally, it can be concluded that obtained model can effectively predict the rate of soluble bicarbonate in drinking water in the Médéa region.


Introduction
Industrial, agricultural, and urban development alters the quality of water and makes it unsafe. This is the case in Médéa region, which is subjected to diversification and increase in the quantity of pollutants released in the aquatic environment without treatment. Depending on the origin of the waste, the pollution may be of a chemical nature, especially heavy metals. To predict the amount of a pollutant, depending on many physicochemical parameters, we refer to methods based on mathematical and artificial intelligence models, such as linear regression (RLM) and artificial neural networks (ANN). 1,2 Neural networks have been highly successful in modelling and predicting environmental parameters. 3 They have been applied for various purposes, e.g.: prediction of groundwater remediation costs for drinking use based on quality of water resource; 4 modelling for nitrate concentration in groundwater; 5 prediction of groundwater suitability for irrigation; 6 modelling of contaminated water treatment processes by homogeneous and heterogeneous nanocatalysis; 7 prediction of the quality of public water supply; 8 modelling of TDS concentrations in river water; 9 prediction of aluminium pitting in natural waters; 10 prediction of fluoride concentration; 11 prediction of bromate removal in drinking water; 12 estimating rainfall relationship with river pollution; 13 model-ling of total dissolved solids; 14 forecasting nitrate concentration in groundwater; 15 effect of drought on pollution of river station by ANN; 16 and prediction of heavy metal concentrations. 17 All these works are linked to different physicochemical parameters of water.
In this work, we propose the use of ANN for the prediction of bicarbonate content of surface waters based on the physicochemical parameters.

Artificial neural network
Artificial neural network (ANN) represents a set of algorithms, the design of derives from and is schematically inspired by the functioning of biological neurons. ANNs are now used as a very powerful tool in modelling and analysing processes, as well as in predicting the behaviour of a given system. 18 The ANN structure consists essentially of an input layer (independent variables), a number of hidden layers, and an output layer. Each of these layers consists of a number of interconnected processing units called neurons. These neurons interact by sending signals, and they are connected to all the neurons in the previous and next layers by links called weights and links. 19 The architecture of the ANN model is shown in Fig. 1.

Materials and methods
In this work, a procedure based on the development and optimisation of the architecture of an ANN using software MATLAB R2013a, is described by a flowchart presented in Fig. 2. 20,21

Database
The data used for this study were obtained from the experimental analysis of several water samples carried out for 84 different days in all seasons in the period from 2018 to 2019 at Médéa region, Algeria. Analyses were done according to the Jean Rodier's book of water analysis 9 th Ed. 22 The dependent variables are the contents of bicarbonate (HCO 3 − ) soluble in water. The independent variables are the physicochemical parameters as follows: conductivity, turbidity, pH, hardness, calcium, magnesium, chlorides, total alkalinity titer (TAC), organic matter, nitrogen dioxide, nitrates, sodium, sulphates, potassium, heavy metals (Mn 2+ , Fe 3+ , and Al + ), and dry residues.

Normalisation and data pre-processing
The primary purpose of the data transformation is to modify the distribution of input variables so that they can better match outputs. Before training and validation, we scaled the inputs and targets using a normalised equation where the data always fall within a specified range: The experimental data were normalised in the interval [−1,1] using the mapminmax function, 23 where x N is the data value after normalisation, x max and x min denote the maximum and minimum of the data, respectively, y max and y min are taken as −1 and 1; and x represents the original values.

Data modelling techniques
Data modelling was carried out by ANNs that made it possible to justify the predictive quality of the models, by applying the same techniques on a set of data relating to 70 % of the samples, chosen at random, among the totality of the samples, which constituted the group for learning a predictive model of the dependent variable. The remaining 30 % of the samples, which did not participate in model learning, were divided in two (15 % for the test, and 15 % for the validation) to examine the validity and performance of the prediction of these models. 24

Modelling performance
In the current study, we present the different statistical parameters used in order to evaluate the effectiveness of each network and its ability to make precise predictions. [25][26][27] The correlation coefficient (R), root mean square error (RMSE), mean square error (MSE), mean absolute error (MAE), standard preaching error (ESP), and error of prediction error model (EPM) were used to estimate the performance of the model. These statistical parameters were calculated using the following equations: 28,29 (2) N is the number of data; y exp and y pred are the experimental and the predicted values, respectively, and exp y and pred y are the average values of the experimental and the predicted values, respectively.

Results and discussion
Preliminary tests have shown that, in order to improve the performance of a model established by ANN, it is necessary to modify the architecture of the network, by playing mainly on the number of hidden layers, or on the number of hidden layers, hidden neurons and/or the number of training cycles (number of iterations). For this, we successively changed the number of hidden neurons (NH = 3-15). The results of these tests are shown in Table 1. It was observed that the minimum of the MSE, the number of iterations, and the maximum correlation coefficient had been reached when NH = 5. The optimal acknowledgements architecture obtained from ANN has a topology of 18-5-1 (Table 1), and its architecture is illustrated in Figs. 3 and 4. We could therefore choose five neurons of the hidden layer of the network in the present study, in order to predict the bicarbonate concentrations. Once the architecture of the neural network model is obtained, the model should be validated by comparing its results with the experimental data obtained in the laboratory. The results given in Fig. 7 show the performance and the efficiency of the developed ANN model. The relevance

Prediction performance
In order to test the precision of the previously developed and optimised ANN model, an interpolation was performed. For this purpose, a database was constructed containing a set of data points located in the middle between the experimental points of bicarbonate from drinking water in Médéa region. The results showing the regression bicarbonate between the predicted and experimental values, and the performance of the interpolation in terms of error and correlation coefficient are shown in Fig. 8.
Results of the prediction performances in terms of all errors and in terms of the agreement vector values [α = (slope), β (y intercept), R (correlation coefficient)] are summarised in Table 2.
These results show a good correlation between the predicted ANN and the experimental values with a high correlation coefficient (R = 0.99276). Fig. 9 again shows the performance of the ANN configuration [18-5-1] with a better superposition of the curves plotted experimentally and those predicted by the model deduced during our study. The results obtained by this study show a good agreement explained by a high coefficient of correlation, which has a value close to unity. In addition, the values of errors including (MSE, RMSE, ESP, EPM, MAE) are very low.

Residues study
Errors made by the artificial neuron networks model on the samples that were used in this study are named residues. 30 Thus, the study of the relationship between the estimated contents of bicarbonate by the mathematical models and their residues (y exp − y pred ) allowed us to ensure the performance of the model and verify them experimentally. Fig. 10 shows the relationships between the estimated levels of bicarbonate in the water of Médéa region and their residues obtained using neural networks (ANN), respectively. It shows that the residues obtained by neural networks are much less dispersed (closer to zero), and a clear improvement in the distribution of residues depending on the nature of the samples. This distribution proves the predictive power of this model established by the neural networks in the prediction of the contents of bicarbonate from the environmental parameters. In general, the result obtained is very satisfactory and justifies the use of the neural network approach in the prediction of bicarbonate contents in Médéa region.

Conclusion
In the present study, the prediction of bicarbonate concentrations in drinking water in the Médéa region was obtained using artificial neural networks (ANN) with supervised learning, involving the "Levenberg-Marquardt" algorithm. This algorithm gives better results in terms of speed, convergence, and generation of performance. The results showed a high learning and predictive capacity for bicarbonate concentrations with a very high correlation coefficient of 0.99276 and a very low mean square error (11.52613 mg dm −3 ) for the whole database. In addition, they showed a better choice of learning algorithm, activation functions, and network architecture [18-5-1] obtained by applying statistical indicators of robustness. The prediction by artificial neural networks also shows a good correlation between the experimental and predicted values, meaning that the ANN model has better predictive power. This performance seems to be due to the fact that the concentrations of bicarbonates in drinking water in Médéa region are linked to the physicochemical characteristics of the environment by non-linear relationships.
This encourages us to consider in future the development of other aspects of this study related to the use of more parameters and prediction of other properties.