Optimisation and Prediction of the Coagulant Dose for the Elimination of Organic Micropollutants Based on Turbidity

In this study, four different mathematical models were considered to predict the coagulant dose in view of turbidity removal: response surface methodology (RSM), artificial neural networks (ANN), support vector machine (SVM), and adaptive neuro-fuzzy inference system (ANFIS). The results showed that all models accurately fitted the experimental data, even if the ANN model was slightly above the other models. The SVM model led to almost similar results as the ANN model; the only difference was in the validation phase, since the correlation coefficient was very high and the statistical indicators were very low for the ANN model compared to the SVM model. However, from an economic point of view, the SVM model was more appropriate than the ANN model, since its number of parameters was 22, i.e., almost half the number of parameters of the ANN model (43 parameters), while the results were almost similar in all the data phase. To reduce the economic costs further, the RSM model can also be used, which remained very useful due to its high coefficients related to the number of parameters – only 13. In addition, the statistical indicators of the RSM model remained acceptable.


Introduction
Water treatment is essential owing to the unavailability of drinking water and its lack in developing countries in Africa, which is mainly due to human and industrial pollution. Water treatment requires specific attention to be in agreement with the standards of consumption. 1 It includes several methods, such as coagulation-flocculation, sedimentation or flotation, filtration and disinfection, etc. 2 Among these processes, coagulation-flocculation has been classified as the best available technology for large-scale drinking water production. 3 It has played and will continue to play an important role, directly or indirectly, in the control of particles, microorganisms, natural organic matter (NOM), synthetic organic carbon, disinfection by-product precursors (DBP), and certain inorganic ions and metals, and finally, in the control of drinking water quality. 4 Coagulation is a process of destabilisation of particles in water by neutralising the negative charge on the particle surface, followed by the agglomeration of the neutralised micro-flakes to form settleable and filterable macro-flocculants. 5,6 The coagulation process is primarily used to remove turbidity in a water treatment plant. 7 It is used in more than half of the water purification plants in Algeria. Coagulation efficiency depends on the water characteristics, such as colour, turbidity, temperature, pH, alkalinity, dissolved salts not least the influence of mineralisation. 8 It also depends on the treatment itself, such as the nature and dosage of coagulant, the conditions of agitation, as well as on the treatment conditions. 9 Jar tests are generally conducted to determine optimal coagulant doses, 10,11 with the goal of reducing costly chemical wastage and operating costs, and achieving drinking water quality objectives. 12 However, to date, this process poses a difficult control problem because it is non-linear and complex. Several studies have been conducted to apply different control strategies to the coagulation process. [13][14][15] In recent years, mathematical models have emerged as a viable method to model complex processes of water treatment. 16 These models have been very successful in modelling and predicting environmental parameters, and in predicting coagulant quantity as a function of many physicochemical parameters. 17 For example, a study was focused on the use of Moringa oleifera seed as a coagulant for the treatment of surface water using response surface methodology (RSM). 18 Four parameters were varied viz. stabilisation time, stirring time, stirring speed, and concentration of Moringa oleifera seed extract (MOSE).

Optimisation and Prediction of the Coagulant Dose for the Elimination of Organic Micropollutants Based on Turbidity
The model predicts the lowest turbidity of 5.49 NTU with optimal conditions of 120 min settling time, stirring speed of 100 rpm, 10 min stirring time, and 3 g l −1 concentration of MOSE. The condition was verified in aftershocks and a turbidity of 5.51 NTU was obtained. 18 Other works reported the effectiveness of three coagulants, CuSO 4 , FeCl 3 , and CuSO 4 + FeCl 3 (the two salts mixed in 1 : 1 ratio (v/v)) for the treatment of wastewater from the effluent treatment plant of an oil refinery by coagulation and flocculation process. 19 Independent parameters, such as pH and coagulant dosage, were optimised using the response surface methodology with central composite design technique considering final pH, COD reduction, turbidity, TDS, and colour as dependent variables. 19 Results showed that the mixed coagulant CuSO 4 + FeCl 3 gave better results than CuSO 4 and FeCl 3 individually. CuSO 4 + FeCl 3 along with adsorption and sweep flocculation, forms tribasic copper chloride (TBCC) as an intermediate in the pH range 5 to 7. TBCC formation improved flocculation by destabilising colloidal and suspended particles due to its structural catalytic, bleaching, and octahedral properties. A maximum reduction in COD (76.77 %), turbidity (89.47 %), TDS (94.16 %), and colour (95.29 %) were observed at pH 7.12 and dosage 0.20 g l −1 of CuSO 4 + FeCl 3 coagulant. The final pH of the solution under optimal conditions for all three coagulants was below the removal limits. 19 Another application by the response surface method (RSM) was to find the optimal combination of coagulant dose and pH relative to the highest removal efficiency of turbidity and dissolved organic carbon (COD). 20 The results obtained with polyaluminum chloride (PACl) were compared with those obtained using a conventional coagulant such as alum. Quadratic models developed for both responses (turbidity removal and DOC removal) indicated the optimal conditions for a PAC1 concentration of 0.11 mM at pH 7.4 and alum concentration of 0.15 mM at pH 6.6. The compromise to optimise both responses simultaneously resulted in 91.4 % turbidity removal and 31.2 % DOC removal using PACl, while 86.3 % turbidity and 34.3 % DOC. CODs were removed using alum. 20 Another study was conducted to predict both turbidity and removal of dissolved organic matter (DOM) during the coagulation process at Akron Water Treatment Plant (Akron, Ohio, USA) with four different neural network models. 21 DOM was monitored and characterised using fluorescence spectroscopy and parallel factor analysis (PARAFAC), building on previous research that identified three fluorescence components (C1, C2, and C3). Neural network models were constructed using operational data to predict each of the components of fluorescence and turbidity after coagulation as a function of variable raw water quality and chemical doses. Correlation coefficients between measured and model-predicted values for final turbidity models, C1, C2, and C3, on an invisible test data set were 0.91, 0.95, 0.97, and 0.51, respectively. 21 Other researchers used extreme machine learning (ELM) coupled with radial base function (RBF) neural networks to predict coagulation doses. 22 The coagulation data was divided into two categories based on low and high turbidity. The optimal number of input parameters for low turbidity water coagulation modelling was found to be 3, while the optimal number of input parameters for turbidity water coagulation modelling were found to be 4. Re-selection of the number of input parameters was necessary, since the alkalinity of the raw water was an important factor in improving the performance of the high turbidity model. The low turbidity model was able to predict coagulant dosage with a correlation coefficient greater than 0.97. 22 The high turbidity model was able to predict coagulation dosage with a reasonably acceptable correlation coefficient of at least 0.80. 22 Another study was performed by a hybrid of k-signifies an adaptive neuro-fuzzy inference system (k-means-ANFIS) for the turbidity of settled water prediction and optimal determination of the coagulant dose using historical data at large scale. 23 To construct a well adaptive model to different states of inflow water process, raw water quality data was classified into four groups according to its properties by a k-means clustering technique. The sub-models were developed individually based on each clustered data set. The results revealed that the sub-models constructed by a k-means-ANFIS hybrid perform better than only a single ANFIS model, but also seasonal artificial neural network (ANN) models. The finally completed model composed of sub-models shows more precise and consistent prediction capacity than a single ANFIS model and a single ANN model based on all five evaluation indices. 23 In another report, they used artificial neural network (ANN) and adaptive network-based fuzzy inference system (ANFIS) models to model the dosage of polyaluminum chloride (PAC) of surface water in northern Taiwan. 24 Each of them was built based on 819 data sets controlled by the processes. Input parameters included yesterday's PAC dosage, the day before yesterday's PAC dosage, and temperature, turbidity, colour, pH in each raw water, flocculation, sedimentation, and treated water. 24 The ANN model was better than the ANFIS model in achieving the optimal prediction model for the optimal dosage of PAC in real time when storm water brought high turbidity to the source water. 24 In this work, response surface methodology (RSM), artificial neural networks (ANN), machine vector support (SVM), and adaptive neuro-fuzzy inference system (ANFIS) were selected to predict coagulant doses as a function of physical and chemical parameters, and to compare them to select the most efficient one. Indeed, to our knowledge, the ANN, SVM, and ANFIS models have never been used in this kind of data-based study built from RSM. In addition, such a comparison has not been made so far.

Jar test
Aluminum salt was used in this study as coagulant. The jar tests were carried out using beakers that were filled with 1 l of water, namely raw waters collected in different localities of the Médéa region, with various turbidity levels, which were adjusted with humic acid. Considering that acidic and oligotrophic waters often have a brown to yellow "tea" tint, 25 in particular due to humic acids (humic acids are one of the most important fractions of humus, thanks to their hydrophilic functions of the carboxylic acid type), they can retain about fifteen times their weight in water, and thus play a fundamental role in water retention and the useful water reserve of a soil. 26 After having created turbidity with humic acid, the water takes on a "tea" hue, not to mention that the turbidity contains various mineral and organic matter in suspension or in solution. Dissolved and colloidal matter alone constitutes 60 to 80 % of the organic load (humic acids). If we manage to remove turbidity, we can say that we have removed organic and mineral matter (complexation), [27][28][29] because the removal of the colour created by humic acid means the removal of organic and dissolved colloidal organics and micropollutants. 30 The samples were mixed at 180 rpm for 2 min after addition of coagulant to provide mixing; the speed was then reduced to 40 rpm for 30 min to ensure flocculation. 31 The time and speed of mixing were set with an automatic controller. After 45 min of settling, the supernatant from each pot was removed from the port sample and analysed for pH, temperature, turbidity, conductivity, and total alkali metric titre (TAC). The pH, conductivity, temperature, and turbidity were measured. TAC was analysed immediately following the assay method described by Rodier. 32

Database
The data used in this study were those obtained from the analysis of samples purified with different doses of coagulant that were collected during several campaigns conducted in the Médéa region. The database that was built by RSM was used for the other models (ANN, SVM, and ANFIS), in order to be able to compare them.
The dependent variable was the coagulant dose. The independent variables were the physicochemical parameters: pH, temperature, conductivity, turbidity, and total alkalinity titre (TAC).

Prediction methods
Several methods have been applied to solve problems related to the prediction and modelling of complex non-linear systems. These methods are particularly useful when such systems are difficult to model using conventional methods. 33 In this study, four methods were tested for predicting coagulant doses from physicochemical water parameters, namely RSM, ANN, SVM, and ANFIS.
After constructing the database according to RSM, the database was normalised in the interval [−1,+1] by using the mapminmax function (Eq. 1) in MATLAB software. 34 It was also divided in two parts for the three other models (ANN, SVM, and ANFIS): 70 % of the dataset for training and the remaining 30 % of the samples, which did not participate in model learning, were used for validation and prediction performances of the models: 35,36 where x N is the data value after normalisation, x max and x min are the maximum and minimum of data, respectively; y max and y min are taken as −1 and 1; x represents the original value.

RSM
The principle of RSM has been described by Khuri and Cornell 20 as a set of mathematical and statistical methods for evaluating the relationships between a group of independent variables and one or more responses. 42 To obtain adequate and reliable measures of the responses of interest, the design of the experiment is necessary. Normally, the relationship between the response and the independent variables cannot be well modelled by a linear function.
A model that incorporates curvature is generally necessary to approximate the response in the region close to optimal, and in most cases a second-order model is adequate. 43 A central composite design (CCD) is considered, which is a very effective design tool for fitting second-order models. 43 This technique is generally used when there are several input parameters affecting the output (response). The CCD was selected for use in this study. 43 Two levels, a value of a = 1 and 5 repetitions at the centre point were considered. Finally, the input variables were coded in intervals [−1, +1] ( Table 1).
The number of experiments is determined according to Eq. (2) 42 : where k is the number of independent variables, 2 k is number of the factorial experiments, 2k is the number of axial experiments, and c is the number of experiments in the centre point. 44 The model used to predict the responses and describe the relationship between the independent variables was the second-order polynomial (Eq. (3)) 42 .
where, Y represents the response functions (in our case, it is force compression): B 0 is a constant coefficient; B i , B ii , and B ij are the coefficients of the linear, quadratic, and interactive terms, respectively. 45

Artificial neural network
In this work, the optimisation of the ANN architecture using MATLAB R2013a software is described by a flowchart shown in Fig. 1

Support vector machine
Support vector machine was developed by Vapnik in the 1990s, and is based on the statistical learning theory (SLT) tools and structural risk minimisation (SRM) theory. 48 It was mainly used for non-linear classification and regression analysis. 49 In this approach, a training data set of N points .., N. X represents the inputs of the model, and Y its output. A SVM model takes the form: where ϕ(.): R n → R m is a non-linear function that maps the finite dimensional space entry into a higher dimensional space that is implicitly created, ω is a weight vector, and b is the bias. 50,51 SVM problems are solved using quadratic programming techniques, which however show some drawbacks: 52 1. They are practically difficult to use. 2. They are time-consuming. 3. They require large memory and CPU time.
In this article, SVM was used for non-linear modelling using the different functions of the kernel, since there is no guarantee that a given kernel appears more efficient than another one regarding the specific data. From this, it is necessary to optimise different kernel functions and test their performances. These kernel functions can be found in the MATLAB Toolbox R2018a. 53 The kernel functions selected in this article were as follows: 53 Gaussian (7) with C = BOxConstraint, ε = epsilon, d = Polynomial order, and σ 2 = sigma, where d, c, ε, and σ 2 are user-defined kernel parameters. The ANFIS was developed by Jang. 54 The architecture of AN-FIS is shown in Fig. 3. It consists of six key components: input and output data, data division unit, algorithm unit, fuzzy generating system, fuzzy inference system, and adaptive network neuron representing the fuzzy system. After dividing the processing data, each set of training and validation data is assigned to the clusters formed by the algorithm. 23,24 Each algorithm is formed independently with training and validation together by ANFIS to obtain the optimal fuzzy inference system. 23 In addition, ANFIS is composed of 5 layers -each layer can include different nodes. 55 Layer 1: Each neuron calculates the degree to which X and Y inputs belong to the different fuzzy sets (Eqs. 8 and 9). 56 The parameters inherent to these sets are called premise network parameters. For this system, we have: for k = 1 and 2 (8) for k = 3 and 4 Layer 2: is used to calculate the degree of activation of the premises. Each neuron in this layer receives the outputs of the previous fuzzification neurons and calculates its activation. The conjunction of the antecedents is performed with the product operator, which uses the derivability constraint to deploy the learning algorithms. Each node performs a fuzzy T-norm. (10) or, Ant(2,k) indicating the antecedent nodes of node (2,k).
Layer 3: Each neuron calculates the normalised degree of truth of a given fuzzy rule. The value obtained represents the contribution of the fuzzy rule to the final result. [ ] where the parameters m k0 , m k1 and m k2 , are called neural fuzzy-system consequential parameters.
Layer 5: This is the output layer comprising a single neuron that provides the output of ANFIS by calculating the sum of the outputs of all output neurons.
[ ] 3 Results and discussion RSM, ANN, SVM, and ANFIS are the four approaches that were used for the prediction of coagulant dose from physicochemical water parameters. All four methods were performed, evaluated, and compared.

RSM
A statistical analysis by RSM was performed using the "JMP 13" software on the entire database software. This method allowed finding the mathematical relationship (Eq. (14)) between the coagulant doses and the independent variables that corresponded to the physicochemical parameters. Firstly, the equation took into account all 5 variables ( Table 2), even those that did not seem to have a significant impact on the dependent variable. Therefore, the relationship obtained was evaluated in order to keep only the independent variables characterised by a high probability value ( Table 2).
Next, only 13 independent variables having high explanatory power for the dependent variable were taken (P r < 0.05); so the relationship could be reduced to Eq. (15): (15) The value of the coefficient of determination decreased slightly, but the equation became simpler after removal of the variables with little explanatory power from the dependent variable. This coefficient R = 0.98691 means that the model's correlation was moderately positive (Fig. 5).
The probability (P r < 0001) was strictly less than 0.5 %, confirming that the model was significant.
The significance level value p and the F ratio value, which provide a measure of the statistical significance of the regression model, were also determined. A high value of F with a minimum value of p means that the equation is significant. 57 The proposed model can infer the effect of the factors (X i ), their interactions, and their quadratic effects simultaneously. Table 2 shows the effects of each independent factor, its interaction with the other factors, and its quadratic effect on the coagulant dose (Y).
Indeed, the coefficients of each factor in the model allow to assess the impact of each factor on the response. 58 Thus, it is evident that changes in turbidity and TAC in the treated effluent increased the dose of the coagulant (Fig. 4).
On the other hand, changes in pH and conductivity had a negative effect on the dose of coagulant (Fig. 4).

Fig. 4 -Effects of the operating factors and their interactions
The interactions and quadratic effects of the factors show that the relationship between the response and the factors is not always linear. 58 Used at different levels in the coagulation process and when several factors are modified simultaneously, one factor can produce different degrees of response. Thus, interactions between the factors (pHtemperature, pH -conductivity, pH -TAC and turbidity -TAC) showed positive effects on coagulant doses (Fig. 4), whereas the interactions between factors (temperatureconductivity and temperature -turbidity) had negative effects on coagulant doses (Fig. 4). In addition, the results showed two factors with interesting quadratic effects, namely, conductivity and TAC, indicating positive and negative quadratic effects, respectively (Fig. 4).
In light of the obtained results, it appears that with each increase in pH (7. The results of the RSM performance in terms of all errors and in terms of agreement vector values (R, slope: α, and intercept: β) are given in Table 3.

ANN
In this part, we optimised 3 activation functions with the number of hidden neurons (from 3 to 10). The results of these tests are presented in Table 4. Table 4 presents the best architectures found. It shows the correlation coefficients and the error for each learning and validation according to the number of neurons in the hidden layer and the network topology. It also shows the activation functions of the hidden layer and the output layer. Architecture 1 ( Table 4) seems to be the most relevant ANN model for predicting coagulant doses. Since the correlation coefficient and RMSE were almost equal between architectures 1 and 3, the lowest number of neurons of the hidden layer was chosen in order to have the lowest possible number of parameters.     the learning and validation phases, and the corresponding results are given in Fig. 6. The results were denormalised to actual values in view of comparison with the other models.
The results of the ANN performance in terms of all errors and in terms of agreement vector values (R, slope: α, and y-intercept: β) are given in Table 5.

SVM
In this part, we optimised 3 kernel functions, each kernel synthesised by parameters that must also be optimised: The linear function synthesised by BOxConstraint, epsilon and sigma; the Gaussian function synthesised by BOxConstraint, epsilon and sigma; the polynomial function synthesized by BOxConstraint, epsilon and PolynomialOrder, knowing that the PolynomialOrder was optimised from 2 to 5 .
Once the result of the learning phase was obtained, it was validated by the validation database. The obtained results (predicted values) were compared with the experimental values in the two stages (learning phase and validation phase) in order to obtain the correlation coefficient and RMSE. The results of these tests are shown in  Table 6. It should be noted that the results were denormalised to the actual values to compare to the other models.
It is obvious that the Gaussian function gives better results in terms of correlation coefficient and RMSE; this result has been graphically schematised in Fig. 7.
The results of the SVM performance in terms of all errors and in terms of agreement vector values (R, slope: α and intercept: β) are given in Table 7.

ANFIS
In this work, we optimised the weighting algorithms, the functions (trimf, trapmf, gbellmf, gaussmf, gauss2mf, pimf, dsigmf and psigmf) for the input and (constant and linear) for the output and many of the nodes for each input to obtain the most accurate result. It should be noted that these algorithms and functions could be found in Matlab R2013 toolbox .
As described previously, two steps were followed for the ANFIS modelling, where two sets of data were used, the training and verification data, respectively. Firstly, the train- ing was performed using the training dataset. Secondly, verification of the dataset was considered to check the accuracy and effectiveness of the ANFIS model.
Two criteria were adopted for optimising the ANFIS model: the number of membership functions assigned to the inputs and output and the RMSE value. In addition, the following four aspects were considered for the ANFIS learn-ing phase, which were the number of used membership functions, the membership functions type, the overfitting and the training options.
The results of these tests are presented in Table 8. The best results obtained are illustrated in Fig. 8 by denormalising the values to the actual values. The results of the ANFIS performances in terms of all errors and in terms of agreement vector values (R, slope: α, and intercept: β) are given in Table 9.

Comparison of the models
The comparison of the coefficients of correlation, determination, adjustment and statistical indicators (RMSE, MSE, ESP, EPM, and MAE) obtained by the four models, RSM, ANN, SVM and ANFIS, for all data is presented in Table 10. The coefficients calculated by the model ANN was slightly higher than those obtained by the SVM model, and significantly higher than those obtained by the RSM and the ANFIS models. These results were also confirmed by the statistical indicators, since those given by the ANN model were slightly lower than those obtained by the SVM model, followed by the RSM model and finally the ANFIS model.
The comparison of the different models is illustrated in Fig. 9, confirming the superiority of the ANN and SVM models over the other models (RSM, ANFIS) with a very good superposition of the experimental values and those given by the models. Despite the fact that the ANN and SVM models were almost equal, the ANN model was slightly superior. The correlation coefficient and the statistical indicators of the apprenticeship phase of the ANN model were almost equal to those given by the SVM model with slight superiority of the SVM model (Tables 5 and 7

Residues study
Residue analysis is an efficient way to reveal the performances of the optimised model. It consists of the measurement of the absolute or relative error between experimental and predicted values. 55 Residue analysis methods are mainly graphical analysis methods. Fig. 10 shows the residuals related to each model (RSM, ANN, SVM, and AN-FIS). This figure shows that the residuals obtained by the neural network method were less scattered (close to zero) compared to those obtained by the other models. Again, this result shows the accuracy and robustness of the ANN model compared to the other models, and justifies the use of the ANN approach in the prediction of coagulant doses.  from an economic point of view, only three models appeared relevant: the ANN model which involved 43, the SVM model which was almost similar to the ANN model in terms of coefficients and statistical indicators but involved only 22 parameters, and finally the RSM model, which is very useful in terms of reducing economic costs due to its high coefficient (R 2 adj = 0.96878) if related to the number of parameters, only 13. In addition, it led to acceptable statistical indicators. Contrarily, the ANFIS model was considered unacceptable, owing to the very high number of parameters, 309.