Multicomponent Adsorption Capacity Forecasting Based on Support Vector Machine with Dragonfly Algorithm

The predictability of the adsorption capacity of the multicomponent adsorption system was modelled using Support Vector Machine (SVM). Two SVM models were built and compared. In the first model, the SVM method was used with an already built-in optimisation algorithm. However, in the second model, the SVM method was used by means of a very recent and efficient optimisation algorithm: the Dragonfly Algorithm (DA). The models’ accuracy was evaluated by three well-established statistical metrics (root mean squared error RMSE, determination coefficient R 2 , and correlation coefficient R ). The used data were collected from previous experimental papers published in literature containing all kinds of pollutants, such as heavy metal ions, dyes, and organic compounds, and different natural/ synthetic adsorbents. The dataset contained five important variables with 1023 points; the variables were divided into four in - puts (molecular weight, equilibrium concentrations of adsorbate, specific area of adsorbent, and temperature), and one output (adsorption capacity at equilibrium). The data were divided using the holdout function into two subsets (80 % for training set, and 20 % for test set). The programming stage was carried out using MATLAB software. The results showed that the optimised DA-SVM model with RBF-Gaussian kernel function had good ability for global search combined with high prediction accuracy, with R 2 = 0.997, R = 0.998, and RMSE = 2.539. The obtained model can be used to predict the efficiency of the adsorption system, and provides a tool for process optimisa - tion responding to changes in operating conditions. A new graphical user interface (GUI) was developed with MATLAB GUI to estimate accurately the desired responses by using the best DA-SVM model.


Introduction
Several separation techniques have been used to study the removal of different pollutants from water , among these methods being multicomponent adsorption. 1 This latest technique has attracted the interest of many industrial sectors because of its simplicity, high efficiency, and cost-effectiveness. [2][3][4][5][6] Multicomponent adsorption equilibrium is complex due to the nonlinear relationships between dependent variables and the nature of the interactions between the adsorbent and the adsorbate (synergism, antagonism or non-interaction). 7 The chemical species are adsorbed at the same time with a different degree of competition, depending on several thermo-physicochemical and morphological parameters 8 In this sense, different theoretical and empirical models (kinetics such as pseudo-first-order, pseudo-second-order, liquid film diffusion and isotherms such as competitive Langmuir, Freundlich, Temkin, Dubinin-Radushkevich and Elovich) 9 have been proposed in literature to model this phenomenon. However, the application of these models is limited because they are established based on some restrictive assumptions about the physicochemi-cal nature that affects the adsorption system. 10 Due to this complexity, machine learning algorithms have emerged as a powerful tool, compared to other classical methods, to tackle the nonlinear relationships directly from samples with no previous knowledge of the chemical or physical nature that affects the system. [10][11][12][13] Different machine learning algorithms were used in the literature as an advanced mathematical tool to model the adsorption capacity of single and multicomponent adsorption systems, such as: artificial neural network (ANN), 3,7,11-15 support vector machine (SVM). 12,[15][16][17] The SVM method can overcome some disadvantages of the ANN model, such as robustness, and avoid the result of falling into local optimum. 18 However, the SVM's parameters are tuned using a built-in optimisers, which are generally selected by trial and error method. 18 This method causes troublesome prediction and large error. 18 Therefore, many methods have been proposed for improving parameter optimisation. Published in literature is a new, simple, and effective optimisation method called dragonfly, 19 which combines the SVM's parameters as the solution position of DA, and keeps the computing performance of the algorithm as the current fitness value of DA, and then iterates through those parameters to obtain the optimal location of the dragonfly, i.e., the best parameters of the SVM. 18

Multicomponent Adsorption Capacity Forecasting Based on Support Vector Machine with Dragonfly Algorithm
To the best our knowledge, no research has been published on the development of only one model capable of predicting the adsorption capacity based on any set of pollutants (heavy metals, dyes, pharmaceuticals, hydrocarbons, organic, and inorganic matter).
Therefore, the aim of this study was to use a novel hybrid DA-SVM algorithm for hyperparameters optimisation, to obtain the best prediction model in terms of statistical metrics for modelling the multicomponent adsorption capacity at equilibrium; this model can predict the adsorption capacity of ternary adsorption systems of different types of pollutants.

Dataset selection
The experimental data of this study were collected from a large body scientific literature. Table 1 represents the affecting parameters. The literature and the number of data used for each system are given in Table 2. The dataset was collected from various published papers; each paper contained different ternary competitive adsorption systems on different adsorbents. The published papers reported single or competitive adsorption like in binary or ternary systems. In this work, we collected only the ternary competitive adsorption from graphs using digitiser software. The three metals or antibiotics shown in Table 2 represent the three pollutants to be removed from the aqueous solution.

SVM modelling approach
The SVM algorithm was first proposed by Cortes and Vapnik in 1995. 53 The SVM algorithm presents some advantages as follows: • It can model complex nonlinear behaviours, • Potential for implementation in regression, • It can deal with missing data, etc.
The SVM algorithm uses the trial and error optimisation method to tune its parameters. Therefore, some disadvantages can be encountered, such as low efficiency, accuracy, and speed of calculation. To overcome these problems, recent optimisation methods have been used to tune the SVM parameters, namely, Genetic algorithm (GA), Particle swarm optimisation (PSO), 54 Cuckoo optimisation algorithm (COA), Artificial Bee Colony (ABC), Simulated annealing (SA), 55 Ant Colony Optimisation (ACO), 56 Grid search, Firefly algorithm (FFA), and Dragonfly algorithm (DA). 57 A detailed theory of SVM has been explained by numerous researchers. 58 The output expression of the SVM model can be written by Eq. (1): where w and b are the weight and bias vector, respectively, ∅(x) represents the nonlinear mapping transfer function, which maps x into higher dimensional feature space.
To obtain w, it is compulsory to tune the following regularised function, which can be expressed as in expression (2), with the constraint of expressions (3)-(4) and Eq. (5): ψ is equivalent to the function approximation accuracy placed on the training data samples. 15 C is the capacity parameter, ξ i and ξ i * represent the positive slack variables. 15 K is a kernel function defined by an inner product of the nonlinear transfer function; the most used kernel function is the Gaussian Radial Basis function (RBF), which is given by Eq. (6): where σ is the width of RBF function.
The dragonfly algorithm is a nature-inspired algorithm that can overcome all shortcomings presented by other optimisation algorithms in the light of high efficiency, rapid convergence, less computational complexity, and better capability of determining the global optima.
There is a connection between the accuracy of the SVM model and C, ε, and σ. The GWO has been employed in this study to tune the aforementioned three SVM parameters. Fig. 1 illustrates the flowchart of the hybrid DA-SVM methodology model to be implemented in MATLAB software.

Statistical evaluation and uncertainty in models
The performance of the best model was measured using three statistical metrics: R, R 2 , and RMSE. These metrics can be expressed as follows: ( ) where exp e,i q and cal e,i q are the experimental and the calculated adsorption capacities. Coefficients R and R 2 are also employed to confirm the accuracy of the best developed models, the R 2 can be expressed by the following equation: 59 where exp e q is the mean of the experimental values and n is number of data sample. Table 3 represents the performances of each model when modelling the adsorption phenomena in terms of RMSE, R, and R 2 . Table 3 presents the values of the DA-SVM model performance, the correlation coefficient, and the root mean square error.  Table 4. The RBF-Gaussian kernel function gave the best results, yielding the lowest error and highest correlation coefficient among all kernel functions.
Scatter plots show the comparison between measured and predicted adsorption capacity values obtained using the best model for the training, whereas testing dataset is shown in Fig. 2. The correlation and the determination coefficient were found to be, respectively, 0.998 and 0.997 for the DA-SVR model using RBF-Gaussian as the kernel function. Hence, the obtained DA-SVM with RBF-Gaussian kernel function is statistically significant. The MSE was used as a fitness function to evaluate how close the solution was to the optimal solution.   high R 2 values of 0.997 for the three outputs. The results confirmed the high prediction ability of the developed model, and the possibility of being integrated in water treatment and purification unit.

MATLAB-based GUI
To provide a simple and convivial use of the optimal DA-SVR model in estimating the target values and avoiding dealing with the complex architecture of the model with many weights and biases, a practical GUI was designed with MATLAB R2018a GUI Toolbox. The constructing procedure of the GUI is presented in Fig. 6.
The GUI presented in Fig. 7 was built using the optimised model with high accuracy; the input parameters are written directly in the related textboxes whose descriptions are stated above the boxes. The inputs are then used with weights and biases of the optimised model to estimate the adsorption capacity of ternary systems as outputs in the range of the used inputs.

Conclusions
The The developed DA-SVM model can be used to predict the adsorption capacity of any ternary pollutant system (heavy metals, dyes, pharmaceuticals, hydrocarbons, organic, and inorganic matter) involved in water treatment and purification. In order to simplify the use of the obtained model, a single MATLAB-based GUI was developed based on best model's parameters that could be used for response prediction with high accuracy.