Soft Computing Prediction of Oil Extraction from Huracrepitan Seeds

This study analyses the extraction process parameters of huracrepitan seed oil using the Adaptive Neuro-Fuzzy Inference System (ANFIS) and Artificial Neural Network (ANN). The experiments were conducted at temperature (60–80 °C), time (4–6 h), and solute/solvent ratio (0.05–0.10) with output parameter as oil yield. Sensitivity analysis shows that temperature and time had the most significant effect on the oil yield. The oil yield estimation performance indicators are: ANN (R2 = 0.999, MSE = 5.63192E-13), ANFIS (R2 = 0.36945, MSE = 0.42331). The results show that ANN gave a better prediction than ANFIS.


Introduction
Hura crepitans (sand box tree) is a perennial crop of the family Euphorbiaceae. It is one of the tropical crops grown in North and South America, also common to Africa especially Nigeria where it is abundant. 1 Oil extracted from its seeds contains a volatile colourless liquid called "Hurin", though a vegetable oil, it is poisonous if ingested, hence the oil remains underutilised. 2 Recent researches have shown that huracrepitan seed oil has been utilised as an industrial raw material in pharmaceuticals, as well as a prospect for biodiesel production. 3 Muhammed et al. 4 analysed the fatty acid composition (wt%) of huracrepitan seed oils, from the assessment, it indicates that the oil is a good base material for the plastic and paint industries.
Previous works have been reported on the extraction of oil from huracrepitan seed. [4][5][6] Optimisation of the process parameters for the extraction of oil from Huracrepitan seed using response surface technique has been reported by Oniya et al., 2 but its comparison with other predictive models and analytically establishing a non-linear relationship between input and output variable of the extraction process have not been reported.
Nowadays, researchers have asserted the potential of soft computing models such as Artificial Neural Networks (ANNs) and Adaptive Neuro-fuzzy Inference System (ANFIS). 7,8 ANN is a modelling tool inspired by biological neural networks, while ANFIS integrates the strengths of ANN and FL (fuzzy logic). The use of ANN and ANFIS has gained increasing applications where the dependency between dependent and independent variables is either unknown or very complex. 9 They provide more accurate results for process control of a complicated system than conventional mathematical models. [10][11][12] Moreover, a limited number of researchers have modelled the extraction of oil from plant seeds using soft computing techniques.
Onoji et al. 13 compared RSM and ANN in modelling and optimising the rubber seed oil extraction process, both models were effective in describing the parametric effect of the considered operating variables on the extraction, however, ANN described the effect more accurately than RSM model. Olajide et al. 14 applied RSM and ANFIS to optimise oil yield from shea kernels in a hydraulic press, considering moisture content, heating temperature, heating time, applied pressure, and pressing time on oil yield; RSM gave a better prediction performance having R 2 of 0.9998, while ANFIS had R 2 of 0.9865. Hence both models gave a good prediction. Eletta et al. 15 modelled and optimised oil extraction from Luffa cylindrica seeds using a binary solvent mixture , the predicted oil yield values from ANN model was more accurate than that of RSM when compared with experimental values. Therefore, this study is aimed at filling the gap in scientific research by predicting oil yield from huracrepitan seed using ANN and ANFIS.
2 Materials and method 2.1 Sandbox (Hura crepitans) seeds Sandbox (Hura crepitans) seeds/fruits were collected from different locations in Nigeria (Auchi in Edo State and Umudike, Abia State). Dried seeds were collected from under the tree and in some cases, the matured and dried fruits were harvested from a tree in bulk quantity. The seeds were removed naturally by sun-drying, which makes the pods break on their own. The seeds were sundried for one week, and subsequently, oven-dried 8 h at a temperature of 60 °C until constant weight to minimise moisture content before extraction. 16 The extraction of the oil was carried out at the analysis laboratory of the Department of Chemical Engineering, Michael Okpara University of Agriculture, Umudike, Abia State, Nigeria, using the solvent extraction method. It was done with a Soxhlet apparatus of 250 cm 3 capacity using n-hexane of analytical grade as the solvent. The parameters were 10-20 g Hura crepitans (sand box) seed, extraction time of 4-6 h, and extraction temperature in the range of 60 °C and 80 °C, and the ratio of the solute (biomass) to solvent (n-hexane) from 0.05 to 0.10. The solvent used was recovered at every interval through a distillation process or the use of a rotary evaporator, and the actual oil obtained was weighed, the experiment was repeated for other parameters, and the percentage yield was calculated using Eq. (1) below: (1) where: Y = oil yield (%); M o = mass of oil extracted (g); and M s = mass of Hura crepitans seed (g).

Experimental design
A Box Behnken experimental design using response surface methodology developed in Design expert version 6.0.8, consisting of three factors and three levels basis, which generated 17 experimental runs, was employed in the process. The three independent factors considered during this study included extraction time, extraction temperature, and seed/solvent ratio; the response is oil yield determined using Eq. 1. The design data coding is presented in Table 1. The data generated from the experimental runs were used for ANN and ANFIS simulation. The multiple regression analysis gave a second-order polynomial equation. The quadratic model developed depicts the interaction between the oil yield (Y) and the coded values of the independent variables A, B, and C (temperature, time, and seed/solvent ratio). (2) where Y = oil yield (%).

ANN model development
Artificial neural network (ANN) architecture was developed in MATLAB 8.4 (R2014b) software environment where the training, validation, and testing of the ANN model was carried out. The three-layer ANN, (Fig. 1), comprised a tangent sigmoid transfer function (tansig) at hidden layer, a linear transfer function (purelin) at the output layer, and Levenberg-Marquardt back-propagation algorithm with 1000 iterations. The input layer corresponded to the three experimental parameters, which were solute/solvent ratio (g mol −1 ), temperature (°C), and time (h). The output layer was oil yield. All the data derived from the convective extraction of oil were randomly divided into three groups (training, validation, and testing) with a ratio of 70 %, 15 %, and 15 %, respectively. In this study, ten neurons were used as a default testing to determine the perfect algorithm for the prediction. One to fifteen neurons in the hidden layer and one neuron in the output layer were applied, and the data used were obtained from the multiple factors at a time experiment.
where A, B are linguistic terms that are user-defined and represent a range of values. The sequence and functions of the layers are as follows: Layer 1: Square node equipped with a node function: Assuming x and y are the two typical input values fed at the two input nodes, which then transform those values to the membership functions such as triangle, generalised bell-shaped, Gaussian membership, etc., where, is the membership function of A i and x is the input parameter to the node. A i is the linguistic label connected with the node function.
Layer 2: This node multiplies the incoming signal and sends the product out. Each node output is the firing strength of a rule:

Performance of developed ANN and ANFIS techniques
To measure the efficiency and performance of the models developed for oil yield, different types of statistical parameters are used to estimate the generalisation error. In the present work, R 2 , RMSE, and MSE were used, as shown in Eqs. 5 to 7. The value of RMSE and MSE close to zero and the R 2 value (correlation coefficient) close to one shows the degree of predictability and reliability of the model. 19 (5) 3 Results and discussion

Physicochemical properties analyses of the extracted HCSO
The physicochemical properties of the extracted huracrepitan seed oil were analysed and most of the properties conformed to the ASTM standards, the degree of unsaturated triglycerides present in the oil was 46.50 I 2 g −1 , hence, the oil can be modified to biobased resins for plastic and paint industries.

ANN model simulation
Eleven backpropagation (BP) algorithms were compared to select the best suited BP algorithm. For all BP algorithms, a three-layer ANN with a tangent sigmoid transfer function (tansig) at the hidden layer and linear transfer function (purelin) at the output layer was used. Ten neurons were used in the hidden layer for all BP algorithms, the benchmark comparison displayed loss on the optimality of the estimates/results produced by some BP training algorithms. As shown in Table 3, the Bayesian regularisation was found as the best of all 11 BP having the smallest MSE of 1.01810E-11. However, the traingd produced the greatest error of 321.1121. The loss on the optimality of the estimates/results produced by some BP training algorithms can be attributed to the combinatorial nature and non-linear structure of the experimental data. Hence, the complexity analysis of the problem was validated by the results of the various training algorithms used in the benchmark comparison.

Optimisation of the ANN structure
The optimal architecture of the ANN model and its parameter variation were determined based on the minimal value of the MSE of the training and prediction set from the Bayesian regularisation algorithm. In optimisation of the network, one neuron was used in the hidden layer as an initial guess. With an increase in the number of neurons, the network gave several local minimum values and different MSE values were obtained for the training set. Table 4 shows that the minimum MSE (5.63192E-13) was attained with 12 neurons for oil yield. With a further increase in the number of neurons beyond the point, the minimum MSE was attained for each of the outputs, had the best correlation coefficient, R 2 , which is closest to one. The ANFIS models using different input variable combinations were investigated with exhaustive search method to determine the input variable that has the greatest effect on the extraction using RMSE as the performance indicator. Table 5 shows exhaustive ANFIS model result with a single input variable, and it was observed that temperature possessed the least RMSE; this indicated that this input variable was the most relevant variable to the response. Also, it can be seen from Table 6, that temperature and time were the best two input variables that mostly affected the oil yield. However, the two inputs were considered for FIS structure.
To obtain the best prediction of huracrepitan seed oil extraction, the developed ANFIS structure was simulated at various input membership functions (mf), such as gauss mf, gauss2 mf, gbell mf, tri mf, trap mf, psig mf, and dsig mf. The correlation coefficient (R 2 ) and the root mean square error (RMSE) were used as the statistical criteria to evaluate the degree of reliability of the network.

Prediction efficiency of ANFIS model for oil yield
Tables 7 and 8 summarise the different input membership function type for linear and constant output MF. The tables also give the RMSE computed for all the model structures considered, on the training and the corresponding correlation coefficient (R 2 ) between the measured and computed output in which RMSE and R 2 are the statistical criteria to judge the performance of the model. The effects of different input membership functions (MF), such as gbell, gauss, gauss2, trap, pi, dsig, and psig on oil yield were tested and verified with a single output MF type "linear" and "constant" by training to determine the best input MF.
It was observed that MSE ranged from 0.42385 to 0.48885 with corresponding R 2 ranging from 0.3429 to 0.34383, as depicted in Table 7. The ranges of R 2 values are not close to 1 and MSE value of tri membership function is the closest to zero for the linear output membership function, the pi had the best prediction for R 2 . For the constant output membership function, the gauss2 mf had the best prediction for R 2 , while the tri mf had the lowest MSE value. It was observed that the MSE ranged from 0.69012 to 2.103 with corresponding R 2 ranging 0.35199-0.36945, as depicted in Table 8.
The R 2 values are closer to 1 than the linear output membership function, while MSE values are close to zero, but that of linear output membership function is closer. From a statistical performance point of view, the prediction of oil yield for huracrepitan seed is poor compared to the other existing investigation on ANFIS modelling. 20-22

Prediction model comparison
The evaluation of the predictive capabilities of ANN and ANFIS for oil yield was assessed using statistical parameters such as MSE (mean square error) and R 2 (correlation coefficient). The results showed that the Bayesian regularisation algorithm optimised at 12 neurons yielded the best prediction of 0.9999 (R 2 ) and 5.63192E-13(MSE) for ANN model, while gauss2 mf had the best R 2 prediction of 0.36945 and tri mf had the lowest (MSE) of 0.42331 for ANFIS model, respectively. These statistical comparison results suggest that ANN performed better than ANFIS model.

Conclusions
This study provided some insight into the effects of process conditions variation of temperature, time, and solute/sol-vent ratio on huracrepitan seed oil extraction. The results obtained from the physicochemical properties of the oil revealed potential to serve as a raw material for the synthesis of bio-based materials/biopolymers. This deduced inference could be attributed to the high iodine value obtained for the oil. The parametric analysis using exhaustive search showed that temperature had the highest influence on the oil yield. The work also showed that the combination of temperature and time had the most significant effect on the oil yield. The statistical comparison for ANN and AN-FIS results suggest that ANN predicted oil extraction from huracrepitan seed better than ANFIS.