e-ISSN 2231-8526
ISSN 0128-7680
Olayemi Joshua Ibidoja, Fam Pei Shan, Mukhtar Eri Suheri, Jumat Sulaiman and Majid Khan Majahar Ali
Pertanika Journal of Science & Technology, Volume 31, Issue 6, October 2023
DOI: https://doi.org/10.47836/pjst.31.6.09
Keywords: Big data, drying, machine learning, seaweed, variable selection
Published on: 12 October 2023
The parameters that determine the removal of moisture content have become necessary in seaweed research as they can reduce cost and improve the quality and quantity of the seaweed. During the seaweed’s drying process, many drying parameters are involved, so it is hard to find a model that can determine the drying parameters. This study compares seaweed big data performance using machine learning algorithms. To achieve the objectives, four machine learning algorithms, such as bagging, boosting, support vector machine, and random forest, were used to determine the significant parameters from the data obtained from v-GHSD (v-Groove Hybrid Solar Drier). The mean absolute percentage error (MAPE) and coefficient of determination (R2) were used to assess the model. The importance of variable selection cannot be overstated in big data due to the large number of variables and parameters that exceed the number of observations. It will reduce the complexity of the model, avoid the curse of dimensionality, reduce cost, remove irrelevant variables, and increase precision. A total of 435 drying parameters determined the moisture content removal, and each algorithm was used to select 15, 25, 35 and 45 significant parameters. The MAPE and R-Square for the 45 highest variable importance for random forest are 2.13 and 0.9732, respectively. It performed best, with the lowest error and the highest R-square. These results show that random forest is the best algorithm to decide the vital drying parameters for removing moisture content.
Ali, M. K. M., Fudholi, A., Sulaiman, J., Muthuvalu, M. S., Ruslan, M. H., Yasir, S. M., & Hurtado, A. Q. (2017). Post-harvest handling of eucheumatoid seaweeds. In A. Q. Hurtado, A. T. Critchley & L. C. Neish (Eds.), Tropical Seaweed Farming Trends, Problems and Opportunities (pp. 131-145). Springer International Publishing. https://doi.org/10.1007/978-3-319-63498-2_8
Ali, M. K. M., Sulaiman, J., Yasir, S. M., Ruslan, M. H., Fudholi, A., Muthuvalu, M. S., & Ramu, V. (2017). Cubic spline as a powerful tools for processing experimental drying rate data of seaweed using solar drier. Article in Malaysian Journal of Mathematical Sciences, 11(S), 159-172.
Ali, M. K. M., Mukhtar, Ismail, M. T., Ferdinand, M. H., & Alimuddin. (2021). Machine learning-based variable selection: An evaluation of bagging and boosting. Turkish Journal of Computer and Mathematics Education, 12(13), 4343-4349.
Alsahaf, A., Petkov, N., Shenoy, V., & Azzopardi, G. (2022). A framework for feature selection through boosting. Expert Systems with Applications, 187, Article 115895. https://doi.org/10.1016/j.eswa.2021.115895
Arjasakusuma, S., Kusuma, S. S., & Phinn, S. (2020). Evaluating variable selection and machine learning algorithms for estimating forest heights by combining lidar and hyperspectral data. ISPRS International Journal of Geo-Information, 9(9), 1-26. https://doi.org/10.3390/ijgi9090507
Bajan, B., Mrówczyńska-Kamińska, A., & Poczta, W. (2020). Economic energy efficiency of food production systems. Energies, 13(21), 1-16. https://doi.org/10.3390/en13215826
Bixler, H. J., & Porse, H. (2011). A decade of change in the seaweed hydrocolloids industry. Journal of Applied Phycology, 23(3), 321-335. https://doi.org/10.1007/s10811-010-9529-3
Chen, R. C., Dewi, C., Huang, S. W., & Caraka, R. E. (2020). Selecting critical features for data classification based on machine learning methods. Journal of Big Data, 7(1), 1-26. https://doi.org/10.1186/s40537-020-00327-4
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, Article e623. https://doi.org/10.7717/peerj-cs.623
Chowdhury, M. Z. I., & Turin, T. C. (2020). Variable selection strategies and its importance in clinical prediction modelling. Family Medicine and Community Health, 8(1), Article e000262. https://doi.org/10.1136/fmch-2019-000262
Cole, M. B., Augustin, M. A., Robertson, M. J., & Manners, J. M. (2018). The science of food security. Npj Science of Food, 2(1), 1-8. https://doi.org/10.1038/s41538-018-0021-9
Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273-297.
Drobnič, F., Kos, A., & Pustišek, M. (2020). On the interpretability of machine learning models and experimental feature selection in case of multicollinear data. Electronics, 9(5), Article 761. https://doi.org/10.3390/electronics9050761
Echave, J., Otero, P., Garcia-Oliveira, P., Munekata, P. E. S., Pateiro, M., Lorenzo, J. M., Simal-Gandara, J., & Prieto, M. A. (2022). Seaweed-derived proteins and peptides: Promising marine bioactives. Antioxidants, 11(1), 1-26. https://doi.org/10.3390/antiox11010176
Freund, R. M., Grigas, P., & Mazumder, R. (2017). A new perspective on boosting in linear regression via subgradient optimization and relatives. Annals of Statistics, 45(6), 2328-2364. https://doi.org/10.1214/16-AOS1505
Friedman, J. H. (2001). Greedy Function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189-1232.
Georganos, S., Grippa, T., Niang Gadiaga, A., Linard, C., Lennert, M., Vanhuysse, S., Mboga, N., Wolff, E., & Kalogirou, S. (2021). Geographical random forests: A spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 36(2), 121-136. https://doi.org/10.1080/10106049.2019.1595177
Gouda, S. G., Hussein, Z., Luo, S., & Yuan, Q. (2019). Model selection for accurate daily global solar radiation prediction in China. Journal of Cleaner Production, 221, 132-144. https://doi.org/10.1016/j.jclepro.2019.02.211
Gunn, H. J., Rezvan, P. H., Fernández, M. I., & Comulada, W. S. (2022). How to apply variable selection machine learning algorithms with multiply imputed data: A missing discussion. Psychological Methods, 28(2), 452-471. https://doi.org/10.1037/met0000478
Ibidoja, O. J., Ajare, E. O., & Jolayemi, E. T. (2016). Reliability measures of academic performance. International Journal of Science for Global Sustainability, 2(4), 59-64.
Javaid, A., Ismail, M. T., & Ali, M. K. M. (2020). Comparison of sparse and robust regression techniques in efficient model selection for moisture ratio removal of seaweed using solar drier. Pertanika Journal of Science and Technology, 28(2), 609-625.
Javaid, A., Muthuvalu, M. S., Sulaiman, J., Ismail, M. T., & Ali, M. K. M. (2019). Forecast the moisture ratio removal during seaweed drying process using solar drier. AIP Conference Proceedings, 2184, Article 050016. https://doi.org/10.1063/1.5136404
Jierula, A., Wang, S., Oh, T. M., & Wang, P. (2021). Study on accuracy metrics for evaluating the predictions of damage locations in deep piles using artificial neural networks with acoustic emission data. Applied Sciences, 11(5), 1-21. https://doi.org/10.3390/app11052314
Kabari, L. G., Onwuka, U., & Onwuka, U. C. (2019). Comparison of bagging and voting ensemble machine learning algorithm as a classifier. International Journal of Computer Science and Software Engineering, 9(3), 19-23.
Kaneko, H. (2021). Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables. Heliyon, 7(6), 1-12. https://doi.org/10.1016/j.heliyon.2021.e07356
Kim, S., & Kim, H. (2016). A new metric of absolute percentage error for intermittent demand forecasts. International Journal of Forecasting, 32(3), 669-679. https://doi.org/10.1016/J.IJFORECAST.2015.12.003
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to classify, detect, and manage univariate and multivariate outliers, with emphasis on pre-registration. International Review of Social Psychology, 32(1), 1-10. https://doi.org/10.5334/irsp.289
Lim, H. Y., Fam, P. S., Javaid, A., & Ali, M. K. M. (2020). Ridge regression as efficient model selection and forecasting of fish drying using v-groove hybrid solar drier. Pertanika Journal of Science and Technology, 28(4), 1179-1202. https://doi.org/10.47836/pjst.28.4.04
Liu, C., Tang, F., & Bak, C. L. (2018). An accurate online dynamic security assessment scheme based on random forest. Energies, 11(7), Article 1914. https://doi.org/10.3390/en11071914
Meyer, H., Reudenbach, C., Wöllauer, S., & Nauss, T. (2019). Importance of spatial predictor variable selection in machine learning applications - Moving from data reproduction to spatial prediction. Ecological Modelling, 411, Article 108815. https://doi.org/10.1016/j.ecolmodel.2019.108815
Namana, M. S. K., Rathnala, P., Sura, S. R., Patnaik, P., Rao, G. N., & Naidu, P. V. (2022). Internet of things for smart agriculture - State of the art and challenges. Ecological Engineering and Environmental Technology, 23(6), 147-160. https://doi.org/10.12912/27197050/152916
Nuroğlu, E., Öz, E., Bakırdere, S., Bursalıoğlu, E. O., Kavanoz, H. B., & İçelli, O. (2019). Evaluation of magnetic field assisted sun drying of food samples on drying time and mycotoxin production. Innovative Food Science and Emerging Technologies, 52, 237-243. https://doi.org/10.1016/j.ifset.2019.01.004
Pradhan, B., Bhuyan, P. P., Patra, S., Nayak, R., Behera, P. K., Behera, C., Behera, A. K., Ki, J. S., & Jena, M. (2022). Beneficial effects of seaweeds and seaweed-derived bioactive compounds: Current evidence and future prospective. Biocatalysis and Agricultural Biotechnology, 39, Article 102242. https://doi.org/10.1016/j.bcab.2021.102242
Prosekov, A. Y., & Ivanova, S. A. (2018). Food security: The challenge of the present. Geoforum, 91, 73-77. https://doi.org/10.1016/j.geoforum.2018.02.030
Rahimi, P., Islam, M. S., Duarte, P. M., Tazerji, S. S., Sobur, M. A., el Zowalaty, M. E., Ashour, H. M., & Rahman, M. T. (2022). Impact of the COVID-19 pandemic on food production and animal health. Trends in Food Science and Technology, 121, 105-113. https://doi.org/10.1016/j.tifs.2021.12.003
Rahman, S., Irfan, M., Raza, M., Ghori, K. M., Yaqoob, S., & Awais, M. (2020). Performance analysis of boosting classifiers in recognizing activities of daily living. International Journal of Environmental Research and Public Health, 17(3), Article 1082. https://doi.org/10.3390/ijerph17031082
Rajarathinam, A., & Vinoth, B. (2014). Outlier detection in simple linear regression models and robust regression-A case study on wheat production data. International Journal of Scientific Research, 3(2), 531-536.
Rashidi, H. H., Tran, N. K., Betts, E. V., Howell, L. P., & Green, R. (2019). Artificial intelligence and machine learning in pathology: The present landscape of supervised methods. Academic Pathology, 6, 1-17. https://doi.org/10.1177/2374289519873088
Safronova, O. V., Polyakova, E. D., Evdokimova, O. V., Demina, E. N., Lazareva, T. N., & Petrova, O. A. (2022). Development of sustainable systems of food production using spirulina platensis dairy technology as a functional filler. IOP Conference Series: Earth and Environmental Science, 981(2), Article 022074. https://doi.org/10.1088/1755-1315/981/2/022074
Solyali, D. (2020). A comparative analysis of machine learning approaches for short-/long-term electricity load forecasting in Cyprus. Sustainability, 12(9), Article 3612. https://doi.org/10.3390/SU12093612
Ssemwanga, M., Makule, E., & Kayondo, S. I. (2020). Performance analysis of an improved solar dryer integrated with multiple metallic solar concentrators for drying fruits. Solar Energy, 204, 419-428. https://doi.org/10.1016/j.solener.2020.04.065
Sumari, A. D. W., Charlinawati, D. S., & Ariyanto, Y. (2021). A simple approach using statistical-based machine learning to predict the weapon system operational readiness. Proceedings of the International Conference on Data Science and Official Statistics, 2021(1), 343-351. https://doi.org/10.34123/icdsos.v2021i1.58
Yang, W., Yuan, T., & Wang, L. (2020). Micro-blog sentiment classification method based on the personality and bagging algorithm. Future Internet, 12(4), Article 75. https://doi.org/10.3390/fi12040075
ISSN 0128-7680
e-ISSN 2231-8526