e-ISSN 2231-8526
ISSN 0128-7680
Zuraira Libasin, Wan Suhailah Wan Mohamed Fauzi, Ahmad Zia ul-Saufie, Nur Azimah Idris and Noor Azizah Mazeni
Pertanika Journal of Science & Technology, Volume 29, Issue 4, October 2021
DOI: https://doi.org/10.47836/pjst.29.4.46
Keywords: Air pollution, imputation, linear interpolation, missing data, performance indicator
Published on: 29 October 2021
The missing value in the dataset has always been the critical issue of accurate prediction. It may lead to a misleading understanding of the scenario of air pollution. There might only be a small number of missing (5% to 10%) answers to each problem, but the missing details may vary. This research is focused mainly on solving long gap missing data. Single missing value imputation means replacing blank space in the monitoring dataset from chosen Department of Environment (DoE) monitoring station with the calculated value from the best technique for long gap hours. The variable that is mainly being a monitor is PM10. The technique focused on this research is the single imputation technique. Furthermore, this technique was tested on the Tanjung Malim monitoring station dataset by fitting with five performance indicators. The result was compared with the previous study, whether it is the best used for long gap hour data. Four stages need to be followed to complete this research. The steps are data acquisitions, characteristic analysis of missing value, single imputation approach, verification of approach and suggestion of the best technique. This research used four existing imputation techniques: series mean (SM), mean of nearby points (MNP), linear trend (LT), and linear interpolation (LIN). This research shows that the interpolation technique is the best technique to apply particulate matter missing data replacement with the least mean absolute error and better performance accuracy.
Ali, S., & Dacey, S. (2017). Technical review: performance of existing imputation methods for missing data in SVM ensemble creation. International Journal of Data Mining & Knowledge Management Process (IJDKP), 7(6), 75-91. https://doi.org/10.5121/ijdkp.2017.7606
Anh, N. T. N., Kim, S. H., Yang, H. J., & Kim, S. H. (2011). Hidden dynamic learning for long-interval consecutive missing values reconstruction in EEG time series. In 2011 IEEE International Conference on Granular Computing (pp. 653-658). IEEE Publishing. https://doi.org 10.1109/grc.2011.6122674
Cokluk, O., & Kayri, M. (2011). The effects of methods of imputation for missing values on the validity and reliability of scales. Educational Sciences: Theory and Practice, 11(1), 303-309.
De Leeuw, J., & Meijer, E. (2008). Introduction to multilevel analysis. In Handbook of multilevel analysis (pp. 1-75). Springer. https://doi.org/10.1007/978-0-387-73186-5_1
Department of Environment. (2018). Malaysia environmental quality report 2018. DoE Publication.
Hirabayashi, S., & Kroll, C. N. (2017). Single imputation method of missing air quality data for i-tree eco analyses in the conterminous United States. Retrieved January 1, 2021, from https://www.itreetools.org/documents/51/Single_imputation_method_of_missing_air_quality_data_for_i-Tree_Eco_analyses_in_the_conterminous_United_States.pdf
Latif, M. T., Othman, M., Idris, N., Juneng, L., Abdullah, A. M., Hamzah, W. P., Khan, M. F., Sulaiman, N. M. N., Jewaratnam, J., Aghamohammadi, N., Sahani, M., Xiang, C. J., Ahamad, F., Amil, N., Darus, M., Varkkey, H., Tangang, F., & Jaafar, A. B. (2018). Impact of regional haze towards air quality in Malaysia: A review. Atmospheric Environment, 177, 28-44. https://doi.org/10.1016/j.atmosenv.2018.01.002
Little, R. J., & Rubin, D. B. (2019). Statistical analysis with missing data (Vol. 793). John Wiley & Sons.
Noor, N. M., Yahaya, A. S., Ramli, N. A., & Abdullah, M. M. A. (2006). The replacement of missing values of continuous air pollution monitoring data using mean top bottom imputation technique. Journal of Engineering Research & Education, 3, 96-105.
Norazian, M. N., Shukri, Y. A., & Azam, R. N. (2008). Estimation of missing values in air pollution data using single imputation techniques. ScienceAsia, 34(3), 341-345. http://doi.org/10.2306/scienceasia1513-1874.2008.34.341
Plaia, A., & Bondi, A. L. (2006). Single imputation method of missing values in environmental pollution data sets. Atmospheric Environment, 40(38), 7316-7330. https://doi.org/10.1016/j.atmosenv.2006.06.040
Sukatis, F. F., Noor, N. M., Zakaria, N. A., Ul-Saufie, A. Z., & Suwardi, A. (2019). Estimation of missing values in air pollution dataset by using various imputation methods. International Journal of Conservation Science, 10(4), 791-804
Ul-Saufie, A. Z., Yahya, A. S., Ramli, N. A., & Hamid, H. A. (2011). Comparison between multiple linear regression and feed forward back propagation neural network models for predicting PM10 concentration level based on gaseous and meteorological parameters. International Journal of Applied, 1(4), 42-49.
Ward, N. (2019). Air pollution. Retrieved January 1, 2021, from https://prezi.com/wyokg7n0uuru/air-pollution/
Zainudin, M. L., & Noor, N. M. (2009, June 20-22). The single interpolation and statistical technique: A review of application in air quality data sets. In Proceedings of Malaysian Technical Universities Conference on Engineering and Technology (MUCEET2009) (pp. 1-4). Pahang, Malaysia
ISSN 0128-7680
e-ISSN 2231-8526