e-ISSN 2231-8526
ISSN 0128-7680
Zun Liang Chuan, Wan Nur Syahidah Wan Yusoff, Azlyna Senawi, Mohd Romlay Mohd Akramin, Soo-Fen Fam, Wendy Ling Shinyie and Tan Lit Ken
Pertanika Journal of Science & Technology, Volume 30, Issue 1, January 2022
DOI: https://doi.org/10.47836/pjst.30.1.18
Keywords: Anderson Darling statistical test, bootstrap, hierarchical, non-hierarchical, regionalisation algorithm, unbiased statistical test
Published on: 10 January 2022
Descriptive data mining has been widely applied in hydrology as the regionalisation algorithms to identify the statistically homogeneous rainfall regions. However, previous studies employed regionalisation algorithms, namely agglomerative hierarchical and non-hierarchical regionalisation algorithms requiring post-processing techniques to validate and interpret the analysis results. The main objective of this study is to investigate the effectiveness of the automated agglomerative hierarchical and non-hierarchical regionalisation algorithms in identifying the homogeneous rainfall regions based on a new statistically significant difference regionalised feature set. To pursue this objective, this study collected 20 historical monthly rainfall time-series data from the rain gauge stations located in the Kuantan district. In practice, these 20 rain gauge stations can be categorised into two statistically homogeneous rainfall regions, namely distinct spatial and temporal variability in the rainfall amounts. The results of the analysis show that Forgy K-means non-hierarchical (FKNH), Hartigan- Wong K-means non-hierarchical (HKNH), and Lloyd K-means non-hierarchical (LKNH) regionalisation algorithms are superior to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Furthermore, FKNH, HKNH, and LKNH yielded the highest regionalisation accuracy compared to other automated agglomerative hierarchical and non-hierarchical regionalisation algorithms. Based on the regionalisation results yielded in this study, the reliability and accuracy that assessed the risk of extreme hydro-meteorological events for the Kuantan district can be improved. In particular, the regional quantile estimates can provide a more accurate estimation compared to at-site quantile estimates using an appropriate statistical distribution.
Ahmad, N. H., Othman, I. R., & Deni, S. M. (2013). Hierarchical cluster approach for regionalisation of Peninsular Malaysia based on the precipitation amount. Journal of Physics: Conference Series, 423, 1-10. https://doi.org/10.1088/1742-6596/423/1/012018
Awan, J. A., Bae, D. H., & Kim, K. J. (2014). Identification and trend analysis of homogeneous rainfall zones over the East Asia monsoon region. International Journal of Climatology, 35(7), 1422-1433. https://doi.org/10.1002/joc.4066
Burn, D. H., Zrinji, Z., & Kowalchuk, M. (1997). Regionalization of catchments for regional flood frequency analysis. Journal of Hydrologic Engineering, 2(2), 76-82. https://doi.org/10.1061/(ASCE)1084-0699(1997)2:2(76)
Chuan, Z. L., Deni, S. M., Fam, S. F., & Ismail, N. (2020). The effectiveness of a probabilistic principal component analysis model and expectation maximisation algorithm in treating missing daily rainfall data. Asia-Pacific Journal of Atmospheric Sciences, 56, 119-129. https://doi.org/10.1007/s13143-019-00135-8
Chuan, Z. L., Ismail, N., Shinyie, W. L., Ken, T. L., Fam, S. F., Senawi, A., & Yusoff, W. N. S. W. (2018a). The efficiency of average linkage hierarchical clustering algorithm associated multi-scale bootstrap resampling in identifying homogeneous precipitation catchments. IOP Conference Series: Materials Science and Engineering, 342, 1-10. https://doi.org/10.1088/1757-899X/342/1/012070
Chuan, Z. L., Ismail, N., Yusoff, W. N. S. W., Fam, S. F., & Romlay, M. A. M. (2018b). Identifying homogeneous rainfall catchments for non-stationary time series using TOPSIS algorithm and bootstrap k-sample Anderson darling test. International Journal of Engineering & Technology, 7(4), 3228-3237.
Chuan, Z. L., Senawi, A., Yusoff, W. N. S. W., Ismail, N., Ken, T. L., & Chuan, M. W. (2018c). Identifying the ideal number Q-components of the Bayesian principal component analysis model for missing daily precipitation data treatment. International Journal of Engineering & Technology, 7(4.30), 5-10. https://doi.org/10.14419/ijet.v7i4.30.21992
Dash, M., & Liu, H. (2003). Feature selection for clustering. In T. Terano, H. Liu & A. L. P. Chen (Eds.), Knowledge discovery and data mining current issues and new applications (pp. 110-121). Springer. https://doi.org/10.1007/3-540-45571-X_13
Forgy, E. (1965). Cluster analysis of multivariate data: Efficiency versus interpretability of classification. Biometrics, 21(3), 768-769.
Guttman, N. B. (1993). The use of L-moments in the determination of regional precipitation climates. Journal of Climate, 6(12), 2309-2325. https://doi.org/10.1175/1520-0442(1993)006<2309:TUOLMI>2.0.CO;2
Hamdan, M. F., Suhaila, J., & Jemain, A. A. (2015). Clustering rainfall pattern in Malaysia using functional data analysis. AIP Conference Proceedings, 1643, 349-355. https://doi.org/10.1063/1.4907466
Hartigan, J. A., & Wong, M. A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 100-108. https://doi.org/10.2307/2346830
Lloyd, S. P. (1982). Least square quantization in PCM. IEEE Transactions on Information Theory, IT-28(2), 129-137. https://doi.org/10.1109/TIT.1982.1056489
MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In L. M. Cam & J. Neyman (Eds.), Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability (pp. 281-297). University of California Press.
Ngongondo, C. S., Xu, C. Y., Tallaksen, L. M., Alemaw, B., & Chirwa, T. (2011). Regional frequency analysis of rainfall extremes in Southern Malawi using the index rainfall and L-moments approaches. Stochastic Environmental Research and Risk Assessment, 25(7), 939-955. https://doi.org/10.1007/s00477-011-0480-x
Nnaji, C. C., Mama, C. N., & Ukpabi, O. (2014). Hierarchical analysis of rainfall variability across Nigeria. Theoretical and Applied Climatology, 123(1-2), 171-184. https://doi.org/10.1007/s00704-014-1348-z
Saeed, G. A. A., Chuan, Z. L., Zakaria, R., Yusoff, W. N. S. W., & Salleh, M. Z. (2016). Determine of the best single imputation algorithm for missing rainfall data treatment. Journal of Quality Measurement and Analysis, 12(1-2), 79-87.
Sahrin, S., Ismail, N., & Alias, N. E. (2018). Regional frequency analysis of Peninsular Malaysia using L-moments. Far East Journal of Mathematical Sciences, 103(8), 1379-1398. https://dx.doi.org/10.17654/MS103081379
Scholz, F. W., & Stephens, M. A. (1986). K-sample Anderson-Darling tests. Journal of the American Statistical Association, 82(399), 918-924. https://doi.org/10.1080/01621459.1987.10478517
Shimodaira, H. (2002). An approximately unbiased test of phylogenetic tree selection. Systematic Biology, 51(3), 492-508. https://doi.org/10.1080/10635150290069913
Tan, P. N., Steinbach, M., & Kumar, V. (2006). Introduction to data mining. Pearson Addison Wesley.
Terassi, P. M. D. B., & Galvani, E. (2017). Identification of homogeneous rainfall regions in the Eastern watersheds of the State of Paraná, Brazil. Climate, 5(3), 1-13. https://doi.org/10.3390/cli5030053
ISSN 0128-7680
e-ISSN 2231-8526