e-ISSN 2231-8526
ISSN 0128-7680
Friday Zinzendoff Okwonu, Nor Aishah Ahad, Innocent Ejiro Okoloko, Joshua Sarduana Apanapudor, Saadi Ahmad Kamaruddin and Festus Irimisose Arunaye
Pertanika Journal of Science & Technology, Volume 30, Issue 4, October 2022
DOI: https://doi.org/10.47836/pjst.30.4.29
Keywords: Classification, least squares, linear prediction, prediction errors, robust
Published on: 28 September 2022
The sample mean classifier, such as the nearest mean classifier (NMC) and the Bayes classifier, is not robust due to the influence of outliers. Enhancing the robust performance of these methods may result in vital information loss due to weighting or data deletion. The focus of this study is to develop robust hybrid univariate classifiers that do not rely on data weighting or deletion. The following data transformation methods, such as the least square approach (LSA) and linear prediction approach (LPA), are applied to estimate the parameters of interest to achieve the objectives of this study. The LSA and LPA estimates are applied to develop two groups of univariate classifiers. We further applied the predicted estimates from the LSA and LPA methods to develop four hybrid classifiers. These classifiers are applied to investigate whether cattle horn and base width length could be used to determine cattle gender. We also used these classification methods to determine whether shapes could classify banana variety. The NMC, LSA, LPA, and hybrid classifiers showed that cattle gender could be determined using horn length and base width measurement. The analysis further revealed that shapes could determine banana variety. The comparative results using the two data sets demonstrated that all the methods have over 90% performance prediction accuracy. The findings affirmed that the performance of the NMC, LSA, LPA, and the hybrid classifiers satisfy the data-dependent theory and are suitable for classifying agricultural products. Therefore, the proposed methods could be applied to perform classification tasks efficiently in many fields of study.
Almetwally, E. M., & Almongy, H. M. (2018). Comparison between M-estimation, S-estimation, and MM estimation methods of robust estimation with application and simulation censoring. International Journal of Mathematical Archive, 9(11), 1-9.
Atal, B. S. (2006). The history of linear prediction. IEEE Signal Processing Magazine, 23(2), 154-161.
Bickel, P. J., & Doksum, K. A. (2015). Mathematical Statistics: Basic Ideas and Selected Topics (Vol. 1, 2nd Ed). Chapman and Hall/CRC.
Bultheel, A., & van Barel, M. (1994). Linear prediction: Mathematics and engineering. Bulletin of the Belgian Mathematical Society, 1, 1-58.
Campbell, N. A., Lopuhaä, H. P., & Rousseeuw, P. J. (1999). On the calculation of a robust S-estimator of a covariance matrix. Statistics in Medicine, 17(23), 2685-2695. https://doi.org/10.1002/(SICI)1097-0258(19981215)17:23<2685::AID-SIM35>3.0.CO;2-W
Chen, C., & Liu, C. J. (2012). The application of total least squares method in data fitting of speed radar. Applied Mechanics and Materials, 203, 69-75. https://doi.org/10.4028/www.scientific.net/amm.203.69
Croux, C., Rousseeuw, P. J., & Hossjer, O. (1994). Generalized S-estimators. Journal of the American Statistical Association, 89(428), 1271-1281. https://doi.org/10.2307/2290990
Dobler, P. C. (2002). Mathematical statistics: Basic ideas and selected topics. The American Statistician, 56(4), 332-332. https://doi.org/10.1198/tas.2002.s204
Drygas, H. (2011). On the relationship between the method of least squares and Gram-Schmidt orthogonalization. Acta et Commentationes Universitatis Tartuensis de Mathematica, 15(1), 3-13. https://doi.org/10.12697/ACUTM.2011.15.01
Engle, R. F. (1982). Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica, 50(4), 987-1007. https://doi.org/10.2307/1912773
Eriksson, A., Preve, D., & Yu, J. (2019). Forecasting realized volatility using a nonnegative semiparametric model. Journal of Risk and Financial Management, 12(3), Article 139. https://doi.org/10.3390/jrfm12030139
Girshin, S. S., Kuznetsov, E. A., & Petrova, E. V. (2016, May). Application of least square method for heat balance equation solving of overhead line conductors in case of natural convection. In 2016 2nd International Conference on Industrial Engineering, Applications and Manufacturing (ICIEAM) (pp. 1-5). IEEE Publishing. https://doi.org/10.1109/ICIEAM.2016.7911417.
Gupta, A. K., & Govindarajulu, Z. (1973). Some new classification rules for c univariate normal populations. Canadian Journal of Statistics, 1(1-2), 139-157. https://doi.org/10.2307/3314996
Harianto, H., Sunyoto, A., & Sudarmawan, S. (2020). Optimasi algoritma Naïve Bayes classifier untuk mendeteksi anomaly dengan univariate fitur selection[Naïve Bayes algorithm optimization classifier to detect anomalies with univariate feature selection]. Edumatic: Jurnal Pendidikan Informatika, 4(2), 40-49. https://doi.org/10.29408/edumatic.v4i2.2433
Hasselmann, K., & Barnett, T. P. (1981). Techniques of linear prediction for systems with periodic statistics. Journal of Atmospheric Sciences, 38(10), 2275-2283. https://doi.org/10.1175/1520-0469(1981)038<2275:TOLPFS>2.0.CO;2
He, Z., Zuo, R., Zhang, D., Ni, P., Han, K., Xue, Z., Wang, J., & Xu, D. (2021). A least squares method for identification of unknown groundwater pollution source. Hydrology Research, 52(2), 450-460. https://doi.org/10.2166/nh.2021.088
Hubert, M., & Debruyne, M. (2010). Minimum covariance determinant. WIREs Computational Statistics, 2(1), 36-43. https://doi.org/10.1002/wics.61
Hubert, M., Debruyne, M., & Rousseeuw, P. J. (2018). Minimum covariance determinant and extensions. WIREs Computational Statistics, 10(3), Article e1421. https://doi.org/10.1002/wics.1421
Huberty, C. J., & Holmes, S. E. (1983). Two-group comparisons and univariate classification. Educational and Psychological Measurement, 43(1), 15-26. https://doi.org/10.1177/001316448304300103
Jaeger, B. (2006). The method of least squares. In A. A. Lazakidou (Ed.), Handbook of Research on Informatics in Healthcare and Biomedicine (pp. 181-185). IGI Global. https://doi.org/10.4018/978-1-59140-982-3.ch023
Johnson, R. A., & Wichern, D. W. (1992). Applied Multivariate Statistical Analysis (3rd Ed.). Prentice-Hall.
Jones, R. H. (1978). Multivariate autoregression estimation using residuals. In Applied Time Series Analysis I, (pp. 139-162). Academic Press. https://doi.org/10.1016/B978-0-12-257250-0.50009-X
Karimi-Bidhendi, S., Munshi, F., & Munshi, A. (2018). Scalable classification of univariate and multivariate time series. In 2018 IEEE International Conference on Big Data (Big Data) (pp. 1598-1605). IEEE Publishing. https://doi.org/10.1109/BigData.2018.8621889
KEEL. (2015). Banana. OpenML. https://www.openml.org/d/1460
Kern, M. (2016). Numerical Methods for Inverse Problems. John Wiley & Sons.
Kordestani, M., Hassanvand, F., Samimi, Y., & Shahriari, H. (2020). Monitoring multivariate simple linear profiles using robust estimators. Communications in Statistics - Theory and Methods, 49(12), 2964-2989. https://doi.org/10.1080/03610926.2019.1584314
Koubaa, Y. (2006). Application of least-squares techniques for induction motor parameters estimation. Mathematical and Computer Modelling of Dynamical Systems, 12(4), 363-375. https://doi.org/10.1080/13873950500064103
Leys, C., Delacre, M., Mora, Y. L., Lakens, D., & Ley, C. (2019). How to classify, detect, and manage univariate and multivariate outliers, with emphasis on pre-registration. International Review of Social Psychology, 32(1), 1-10. https://doi.org/10.5334/irsp.289
Lindley, D. V. (1999). Introduction to the practice of statistics, by David S. Moore and George P. McCabe. Pp. 825 (with appendices and CD-ROM).£ 27.95. 1999. ISBN 0 7167 3502 4 (WH Freeman). The Mathematical Gazette, 83(497), 374-375. https://doi.org/10.2307/3619120
Ma, D., Wei, W., Hu, H., & Guan, J. (2011). The application of Bayesian classification theories in distance education system. International Journal of Modern Education and Computer Science, 3(4), 9-16.
Manolakis, D. G., & Proakis, J. G. (1996). Digital Signal Processing. Principles, Algorithm, and Applications (4th Ed.). Prentice-Hall International Inc.
Marple, S. L., & Carey, W. M. (1989). Digital spectral analysis with applications. The Journal of the Acoustical Society of America, 86, Article 2043. https://doi.org/10.1121/1.398548
Mello, L. (2006). Linear Predictive Coding as an Estimator of Volatility. arXiv e-Print. https://doi.org/10.48550/arXiv.cs/0607107
Miller, S. J. (2006). The method of least squares. Williams College. https://web.williams.edu/Mathematics/sjmiller/public_html/probabilitylifesaver/MethodLeastSquares.pdf
Ogundokun, R. O., Lukman, A. F., Kibria, G. B., Awotunde, J. B., & Aladeitan, B. B. (2020). Predictive modelling of COVID-19 confirmed cases in Nigeria. Infectious Disease Modelling, 5, 543-548. https://doi.org/10.1016/j.idm.2020.08.003
Okwonu F. Z., & Othman, A. R. (2012). A model classification technique for linear discriminant analysis for two groups. International Journal of Computer Science Issues (IJCSI), 9(3), 125-128.
Olarenwaju, B. A., & Harrison, I. U. (2020). Modeling of COVID-19 cases of selected states in Nigeria using linear and non-linear prediction models. Journal of Computer Sciences Institute, 17, 390-395.
Penenberg, D. N. (2015). Mathematical Statistics: Basic Ideas and Selected Topics, 2nd edn, vols I and II PJ Bickel and KA Doksum, 2015 Boca Raton, Chapman and Hall–CRC xxii+ 548 pp., $99.95 (vol. I); 438 pp., $99.95 (vol. II) ISBN 978-1-498-72380-0. Journal of the Royal Statistical Society Serires, 179(4), 1128-1129.
Randall, R. B., Antoni, J., & Borghesani, P. (2020). Applied digital signal processing. In R. Allemang & P. Avitabile (Eds), Handbook of Experimental Structural Dynamics (pp.1-81). Springer. https://doi.org/10.1007/978-1-4939-6503-8_6-1
Skurichina, M., & Duin, R. P. (2000). Boosting in linear discriminant analysis. In International Workshop on Multiple Classifier Systems. MCS 2000. Lecture Notes in Computer Science, (Vol. 1857, pp. 190-199). Springer. https://doi.org/10.1007/3-540-45014-9_18
Song, K., Wang, N., & Wang, H. (2020). A metric learning-based univariate time series classification method. Information, 11(6), Article 288. https://doi.org/10.3390/info11060288
Srivastava, S. (2017). Fundamentals of linear prediction. The Institute for Signal and Information Processing. https://www.isip.piconepress.com/courses/msstate/ece_7000_speech/lectures/current/lecture_03/paper/paper.pdf
Sun, W., Zuo, F., Dong, A., & Zhou, L. (2015). Application of least square curve fitting algorithm based on LabVIEW in pressure detection system. In 2015 International Conference on Applied Science and Engineering Innovation (pp. 39-43). Atlantis Press.
Tan, L., & Jiang, J. (2018). Digital Signal Processing: Fundamentals and Applications. Academic Press.
Tang, J., Alelyani, S., & Liu, H. (2014). Feature selection for classification: A review. In C. C. Aggarwal (Ed.), Data Classification: Algorithms and Applications (pp. 1-28). Chapman and Hall/CRC. https://doi.org/10.1201/b17320
Theodoridis, S., & Koutroumbas, K. (2009). Classifiers based on Bayes decision theory. In Pattern recognition (pp. 13-28). Academia Press.
Vaseghi, S. V. (2008). Advanced Digital Signal Processing and Noise Reduction. John Wiley & Sons.
Verardi, V., & McCathie, A. (2012). The S-estimator of multivariate location and scatter in stata. The Stata Journal: Promoting Communications on Statistics and Stata, 12(2), 299-307. https://doi.org/10.1177/1536867X1201200208
Yao, T. T., Bai, Z. J., Jin, X. Q., & Zhao, Z. (2020). A geometric Gauss-Newton method for least squares inverse eigenvalue problems. BIT Numerical Mathematics, 60(3), 825-852. https://doi.org/10.1007/s10543-019-00798-9
Ye, N. (2020). Naïve Bayes classifier. In Data Mining (pp. 1-6). CRC Press. https://doi.org/10.1201/b15288-5
ISSN 0128-7680
e-ISSN 2231-8526