Preliminary Analysis of Machine Learning Performance and the Effect of Outliers in Daily Rainfall Classification in Jambi City

Authors

  • Muhammad Risyad Naufal Universitas Sriwijaya
  • Marathur Rodhiyah Universitas Sriwijaya
  • Erni Universitas Sriwijaya

DOI:

https://doi.org/10.32734/jotp.v8i1.24702

Keywords:

Classification, Daily Rainfall, Machine Learning, Macro F1-Score, Outlier Removal

Abstract

Rainfall is a crucial meteorological parameter that significantly affects various sectors, particularly in tropical regions such as Jambi City. However, daily rainfall data often contain outliers and imbalanced class distributions, which can degrade the performance of machine learning-based classification models. This study aims to conduct a preliminary analysis of the performance of several machine learning algorithms for daily rainfall classification in Jambi City by examining the effects of outlier removal. The algorithms evaluated include Support Vector Machine (RBF), K-Nearest Neighbor, Naive Bayes, Decision Tree, and Random Forest. Model performance was assessed using accuracy and macro F1-score metrics. The rainfall classes used in this study consist of four categories: no rain, light rain, moderate rain, and heavy rain. The results indicate that outlier removal improves the accuracy of all evaluated algorithms, with the most substantial improvement observed in the Decision Tree model with accuracy improved from 45.71% to 57.36% and macro F1-score from 28.99% to 38.78%. Overall, the implementation of outlier removal yields more balanced and representative rainfall classification results, potentially serving as a basis for future quantitative rainfall regression studies.

Downloads

Download data is not yet available.

References

[1] Julikah, G. Rahmat, and M. B. Wiranatanegara, “Subsurface Geological Evaluation of the Central Sumatra Basin in Relation to the Presence of Heavy Oil,” Sci. Contrib. Oil Gas, vol. 44, no. 1, pp. 65–81, 2021, doi: 10.29017/scog.44.1.491.

[2] E. Yanfatriani et al., “Extreme Rainfall Trends and Hydrometeorological Disasters in Tropical Regions: Implications for Climate Resilience,” Emerg. Sci. J., vol. 8, no. 5, pp. 1860–1874, 2024, doi: 10.28991/ESJ-2024-08-05-012.

[3] J. Wang, Z. You, P. Song, and Z. Fang, “Rainfall’s impact on agricultural production and government poverty reduction efficiency in China,” Sci. Rep., vol. 14, no. 1, pp. 1–21, 2024, doi: 10.1038/s41598-024-59282-2.

[4] R. Rahayu, S. A. Mathias, S. Reaney, G. Vesuviano, R. Suwarman, and A. M. Ramdhan, “Impact of land cover, rainfall and topography on flood risk in West Java,” Nat. Hazards, vol. 116, no. 2, pp. 1735–1758, 2023, doi: 10.1007/s11069-022-05737-6.

[5] V. Kumar, N. Kedam, O. Kisi, S. Alsulamy, K. M. Khedher, and M. A. Salem, “A Comparative Study of Machine Learning Models for Daily and Weekly Rainfall Forecasting,” Water Resour. Manag., vol. 39, no. 1, pp. 271–290, 2025, doi: 10.1007/s11269-024-03969-8.

[6] C. W. Wu and F. N. F. Chou, “An inverse-problem approach to detect outliers in rainfall measurements of ground gauges for robust reservoir flood control operation,” J. Hydrol., vol. 620, no. PA, p. 129360, 2023, doi: 10.1016/j.jhydrol.2023.129360.

[7] P. K. Das, R. L. Sahu, and P. C. Swain, “Comparative analysis of machine learning models for rainfall prediction,” J. Atmos. Solar-Terrestrial Phys., vol. 264, no. August, p. 106340, 2024, doi: 10.1016/j.jastp.2024.106340.

[8] N. Nicholls, “Atmospheric and climatic hazards: Improved monitoring and prediction for disaster mitigation,” Nat. Hazards, vol. 23, no. 2–3, pp. 137–155, 2001, doi: 10.1023/A:1011130223164.

[9] Z. A. Dwiyanti and C. Prianto, “Prediksi Cuaca Kota Jakarta Menggunakan Metode Random Forest,” J. Tekno Insentif, vol. 17, no. 2, pp. 127–137, 2023, doi: 10.36787/jti.v17i2.1136.

[10] I. Hapsari and S. Pandya Wisesa, “Evaluasi Model Prediksi Curah Hujan Berbasis Machine Learning di Kota Bandung,” J. Nas. Teknol. dan Sist. Inf., vol. 11, no. 2, pp. 136–143, 2025, doi: 10.25077/teknosi.v11i2.2025.136-143.

[11] O. Ejike, D. Ndzi, and M. Z. Shakir, “Comparative Study of Machine Learning-Based Rainfall Prediction in Tropical and Temperate Climates,” Climate, vol. 13, no. 8, pp. 1–27, 2025, doi: 10.3390/cli13080167.

[12] C. D. Usman and U. Sudibyo, “Klasifikasi Curah Hujan di Kota Semarang Menggunakan Machine Learning,” Pros. SAINTEK Sains dan Teknol., vol. 1, no. 1, pp. 1–5, 2022, [Online]. Available: https://www.researchgate.net/publication/388748074_Klasifikasi_Curah_Hujan_di_Kota_Semarang_Menggunakan_Machine_Learning/link/67a465ab207c0c20fa7b4e06/download?_tp=eyJjb250ZXh0Ijp7InBhZ2UiOiJwdWJsaWNhdGlvbiIsInByZXZpb3VzUGFnZSI6bnVsbH19

[13] BMKG, Penyediaan Dan Penyebarluasan Peringatan Dini Cuaca Ekstrem Sesaat (Nowcasting) 1 (Satu) Sampai Dengan 3 (Tiga) Jam Ke Depan Di Lingkungan Badan Meteorologi, Klimatologi, Dan Geofisika. 2023, pp. 1–11. [Online]. Available: https://jdih.bmkg.go.id/common/dokumen/2023sopbmkg025.pdf

[14] D. T. Utari, G. R. P. Palage, F. Fadhlirobby, and A. B. Nuswantoro, “COMPARATIVE ANALYSIS OF MACHINE LEARNING MODELS FOR RAINFALL CLASSIFICATION IN YOGYAKARTA,” BAREKENG J. Math. Its A lications, vol. 264, no. 4, pp. 2765–2776, 2024, doi: 10.1016/j.jastp.2024.106340.

[15] M. Yaşar and F. Dikbaş, “Multivariate Outlier Detection in Precipitation Series by Using Two-Dimensional Correlation,” no. January, pp. 0–18, 2023, doi: https://doi.org/10.21203/rs.3.rs-399196/v2.

[16] S. Lusito, A. Pugnana, and R. Guidotti, Solving imbalanced learning with outlier detection and features reduction, vol. 113, no. 8. Springer US, 2024. doi: 10.1007/s10994-023-06448-0.

[17] W. M. Ridwan, M. Sapitang, A. Aziz, K. Faizal, A. Najah, and A. El-shafie, “Rainfall forecasting model using machine learning methods : Case study,” Ain Shams Eng. J., 2020, doi: 10.1016/j.asej.2020.09.011.

[18] V. Guhan, A. D. Raju, R. Krishna, and K. Nagaratna, “Evaluating weather trends and forecasting with machine learning : Insights from maximum temperature , minimum temperature , and rainfall data in India,” Dyn. Atmos. Ocean., vol. 110, no. March, p. 101562, 2025, doi: 10.1016/j.dynatmoce.2025.101562.

[19] F. Taromideh, G. F. Santonastaso, and R. Greco, “Journal of Hydrology : Regional Studies Rainfall nowcasting by integrating radar and rain gauge data with machine learning for Ischia Island , Italy,” J. Hydrol. Reg. Stud., vol. 58, no. March, p. 102273, 2025, doi: 10.1016/j.ejrh.2025.102273.

[20] S. Jang, J. Yoo, Y. Lee, and B. Kim, “Progress in Disaster Science Flood prediction in urban areas based on machine learning considering the statistical characteristics of rainfall,” Prog. Disaster Sci., vol. 26, no. March, p. 100415, 2025, doi: 10.1016/j.pdisas.2025.100415.

[21] H. Wen, F. Yan, J. Huang, and Y. Li, “Interpretable machine learning models and decision-making mechanisms for landslide hazard assessment under different rainfall conditions,” Expert Syst. Appl., vol. 270, no. January, p. 126582, 2025, doi: 10.1016/j.eswa.2025.126582.

[22] F. I. Arassah, K. Sadik, B. Sartono, and P. Sofan, “Optimizing Machine Learning for Daily Rainfall Prediction in Bogor : A Statistical Downscaling Approach,” vol. 5, no. 6, pp. 7006–7018, 2025, doi: https://doi.org/10.59188/eduvest.v5i6.51307.

[23] A. Rahmannisa, M. Ariska, S. M. Siahaan, and I. Seprina, “Implementation of Machine Learning for Rainfall Prediction in Smoke-Prone Areas of South Sumatra,” vol. IX, no. 2, pp. 89–103, 2025, doi: https://doi.org/10.19109/h8s3w172.

[24] BMKG, “Data Online - Direktorat Data dan Komputasi BMKG.” https://dataonline.bmkg.go.id/

[25] C. S. Kim, C. R. Kim, and K. H. Kok, “Outlier Detection in Hydrological Data Using Machine Learning: A Case Study in Lao PDR,” Water (Switzerland), vol. 17, no. 21, pp. 1–28, 2025, doi: 10.3390/w17213120.

[26] M. Okirya and J. Du Plessis, “Evaluating Bias Correction Methods Using Annual Maximum Series Rainfall Data from Observed and Remotely Sensed Sources in Gauged and Ungauged Catchments in Uganda,” Hydrology, vol. 12, p. 113, 2025, doi: https:// doi.org/10.3390/hydrology12050113.

[27] C. Meng, Y. Hu, Y. Zhang, and F. Guo, “PSBP-SVM : A Machine Learning-Based Computational Identifier for Predicting Polystyrene Binding Peptides,” vol. 8, no. March, pp. 1–9, 2020, doi: 10.3389/fbioe.2020.00245.

[28] M. A. Azeem and S. Dev, “A performance and interpretability assessment of machine learning models for rainfall prediction in the Republic of Ireland,” Decis. Anal. J., vol. 12, no. June, p. 100515, 2024, doi: 10.1016/j.dajour.2024.100515.

[29] M. Hassan et al., “Machine Learning-Based Rainfall Prediction : Unveiling Insights and Forecasting for Improved Preparedness,” IEEE Access, vol. 11, no. November, pp. 132196–132222, 2023, doi: 10.1109/ACCESS.2023.3333876.

[30] V. Kumar, V. K. Yadav, and E. S. Dubey, “Rainfall Prediction using Machine Learning,” no. May, 2022, doi: https://doi.org/10.22214/ijraset.2022.42876.

[31] Q. B. Pham, T. C. Yang, C. M. Kuo, H. W. Tseng, and P. S. Yu, “Combing random forest and least square support vector regression for improving extreme rainfall downscaling,” Water (Switzerland), vol. 11, no. 3, 2019, doi: 10.3390/w11030451.

[32] M. C. H. Lee, J. Braet, and J. Springael, “applied sciences Performance Metrics for Multilabel Emotion Classification : Comparing Micro , Macro , and Weighted F1-Scores,” Appl. Sci., vol. 14, no. 9863, 2024, doi: https://doi.org/ 10.3390/app14219863.

[33] T. K. Aulia, W. A. Arifin, and M. Rudi, “Klasifikasi Kualitas Air Budidaya Ikan Nila Menggunakan Support Vector Machine,” pp. 1842–1852, 2025, doi: 10.33364/algoritma/v.22-2.2356.

[34] S. S. Shapiro, M. B. Wilk, and B. T. Laboratories, “An analysis of variance test for normality ( complete samples )!,” 1965, doi: http://www.math.utah.edu/~morris/Courses/ShapiroWilk.pdf.

[35] Abraham S. Fischler School of Education, “Module 6 : t Tests Module 6 Overview,” Nova Southeastern University. [Online]. Available: https://app.nova.edu/toolbox/instructionalproducts/Statistics & SPSS/Module 6 t tests.pdf

[36] O. Rainio, “Evaluation metrics and statistical tests for machine learning,” Sci. Rep., pp. 1–14, 2024, doi: 10.1038/s41598-024-56706-x.

Downloads

Published

2026-03-10