Precision Document Transaction Type Classifier Using Machine Learning Techniques

Authors

  • Jay Carlou C. Sabado Don Mariano Marcos Memorial State University
  • Sheena I. Sapuay-Guillen Don Mariano Marcos Memorial State University

DOI:

https://doi.org/10.32734/jocai.v9.i1-19945

Keywords:

Precision Document Transaction Type Classifier, Agile Methodology, Software Quality, Machine Learning, Ease of Doing Business Law, Republic Act 11032

Abstract

This paper aimed to develop a Precision Document Transaction Type Classifier using machine learning to identify transaction types, aligning with the Ease of Doing Business Law (RA 11032), which aims to streamline government services and improve service delivery. With the use of existing government documents, a dataset was created and processed for the training and evaluation of models, including Naïve Bayes, Bidirectional Long Short-Term Memory (Bi-LSTM), and Bidirectional Encoder Representations from Transformer (BERT). The BERT Model was the most accurate, efficient, and precise among other models. For the development of the software application Agile Methodology was used to ensure iterative progress and adaptability during the development phase. For the software quality evaluation, it was assessed using ISO/IEC 25010:2011, achieving a general high score mean of 4.25 corresponding to a descriptive equivalent of Excellent covering various software quality metrics demonstrating reliability, efficiency and overall performance.

Downloads

Download data is not yet available.

References

S. B. Vuyokasi, “A comparative analysis of the use of e-government services by small businesses”. University of Johannesburg.

D. MacLean, R. Titah, “A Systematic Literature review of Empirical Research on the Impacts of e-Government: A Public Value Perspective”.

Story - e-Estonia. (2023c, February 1). e-Estonia. https://e-estonia.com/story/

Singapore Land Authority. (n.d.). https://www.sla.gov.sg/

J. O. H. Engineering, “Retracted: Application of internet of things and naive bayes in public health environmental management of government institutions in China”. Journal of Healthcare Engineering, vol. 1, 2023. https://doi.org/10.1155/2023/9815658

A. H. Oliaee, S. Das, J. Liu, M. A. Rahman, “Using Bidirectional Encoder Representations from Transformers (BERT) to classify traffic crash severity types”, Natural Language Processing Journal, vol. 3, pp. 100007, 2023. https://doi.org/10.1016/j.nlp.2023.100007

P. P. Pan, C. Yijin, “Automatic subject classification of public messages in e-government affairs”, Data and Information Management, vol. 5, no. 3, pp. 336–347, 2021. https://doi.org/10.2478/dim-2021-0004

D. Jiapeng, G. Shuaiying, T. Yuan, Y. Tengyuan, “Enhancing the Governance Capabilities through Smart Technology: Scenario Application of Image Recognition and Its Effects in Chinese Local Governance. (n.d.-b). Digital Object Identifier, 2020. https://ieeexplore.ieee.org/document/9186691

House of Representatives press releases. (n.d.-b). https://www.congress.gov.ph/press/details.php?pressid=12406

Republic Act No. 11032. (n.d.). https://lawphil.net/statutes/repacts/ra2018/ra_11032_2018.html

J. V. Brocke, A. R. Hevner, A. Maedche, “Introduction to Design Science Research”, In Progress in IS, pp. 1–13, 2020. https://doi.org/10.1007/978-3-030-46781-4_1

S. Mdletshe, M. Oliveira, B. Twala, “Enhancing medical radiation science education through a design science research methodology”, Journal of Medical Imaging and Radiation Sciences, vol. 52, no. 2, pp. 172–178, 2021. https://doi.org/10.1016/j.jmir.2021.01.005

M. Yazdani, M. Loosemore, M. Mojtahedi, D. Sanderson, M. Haghani, “An integration of operations research and design science research methodology: With an application in hospital disaster management”, Progress in Disaster Science, pp. 100300, 2023. https://doi.org/10.1016/j.pdisas.2023.100300

S. Mdletshe, O. S. Motshweneng, M. Oliveira, B. Twala, “Design science research application in medical radiation science education: A case study on the evaluation of a developed artifact”, Journal of Medical Imaging and Radiation Sciences, vol. 54, no. 1, pp. 206–214, 2022. https://doi.org/10.1016/j.jmir.2022.11.007

S. Maataoui, G. Bencheikh, G. Bencheikh, “Predictive Maintenance in the Industrial Sector: A CRISP-DM Approach for Developing Accurate Machine Failure Prediction Models”, IEEE, 2023.

J. Saltz, “CRISP-DM for Data Science: Strengths, Weaknesses and Potential Next Steps”, IEEE, 2021.

P. Sudhir, V. D. Suresh, “Comparative study of various approaches, applications and classifiers for sentiment analysis”, Global Transitions Proceedings, vol. 2, no. 2, pp. 205–211, 2021. https://doi.org/10.1016/j.gltp.2021.08.004

R. Ofori-Boateng, M. Aceves-Martins, C. Jayne, N. Wiratunga, C. F. Moreno-Garcia, “Evaluation of Attention-Based LSTM and Bi-LSTM networks for abstract text classification in systematic literature review automation”, Procedia Computer Science, vol. 222, pp. 114–126.

A. Turchin, S. Masharsky, M. Zitnik, “Comparison of BERT implementations for natural language processing of narrative medical documents”, Informatics in Medicine Unlocked, vol. 36, pp. 101139. https://doi.org/10.1016/j.imu.2022.101139

A. Turchin, S. Masharsky, M. Zitnik, “Comparison of BERT implementations for natural language processing of narrative medical documents”, Informatics in Medicine Unlocked, vol. 36, pp. 101139.

Salonso, D. Valero-Carreras, J. Alcaraz, M. Landete, “Comparing two SVM models through different metrics based on the confusion matrix”, Computers and Operations Research, vol. 152, pp. 106131 – CIO, 2023. https://cio.umh.es/2023/01/12/valero-carreras-d-alcaraz-j-landete-m-2023-comparing-two-svm-models-through-different-metrics-based-on-the-confusion-matrix-computers-and-operations-research-152106131-2/

A. Hinderks, F. J. D. Mayo, J. Thomaschewski, M. J. Escalona, “Approaches to manage the user experience process in Agile software development: A systematic literature review”, Information & Software Technology, vol. 150, pp. 106957, 2022. https://doi.org/10.1016/j.infsof.2022.106957

A. Alami, O. Krancher, M. Paasivaara, “The journey to technical excellence in agile software development”, Information & Software Technology, vol. 150, pp. 106959, 2022. https://doi.org/10.1016/j.infsof.2022.106959

M. Klima, M. Bures, K. Frajtak, V. Rechtberger, M. Trnka, X. Bellekens, T. Cerny, B. S Ahmed, B, “Selected Code-Quality Characteristics and Metrics for Internet of Things Systems”, 2022. https://ieeexplore.ieee.org/document/9762941

H. Salehinejad, A. M. Meehan, P. A. Rahman, M. A. Core, B. J. Borah, P. J. Caraballo, “Novel machine learning model to improve performance of an early warning system in hospitalized patients: a retrospective multisite cross-validation study”, EClinicalMedicine, vol. 66, pp. 102312, 2023. https://doi.org/10.1016/j.eclinm.2023.102312

K. Anjaria, “Knowledge derivation from the Likert scale using Z-numbers” Information Sciences, vol. 590, pp. 234–252, 2022. https://doi.org/10.1016/j.ins.2022.01.024

C. Schroer, F. Kruse, J. M. Gomez, “A Systematic Literature Review on Applying CRISP-DM Process model”. Procedia Computer Science, vol. 181, pp. 526–534, 2021. https://doi.org/10.1016/j.procs.2021.01.199

K. Madatov, S. Bekchanov, J. Vicic, “Dataset of Karakalpak language stop words”, Data in Brief, vol. 48, pp. 109111, 2023. https://doi.org/10.1016/j.dib.2023.109111

D. J. Ladani and N. P. Desai, "Automatic stopword Identification Technique for Gujarati text," 2021 International Conference on Artificial Intelligence and Machine Vision (AIMV), Gandhinagar, India, pp. 1-5, 2021. doi: 10.1109/AIMV53313.2021.9670968.

A. AlKarawi, K, AlJanabi, “Data Reduction Techniques: A Comparative study”, Journal of Kufa for Mathematics and Computer, vol. 9, no. 2, pp. 1–17, 2022. https://doi.org/10.31642/jokmc/2018/090201

M. Karwatowski, M. Pietron, “Context based lemmatizer for Polish language”, 2022, arXiv.org. https://arxiv.org/abs/2207.11565

P. Prakrankamanant and E. Chuangsuwanich, "Tokenization-based data augmentation for text classification," 2022 19th International Joint Conference on Computer Science and Software Engineering (JCSSE), Bangkok, Thailand, pp. 1-6, 2022. doi: 10.1109/JCSSE54890.2022.9836268.

Y. Guo, Z. Xie, X. Chen, H. Chen, Wang, L., Du, H., Wei, S., Zhao, Y., Li, Q., & Wu, G. (2022, November 27). ESIE-BERT: Enriching Sub-words Information Explicitly with BERT for Joint Intent Classification and SlotFilling. arXiv.org. https://arxiv.org/abs/2211.14829

I. Dawar and N. Kumar, "Text Categorization By Content using Naïve Bayes Approach," 2023 11th International Conference on Internet of Everything, Microwave Engineering, Communication and Networks (IEMECON), Jaipur, India, pp. 1-6, 2023. doi: 10.1109/IEMECON56962.2023.10092372.

Chingmuankim and R. Jindal, "Classification and Analysis of Textual data using Naive Bayes with TF-IDF," 2022 4th International Conference on Electrical, Control and Instrumentation Engineering (ICECIE), KualaLumpur, Malaysia, pp. 1-9, 2022. doi: 10.1109/ICECIE55199.2022.10000309.

A. P. Noto, D. R. S. Saputro, “Classification data mining with Laplacian Smoothing on Naïve Bayes method”. AIP Conference Proceedings, 2023. https://doi.org/10.1063/5.0116519

H. Zhang, C. Ma, Z. Jiang, J. Lian, "Image Caption Generation Using Contextual Information Fusion With Bi-LSTM-s," in IEEE Access, vol. 11, pp. 134-143, 2023. doi: 10.1109/ACCESS.2022.3232508.

J. Schmidt, “Testing for overfitting”, 2022. arXiv.org. https://arxiv.org/abs/2305.05792

J. Sawicki, M. Ganzha, M. Paprzycki, “The state of the art of Natural Language Processing - a systematic automated review of NLP literature using NLP techniques. Data Intelligence, pp. 1–47, 2023. https://doi.org/10.1162/dint_a_00213

M. U. Joseph, M. Jacob, "Developing a Real time model to Detect SMS Phishing Attacks in Edges using BERT," 2022 International Conference on Computing, Communication, Security and Intelligent Systems (IC3SIS), Kochi, India, pp. 1-7, 2022. doi: 10.1109/IC3SIS54991.2022.9885427.

S. Shen, Z. Dong, J. Ye, L. Ma, Z. Yao, A.Gholami, M. W. Mahoney, K. Keutzer, “Q-BERT: Hessian based Ultra Low precision Quantization of BERT”. Proceedings of the . . . AAAI Conference on Artificial Intelligence, vol. 34, no. 05, pp. 8815–8821, 2022. https://doi.org/10.1609/aaai.v34i05.6409

R. Dodda and S. B. Alladi, “BERT-based document clustering: unveiling semantic patterns in 20News Group, Reuters, and BBC Sports Corpora,” Authorea (Authorea), 2024, doi: 10.22541/au.171506422.20645846/v1.

M. H. Zahedi, A. R. Kashanaki, E. Farahani, “Risk management framework in Agile software development methodology”, International Journal of Power Electronics and Drive Systems/International Journal of Electrical and Computer Engineering, vol. 13, no. 4, pp. 4379, 2023. https://doi.org/10.11591/ijece.v13i4.pp4379-438

S. Lee, K. Hou, K. Wang, S. Sehrish, M. Paterno, J. Kowalkowski, Q. Koziol, R. Ross, A. Agrawal, A. Choudhary, W. Liao, “A case study on parallel HDF5 Dataset concatenation for High Energy Physics data analysis”, 2023. arXiv.org. https://arxiv.org/abs/2205.01168

I. Olenych, R. Korostenskyi, “Analysis of The Effectiveness of Using Kotlin Multiplatform Mobile Technology for Creating Cross-Platform Applications. Elektronìka Ta Ìnformacìjnì Tehnologìï, 21. https://doi.org/10.30970/eli.21.3

S. Marchenko, “Jetpack Compose: New Approaches To Android Ui Development”, 2023. http://baltijapublishing.lv/omp/index.php/bp/catalog/view/291/8064/16856-1

J. Xiao, L. Wang, Y. Cheng, J. Zhang, J. Hu, S. Tan, Y. Su, H. Zhou, "Web Front-end Development based on Flask Architecture for Image Recognition," 2023 IEEE 7th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, pp. 1964-1967, 2023. doi: 10.1109/ITOEC57671.2023.10291828.

M. R. A. Assifa, F. Setiadi, R. G. Utomo, “Evaluation of Software Quality For I-Office Plus Applications Using Iso/Iec 25010 and Kano Model. Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika, vol. 8, no. 2, pp. 561–571. https://doi.org/10.29100/jipi.v8i2.3561

Y. I. Irawan, E. S. Negara, “Evaluation of Software Quality Assurance Silampari Smart City of Lubuklinggau based on ISO/IEC 25010:2011 Analysis Model”. 2022 International Conference on Informatics, Multimedia, Cyber and Information System (ICIMCIS), 2023. https://doi.org/10.1109/icimcis56303.2022.10017834

K. Moumane, A. Idri, F. E. Aouni, J. Laghnimi, N. C. Benabdellah, O. Hamal, “ISO/IEC 25010-based quality evaluation of three mobile applications for reproductive health services in Morocco. Clinical and Experimental Obstetrics & Gynecology, vol. 51, no. 4, pp. 88, 2024. https://doi.org/10.31083/j.ceog5104088

A. Tursia, D. Pernadi, “Pengukuran kualitas perangkat lunak Persona berdasarkan ISO/IEC 25010 menggunakan tingkat capaian responden (TCR)”, Digital Transformation Technology, vol. 3, no. 2, pp. 879–887, 2024. https://doi.org/10.47709/digitech.v3i2.3416

B. I. Rumabar, E. Maria, “Evaluasi kualitas ShopeePay menggunakan ISO/IEC 25010. JURNAL SISTEM INFORMASI BISNIS, vol. 14, no. 1, pp.54–61, 2023. https://doi.org/10.21456/vol14iss1pp54-61

I. Gasanov, A. Ereshko, “Computational experiments on Back-Testing Complex using the forecast of artificial neural network”, 2022 15th International Conference Management of Large-scale System Development (MLSD), pp. 1–4, 2022. https://doi.org/10.1109/mlsd55143.2022.9934431

Downloads

Published

2025-01-31

How to Cite

Sabado, J. C. C., & Sapuay-Guillen, S. I. . (2025). Precision Document Transaction Type Classifier Using Machine Learning Techniques. Data Science: Journal of Computing and Applied Informatics, 9(1), 57–75. https://doi.org/10.32734/jocai.v9.i1-19945