Comparative Classification of Hepatitis C Disease Using Naïve Bayes and Random Forest with SMOTE-Based Class Balancing
DOI:
https://doi.org/10.55681/armada.v4i6.2961Keywords:
Hepatitis C, Ketidakseimbangan Kelas, Naïve Bayes, Random Forest, SMOTEAbstract
Hepatitis C merupakan penyakit hati yang disebabkan oleh Virus Hepatitis C (HCV) dan dapat berkembang menjadi komplikasi serius seperti sirosis, gagal hati, dan kanker hati. Deteksi dini yang akurat sangat penting untuk meningkatkan keberhasilan pengobatan dan kualitas hidup pasien. Penelitian ini membandingkan kinerja algoritma Naïve Bayes dan Random Forest dalam klasifikasi penyakit Hepatitis C menggunakan data klinis dari UCI Machine Learning Repository yang terdiri atas 615 data pasien. Tahap prapemrosesan meliputi pembersihan data, transformasi data, dan penyeimbangan kelas menggunakan metode Synthetic Minority Over-sampling Technique (SMOTE). Dataset dibagi dengan rasio 80:20 untuk data latih dan data uji. Hasil eksperimen menunjukkan bahwa Random Forest menghasilkan akurasi 99,06%, precision 0,99, recall 0,98, dan F1-score 0,99, jauh melampaui Naïve Bayes yang memperoleh akurasi 84,06%. Analisis feature importance mengidentifikasi AST, ALT, GGT, Usia, dan Albumin sebagai prediktor klinis paling signifikan. Kombinasi algoritma Random Forest dan SMOTE terbukti menjadi pendekatan yang sangat efektif dalam klasifikasi Hepatitis C dan berpotensi besar mendukung pengambilan keputusan klinis yang akurat untuk deteksi dini penyakit.
Downloads
References
Furizal, F., Ma’arif, A., & Rifaldi, D. (2023). Application of Machine Learning in Healthcare and Medicine: A Review. Journal of Robotics and Control (JRC), 4(5), 621–631. https://doi.org/10.18196/jrc.v4i5.19640
Ahmed, H., Yasin, S., Khan, M. A., & Tariq, U. (2023). Machine learning techniques for liver disease prediction: A systematic review. Healthcare, 11(4), 567–580.
Arslan, A. K., Colak, C., & Sarihan, M. E. (2022). Hepatitis C disease classification using machine learning algorithms. Biomedical Signal Processing and Control, 76, 103675.
Bagur, A., & Pratama, A. (2025). Performance evaluation of machine learning algorithms for healthcare classification.
Blach, S., Kondili, L. A., Aghemo, A., Crespo, J., Feeney, E., Papatheodoridis, G., Puoti, M., Ryder, S., Semela, D., & Razavi, H. (2023). Global change in hepatitis C virus prevalence and cascade of care between 2015 and 2020: A modelling study. Journal of Hepatology, 78(4), 733–745. https://doi.org/10.1016/S2468-1253(21)00472-6
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953
Damayanti, A., & Testiana, G. (2023). Penerapan Data Mining untuk Prediksi Penyakit Hepatitis C Menggunakan Algoritma Naïve Bayes. Jurnal Manajamen Informatika Jayakarta, 3(2), 177-186. https://doi.org/10.52362/jmijayakarta.v3i2.1098
Fan, Y., Lu, X., & Sun, G. (2023). IHCP: Interpretable hepatitis C prediction system based on black-box machine learning models. BMC Bioinformatics, 24, 333. https://doi.org/10.1186/s12859-023-05456-0
Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905. https://doi.org/10.1613/jair.1.11192
Ghosh, M., Raihan, M. M. S., Raihan, M., Akter, L., Bairagi, A. K., Alshamrani, S. S., & Masud, M. (2021). A Comparative Analysis of Machine Learning Algorithms to Predict Liver Disease. Intelligent Automation & Soft Computing, 30(3). DOI:10.32604/iasc.2021.017989
Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert systems with applications, 73, 220-239. https://doi.org/10.1016/j.eswa.2016.12.035
Han, J., Pei, J., & Kamber, M. (2022). Data mining: Concepts and techniques (4th ed.). Morgan Kaufmann.
Hendrayana, I. G., Dewi, N. P. D. A. S., Aryasa, J. A. D., Prayoga, I. M. A., & Raharjo, R. A. (2025). The implementation of the Random Forest Algorithm with Resampling and Without Resampling on the Hepatitis C Disease Dataset. Journal of Computer Networks, Architecture and High Performance Computing, 7(3), 614-628. https://doi.org/10.47709/cnahpc.v7i3.6089
Lilhore, U. K., Simaiya, S., Dalal, S., & Ahuja, N. J. (2023). Machine learning-based prediction of Hepatitis C disease using clinical data.
Lingala, S., & Ghany, M. G. (2015). Natural history of hepatitis C. Gastroenterology Clinics of North America, 44(4), 717. https://doi.org/10.1016/j.gtc.2021.12.002
Probst, P., Wright, M. N., & Boulesteix, A. L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3), e1301. https://doi.org/10.1002/widm.1301
Safdari, R., Deghatipour, A., Gholamzadeh, M., & Maghooli, K. (2022). Applying data mining techniques to classify patients with suspected hepatitis C virus infection. Intelligent Medicine, 2(04), 193-198. https://mednexus.org/doi/full/10.1016/j.imed.2021.12.003
Senbagamalar, K., & Logeswari, K. (2024). Comparative analysis of machine learning models for Hepatitis disease prediction.
Shameer, K., Johnson, K. W., Glicksberg, B. S., Dudley, J. T., & Sengupta, P. P. (2018). Machine learning in cardiovascular medicine: are we there yet?. Heart, 104(14), 1156-1164. https://doi.org/10.1136/heartjnl-2017-311198
World Health Organization. (2024). Global hepatitis report 2024: Action for access in low- and middle-income countries. WHO. https://www.who.int/publications/i/item/9789240091672
Zhang, H. (2004). The optimality of Naïve Bayes. Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004), 562–567. AAAI Press. https://aaai.org/papers/flairs-2004-097/
Zulfiqar, H., Sikandar, Z., Shafique, R., & Ahmad, S. (2024). An intelligent prediction system for Hepatitis C diagnosis using machine learning techniques.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 ARMADA : Jurnal Penelitian Multidisiplin

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





