Comparative Classification of Hepatitis C Disease Using Naïve Bayes and Random Forest with SMOTE-Based Class Balancing

Authors

  • Ida Oktavia Salsavana Program Studi Magister Informatika, Universitas Islam Negri Maulana Malik IbrahimMalang
  • Mokhamad Amin Hariyadi Program Studi Magister Informatika, Universitas Islam Negri Maulana Malik Ibrahim Malang
  • Fresy Nugroho Program Studi Magister Informatika, Universitas Islam Negri Maulana Malik Ibrahim Malang

DOI:

https://doi.org/10.55681/armada.v4i6.2961

Keywords:

Hepatitis C, Ketidakseimbangan Kelas, Naïve Bayes, Random Forest, SMOTE

Abstract

Hepatitis C merupakan penyakit hati yang disebabkan oleh Virus Hepatitis C (HCV) dan dapat berkembang menjadi komplikasi serius seperti sirosis, gagal hati, dan kanker hati. Deteksi dini yang akurat sangat penting untuk meningkatkan keberhasilan pengobatan dan kualitas hidup pasien. Penelitian ini membandingkan kinerja algoritma Naïve Bayes dan Random Forest dalam klasifikasi penyakit Hepatitis C menggunakan data klinis dari UCI Machine Learning Repository yang terdiri atas 615 data pasien. Tahap prapemrosesan meliputi pembersihan data, transformasi data, dan penyeimbangan kelas menggunakan metode Synthetic Minority Over-sampling Technique (SMOTE). Dataset dibagi dengan rasio 80:20 untuk data latih dan data uji. Hasil eksperimen menunjukkan bahwa Random Forest menghasilkan akurasi 99,06%, precision 0,99, recall 0,98, dan F1-score 0,99, jauh melampaui Naïve Bayes yang memperoleh akurasi 84,06%. Analisis feature importance mengidentifikasi AST, ALT, GGT, Usia, dan Albumin sebagai prediktor klinis paling signifikan. Kombinasi algoritma Random Forest dan SMOTE terbukti menjadi pendekatan yang sangat efektif dalam klasifikasi Hepatitis C dan berpotensi besar mendukung pengambilan keputusan klinis yang akurat untuk deteksi dini penyakit.

Downloads

Download data is not yet available.

References

Furizal, F., Ma’arif, A., & Rifaldi, D. (2023). Application of Machine Learning in Healthcare and Medicine: A Review. Journal of Robotics and Control (JRC), 4(5), 621–631. https://doi.org/10.18196/jrc.v4i5.19640

Ahmed, H., Yasin, S., Khan, M. A., & Tariq, U. (2023). Machine learning techniques for liver disease prediction: A systematic review. Healthcare, 11(4), 567–580.

Arslan, A. K., Colak, C., & Sarihan, M. E. (2022). Hepatitis C disease classification using machine learning algorithms. Biomedical Signal Processing and Control, 76, 103675.

Bagur, A., & Pratama, A. (2025). Performance evaluation of machine learning algorithms for healthcare classification.

Blach, S., Kondili, L. A., Aghemo, A., Crespo, J., Feeney, E., Papatheodoridis, G., Puoti, M., Ryder, S., Semela, D., & Razavi, H. (2023). Global change in hepatitis C virus prevalence and cascade of care between 2015 and 2020: A modelling study. Journal of Hepatology, 78(4), 733–745. https://doi.org/10.1016/S2468-1253(21)00472-6

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321–357. https://doi.org/10.1613/jair.953

Damayanti, A., & Testiana, G. (2023). Penerapan Data Mining untuk Prediksi Penyakit Hepatitis C Menggunakan Algoritma Naïve Bayes. Jurnal Manajamen Informatika Jayakarta, 3(2), 177-186. https://doi.org/10.52362/jmijayakarta.v3i2.1098

Fan, Y., Lu, X., & Sun, G. (2023). IHCP: Interpretable hepatitis C prediction system based on black-box machine learning models. BMC Bioinformatics, 24, 333. https://doi.org/10.1186/s12859-023-05456-0

Fernández, A., Garcia, S., Herrera, F., & Chawla, N. V. (2018). SMOTE for learning from imbalanced data: progress and challenges, marking the 15-year anniversary. Journal of artificial intelligence research, 61, 863-905. https://doi.org/10.1613/jair.1.11192

Ghosh, M., Raihan, M. M. S., Raihan, M., Akter, L., Bairagi, A. K., Alshamrani, S. S., & Masud, M. (2021). A Comparative Analysis of Machine Learning Algorithms to Predict Liver Disease. Intelligent Automation & Soft Computing, 30(3). DOI:10.32604/iasc.2021.017989

Haixiang, G., Yijing, L., Shang, J., Mingyun, G., Yuanyue, H., & Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications. Expert systems with applications, 73, 220-239. https://doi.org/10.1016/j.eswa.2016.12.035

Han, J., Pei, J., & Kamber, M. (2022). Data mining: Concepts and techniques (4th ed.). Morgan Kaufmann.

Hendrayana, I. G., Dewi, N. P. D. A. S., Aryasa, J. A. D., Prayoga, I. M. A., & Raharjo, R. A. (2025). The implementation of the Random Forest Algorithm with Resampling and Without Resampling on the Hepatitis C Disease Dataset. Journal of Computer Networks, Architecture and High Performance Computing, 7(3), 614-628. https://doi.org/10.47709/cnahpc.v7i3.6089

Lilhore, U. K., Simaiya, S., Dalal, S., & Ahuja, N. J. (2023). Machine learning-based prediction of Hepatitis C disease using clinical data.

Lingala, S., & Ghany, M. G. (2015). Natural history of hepatitis C. Gastroenterology Clinics of North America, 44(4), 717. https://doi.org/10.1016/j.gtc.2021.12.002

Probst, P., Wright, M. N., & Boulesteix, A. L. (2019). Hyperparameters and tuning strategies for random forest. Wiley Interdisciplinary Reviews: data mining and knowledge discovery, 9(3), e1301. https://doi.org/10.1002/widm.1301

Safdari, R., Deghatipour, A., Gholamzadeh, M., & Maghooli, K. (2022). Applying data mining techniques to classify patients with suspected hepatitis C virus infection. Intelligent Medicine, 2(04), 193-198. https://mednexus.org/doi/full/10.1016/j.imed.2021.12.003

Senbagamalar, K., & Logeswari, K. (2024). Comparative analysis of machine learning models for Hepatitis disease prediction.

Shameer, K., Johnson, K. W., Glicksberg, B. S., Dudley, J. T., & Sengupta, P. P. (2018). Machine learning in cardiovascular medicine: are we there yet?. Heart, 104(14), 1156-1164. https://doi.org/10.1136/heartjnl-2017-311198

World Health Organization. (2024). Global hepatitis report 2024: Action for access in low- and middle-income countries. WHO. https://www.who.int/publications/i/item/9789240091672

Zhang, H. (2004). The optimality of Naïve Bayes. Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS 2004), 562–567. AAAI Press. https://aaai.org/papers/flairs-2004-097/

Zulfiqar, H., Sikandar, Z., Shafique, R., & Ahmad, S. (2024). An intelligent prediction system for Hepatitis C diagnosis using machine learning techniques.

Downloads

Published

2026-06-30

How to Cite

Salsavana, I. O., Hariyadi, M. A., & Nugroho, F. (2026). Comparative Classification of Hepatitis C Disease Using Naïve Bayes and Random Forest with SMOTE-Based Class Balancing. ARMADA : Jurnal Penelitian Multidisiplin, 4(6), 2319–2331. https://doi.org/10.55681/armada.v4i6.2961