Hadith Text Classification Based on Topic Using Convolutional Neural Network (CNN) and TF-IDF

Muhammad Rafi Athallah, Kemas Muslim Lhaksmana

Abstract


Convolutional Neural Networks (CNN) will develop a hadith classification system to categorize texts based on specific topics or categories. This study compares two text representation techniques, namely Term Frequency- Inverse Document Frequency (TF-IDF) and Word2Vec, concerning the application of stemming and without stemming in the process. This study utilizes Category ID 0-5. About 2,845 data have been processed as required for testing. The data was divided into two parts, with a proportion of 80:20 for training and testing. Next, several models were evaluated, namely Word2Vec with stemming, TFIDFCNN without stemming, and TFIDFCNN with stemming. Accuracy, precision, recall, and F1 score metrics were used to assess the performance. The results show that the TFIDFCNN model without stemming performs best with 85% accuracy in topic-based text classification. This is due to the stability and efficiency of the model in processing data.

Keywords


Convolutional Neural Network (CNN), Hadisth, TFIDF, Word2Vec

Full Text:

PDF

References


Abubakar, H. D., & Umar, M. (2022). Sentiment Classification: Review of Text Vectorization Methods: Bag of Words, Tf-Idf, Word2vec and Doc2vec. SLU Journal of Science and Technology, 4(1 & 2), 27–33. https://doi.org/10.56471/slujst.v4i.266

Alsaleh, D., & Larabi-Marie-Sainte, S. (2021). Arabic Text Classification Using Convolutional Neural Network and Genetic Algorithms. IEEE Access, 9, 91670–91685. https://doi.org/10.1109/ACCESS.2021.3091376

Chicco, D., Tötsch, N., & Jurman, G. (2021). The matthews correlation coefficient (Mcc) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Mining, 14, 1–22. https://doi.org/10.1186/s13040-021-00244-z

Dharma, E. M., Lumban Gaol, F., Leslie, H., Warnars, H. S., & Soewito, B. (2022). The Accuracy Comparison Among Word2Vec, Glove, and Fasttext Towards Convolution Neural Network (CNN) Text Classification. Journal of Theoretical and Applied Information Technology, 31(2). www.jatit.org

Grandini, M., Bagli, E., & Visani, G. (2020). Metrics for Multi-Class Classification: an Overview. http://arxiv.org/abs/2008.05756

Hanafi, A., Adiwijaya, A., & Astuti, W. (2020). Klasifikasi Multi Label pada Hadis Bukhari Terjemahan Bahasa Indonesia Menggunakan Mutual Information dan k-Nearest Neighbor. Jurnal Sisfokom (Sistem Informasi Dan Komputer), 9(3), 357–364. https://doi.org/10.32736/sisfokom.v9i3.980

Harish, H. N., Al Faraby, S., & Dwifebri, M. (2021). Klasifikasi Multi Label Pada Hadis Bukhari Terjemahan Bahasa Indonesia menggunakan Random Forest, Mutual Information, dan Chi-Square. 8(5), 10583.

Heydarian, M., Doyle, T. E., & Samavi, R. (2022). MLCM: Multi-Label Confusion Matrix. IEEE Access, 10, 19083–19095. https://doi.org/10.1109/ACCESS.2022.3151048

Irfan Renaldy. (2020). Analisis perbandingan algoritma k-nearest neighbor dengan algoritma support vector machine pada pengklasifikasian hadits shahih muslim studi kasus: hadits shahih muslim pada software ensiklopedi hadits “kitab 9 imam.” Fakultas Sains Dan Teknologi UIN Syarif Hidayatullah Jakarta. https://repository.uinjkt.ac.id/dspace/handle/123456789/55999

Kurniawan, A. A., & Mustikasari, M. (2020). Implementasi Deep Learning Menggunakan Metode CNN dan LSTM untuk Menentukan Berita Palsu dalam Bahasa Indonesia. 5(4), 2622–4615. https://doi.org/10.32493/informatika.v5i4.7760

Nilla, A., & Setiawan, E. B. (2024). Film Recommendation System Using Content-Based Filtering and the Convolutional Neural Network (CNN) Classification Methods. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika, 10(1), 17. https://doi.org/10.26555/jiteki.v9i4.28113

Nisa, E. C., & Kuan, Y. Der. (2021). Comparative assessment to predict and forecast water-cooled chiller power consumption using machine learning and deep learning algorithms. Sustainability (Switzerland), 13(2), 1–18. https://doi.org/10.3390/su13020744

Ramadhanti, W., & Setiawan, E. B. (2023). Topic Detection on Twitter Using Deep Learning Method with Feature Expansion GloVe. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika (JITEKI), 9(3), 780–792. https://doi.org/10.26555/jiteki.v9i3.26736

Rasenda, Rini, Nova, S. (2024). Sistem Informasi Manajemen Pengklasifikasian Hadist Shahih Bukhari dan Muslim Menggunakan Algoritma Neural Network. Sustainability in Action: Transformative Strategies in Management and Accounting.

Rizky Amalia Putri, N., & Terza Damaliana, A. (2024). Sentiment Analysis on Digital Korlantas POLRI Application Reviews Using the Distilbert Model. Journal of Renewable Energy, Electrical, and Computer Engineering, 4(x2), 83–89. https://doi.org/10.29103/jreece.v4i2.17197

Santoso, S. P. J. (2019). Pengelompokan pada Hadits Menggunakan Naive Bayes Classifier. E-Proceeding of Engineeering, 6(2), 9894.

Sihombing, J. J., Arnita, A., Al Idrus, S. I., & Niska, D. Y. (2024). Implementation of text summarization on indonesian scientific articles using textrank algorithm with TF-IDF web-based. Journal of Soft Computing Exploration, 5(3), 310–319. https://doi.org/10.52465/joscex.v5i3.475.

Suryaningrum, K. M. (2023). Comparison of the TF-IDF Method with the Count Vectorizer to Classify Hate Speech. Engineering, MAthematics and Computer Science (EMACS) Journal, 5(2), 79–83. https://doi.org/10.21512/emacsjournal.v5i2.9978.

Umi Hani. (2022). Buku Ajar Pengantar Studi Islam. In Universitas Islam Kalimantan Muhammad Arsyad Al Banjary.

Xiao, L., Li, Q., Ma, Q., Shen, J., Yang, Y., & Li, D. (2024). Text classification algorithm of tourist attractions subcategories with modified TF-IDF and Word2Vec. PloS One, 19(10), e0305095. https://doi.org/10.1371/journal.pone.0305095.




DOI: https://doi.org/10.29103/jreece.v5i1.20354

Article Metrics

 Abstract Views : 3 times
 PDF Downloaded : 3 times

Refbacks

  • There are currently no refbacks.


Copyright (c) 2025 Muhammad Rafi Athallah, Kemas Muslim Lhaksmana

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Lunasin Utang Modal Maxwin di Mpo8821

situs togel situs toto macau 4d situs toto

nextogel

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

NEXTOGEL

JEPETOGEL

JEPETOGEL

JEPETOGEL

JEPETOGEL

KIM369

KIM369

KIM369

KIM369

KIM369

KIM369

KIM369

KIM369

KIM369

KIM369

situs togel online
togel slot slot toto togel

slot online

slot gacor

bandar togel