Cervical Cancer Prediction Using SMOTE Algorithm and Machine Learning Approaches

Cervical Cancer Prediction Using SMOTE Algorithm and Machine Learning Approaches

Cervical cancer is one of the most successful types of treatment when diagnosed early. In this study, it is aimed to find and classify the disease with data mining methods on the digitized data set obtained as a result of the pap-smear test. Two-stage architecture has been proposed for the diagnosis of cervical cancer. In the first stage of the study, missing data were extracted from the used dataset, and in the second stage, a new dataset was obtained by using the Synthetic Minority Oversampling Technique (SMOTE) algorithm to balance the target classes in the dataset. By applying the majority voting (MV) method to the dataset used in the study, the structure with 4 target variables was reduced to a single target variable. On two data sets, Artificial Neural Network (ANN), Support Vector Machines (SVM), Decision Trees (DT), Random Forest (RF), and K-Nearest Neighbors (KNN) algorithms from data mining methods were used for the diagnosis of cervical cancer. The results obtained from the original dataset and the dataset produced with Smote were compared. ANN is the best method evaluated according to classification success and F-score, and the major voted target variable in the balanced data group produced with the Smote algorithm gave the most successful result. The experimental results showed that the use of MV and SMOTE algorithms together increased the classification success from 93% to 99%.

___

  • Abdullah, A. A., Sabri, N. A., Khairunizam, W., Zunaidi, I., Razlan, Z. M., & Shahriman, A. B. (2019). Development of predictive models for cervical cancer based on gene expression profiling data. In IOP Conference Series: Materials Science and Engineering (Vol. 557, p. 012003). IOP Publishing.
  • Adem, K., Kiliçarslan, S., & Cömert, O. (2019). Classification and diagnosis of cervical cancer with stacked autoencoder and softmax classification. Expert Systems with Applications, 115, 557–564. https://doi.org/10.1016/j.eswa.2018.08.050
  • Akyol, F. B., & Altun, O. (2020). Detection of cervix cancer from pap-smear images. Sakarya University Journal of Computer and Information Sciences, 3(2), 99–111.
  • Al Mudawi, N., & Alazeb, A. (2022). A Model for Predicting Cervical Cancer Using Machine Learning Algorithms. Sensors, 22(11), 4132.
  • Alam, T. M., Khan, M. M. A., Iqbal, M. A., Abdul, W., & Mushtaq, M. (2019, October 23). Cervical Cancer Prediction through Different Screening Methods Using Data Mining. SSRN Scholarly Paper, Rochester, NY. Retrieved from https://papers.ssrn.com/abstract=3474371
  • Ali, M. M., Ahmed, K., Bui, F. M., Paul, B. K., Ibrahim, S. M., Quinn, J. M. W., & Moni, M. A. (2021). Machine learning-based statistical analysis for early stage detection of cervical cancer. Computers in Biology and Medicine, 139, 104985. https://doi.org/10.1016/j.compbiomed.2021.104985
  • Allehaibi, K. H. S., Nugroho, L. E., Lazuardi, L., Prabuwono, A. S., & Mantoro, T. (2019). Segmentation and classification of cervical cells using deep learning. IEEE Access, 7, 116925–116941.
  • CH, N., Sai, P. P., Madhuri, G., Reddy, K. S., & BharathSimha Reddy, D. V. (2022). Artificial Intelligence based Cervical Cancer Risk Prediction Using M1 Algorithms. In 2022 International Conference on Emerging Smart Computing and Informatics (ESCI) (pp. 1–6). Presented at the 2022 International Conference on Emerging Smart Computing and Informatics (ESCI). https://doi.org/10.1109/ESCI53509.2022.9758241
  • Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321–357.
  • Chen, W., Shen, W., Gao, L., & Li, X. (2022). Hybrid Loss-Constrained Lightweight Convolutional Neural Networks for Cervical Cell Classification. Sensors, 22(9), 3272. https://doi.org/10.3390/s22093272
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297. Dudani, S. A. (1976). The distance-weighted k-nearest-neighbor rule. IEEE Transactions on Systems, Man, and Cybernetics, (4), 325–327.
  • Elen, A., Baş, S., & Közkurt, C. (2022). An Adaptive Gaussian Kernel for Support Vector Machine. Arabian Journal for Science and Engineering. https://doi.org/10.1007/s13369-022-06654-3
  • Eyüpoğlu, C. (2020). Korelasyon Temelli Özellik Seçimi, Genetik Arama ve Rastgele Ormanlar Tekniklerine Dayanan Yeni Bir Rahim Ağzı Kanseri Teşhis Yöntemi. Avrupa Bilim ve Teknoloji Dergisi, (19), 263–271.
  • Fernandes, K., Cardoso, J. S., & Fernandes, J. (2017). Transfer learning with partial observability applied to cervical cancer screening. In Iberian conference on pattern recognition and image analysis (pp. 243–250). Springer.
  • Gan, D., Shen, J., An, B., Xu, M., & Liu, N. (2020). Integrating TANBN with cost sensitive classification algorithm for imbalanced data in medical diagnosis. Computers & Industrial Engineering, 140, 106266.
  • Güre, M. D. P., Karataş, M., & Başcıllar, M. (2022). “HPV Aşısı Haktır”: Halk Sağlığı Sosyal Hizmeti Perspektifinden HPV İle İlgili Tweetlerin Analizi. Toplum ve Sosyal Hizmet, 33(3), 955–973.
  • He, H., & Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263–1284.
  • Hu, F., & Li, H. (2013). A novel boundary oversampling algorithm based on neighborhood rough set model: NRSBoundary-SMOTE. Mathematical Problems in Engineering, 2013.
  • Ilyas, Q. M., & Ahmad, M. (2021). An enhanced ensemble diagnosis of cervical cancer: a pursuit of machine intelligence towards sustainable health. IEEE Access, 9, 12374–12388.
  • Islam, A.-U.-, Ripon, S. H., & Bhuiyan, N. Q. (2019). Cervical Cancer Risk Factors: Classification and Mining Associations. APTIKOM Journal on Computer Science and Information Technologies, 4(1), 8–18.
  • Karani, H., Gangurde, A., Dhumal, G., Gautam, W., Hiran, S., & Marathe, A. (2022). Comparison of Performance of Machine Learning Algorithms for Cervical Cancer Classification. In 2022 Second International Conference on Advances in Electrical, Computing, Communication and Sustainable Technologies (ICAECT) (pp. 1–7). IEEE.
  • Kartal, E., & Özen, Z. (2017). Dengesiz veri setlerinde sınıflandırma. Mühendislikte Yapay Zekâ ve Uygulamaları, 1st ed., O. Torkul, S. Gülseçen, Y. Uyaroğlu, G. Çağıl, and MK Uçar, Eds. Sakarya: Sakarya Üniversitesi Kütüphanesi Yayınevi, 109, 131.
  • Khanam, F. (2021). Prediction of cervical cancer in Bangladesh using hybrid machine learning algorithms. Retrieved from http://lib.buet.ac.bd:8080/xmlui/handle/123456789/6030
  • Kotsiantis, S. B. (2013). Decision trees: a recent overview. Artificial Intelligence Review, 39(4), 261–283.
  • Lam, L., & Suen, S. Y. (1997). Application of majority voting to pattern recognition: an analysis of its behavior and performance. IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, 27(5), 553–568.
  • Liu, X.-Y., Wu, J., & Zhou, Z.-H. (2008). Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39(2), 539–550.
  • Mitchell, T., Buchanan, B., DeJong, G., Dietterich, T., Rosenbloom, P., & Waibel, A. (1990). Machine learning. Annual review of computer science, 4(1), 417–433.
  • Nithya, B., & Ilango, V. (2019). Evaluation of machine learning based optimized feature selection approaches and classification methods for cervical cancer prediction. SN Applied Sciences, 1(6), 1–16.
  • Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … Dubourg, V. (2011). “ Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, p.
  • Ratul, I. J., Al-Monsur, A., Tabassum, B., Ar-Rafi, A. M., Nishat, M. M., & Faisal, F. (2022). Early risk prediction of cervical cancer: A machine learning approach. In 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON) (pp. 1–4). Presented at the 2022 19th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology (ECTI-CON). https://doi.org/10.1109/ECTI-CON54298.2022.9795429
  • Sharma, K. K., & Seal, A. (2021). Multi-view spectral clustering for uncertain objects. Information Sciences, 547, 723–745.
  • Sharma, K. K., & Seal, A. (2021). Outlier-robust multi-view clustering for uncertain data. Knowledge-Based Systems, 211, 106567
  • Suman, S. K., & Hooda, N. (2019). Predicting risk of Cervical Cancer: A case study of machine learning. Journal of Statistics and Management Systems, 22(4), 689–696.
  • Tanimu, J. J., Hamada, M., Hassan, M., Kakudi, H., & Abiodun, J. O. (2022). A Machine Learning Method for Classification of Cervical Cancer. Electronics, 11(3), 463. https://doi.org/10.3390/electronics11030463
  • Yang, W., Gou, X., Xu, T., Yi, X., & Jiang, M. (2019). Cervical cancer risk prediction model and analysis of risk factors based on machine learning. In Proceedings of the 2019 11th International Conference on Bioinformatics and Biomedical Technology (pp. 50–54).
  • Zhang, L., Zhu, Y., Song, Y., Han, Y., Sun, D., Qin, S., & Gao, Y. (2021). Intelligent Diagnosis of Cervical Cancer Based on Data Mining Algorithm. Computational and Mathematical Methods in Medicine, 2021.
Iğdır Üniversitesi Fen Bilimleri Enstitüsü Dergisi-Cover
  • ISSN: 2146-0574
  • Yayın Aralığı: 4
  • Başlangıç: 2011
  • Yayıncı: -
Sayıdaki Diğer Makaleler

Anticancer Effects of Alpha-lipoic Acid on A172 and U373 Human Glioblastoma Cells

Doğukan MUTLU, Mücahit SEÇME, Şevki ARSLAN

Çekişmeli Üretici Ağlar Kullanılarak Hasarlı Mozaik Görüntülerinin Tamamlanması

Mehmet Kıvılcım KELEŞ, Erdal GÜVENOĞLU

Low Power Electronic Voltage Transformer Design and Construction

Yusuf KILINÇ, Ali Fuat BOZ

Türk İşaret Dilinin Sınıflandırılması için Derin Öğrenme Yaklaşımları

Ishak PACAL, Melek ALAFTEKİN

Bazı Bitkisel Çözelti Kombinasyon Uygulamalarının Soğuk Stresi altındaki Beyaz Lahanaların Büyümesine (Brassica oleracea var. Alba) Etkisi

Mehmet Selim ÇOBANOĞLU

1.5 GHz – 3.1 GHz Arası Bantlarda Verimli Çalışan Geniş Bant RF Enerji Hasatlama Devresinin Toplu Elemanlarla Tasarımı

Sadik ZUHUR, Muhammed Said BOYBAY

Chloroform-Methanol Extraction Antimicrobial Potential of Rheum Ribes Originating from Elazig/Aricak Province

Oğuzhan ÖZDEMİR, Mustafa Oğuzhan KAYA, Mesut GOK, Nurten YILMAZ, Zeynep TUZCU

Onobrychis argyrea subsp. argyrea Ekstrelerinin Antioksidan, Antimikrobiyal ve Antiproliferatif Aktivitelerinin Belirlenmesi

Sevgi ALTIN, Cemalettin ALP, Ekrem KÖKSAL, Sümeyye AKYÜZ

Bazı Yapay Gözyaşı Damlalarının İnsan Karbonik Anhidraz Enzimi-II (hCA-II) Üzerindeki İnhibitör Etkileri

Büşra ÇALIŞKAN, Mine AKSOY

‘Chandler’ Ceviz (Juglans regia L.) Çeşidinin Fenolojik ve Pomolojik Özellikleri Üzerine Farklı Çöğür Anaçlarının Etkisi

Mevlüt Batuhan KOŞAR, Dilan AHI KOŞAR, Eküle SÖNMEZ, Umran ERTÜRK