Document Type : Original Article


1 Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran

2 Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran

3 Health Management Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran

4 Department of Health Information Management and Technology, Abadan University of Medical Sciences, Abadan, Iran


BACKGROUND: Breast cancer (BC) is the most common cause of cancer‑related deaths in women
globally. Currently, many machine learning (ML)‑based predictive models have been established to
assist clinicians in decision making for the prediction of BC. However, preventing risk factor formation
even with having healthy lifestyle behaviors or preventing disease at early stages can significantly
lead to optimal population‑wide BC health. Thus, we aimed to develop a prediction model by using a
genetic algorithm (GA) incorporating several ML algorithms for the prediction and early warning of BC.
MATERIAL AND METHODS: The data of 3168 healthy individuals and 1742 patient case records
in the BC Registry Database in Ayatollah Taleghani hospital, Abadan, Iran were analyzed. First, a
modified hybrid GA was used to perform feature selection and optimization of selected features.
Then, with the use of selected features, several ML algorithms were trained to predict BC. Afterward,
the performance of each model was measured in terms of accuracy, precision, sensitivity, specificity,
and receiver operating characteristic (ROC) curve metrics. Finally, a clinical decision support system
based on the best model was developed.
RESULTS: After performing feature selection, age, consumption of dairy products, BC family history,
breast biopsy, chest X‑ray, hormone therapy, alcohol consumption, being overweight, having children,
and education statuses were selected as the most important features for prediction of BC. The
experimental results showed that the decision tree yielded a superior performance than other ML
models, with values of 99.3%, 99.5%, 98.26% for accuracy, specificity, and sensitivity, respectively.
CONCLUSION: The developed predictive system can accurately identify persons who are at elevated
risk for BC and can be used as an essential clinical screening tool for the early prevention of BC and
serve as an important tool for developing preventive health strategies.


  1. World Health Organization. Cancer. 2018 Available fr
    2. Dhahri H, Al Maghayreh E, Mahmood A, Elkilani W,
    Faisal Nagi M. Automated breast cancer diagnosis based on
    machine learning algorithms. J Healthc Eng 2019;2019. doi:
    3. Namini S, Elahi SA, Seirafi MR, Sabet M, Azadeh P. Predicting
    post‑traumatic growth inventory (PTGI) based on the perceived
    social support; the mediating role of resilience in women with
    breast cancer: A structural equation modeling approach. Iran J
    Health Educ Health Promot 2021;9:172‑86.
    4. Salod Z, Singh Y. Comparison of the performance of machine
    learning algorithms in breast cancer screening and detection:
    A protocol. J Public Health Res 2019;8. doi: 10.4081/jphr.
    5. Key TJ, Verkasalo PK, Banks E. Epidemiology of breast cancer.
    Lancet Oncol 2001;2:133‑40.
    6. Cheraghi Z, Poorolajal J, Hashem T, Esmailnasab N, Irani AD.
    Effect of body mass index on breast cancer during premenopausal
    and postmenopausal periods: A meta‑analysis. PLoS One
    2012;7:e51446. doi: 10.1371/journal.pone. 0051446.
    7. Colditz GA, Willett WC, Hunter DJ, Stampfer MJ, Manson JE,
    Hennekens CH, et al. Family history, age, and risk of breast
    cancer: Prospective data from the Nurses’ Health Study. JAMA
    8. Farvid MS, Eliassen AH, Cho E, Liao X, Chen WY, Willett WC.
    Dietary fiber intake in young adults and breast cancer risk.
    Pediatrics 2016;137:e20151226.
    9. Kotepui M. Diet and risk of breast cancer. Contemp Oncol
    10. Wolf I, Sadetzki S, Catane R, Karasik A, Kaufman B. Diabetes
    mellitus and breast cancer. Lancet Oncol 2005;6:103‑11.
    11. Yancik R, Wesley MN, Ries LA, Havlik RJ, Edwards BK, Yates JW.
    Effect of age and comorbidity in postmenopausal breast cancer
    patients aged 55 years and older. JAMA 2001;285:885‑92.
    12. Park Y‑MM, O’Brien KM, Zhao S, Weinberg CR, Baird DD,
    Sandler DP. Gestational diabetes mellitus may be associated with
    increased risk of breast cancer. Br J Cancer 2017;116:960‑3.
    13. Tehard B, Clavel‑Chapelon F. Several anthropometric
    measurements and breast cancer risk: Results of the E3N cohort
    study. Int J Obes 2006;30:156‑63.
    14. Tian Y‑F, Chu C‑H, Wu M‑H, Chang C‑L, Yang T, Chou Y‑C,
    et al. Anthropometric measures, plasma adiponectin, and breast
    cancer risk. Endocr Related Cancer 2007;14:669‑77.
    15. Barlow WE, White E, Ballard‑Barbash R, Vacek PM,
    Titus‑Ernstoff L, Carney PA, et al. Prospective breast cancer
    risk prediction model for women undergoing screening
    mammography. J Natl Cancer Inst 2006;98:1204‑14.
    16. Concato J, Feinstein AR, Holford TR. The risk of determining risk
    with multivariable models. Ann Intern Med 1993;118:201‑10.
    17. Chaurasia V, Pal S. Data mining techniques: To predict and resolve
    breast cancer survivability. International Journal of Computer
    Science and Mobile Computing IJCSMC 2014;3:10-22.
    18. LokeshkumarR, Mishra OA, Kalra S. Social media data analysis to
    predict mental state of users using machine learning techniques.
    J Educ Health Promot 2021;10:301.
    19. Amirhajlou L, Sohrabi Z, Alebouyeh MR, Tavakoli N,
    Haghighi RZ, Hashemi A, et al. Application of data mining
    techniques for predicting residents’ performance on pre‑board
    examinations: A case study. J Educ Health Promot 2019;8.
    20. Boeri C, Chiappa C, Galli F, De Berardinis V, Bardelli L, Carcano G,
    et al. Machine Learning techniques in breast cancer prognosis
    prediction: A primary evaluation. Cancer Med 2020;9:3234‑43.
    21. MarianiMC, TweneboahOK, BhuiyanMAM. Supervised machine
    learning models applied to disease diagnosis and prognosis. AIMS
    Public Health 2019;6:405.
    22. Valvano G, Santini G, Martini N, Ripoli A, Iacconi C, Chiappino D,
    et al. Convolutional neural networks for the segmentation of
    microcalcification in mammography imaging. J Healthc Eng
    2019;2019:9360941. doi: 10.1155/2019/9360941.
    23. Sarvestani AS, Safavi A, Parandeh N, Salehi M. Predicting
    breast cancer survivability using data mining techniques.
    2010 2nd International Conference on Software Technology and
    Engineering. IEEE, 2010. p. V2‑227‑V2‑231.
    24. Chaurasia V, Pal S, Tiwari B. Prediction of benign and malignant
    breast cancer using data mining techniques. J Algorithm Comput
    Technol 2018;12:119‑26.
    25. Akay MF. Support vector machines combined with feature
    selection for breast cancer diagnosis. Expert Syst Appl
    26. Cruz JA, Wishart DS. Applications of machine learning
    in cancer prediction and prognosis. Cancer Inform
    2006;2:117693510600200030. doi: 10.1177/117693510600200030.
    27. Liu H, Yu L. Toward integrating feature selection algorithms
    for classification and clustering. IEEE Trans knowl Data Eng
    28. Medjahed SA, Saadi TA, Benyettou A. Breast cancer diagnosis
    by using k‑nearest neighbor with different distances and
    classification rules. Int J Comput Appl 2013;62.
    29. Odajima K, Pawlovsky AP. A detailed description of the use of
    the kNN method for breast cancer diagnosis. 2014 7th International
    Conference on Biomedical Engineering and Informatics. IEEE;
    2014. p. 688‑692.
    30. Ting F, Sim K. Self‑regulated multilayer perceptron neural
    network for breast cancer classification. 2017 International
    Conference on Robotics, Automation and Sciences (ICORAS).
    IEEE; 2017. p. 1‑5.
    31. Jouni H, Issa M, Harb A, Jacquemod G, Leduc Y. Neural Network
    architecture for breast cancer detection and classification. 2016
    IEEE International Multidisciplinary Conference on Engineering
    Technology (IMCET). IEEE; 2016. p. 37‑41.
    32. Afrash MR, Khalili M, Salekde MS. A comparison of data mining
    methods for diagnosis and prognosis of heart disease. Int J Adv
    Intell Paradig 2020;16:88‑97.
    33. Sumbaly R, Vishnusri N, Jeyalatha S. Diagnosis of breast cancer
    using decision tree data mining technique. Int J Comput Appl
    34. Naghibi S, Teshnehlab M, Shoorehdeli MA. Breast cancer
    classification based on advanced multi dimensional fuzzy neural
    network. J Med Syst 2012;36:2713‑20.
    35. Azar AT, El‑Said SA. Probabilistic neural network for breast
    cancer classification. Neural Comput Appl 2013;23:1737‑51.
    36. Engelbrecht AP. Computational Intelligence: An Introduction.
    Hoboken, New Jersey: John Wiley & Sons; 2007.
  2. 37. Umbarkar DA, Sheth P. Crossover operators in genetic algorithms:
    A review. ICTACT J Soft Comput 20156;6. doi: 10.21917/ijsc.
    38. Lloyd‑Jones DM, Hong Y, Labarthe D, Mozaffarian D, Appel LJ,
    Van Horn L, et al. Defining and setting national goals for
    cardiovascular health promotion and disease reduction: The
    American Heart Association’s strategic impact goal through 2020
    and beyond. Circulation 2010;121:586‑613.
    39. Williams K, Idowu PA, Balogun JA, Oluwaranti AI. Breast cancer
    risk prediction using data mining classification techniques. Tran
    Networks Commun 2015;3:1.
    40. Higa A. Diagnosis of breast cancer using decision tree and artificial
    neural network algorithms. Cell 2018;1 (7):23‑27.
    41. Jebarani PE, Umadevi N, Dang H, Pomplun M. A novel hybrid
    K‑means and GMM machine learning model for breast cancer
    detection. IEEE Access 2021;9:146153‑62.
    42. Solanki YS, Chakrabarti P, Jasinski M, Leonowicz Z, Bolshev V,
    Vinogradov A, et al. A hybrid supervised machine learning
    classifier system for breast cancer prognosis using feature
    selection and data imbalance handling approaches. Electronics
    43. Antonie ML, Zaiane OR, Coman A. Application of data mining
    techniques for medical image classification. In Proceedings of the
    Second International Conference on Multimedia Data Mining
    2001. p. 94-101.
    44. Sinthia P, Malathi M. An effective two way classification of
    breast cancer images: A detailed review. Asian Pac J Cancer Prev
    45. Muthuselvan S, Sundaram KS. Prediction of breast cancer using
    classification rule mining techniques in blood test datasets. 2016
    International Conference on Information Communication and
    Embedded Systems (ICICES). IEEE; 2016.