Document Type : Original Article
Authors
- . Mohammad R. Afrash 1
- . Azadeh Bayani 1
- . Mostafa Shanbehzadeh 2
- . Mohammadkarim Bahadori 3
- . Hadi Kazemi‑Arpanahi 4
1 Department of Health Information Technology and Management, School of Allied Medical Sciences, Shahid Beheshti University of Medical Sciences, Tehran, Iran
2 Department of Health Information Technology, School of Paramedical, Ilam University of Medical Sciences, Ilam, Iran
3 Health Management Research Center, Baqiyatallah University of Medical Sciences, Tehran, Iran
4 Department of Health Information Management and Technology, Abadan University of Medical Sciences, Abadan, Iran
Abstract
BACKGROUND: Breast cancer (BC) is the most common cause of cancer‑related deaths in women
globally. Currently, many machine learning (ML)‑based predictive models have been established to
assist clinicians in decision making for the prediction of BC. However, preventing risk factor formation
even with having healthy lifestyle behaviors or preventing disease at early stages can significantly
lead to optimal population‑wide BC health. Thus, we aimed to develop a prediction model by using a
genetic algorithm (GA) incorporating several ML algorithms for the prediction and early warning of BC.
MATERIAL AND METHODS: The data of 3168 healthy individuals and 1742 patient case records
in the BC Registry Database in Ayatollah Taleghani hospital, Abadan, Iran were analyzed. First, a
modified hybrid GA was used to perform feature selection and optimization of selected features.
Then, with the use of selected features, several ML algorithms were trained to predict BC. Afterward,
the performance of each model was measured in terms of accuracy, precision, sensitivity, specificity,
and receiver operating characteristic (ROC) curve metrics. Finally, a clinical decision support system
based on the best model was developed.
RESULTS: After performing feature selection, age, consumption of dairy products, BC family history,
breast biopsy, chest X‑ray, hormone therapy, alcohol consumption, being overweight, having children,
and education statuses were selected as the most important features for prediction of BC. The
experimental results showed that the decision tree yielded a superior performance than other ML
models, with values of 99.3%, 99.5%, 98.26% for accuracy, specificity, and sensitivity, respectively.
CONCLUSION: The developed predictive system can accurately identify persons who are at elevated
risk for BC and can be used as an essential clinical screening tool for the early prevention of BC and
serve as an important tool for developing preventive health strategies.
Keywords
- World Health Organization. Cancer. 2018 Available fr
om: https://www.who.int/news‑room/fact
‑sheets/detail/cancer.
2. Dhahri H, Al Maghayreh E, Mahmood A, Elkilani W,
Faisal Nagi M. Automated breast cancer diagnosis based on
machine learning algorithms. J Healthc Eng 2019;2019. doi:
10.1155/2019/4253641.
3. Namini S, Elahi SA, Seirafi MR, Sabet M, Azadeh P. Predicting
post‑traumatic growth inventory (PTGI) based on the perceived
social support; the mediating role of resilience in women with
breast cancer: A structural equation modeling approach. Iran J
Health Educ Health Promot 2021;9:172‑86.
4. Salod Z, Singh Y. Comparison of the performance of machine
learning algorithms in breast cancer screening and detection:
A protocol. J Public Health Res 2019;8. doi: 10.4081/jphr.
2019.1677.
5. Key TJ, Verkasalo PK, Banks E. Epidemiology of breast cancer.
Lancet Oncol 2001;2:133‑40.
6. Cheraghi Z, Poorolajal J, Hashem T, Esmailnasab N, Irani AD.
Effect of body mass index on breast cancer during premenopausal
and postmenopausal periods: A meta‑analysis. PLoS One
2012;7:e51446. doi: 10.1371/journal.pone. 0051446.
7. Colditz GA, Willett WC, Hunter DJ, Stampfer MJ, Manson JE,
Hennekens CH, et al. Family history, age, and risk of breast
cancer: Prospective data from the Nurses’ Health Study. JAMA
1993;270:338‑43.
8. Farvid MS, Eliassen AH, Cho E, Liao X, Chen WY, Willett WC.
Dietary fiber intake in young adults and breast cancer risk.
Pediatrics 2016;137:e20151226.
9. Kotepui M. Diet and risk of breast cancer. Contemp Oncol
2016;20:13‑9.
10. Wolf I, Sadetzki S, Catane R, Karasik A, Kaufman B. Diabetes
mellitus and breast cancer. Lancet Oncol 2005;6:103‑11.
11. Yancik R, Wesley MN, Ries LA, Havlik RJ, Edwards BK, Yates JW.
Effect of age and comorbidity in postmenopausal breast cancer
patients aged 55 years and older. JAMA 2001;285:885‑92.
12. Park Y‑MM, O’Brien KM, Zhao S, Weinberg CR, Baird DD,
Sandler DP. Gestational diabetes mellitus may be associated with
increased risk of breast cancer. Br J Cancer 2017;116:960‑3.
13. Tehard B, Clavel‑Chapelon F. Several anthropometric
measurements and breast cancer risk: Results of the E3N cohort
study. Int J Obes 2006;30:156‑63.
14. Tian Y‑F, Chu C‑H, Wu M‑H, Chang C‑L, Yang T, Chou Y‑C,
et al. Anthropometric measures, plasma adiponectin, and breast
cancer risk. Endocr Related Cancer 2007;14:669‑77.
15. Barlow WE, White E, Ballard‑Barbash R, Vacek PM,
Titus‑Ernstoff L, Carney PA, et al. Prospective breast cancer
risk prediction model for women undergoing screening
mammography. J Natl Cancer Inst 2006;98:1204‑14.
16. Concato J, Feinstein AR, Holford TR. The risk of determining risk
with multivariable models. Ann Intern Med 1993;118:201‑10.
17. Chaurasia V, Pal S. Data mining techniques: To predict and resolve
breast cancer survivability. International Journal of Computer
Science and Mobile Computing IJCSMC 2014;3:10-22.
18. LokeshkumarR, Mishra OA, Kalra S. Social media data analysis to
predict mental state of users using machine learning techniques.
J Educ Health Promot 2021;10:301.
19. Amirhajlou L, Sohrabi Z, Alebouyeh MR, Tavakoli N,
Haghighi RZ, Hashemi A, et al. Application of data mining
techniques for predicting residents’ performance on pre‑board
examinations: A case study. J Educ Health Promot 2019;8.
20. Boeri C, Chiappa C, Galli F, De Berardinis V, Bardelli L, Carcano G,
et al. Machine Learning techniques in breast cancer prognosis
prediction: A primary evaluation. Cancer Med 2020;9:3234‑43.
21. MarianiMC, TweneboahOK, BhuiyanMAM. Supervised machine
learning models applied to disease diagnosis and prognosis. AIMS
Public Health 2019;6:405.
22. Valvano G, Santini G, Martini N, Ripoli A, Iacconi C, Chiappino D,
et al. Convolutional neural networks for the segmentation of
microcalcification in mammography imaging. J Healthc Eng
2019;2019:9360941. doi: 10.1155/2019/9360941.
23. Sarvestani AS, Safavi A, Parandeh N, Salehi M. Predicting
breast cancer survivability using data mining techniques.
2010 2nd International Conference on Software Technology and
Engineering. IEEE, 2010. p. V2‑227‑V2‑231.
24. Chaurasia V, Pal S, Tiwari B. Prediction of benign and malignant
breast cancer using data mining techniques. J Algorithm Comput
Technol 2018;12:119‑26.
25. Akay MF. Support vector machines combined with feature
selection for breast cancer diagnosis. Expert Syst Appl
2009;36:3240‑7.
26. Cruz JA, Wishart DS. Applications of machine learning
in cancer prediction and prognosis. Cancer Inform
2006;2:117693510600200030. doi: 10.1177/117693510600200030.
27. Liu H, Yu L. Toward integrating feature selection algorithms
for classification and clustering. IEEE Trans knowl Data Eng
2005;17:491‑502.
28. Medjahed SA, Saadi TA, Benyettou A. Breast cancer diagnosis
by using k‑nearest neighbor with different distances and
classification rules. Int J Comput Appl 2013;62.
29. Odajima K, Pawlovsky AP. A detailed description of the use of
the kNN method for breast cancer diagnosis. 2014 7th International
Conference on Biomedical Engineering and Informatics. IEEE;
2014. p. 688‑692.
30. Ting F, Sim K. Self‑regulated multilayer perceptron neural
network for breast cancer classification. 2017 International
Conference on Robotics, Automation and Sciences (ICORAS).
IEEE; 2017. p. 1‑5.
31. Jouni H, Issa M, Harb A, Jacquemod G, Leduc Y. Neural Network
architecture for breast cancer detection and classification. 2016
IEEE International Multidisciplinary Conference on Engineering
Technology (IMCET). IEEE; 2016. p. 37‑41.
32. Afrash MR, Khalili M, Salekde MS. A comparison of data mining
methods for diagnosis and prognosis of heart disease. Int J Adv
Intell Paradig 2020;16:88‑97.
33. Sumbaly R, Vishnusri N, Jeyalatha S. Diagnosis of breast cancer
using decision tree data mining technique. Int J Comput Appl
2014;98.
34. Naghibi S, Teshnehlab M, Shoorehdeli MA. Breast cancer
classification based on advanced multi dimensional fuzzy neural
network. J Med Syst 2012;36:2713‑20.
35. Azar AT, El‑Said SA. Probabilistic neural network for breast
cancer classification. Neural Comput Appl 2013;23:1737‑51.
36. Engelbrecht AP. Computational Intelligence: An Introduction.
Hoboken, New Jersey: John Wiley & Sons; 2007. - 37. Umbarkar DA, Sheth P. Crossover operators in genetic algorithms:
A review. ICTACT J Soft Comput 20156;6. doi: 10.21917/ijsc.
2015.0150.
38. Lloyd‑Jones DM, Hong Y, Labarthe D, Mozaffarian D, Appel LJ,
Van Horn L, et al. Defining and setting national goals for
cardiovascular health promotion and disease reduction: The
American Heart Association’s strategic impact goal through 2020
and beyond. Circulation 2010;121:586‑613.
39. Williams K, Idowu PA, Balogun JA, Oluwaranti AI. Breast cancer
risk prediction using data mining classification techniques. Tran
Networks Commun 2015;3:1.
40. Higa A. Diagnosis of breast cancer using decision tree and artificial
neural network algorithms. Cell 2018;1 (7):23‑27.
41. Jebarani PE, Umadevi N, Dang H, Pomplun M. A novel hybrid
K‑means and GMM machine learning model for breast cancer
detection. IEEE Access 2021;9:146153‑62.
42. Solanki YS, Chakrabarti P, Jasinski M, Leonowicz Z, Bolshev V,
Vinogradov A, et al. A hybrid supervised machine learning
classifier system for breast cancer prognosis using feature
selection and data imbalance handling approaches. Electronics
2021;10:699.
43. Antonie ML, Zaiane OR, Coman A. Application of data mining
techniques for medical image classification. In Proceedings of the
Second International Conference on Multimedia Data Mining
2001. p. 94-101.
44. Sinthia P, Malathi M. An effective two way classification of
breast cancer images: A detailed review. Asian Pac J Cancer Prev
2018;19:3335‑9.
45. Muthuselvan S, Sundaram KS. Prediction of breast cancer using
classification rule mining techniques in blood test datasets. 2016
International Conference on Information Communication and
Embedded Systems (ICICES). IEEE; 2016.