Document Type : Original Article


1 Department of Anesthesiology and Pain Medicine, Iran University of Medical Sciences, Tehran, Iran

2 Department of Emergency Medicine, Iran University of Medical Sciences, Tehran, Iran

3 Deputy of Specialty and Subspecialty Education and

4 Department of Medical Ethics, Iran University of Medical Sciences, Tehran, Iran

5 Health Laboratories Administration, Birjand University of Medical Sciences, Birjand, Iran


CONTEXT: Predicting residents’ academic performance is critical for medical educational institutions
to plan strategies for improving their achievement.
AIMS: This study aimed to predict the performance of residents on preboard examinations based
on the results of in‑training examinations (ITE) using various educational data mining (DM)
SETTINGS AND DESIGN: This research was a descriptive cross‑sectional pilot study conducted at
Iran University of Medical Sciences, Iran.
PARTICIPANTS AND METHODS: A sample of 841 residents in six specialties participating in the
ITEs between 2004 and 2014 was selected through convenience sampling. Data were collected
from the residency training database using a researcher‑made checklist.
STATISTICAL ANALYSIS: The analysis of variance was performed to compare mean scores
between specialties, and multiple‑regression was conducted to examine the relationship between
the independent variables (ITEs scores in postgraduate 1st year [PGY1] to PG 3rd year [PGY3],
sex, and type of specialty training) and the dependent variable (scores of postgraduate 4th year
called preboard). Next, three DM algorithms, including multi‑layer perceptron artificial neural
network (MLP‑ANN), support vector machine, and linear regression were utilized to build the
prediction models of preboard examination scores. The performance of models was analyzed
based on the root mean square error (RMSE) and mean absolute error (MAE). In the final step,
the MLP‑ANN was employed to find the association rules. Data analysis was performed in SPSS
22 and RapidMiner 7.1.001.
RESULTS: The ITE scores on the PGY‑2 and PGY‑3 and the type of specialty training were the
predictors of scores on the preboard examination (R2 = 0.129, P < 0.01). The algorithm with the
overall best results in terms of measuring error values was MLP‑ANN with the condition of ten‑fold
cross‑validation (RMSE = 0.325, MAE = 0.212). Finally, MLP‑ANN was utilized to find the efficient
CONCLUSIONS: According to the results of the study, MLP‑ANN was recognized to be useful in the
evaluation of student performance on the ITEs. It is suggested that medical, educational databases
be enhanced to benefit from the potential of DM approach in the identification of residents at risk,
allowing instructors to offer constructive advice in a timely manner.


1. Al‑Razgan M, Al‑Khalifa AS, Al‑Khalifa HS. Educational Data
Mining: A Systematic Review of the Published Literature
2006‑2013. Lecture Notes in Electrical Engineering Proceedings
of the First International Conference on Advanced Data and
Information Engineering (DaEng‑2013); 2013. p. 711‑9.
2. Romero C, Ventura S. Educational data mining: A survey from
1995 to 2005. Expert Syst Appl 2007;33:135‑46.
3. Rahman SA. Applicability of Educational Data Mining in
Afghanistan: Opportunities and Challenges. In: Proceedings of
the 9th International Conference on Educational Data Mining,
634‑35. Raleigh, NC, USA; Available from: http://www.
pdf. [Last accessed on 2017 Mar 02].
4. Vaitsis C, Nilsson G, Zary N. Visual analytics in healthcare
education: Exploring novel ways to analyze and represent big
data in undergraduate medical education. PeerJ 2014;2:e683.
5. Bahadori M, Mousavi SM, SadeghifarJ, Haghi M. Reliability and
performance of SEVQUAL survey in evaluating quality of medical
education services. Int J Hosp Res 2013;2:39‑44.
6. Yaghmaei M, Heidarzadeh A, Jalali MM. Relationship
between residents’ success on the certifying examinations with
in – Training exam and internal evaluation. Res Med Educ
7. Rusli NM, Ibrahim Z, Janor RM. Predicting Students’ Academic
Achievement: Comparison Between Logistic Regression, Artificial
Neural Network, and Neuro‑Fuzzy. International Symposium on
Information Technology; 2008.
8. Bedno SA, Soltis MA, Mancuso JD, Burnett DG, Mallon TM.
The in‑service examination score as a predictor of success on the
American board of preventive medicine certification examination.
Am J Prev Med 2011;41:641‑4.
9. Kay C, Jackson JL, Frank M. The relationship between internal
medicine residency graduate performance on the ABIM certifying
examination, yearly in‑service training examinations, and the
USMLE step 1 examination. Acad Med 2015;90:100‑4.
10. Hauer KE, Vandergrift J, Hess B, Lipner RS, Holmboe ES,
Hood S, et al. Correlations between ratings on the resident annual
evaluation summary and the internal medicine milestones
and association with ABIM certification examination scores
among US internal medicine residents, 2013‑2014. JAMA
11. O’Neill TR, Li Z, Peabody MR, Lybarger M, Royal K, Puffer JC,
et al. The predictive validity of the ABFM’s in‑training
examination. Fam Med 2015;47:349‑56.
12. Brateanu A, Yu C, Kattan MW, OlenderJ, Nielsen C. A nomogram
to predict the probability of passing the American board of
internal medicine examination. Med Educ Online 2012;17:18810.
13. Caffery T, Fredette J, Musso MW, Jones GN. Predicting American
board of emergency medicine qualifying examination passage
using United States medical licensing examination step scores.
Ochsner J 2018;18:204‑8.
14. Althouse LA, McGuinness GA. The in‑training examination: An
analysis of its predictive value on performance on the general
pediatrics certification examination. J Pediatr 2008;153:425‑8.
15. Norcini JJ, Grosso LJ, Shea JA, Webster GD. The relationship
between features of residency training and ABIM certifying
examination performance. J Gen Intern Med 1987;2:330‑6.
16. Tian C, Gilbert DL. Association between performance on
neurology in‑training and certification examinations. Neurology
17. Hughes G, Dobbins C. The utilization of data analysis techniques
in predicting student performance in massive open online
courses (MOOCs). Res Pract Technol Enhanc Learn 2015;10:10.
18. Yukselturk E, Ozekes S, Türel YK. Predicting dropout student:
An application of data mining methods in an online education
program. Eur J Open Distance E Learn 2014;17:118‑33.
19. Juneja S. Research survey of data mining techniques in educational
system. Int J Eng Comput Sci 2016;5:19010‑13. [doi: 10.18535/
20. Almarabeh H. Analysis of students performance by using different
data mining classifiers. Int J Mod Educ Comput Sci 2017;9:9‑15.
21. Depren SK, Aşkın ÖE, Öz E. Identifying the classification performances of educational data mining methods: A case
study for TIMSS. Theory Pract 2017;17:1605-23. [doi: 10.12738/
estp. 2017.5.0634].
22. Strecht P, Cruz L, Soares C, Mendes‑Moreira J, Abreu R.
A Comparative Study of Classification and Regression
Algorithms for Modelling Students’ Academic Performance.
The 8th International Conference on Educational Data Mining
(EDM 2015); 2015. p. 392‑5.
23. Ajiboye AR, Arshaa RA. A novel approach to efficient exploration
and mining of students’ data. J Theor Appl Inf Technol
2015;79:176‑84. Available from:
Vol79No1/18Vol79No1.pdf. [Last accessed on 2015 Nov 10].
24. Grossman RS, Fincher RM, Layne RD, Seelig CB, Berkowitz LR,
Levine MA, et al. Validity of the in‑training examination for
predicting American board of internal medicine certifying
examination scores. J Gen Intern Med 1992;7:63‑7.
25. Stohl HE, Hueppchen NA, Bienstock JL. Can medical school
performance predict residency performance? Resident selection
and predictors of successful performance in obstetrics and
gynecology. J Grad Med Educ 2010;2:322‑6