A Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival Analysis

Mulaudzi, T. B.Bere, A.Tshisikule, Ompha2026-06-172026-06-172026-05-19Tshisikule, O. 2026. A Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival Analysis. . .https://univendspace.univen.ac.za/handle/11602/3189M.Sc. in e-ScienceDepartment of Mathematical and Computational SciencesSurvival analysis is a statistical technique used to model time-to-event data, commonly applied in fields such as healthcare, engineering, and finance. Traditional approaches, including the Cox Proportional Hazards (CoxPH) model, have long been dominant due to their interpretability and theoretical foundation. However, recent advances in machine learning have shown promise in handling complex, high-dimensional datasets with nonlinear relationships. Despite this, there remains a gap in systematic comparative studies between traditional survival models and modern approaches such as regularized regression, ensemble methods, and deep learning architectures, particularly across diverse datasets with varying characteristics. This study conducts a comparative analysis of traditional, machine learning, and deep learning-based survival models, evaluating their predictive performance and computational efficiency for continuous-time survival data. The models considered include LASSO-regularized Cox regression, CoxPH, Random Survival Forest (RSF), and Long Short-Term Memory (LSTM) algorithms. Model performance was assessed using the concordance index (C-index), integrated Brier score (IBS), and Time-dependent Area Under the Curve (AUC) across three secondary datasets with different characteristics: a breast cancer dataset obtained from the SEER Program of the National Cancer Institute (2017 November update), the North Carolina Recidivism dataset (ICPSR 8987) obtained from ICPSR, and a heart failure clinical records dataset obtained from Kaggle. A rigorous statistical framework was employed, utilizing 100 iterations of stratified train-test splits to generate robust performance distributions. Distributional assumptions were systematically tested using Shapiro-Wilk and Levene’s tests to determine appropriate statistical tests, followed by omnibus tests (ANOVA, Welch’s ANOVA, or Kruskal-Wallis) and post-hoc pairwise comparisons with Bonferroni correction to control family-wise error rates. The analysis revealed that traditional survival models consistently outperformed deep learning-based approaches across all datasets. Random Survival Forest achieved the highest predictive accuracy, followed closely by CoxPH, with C-index values ranging from 0.66 to 0.73 and lower IBS scores indicating better calibration. In contrast, LSTM models performed poorly, often near random prediction levels (C-index 0.3–0.42), despite extensive optimization efforts including hyperparameter tuning, class balancing, and architectural modifications. Statistical testing confirmed that performance differences were highly significant across models and datasets (all p < 0.001), and post-hoc analyses demonstrated that RSF and CoxPH consistently outperformed LSTM for both discrimination and calibration metrics. These results suggest that traditional survival models remain the most reliable choice for moderate-sized datasets with censored observations and weak predictive signals, while LSTM networks are limited by dataset size, high censoring, and architectural mismatch with static survival data.1 online resource (xvii, 162 leaves): color illustrationsenUniversity of VendaSurvival AnalysisMachine LearningUCTDDeep LearningCensored DataPredictive PerformanceA Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival AnalysisDissertationTshisikule O. A Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival Analysis. []. , 2026 [cited yyyy month dd]. Available from:Tshisikule, O. (2026). <i>A Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival Analysis</i>. (). . Retrieved fromTshisikule, Ompha. <i>"A Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival Analysis."</i> ., , 2026.TY - Dissertation AU - Tshisikule, Ompha AB - Survival analysis is a statistical technique used to model time-to-event data, commonly applied in fields such as healthcare, engineering, and finance. Traditional approaches, including the Cox Proportional Hazards (CoxPH) model, have long been dominant due to their interpretability and theoretical foundation. However, recent advances in machine learning have shown promise in handling complex, high-dimensional datasets with nonlinear relationships. Despite this, there remains a gap in systematic comparative studies between traditional survival models and modern approaches such as regularized regression, ensemble methods, and deep learning architectures, particularly across diverse datasets with varying characteristics. This study conducts a comparative analysis of traditional, machine learning, and deep learning-based survival models, evaluating their predictive performance and computational efficiency for continuous-time survival data. The models considered include LASSO-regularized Cox regression, CoxPH, Random Survival Forest (RSF), and Long Short-Term Memory (LSTM) algorithms. Model performance was assessed using the concordance index (C-index), integrated Brier score (IBS), and Time-dependent Area Under the Curve (AUC) across three secondary datasets with different characteristics: a breast cancer dataset obtained from the SEER Program of the National Cancer Institute (2017 November update), the North Carolina Recidivism dataset (ICPSR 8987) obtained from ICPSR, and a heart failure clinical records dataset obtained from Kaggle. A rigorous statistical framework was employed, utilizing 100 iterations of stratified train-test splits to generate robust performance distributions. Distributional assumptions were systematically tested using Shapiro-Wilk and Levene’s tests to determine appropriate statistical tests, followed by omnibus tests (ANOVA, Welch’s ANOVA, or Kruskal-Wallis) and post-hoc pairwise comparisons with Bonferroni correction to control family-wise error rates. The analysis revealed that traditional survival models consistently outperformed deep learning-based approaches across all datasets. Random Survival Forest achieved the highest predictive accuracy, followed closely by CoxPH, with C-index values ranging from 0.66 to 0.73 and lower IBS scores indicating better calibration. In contrast, LSTM models performed poorly, often near random prediction levels (C-index 0.3–0.42), despite extensive optimization efforts including hyperparameter tuning, class balancing, and architectural modifications. Statistical testing confirmed that performance differences were highly significant across models and datasets (all p < 0.001), and post-hoc analyses demonstrated that RSF and CoxPH consistently outperformed LSTM for both discrimination and calibration metrics. These results suggest that traditional survival models remain the most reliable choice for moderate-sized datasets with censored observations and weak predictive signals, while LSTM networks are limited by dataset size, high censoring, and architectural mismatch with static survival data. DA - 2026-05-19 DB - ResearchSpace DP - Univen KW - Survival Analysis KW - Machine Learning KW - Deep Learning KW - Censored Data KW - Predictive Performance LK - https://univendspace.univen.ac.za PY - 2026 T1 - A Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival Analysis TI - A Comparative Analysis of Machine Learning Models and Traditional Statistical Models for Continuous-Time Survival Analysis UR - ER -