Deep Learning vs. Tree Ensembles for Life Expectancy Prediction: Evidence from Ukrainian Life Calculator
DOI:
https://doi.org/10.32628/IJSRST2613333Keywords:
Life Expectancy Prediction, Deep Learning, MLP (Multilayer Perceptron), LSTM (Long Short-Term Memory), Random Forest, XGBoostAbstract
Life expectancy must be accurately predicted to allow effective planning of healthcare, resource allocation and public policy. This paper presents the comparison of four predictive models for the Ukrainian humanitarian project Life Calculator: the standard tree based ensemble machine learning models, Random Forest (LCR) and XGBoost (LCX), and deep learning models, Multilayer Perceptron (LCM) and Long Short-Term Memory (LCL). Data based on 260,000 Ukrainian participants were used to train all of the models using a single pipeline and evaluated based on common regression measures (RMSE, MAE, MSE, R2, MAPE) and the C-Index. Findings indicate that MLP and LSTM are always better than LCR and LCX in terms of error reduction in most measures, and MLP has the largest concordance score. In spite of this progress, negative values of R2 in all models indicate that there are still challenges regarding the noisy and incomplete health data. Deep learning models can capture more nonlinear, fine-grained associations among demographic and health variables; however, additional feature enrichment and more advanced models are required for clinically viable life expectancy predictions.
Downloads
References
Esteva, A., Kuprel, B., Novoa, R. A., Ko, J., Swetter, S. M., Blau, H., & Thrun, S. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature, 542(7639), 115–118. https://doi.org/10.1038/nature21056
Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine learning in medicine. New England Journal of Medicine, 380(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259
Nguyen, H., Zhang, Y., & Li, X. (2023). Challenges in predictive modeling of healthcare data. Journal of Medical Systems, 47(1), 1–10. https://doi.org/10.1007/s10916-023-01972-8
Lee, C., Kim, S., & Park, J. (2022). Deep learning models in healthcare: A systematic review. Healthcare Informatics Research, 28(4), 1–15. https://doi.org/10.4258/hir.2022.28.4.1
Friedman, J., Hastie, T., & Tibshirani, R. (2009). The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer.
Lipton, Z. C. (2015). A critical review of recurrent neural networks for sequence learning. arXiv preprint arXiv:1506.00019. https://doi.org/10.48550/arXiv.1506.00019
Smith, R., & Johnson, M. (2021). Enhancing predictive accuracy in healthcare using deep learning. Journal of Biomedical Informatics, 115, 103134. https://doi.org/10.1016/j.jbi.2020.103134
Rajpurkar, P., Chen, E., Banerjee, O., & Topol, E. J. (2022). AI in healthcare: Past, present, and future. Nature Medicine, 28(1), 31–38. https://doi.org/10.1038/s41591-021-01535-6
Kumar, R., & Gupta, S. (2023). Deep learning applications in medical diagnostics. Journal of Medical Imaging, 10(2), 1–13. https://doi.org/10.1117/1.JMI.10.2.021302
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. A., Kaiser, Ł., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://doi.org/10.48550/arXiv.1706.03762
Guyon, I., & Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. https://doi.org/10.1162/153244303322753616
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32. https://doi.org/10.1023/A:1010933404324
Chai, T., & Draxler, R. R. (2014). Root mean square error (RMSE) or mean absolute error (MAE)? – Arguments against avoiding RMSE in the literature. Geoscientific Model Development, 7(3), 1247–1250. https://doi.org/10.5194/gmd-7-1247-2014
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 785–794). ACM. https://doi.org/10.1145/2939672.2939785
Harrell, F. E. (2015). Regression modeling strategies: With applications to linear models, logistic and ordinal regression, and survival analysis. Springer.
Cutler, D. M., Meara, E., & Richards-Shubik, S. (2018). The health effects of expanded public health insurance: Evidence from Medicare. Brookings Papers on Economic Activity, 2018(2), 1–53.
Doshi-Velez, F., & Kim, B. (2017). Towards a rigorous science of interpretable machine learning. arXiv preprint arXiv:1702.08608. https://doi.org/10.48550/arXiv.1702.08608
Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Annals of Statistics, 29(5), 1189–1232. https://doi.org/10.1214/aos/1013203451
Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models. Cambridge University Press.
Dietterich, T. G. (1995). Overfitting and undercomputing in machine learning. ACM Computing Surveys, 27(3), 326–327. https://doi.org/10.1145/212094.212114
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Harrell, F. E., Califf, R. M., Pryor, D. B., Lee, K. L., & Rosati, R. A. (1982). Evaluating the yield of medical tests. JAMA, 247(18), 2543–2546. https://doi.org/10.1001/jama.1982.03320430047030
Lipton, Z. C., Kale, D. C., Elkan, C., & Wetzel, R. (2016). Learning to diagnose with LSTM recurrent neural networks. arXiv preprint arXiv:1511.03677. https://doi.org/10.48550/arXiv.1511.03677
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679–688. https://doi.org/10.1016/j.ijforecast.2006.03.001
Kingma, D. P., & Ba, J. (2015). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980. https://doi.org/10.48550/arXiv.1412.6980
Kleinbaum, D. G., & Klein, M. (2012). Survival analysis: A self-learning text (3rd ed.). Springer.
Kuhn, M., & Johnson, K. (2019). Feature engineering and selection: A practical approach for predictive models. CRC Press.
LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. https://doi.org/10.1038/nature14539
Lipton, Z. C., Kale, D. C., & Wetzel, R. (2015). Phenotyping of clinical time series with LSTM recurrent neural networks. arXiv preprint arXiv:1510.07641. https://doi.org/10.48550/arXiv.1510.07641
Miotto, R., Wang, F., Wang, S., Jiang, X., & Dudley, J. T. (2016). Deep learning for healthcare: Review, opportunities and challenges. Briefings in Bioinformatics, 19(6), 1236–1246. https://doi.org/10.1093/bib/bbx044
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (pp. 807–814).
Purushotham, S., Meng, C., Che, Z., & Liu, Y. (2018). Benchmarking deep learning models on large healthcare datasets. Journal of Biomedical Informatics, 83, 112–134. https://doi.org/10.1016/j.jbi.2018.04.007
Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. https://doi.org/10.1016/j.neunet.2014.09.003
Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. Journal of Machine Learning Research, 15(1), 1929–1958.
Steyerberg, E. W. (2019). Clinical prediction models: A practical approach to development, validation, and updating. Springer.
Svetnik, V., Liaw, A., Tong, C., Culberson, J. C., Sheridan, R. P., & Feuston, B. P. (2003). Random forest: A classification and regression tool for compound classification and QSAR modeling. Journal of Chemical Information and Computer Sciences, 43(6), 1947–1958. https://doi.org/10.1021/ci034160g
Willmott, C. J., & Matsuura, K. (2005). Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Climate Research, 30(1), 79–82. https://doi.org/10.3354/cr030079
Goldstein, B. A., Navar, A. M., & Carter, R. E. (2017). Moving beyond regression techniques in cardiovascular risk prediction: Applying machine learning to address analytic challenges. European Heart Journal, 38(23), 1805–1814. https://doi.org/10.1093/eurheartj/ehw302
Malanin, V., & Chaykovsky, I. (2025). Development of a mathematical model for personalized estimation of life expectancy in Ukraine. Cybernetics and Computer Technologies, (2), 47–60. https://doi.org/10.34229/2707-451X.25.2.4
Downloads
Published
Issue
Section
License
Copyright (c) 2026 International Journal of Scientific Research in Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.
https://creativecommons.org/licenses/by/4.0