Integrating traditional and machine-learning methods for predictive insurance modeling: A hybrid approach to balancing the trade-off between interpretability and accuracy

Věda a výzkum

Doba řešení: 1. března 2025 - 28. února 2027
Řešitel: Ing. Ondřej Vít
Pracoviště: Fakulta informatiky a statistiky
Katedra statistiky a pravděpodobnosti (4100)

Samostatný řešitel
Poskytovatel: Ministerstvo školství, mládeže a tělovýchovy
program: Interní grantová agentura VŠE
Celkový rozpočet: 183 810 CZK
Registrační číslo F4/36/2025
Číslo zakázky: IG410025
This project aims to advance risk estimation methodologies in third-party liability insurance by integrating traditional statistical models with advanced machine learning techniques into a hybrid modeling framework. The primary objective is to enhance the predictive modeling of insurance risks by combining the excellent interpretability of Generalized Linear Models (GLMs) with the high predictive accuracy of machine-learning methods. This integration seeks to capitalize on the strengths of both approaches and maximize their utility in modeling. The initiative will focus on refining the Combined Actuarial Neural Network (CANN) model through the employment of additional machine-learning approaches, namely regression random forests and the XGBoost algorithm, to improve predictive capabilities. Additionally, the integration of actuarial copulas with neural networks will be explored to enhance risk assessment accuracy, benefitting from the fact that both copulas and neural networks scale variables onto [0, 1] intervals, thereby facilitating the smooth application of copulas based on theories developed for neural networks. Furthermore, we also focus on theoretical contributions, particularly in formally optimizing the trade-off between the strong interpretability provided by GLMs and the high predictive accuracy of the machine-learning component of the CANN model. If the machine-learning part predicts a high percentage of the problem variability, which remains invariant in total sum, only a limited percentage can be dedicated to GLMs, which are highly interpretable, and vice versa. This interpretability-accuracy trade-off, in the context of recently published research on CANN models, opens up opportunities for potential theoretical and publishable contributions in the field. The project will emphasize robust model validation processes, evaluate various methodologies for different types of insurance datasets, and aim to optimize the balance between interpretability and accuracy, contributing insights to both the theoretical and practical domains of insurance risk management.

Projekty řešitele