Subventions et des contributions :
Subvention ou bourse octroyée s'appliquant à plus d'un exercice financier. (2017-2018 à 2022-2023)
This proposal is for joint work with graduate students and industrial partners on modern statistical modeling for insurance losses. Itl divides in 3 subprojects.
Predictive methods in Bayesian credibility : Classical credibility theory answers the 2 following questions: (1) “how many observations are needed in a risk class before its premium can be based solely on its sample values?” (full credibility) , and (2) “if not it is not fully credible, how can out-of-sample information be mixed in to improve the risk class sample-based premium estimator” (partial credibility). In the 60’s and 70’s a Bayesian answer was given to these questions, with an emphasis on analytical solutions, linear estimators for the posterior mean and asymptotic results for the variance (confidence intervals).
It is time to revisit the theory using modern computational tools. We use GLMs for a segmented portfolio of insurance policies, set in a general Bayesian framework. Our prior and model distributions do not need to be natural conjugate, nor are any linearity constraints imposed on premiums, apart from the GLM assumption. The posterior and predictive distributions are evaluated through MCMC simulations, to approximate integrals, and used to answer the 2 credibility questions above plus much more.
Machine learning techniques for interaction terms : The GLMs fitted to insurance portfolios use a large number of covariates (100+) to segregate policies into risk classes. These covariates enter the GLM-mean linearly, before being modified by a link function. The introduction of non-linear terms, such as interactions between covariates, may give a better representation. The choice of the most significant interactions becomes a very high dimensional problem.
Regularization is used in high-dimensional models to help automatize variable selection. We propose to generalize insurance GLMs to include regularization, such as Ridge Regression, Lasso, Group-Lasso or Elastic Net and compare them to generalized boosted models (GBM), which aggregates simple tree-based models.
Hidden Markov chains in insurance GLMs : GLMs are static, in the sense that the risk characteristics (covariates) are based on past information over a fixed period of time to determine a policyholder’s risk classes during the next year. For instance, in auto insurance a driver's risk classification may depend on her/his number of accidents in the last 3 years. This classification can only change once the model is fitted again in a future year.
We study a time-dependent loss model where the policyholder’s driving ability can change, perhaps due to an increased safety awareness following a recent accident, or a less safe, overconfident driving behaviour following years without accidents. We propose a hidden Markov model (HMM), where current behaviour (good/bad driving) is not observable for the insurer, but its impact on the number/severity of claims is.