Comptes Rendus
Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction
Comptes Rendus. Mécanique, Volume 347 (2019) no. 11, pp. 817-830.

We present here the statistical models that are most in use in survival data analysis. The parametric ones are based on explicit distributions, depending only on real unknown parameters, while the preferred models are semi-parametric, like Cox model, which imply unknown functions to be estimated. Now, as big data sets are available, two types of methods are needed to deal with the resulting curse of dimensionality including non informative factors which spoil the informative part relative to the target: on one hand, methods that reduce the dimension while maximizing the information left in the reduced data, and then applying classical stochastic models; on the other hand algorithms that apply directly to big data, i.e. artificial intelligence (AI or machine learning). Actually, those algorithms have a probabilistic interpretation. We present here several of the former methods. As for the latter methods, which comprise neural networks, support vector machines, random forests and more (see second edition, January 2017 of Hastie, Tibshirani et al. (2005) [1]), we present the neural networks approach. Neural networks are known to be efficient for prediction on big data. As we analyzed, using a classical stochastic model, risk factors for Alzheimer on a data set of around 5000 patients and p=17 factors, we were interested in comparing its prediction performance with the one of a neural network on this relatively small sample size data.

Reçu le :
Accepté le :
Publié le :
DOI : 10.1016/j.crme.2019.11.007
Mots clés : Survival data analysis, Stochastic models, Machine learning, Neural networks, Nonlinear modeling, Alzheimer disease
Catherine Huber-Carol 1 ; Shulamith Gross 2 ; Filia Vonta 3

1 MAP5 CNRS 8145, Université de Paris, 45 rue des Saints-Pères, 75270, Paris cedex 06, France
2 Lab VC-170, Baruch College of CUNY, One Baruch way, NY, NY 10010, USA
3 Department of Mathematics, National Technical University of Athens, 9 Iroon Polytechneiou Str., 15780, Athens, Greece
@article{CRMECA_2019__347_11_817_0,
     author = {Catherine Huber-Carol and Shulamith Gross and Filia Vonta},
     title = {Risk analysis: {Survival} data analysis vs. machine learning. {Application} to {Alzheimer} prediction},
     journal = {Comptes Rendus. M\'ecanique},
     pages = {817--830},
     publisher = {Elsevier},
     volume = {347},
     number = {11},
     year = {2019},
     doi = {10.1016/j.crme.2019.11.007},
     language = {en},
}
TY  - JOUR
AU  - Catherine Huber-Carol
AU  - Shulamith Gross
AU  - Filia Vonta
TI  - Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction
JO  - Comptes Rendus. Mécanique
PY  - 2019
SP  - 817
EP  - 830
VL  - 347
IS  - 11
PB  - Elsevier
DO  - 10.1016/j.crme.2019.11.007
LA  - en
ID  - CRMECA_2019__347_11_817_0
ER  - 
%0 Journal Article
%A Catherine Huber-Carol
%A Shulamith Gross
%A Filia Vonta
%T Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction
%J Comptes Rendus. Mécanique
%D 2019
%P 817-830
%V 347
%N 11
%I Elsevier
%R 10.1016/j.crme.2019.11.007
%G en
%F CRMECA_2019__347_11_817_0
Catherine Huber-Carol; Shulamith Gross; Filia Vonta. Risk analysis: Survival data analysis vs. machine learning. Application to Alzheimer prediction. Comptes Rendus. Mécanique, Volume 347 (2019) no. 11, pp. 817-830. doi : 10.1016/j.crme.2019.11.007. https://comptes-rendus.academie-sciences.fr/mecanique/articles/10.1016/j.crme.2019.11.007/

[1] T. Hastie; R. Tibshirani; J. Friedman; J. Franklin The elements of statistical learning: data mining, inference and prediction, Math. Intell., Volume 27 (2005) no. 2, pp. 83-85

[2] J.F. Lawless Statistical Models and Methods for Lifetime Data, Vol. 362, John Wiley & Sons, 2011

[3] P.K. Andersen; O. Borgan; R.D. Gill; N. Keiding Statistical Models Based on Counting Processes, Springer Science & Business Media, 2012

[4] E.L. Lehmann; J.P. Romano Testing Statistical Hypotheses, Springer Science & Business Media, 2006

[5] C. Huber-Carol; N. Balakrishnan; M. Nikulin; M. Mesbah Goodness-of-Fit Tests and Model Validity, Springer Science & Business Media, 2012

[6] O. Pons Estimation in a Cox regression model with a change-point according to a threshold in a covariate, Ann. Stat., Volume 31 (2003) no. 2, pp. 442-463

[7] T.M. Therneau; P.M. Grambsch Modeling Survival Data: Extending the Cox Model, Springer Science & Business Media, 2013

[8] Y. Le Cun; Y. Bengio; G. Hinton Deep learning, Nature, Volume 521 (2015) no. 7553, p. 436

[9] C. Huber; V. Solev; F. Vonta Interval censored and truncated data: rate of convergence of NPMLE of the density, J. Stat. Plan. Inference, Volume 139 (2009) no. 5, pp. 1734-1749

[10] S.T. Gross; C. Huber Matched pair experiments: Cox and maximum likelihood estimation, Scand. J. Stat. (1987), pp. 27-41

[11] S.T. Gross; T.L. Lai Bootstrap methods for truncated and censored data, Stat. Sin. (1996), pp. 509-530

[12] P. Hall The Bootstrap and Edgeworth Expansion, Springer Science & Business Media, 2013

[13] M. Nikulin; F. Haghighi A chi-squared test for the generalized power Weibull family for the head-and-neck cancer censored data, J. Math. Sci., Volume 133 (2006) no. 3, pp. 1333-1341

[14] B. Efron Logistic regression, survival analysis, and the Kaplan-Meier curve, J. Amer. Stat. Assoc., Volume 83 (1988) no. 402, pp. 414-425

[15] G.S. Mudholkar; D.K. Srivastava Exponentiated Weibull family for analyzing bathtub failure-rate data, IEEE Trans. Reliab., Volume 42 (1993) no. 2, pp. 299-302

[16] C. Huber; M.S. Nikulin Remarques sur le maximum de vraisemblance, Qüestiió: Quaderns d'Estad. Investig. Oper., Volume 21 (1997) no. 1

[17] S.T. Gross; C. Huber-Carol Regression models for truncated survival data, Scand. J. Stat. (1992), pp. 193-213

[18] C. Huber; V. Solev; F. Vonta Estimation of density for arbitrarily censored and truncated data, Probability, Statistics and Modelling in Public Health, Springer, 2006, pp. 246-265

[19] C. Huber Efficient regression estimation under general censoring and truncation, Mathematical and Statistical Models and Methods in Reliability, Springer, 2010, pp. 235-241

[20] D.R. Cox Regression models and life-tables, J. R. Stat. Soc., Ser. B, Methodol., Volume 34 (1972) no. 2, pp. 187-202

[21] D.R. Cox Analysis of Survival Data, Chapman and Hall/CRC, 2018

[22] J. Bretagnolle; C. Huber-Carol Effects of omitting covariates in Cox's model for survival data, Scand. J. Stat. (1988), pp. 125-138

[23] F. Vonta Efficient estimation in a non-proportional hazards model in survival analysis, Scand. J. Stat. (1996), pp. 49-61

[24] C. Huber-Carol; F. Vonta Frailty models for arbitrarily censored and truncated data, Lifetime Data Anal., Volume 10 (2004) no. 4, pp. 369-388

[25] C. Huber-Carol; F. Vonta Semiparametric transformation models for arbitrarily censored and truncated data, Parametric and Semiparametric Models With Applications to Reliability, Survival Analysis, Quality of Life, Springer, 2004, pp. 167-176

[26] V. Bagdonavičius; M. Nikulin Accelerated Life Models: Modeling and Statistical Analysis, Chapman and Hall/CRC, 2001

[27] V. Bagdonavičius; M.S. Nikulin Goodness-of-fit tests for accelerated life models, Goodness-of-Fit Tests and Model Validity, Springer, 2002, pp. 281-297

[28] J.-J. Droesbeke; Société mathématique de France; Association pour la statistique et ses utilisations (France) Analyse statistique des durées de vie: modélisation des données censurées, Journées d'Étude en Statistique, vol. 3, Marseille-Luminy, 1988

[29] S.T. Gross; T.L. Lai Nonparametric estimation and regression analysis with left-truncated and right-censored data, J. Amer. Stat. Assoc., Volume 91 (1996) no. 435, pp. 1166-1180

[30] P.J. Huber; E.M. Ronchetti Robust Statistics, Wiley, New York, 1981

[31] C. Huber Robust versus nonparametric approaches and survival data analysis, Advances in Degradation Modeling, Springer, 2010, pp. 323-337

[32] M.-L. Ting Lee; G.A. Whitmore Threshold regression for survival analysis: modeling event times by a stochastic process reaching a boundary, Stat. Sci., Volume 21 (2006) no. 4, pp. 501-513

[33] M.-L. Ting Ting Lee; G.A. Whitmore; B.A. Rosner Threshold regression for survival data with time-varying covariates, Stat. Med., Volume 29 (2010) no. 7–8, pp. 896-905

[34] A. Chambaz; D. Choudat; C. Huber; J.-C. Pairon; M.J. Van der Laan Analysis of the effect of occupational exposure to asbestos based on threshold regression modeling of case–control data, Biostatistics, Volume 15 (2013) no. 2, pp. 327-340

[35] X. He; M.-L. Ting Lee First-hitting-time based threshold regression, International Encyclopedia of Statistical Science, 2011, pp. 523-524

[36] R. Ohayon Reduced models for fluid–structure interaction problems, Int. J. Numer. Methods Eng., Volume 60 (2004) no. 1, pp. 139-152

[37] F. Chinesta; A. Ammar; A. Leygue; R. Keunings An overview of the proper generalized decomposition with applications in computational rheology, J. Non-Newton. Fluid Mech., Volume 166 (2011) no. 11, pp. 578-592

[38] F. Chinesta; P. Ladevèze; E. Cueto A short review on model order reduction based on proper generalized decomposition, Arch. Comput. Methods Eng., Volume 18 (2011) no. 4, p. 395

[39] A. Nouy Low-rank tensor methods for model order reduction, Handbook of Uncertainty Quantification, 2017, pp. 857-882

[40] C. Huber; J. Lellouch Estimation dans les tableaux de contingence a un grand nombre d'entrées, Int. Stat. Rev. (1974), pp. 193-203

[41] C. Huber-Carol; S.T. Gross; A. Alpérovitch Within the sample comparison of prediction performance of models and submodels: application to Alzheimer's disease, Statistical Models and Methods for Reliability and Survival Analysis, 2013, pp. 95-109

[42] Y. Le Cun, Personal communication, December 2018.

[43] V. Rykov Reliability of Engineering Systems and Technological Risk, John Wiley & Sons, 2016

[44] F. Vonta; M.S. Nikulin; N. Limnios; C. Huber-Carol Statistical Models and Methods for Biomedical and Technical Systems, Springer Science & Business Media, 2008

[45] B. Harlamov Stochastic Risk Analysis and Management, John Wiley & Sons, 2017

[46] K.P. Murphy Machine Learning: A Probabilistic Perspective, MIT Press, 2012

[47] M.-L. Ting Lee Analysis of Microarray Gene Expression Data, Springer Science & Business Media, 2007

Cité par Sources :

Commentaires - Politique


Ces articles pourraient vous intéresser

Chi-squared tests for general composite hypotheses from censored samples

Vilijandas Bagdonavičius; Mikhail Nikulin

C. R. Math (2011)


A test for the equality of marginal distributions

Vilijandas Bagdonavičius; Ruta Levuliene; Mikhail Nikulin

C. R. Math (2007)


Non-parametric tests for the two-group comparison with multivariate censored data

Claire Pinçon

C. R. Math (2004)