Comptes Rendus
Statistiques
An elementary analysis of ridge regression with random design
Comptes Rendus. Mathématique, Volume 360 (2022), pp. 1055-1063.

In this note, we provide an elementary analysis of the prediction error of ridge regression with random design. The proof is short and self-contained. In particular, it bypasses the use of Rudelson’s deviation inequality for covariance matrices, through a combination of exchangeability arguments, matrix perturbation and operator convexity.

Reçu le :
Accepté le :
Publié le :
DOI : 10.5802/crmath.367
Classification : 62J07, 62G08, 60B20

Jaouad Mourtada 1 ; Lorenzo Rosasco 2, 3

1 CREST, ENSAE, Palaiseau, France
2 Center for Brains, Minds and Machines, MIT, Cambridge, United States
3 Universita’ di Genova & Istituto Italiano di Tecnologia, Genova, Italy
Licence : CC-BY 4.0
Droits d'auteur : Les auteurs conservent leurs droits
@article{CRMATH_2022__360_G9_1055_0,
     author = {Jaouad Mourtada and Lorenzo Rosasco},
     title = {An elementary analysis of ridge regression with random design},
     journal = {Comptes Rendus. Math\'ematique},
     pages = {1055--1063},
     publisher = {Acad\'emie des sciences, Paris},
     volume = {360},
     year = {2022},
     doi = {10.5802/crmath.367},
     language = {en},
}
TY  - JOUR
AU  - Jaouad Mourtada
AU  - Lorenzo Rosasco
TI  - An elementary analysis of ridge regression with random design
JO  - Comptes Rendus. Mathématique
PY  - 2022
SP  - 1055
EP  - 1063
VL  - 360
PB  - Académie des sciences, Paris
DO  - 10.5802/crmath.367
LA  - en
ID  - CRMATH_2022__360_G9_1055_0
ER  - 
%0 Journal Article
%A Jaouad Mourtada
%A Lorenzo Rosasco
%T An elementary analysis of ridge regression with random design
%J Comptes Rendus. Mathématique
%D 2022
%P 1055-1063
%V 360
%I Académie des sciences, Paris
%R 10.5802/crmath.367
%G en
%F CRMATH_2022__360_G9_1055_0
Jaouad Mourtada; Lorenzo Rosasco. An elementary analysis of ridge regression with random design. Comptes Rendus. Mathématique, Volume 360 (2022), pp. 1055-1063. doi : 10.5802/crmath.367. https://comptes-rendus.academie-sciences.fr/mathematique/articles/10.5802/crmath.367/

[1] Rudolf Ahlswede; Andreas Winter Strong converse for identification via quantum channels, IEEE Trans. Inf. Theory, Volume 48 (2002) no. 3, pp. 569-579 | DOI | MR | Zbl

[2] Nachman Aronszajn Theory of reproducing kernels, Trans. Am. Math. Soc., Volume 68 (1950) no. 3, pp. 337-404 | DOI | MR | Zbl

[3] Francis Bach; Éric Moulines Non-strongly-convex smooth stochastic approximation with convergence rate O(1/n), Advances in Neural Information Processing Systems 26 (2013), pp. 773-781

[4] Peter L. Bartlett; Olivier Bousquet; Shahar Mendelson Local rademacher complexities, Ann. Stat., Volume 33 (2005) no. 4, pp. 1497-1537 | MR | Zbl

[5] Peter L. Bartlett; Philip M. Long; Gábor Lugosi; Alexander Tsigler Benign overfitting in linear regression, Proc. Natl. Acad. Sci. USA, Volume 117 (2020) no. 48, pp. 30063-30070 | DOI | MR | Zbl

[6] Gilles Blanchard; Nicole Mücke Optimal Rates for Regularization of Statistical Inverse Learning Problems, Found. Comput. Math., Volume 18 (2018) no. 4, pp. 971-1013 | DOI | MR | Zbl

[7] Andrea Caponnetto; Ernesto De Vito Optimal rates for the regularized least-squares algorithm, Found. Comput. Math., Volume 7 (2007) no. 3, pp. 331-368 | DOI | MR | Zbl

[8] Eric Carlen Trace inequalities and quantum entropy: an introductory course, Entropy and the quantum (Contemporary Mathematics), Volume 529, American Mathematical Society, 2010, pp. 73-140 | DOI | MR | Zbl

[9] Olivier Catoni PAC-Bayesian bounds for the Gram matrix and least squares regression with a random design (2016) (https://arxiv.org/abs/1603.05229)

[10] Ernesto De Vito; Andrea Caponnetto; Lorenzo Rosasco Model selection for regularized least-squares algorithm in learning theory, Found. Comput. Math., Volume 5 (2005) no. 1, pp. 59-85 | DOI | MR

[11] Aymeric Dieuleveut; Francis Bach Nonparametric stochastic approximation with large step-sizes, Ann. Stat., Volume 44 (2016) no. 4, pp. 1363-1399 | MR | Zbl

[12] Sara van de Geer Empirical Processes in M-estimation, Cambridge University Press, 1999

[13] Alon Gonen; Shai Shalev-Shwartz Average stability is invariant to data preconditioning. Implications to exp-concave empirical risk minimization, J. Mach. Learn. Res., Volume 18 (2018) no. 222, pp. 1-13 | MR | Zbl

[14] Arthur E. Hoerl Application of Ridge Analysis to Regression Problems, Chem. Eng. Prog., Volume 58 (1962), pp. 54-59

[15] Daniel Hsu; Sham M. Kakade; Tong Zhang Random design analysis of Ridge regression, Found. Comput. Math., Volume 14 (2014) no. 3, pp. 569-600 | MR | Zbl

[16] Prateek Jain; Sham M. Kakade; Rahul Kidambi; Praneeth Netrapalli; Aaron Sidford Parallelizing stochastic gradient descent for least squares regression: Mini-batching, averaging, and model misspecification, J. Mach. Learn. Res., Volume 18 (2018) no. 223, pp. 1-42 | MR | Zbl

[17] Vladimir Koltchinskii Local Rademacher complexities and oracle inequalities in risk minimization, Ann. Stat., Volume 34 (2006) no. 6, pp. 2593-2656 | MR | Zbl

[18] Vladimir Koltchinskii Oracle Inequalities in Empirical Risk Minimization and Sparse Recovery Problems. École d’Été de Probabilités de Saint-Flour, Lecture Notes in Mathematics, 2033, Springer, 2011

[19] Vladimir Koltchinskii; Karim Lounici Concentration inequalities and moment bounds for sample covariance operators, Bernoulli, Volume 23 (2017) no. 1, pp. 110-133 | MR | Zbl

[20] Tomer Koren; Kfir Levy Fast rates for exp-concave empirical risk minimization, Advances in Neural Information Processing Systems 28 (2015), pp. 1477-1485

[21] Guillaume Lecué; Shahar Mendelson Performance of empirical risk minimization in linear aggregation, Bernoulli, Volume 22 (2016) no. 3, pp. 1520-1534 | MR | Zbl

[22] Françoise Lust-Piquard; Gilles Pisier Non commutative Khintchine and Paley inequalities, Ark. Mat., Volume 29 (1991) no. 1, pp. 241-260 | DOI | Zbl

[23] Pascal Massart Some applications of concentration inequalities to statistics, Ann. Fac. Sci. Toulouse, Math., Volume 9 (2000) no. 2, pp. 245-303 | DOI | Numdam | MR | Zbl

[24] Shahar Mendelson On the performance of kernel classes, J. Mach. Learn. Res., Volume 4 (2003), pp. 759-771 | MR

[25] Jaouad Mourtada Exact minimax risk for linear least squares, and the lower tail of sample covariance matrices, Ann. Stat., Volume 50 (2022) no. 4, pp. 2157-2178 | MR

[26] Jaouad Mourtada; Stéphane Gaïffas An improper estimator with optimal excess risk in misspecified density estimation and logistic regression, J. Mach. Learn. Res., Volume 23 (2022) no. 31, pp. 1-49 | MR

[27] Roberto I. Oliveira Sums of random Hermitian matrices and an inequality by Rudelson, Electron. Commun. Probab., Volume 15 (2010), pp. 203-212 | MR | Zbl

[28] Roberto I. Oliveira The lower tail of random quadratic forms with applications to ordinary least squares, Probab. Theory Relat. Fields, Volume 166 (2016) no. 3, pp. 1175-1194 | DOI | MR | Zbl

[29] Lorenzo Rosasco; Silvia Villa Learning with incremental iterative regularization, Advances in Neural Information Processing Systems 28 (2015), pp. 1630-1638

[30] Mark Rudelson Random vectors in the isotropic position, J. Funct. Anal., Volume 164 (1999) no. 1, pp. 60-72 | DOI | MR | Zbl

[31] Steve Smale; Ding-Xuan Zhou Shannon sampling II: Connections to learning theory, Appl. Comput. Harmon. Anal., Volume 19 (2005) no. 3, pp. 285-302 | DOI | MR | Zbl

[32] Steve Smale; Ding-Xuan Zhou Learning theory estimates via integral operators and their approximations, Constr. Approx., Volume 26 (2007) no. 2, pp. 153-172 | DOI | MR | Zbl

[33] Ingo Steinwart; Andreas Christmann Support Vector Machines, Springer, 2008

[34] Ingo Steinwart; Don Hush; Clint Scovel Optimal rates for regularized least squares regression, Proc. 22nd Conference on Learning Theory (2009), pp. 79-93

[35] Andrey N. Tikhonov Solution of incorrectly formulated problems and the regularization method, Sov. Math., Dokl., Volume 4 (1963), pp. 1035-1038 | Zbl

[36] Joel A. Tropp User-friendly tail bounds for sums of random matrices, Found. Comput. Math., Volume 12 (2012) no. 4, pp. 389-434 | DOI | MR | Zbl

[37] Alexander Tsigler; Peter L. Bartlett Benign overfitting in ridge regression (2020) (https://arxiv.org/abs/2009.14286)

[38] Tomas Vaškevičius; Nikita Zhivotovskiy Suboptimality of constrained least squares and improvements via non-linear predictors (2021) (https://arxiv.org/abs/2009.09304)

[39] Roman Vershynin Introduction to the non-asymptotic analysis of random matrices (2012), pp. 210-268

[40] Ernesto De Vito; Lorenzo Rosasco; Andrea Caponnetto; Umberto De Giovannini; Francesca Odone Learning from examples as an inverse problem, J. Mach. Learn. Res., Volume 6 (2005) no. 5, pp. 883-904 | MR | Zbl

[41] Grace Wahba Spline Models for Observational Data, 59, Society for Industrial and Applied Mathematics, 1990 | DOI

[42] Yuan Yao; Lorenzo Rosasco; Andrea Caponnetto On early stopping in gradient descent learning, Constr. Approx., Volume 26 (2007) no. 2, pp. 289-315 | MR | Zbl

[43] Nikita Zhivotovskiy Dimension-free bounds for sums of independent matrices and simple tensors via the variational principle (2021) (https://arxiv.org/abs/2108.08198)

Cité par Sources :

Commentaires - Politique