Comptes Rendus
Entropic optimal transport is maximum-likelihood deconvolution
Comptes Rendus. Mathématique, Volume 356 (2018) no. 11-12, pp. 1228-1235.

We give a statistical interpretation of entropic optimal transport by showing that performing maximum-likelihood estimation for Gaussian deconvolution corresponds to calculating a projection with respect to the entropic optimal transport distance. This structural result gives theoretical support for the wide adoption of these tools in the machine learning community.

Cette note donne un interprétation statistique du transport optimal entropique : on montre que l'estimateur du maximum de vraisemblance en deconvolution gaussienne correspond à la projection de la loi empirique des données au sens de la distance définie par le transport optimal entropique. Ce résultat structurel donne une justification théorique, qui soutient l'adoption massive de ces outils par la communauté de l'apprentissage automatique.

Published online:
DOI: 10.1016/j.crma.2018.10.010

Philippe Rigollet 1; Jonathan Weed 1

1 Department of Mathematics, Massachusetts Institute of Technology, Cambridge, MA, USA
     author = {Philippe Rigollet and Jonathan Weed},
     title = {Entropic optimal transport is maximum-likelihood deconvolution},
     journal = {Comptes Rendus. Math\'ematique},
     pages = {1228--1235},
     publisher = {Elsevier},
     volume = {356},
     number = {11-12},
     year = {2018},
     doi = {10.1016/j.crma.2018.10.010},
     language = {en},
AU  - Philippe Rigollet
AU  - Jonathan Weed
TI  - Entropic optimal transport is maximum-likelihood deconvolution
JO  - Comptes Rendus. Mathématique
PY  - 2018
SP  - 1228
EP  - 1235
VL  - 356
IS  - 11-12
PB  - Elsevier
DO  - 10.1016/j.crma.2018.10.010
LA  - en
ID  - CRMATH_2018__356_11-12_1228_0
ER  - 
%0 Journal Article
%A Philippe Rigollet
%A Jonathan Weed
%T Entropic optimal transport is maximum-likelihood deconvolution
%J Comptes Rendus. Mathématique
%D 2018
%P 1228-1235
%V 356
%N 11-12
%I Elsevier
%R 10.1016/j.crma.2018.10.010
%G en
%F CRMATH_2018__356_11-12_1228_0
Philippe Rigollet; Jonathan Weed. Entropic optimal transport is maximum-likelihood deconvolution. Comptes Rendus. Mathématique, Volume 356 (2018) no. 11-12, pp. 1228-1235. doi : 10.1016/j.crma.2018.10.010.

[1] J. Altschuler; J. Weed; P. Rigollet Near-linear time approximation algorithms for optimal transport via Sinkhorn iteration, 2017, 4–9 December 2017, Long Beach, CA, USA (I. Guyon; U. von Luxburg; S. Bengio; H.M. Wallach; R. Fergus; S.V.N. Vishwanathan; R. Garnett, eds.) (2017), pp. 1961-1971

[2] M. Arjovsky; S. Chintala; L. Bottou Wasserstein GAN, 2017 (preprint) | arXiv

[3] F. Bassetti; A. Bodini; E. Regazzini On minimum Kantorovich distance estimators, Stat. Probab. Lett., Volume 76 (2006) no. 12, pp. 1298-1302 (MR 2269358)

[4] J.-D. Benamou Numerical resolution of an “unbalanced” mass transport problem, M2AN, Math. Model. Numer. Anal., Volume 37 (2003) no. 5, pp. 851-868 (MR 2020867)

[5] P.J. Bickel; K.A. Doksum Mathematical Statistics: Basic Ideas and Selected Topics, vol. 1, Prentice-Hall, 2006 (updated printing)

[6] N. Bonneel; M. van de Panne; S. Paris; W. Heidrich Displacement interpolation using lagrangian mass transport, ACM Trans. Graph., Volume 30 (2011) no. 6, p. 158:1-158:12

[7] C. Caillerie; F. Chazal; J. Dedecker; B. Michel Deconvolution for the Wasserstein metric and geometric inference, Electron. J. Stat., Volume 5 (2011), pp. 1394-1423 (MR 2851684)

[8] R.J. Carroll; P. Hall Optimal rates of convergence for deconvolving a density, J. Amer. Stat. Assoc., Volume 83 (1988) no. 404, pp. 1184-1186

[9] R.J. Carroll; D. Ruppert; L.A. Stefanski; C.M. Crainiceanu Measurement error in nonlinear models, A Modern Perspective, Monogr. Stat. Appl. Probab., vol. 105, Chapman & Hall/CRC, Boca Raton, FL, 2006

[10] O. Catoni Statistical learning theory and stochastic optimization, July 8–25, 2001 (Lect. Notes Math.), Volume vol. 1851 (2004) MR 2163920 (2006d:62004)

[11] L. Chizat; G. Peyré; B. Schmitzer; F.-X. Vialard Scaling algorithms for unbalanced transport problems, 2016 (preprint) | arXiv

[12] N. Courty; R. Flamary; D. Tuia; A. Rakotomamonjy Optimal transport for domain adaptation, IEEE Trans. Pattern Anal. Mach. Intell., Volume 39 (2017) no. 9, pp. 1853-1865

[13] M. Cuturi Sinkhorn distances: lightspeed computation of optimal transport, held December 5–8, 2013, Lake Tahoe, Nevada, United States (C.J.C. Burges; L. Bottou; Z. Ghahramani; K.Q. Weinberger, eds.) (2013), pp. 2292-2300

[14] A. Dalalyan; A.B. Tsybakov Aggregation by exponential weighting, sharp pac-bayesian bounds and sparsity, Mach. Learn., Volume 72 (2008) no. 1, pp. 39-61

[15] A.S. Dalalyan; A.B. Tsybakov Mirror averaging with sparsity priors, Bernoulli, Volume 18 (2012) no. 3, pp. 914-944 (MR 2948907)

[16] A.S. Dalalyan; A.B. Tsybakov Sparse regression learning by aggregation and Langevin Monte-Carlo, J. Comput. Syst. Sci., Volume 78 (2012) no. 5, pp. 1423-1443 | arXiv

[17] J. Fan On the estimation of quadratic functionals, Ann. Stat., Volume 19 (1991) no. 3, pp. 1273-1294 MR 1126325 (92j:62006)

[18] A. Forrow; J.-C. Hütter; M. Nitzan; G. Schiebinger; P. Rigollet; J. Weed Statistical optimal transport via factored couplings, 2018 | arXiv

[19] C. Frogner; C. Zhang; H. Mobahi; M. Araya-Polo; T.A. Poggio Learning with a Wasserstein loss, December 7–12, 2015, Montreal, Quebec, Canada (C. Cortes; N.D. Lawrence; D.D. Lee; M. Sugiyama; R. Garnett, eds.) (2015), pp. 2053-2061

[20] A. Genevay; M. Cuturi; G. Peyré; F.R. Bach Stochastic optimization for large-scale optimal transport, December 5–10, 2016, Barcelona, Spain (D.D. Lee; M. Sugiyama; U. von Luxburg; I. Guyon; R. Garnett, eds.) (2016), pp. 3432-3440

[21] A. Genevay; G. Peyré; M. Cuturi Learning generative models with sinkhorn divergences, 9–11 April 2018, Playa Blanca, Lanzarote, Canary Islands, Spain (A.J. Storkey; F. Pérez-Cruz, eds.) (Proceedings of Machine Learning Research), Volume vol. 84, PMLR (2018), pp. 1608-1617

[22] E. Giné; R. Nickl Mathematical Foundations of Infinite-Dimensional Statistical Models, Cambridge Ser. Statist. Probab. Math., vol. 40, Cambridge University Press, New York, 2016 (MR 3588285)

[23] W. Jitkrittum; Z. Szabó; K.P. Chwialkowski; A. Gretton Interpretable distribution features with maximum testing power, December 5–10, 2016, Barcelona, Spain (D.D. Lee; M. Sugiyama; U. von Luxburg; I. Guyon; R. Garnett, eds.) (2016), pp. 181-189

[24] A. Juditsky; P. Rigollet; A. Tsybakov Learning by mirror averaging, Ann. Stat., Volume 36 (2008) no. 5, pp. 2183-2206 (MR MR2458184)

[25] M.J. Kearns; Y. Mansour; A.Y. Ng An information-theoretic analysis of hard and soft assignment methods for clustering, Brown University, Providence, Rhode Island, USA, August 1–3, 1997 (D. Geiger; P.P. Shenoy, eds.), Morgan Kaufmann (1997), pp. 282-293

[26] C. Léonard A survey of the Schrödinger problem and some of its connections with optimal transport, Discrete Contin. Dyn. Syst., Volume 34 (2014) no. 4, pp. 1533-1574 (MR 3121631)

[27] M. Liero; A. Mielke; G. Savaré Optimal entropy-transport problems and a new Hellinger–Kantorovich distance between positive measures, Invent. Math., Volume 211 (2018) no. 3, pp. 969-1117 (MR 3763404)

[28] B.G. Lindsay Mixture models: theory, geometry and applications, Regional Conference Series in Probability and Statistics, vol. 5, Institute of Mathematical Statistics and American Statistical Association, Haywood CA and Alexandria VA, 1995

[29] G. Montavon; K.-R. Müller; M. Cuturi Wasserstein training of restricted boltzmann machines, December 5–10, 2016, Barcelona, Spain (D.D. Lee; M. Sugiyama; U. von Luxburg; I. Guyon; R. Garnett, eds.) (2016), pp. 3711-3719

[30] J. Mueller; T.S. Jaakkola Principal differences analysis: interpretable characterization of differences between distributions, December 7–12, 2015, Montreal, Quebec, Canada (C. Cortes; N.D. Lawrence; D.D. Lee; M. Sugiyama; R. Garnett, eds.) (2015), pp. 1702-1710

[31] G. Peyré; M. Cuturi Computational Optimal Transport, 2017 (Tech. report)

[32] P. Rigollet Kullback–Leibler aggregation and misspecified generalized linear models, Ann. Stat., Volume 40 (2012) no. 2, pp. 639-665 (MR 2933661)

[33] P. Rigollet; A. Tsybakov Exponential screening and optimal rates of sparse estimation, Ann. Stat., Volume 39 (2011) no. 2, pp. 731-771 (MR 2816337)

[34] P. Rigollet; A. Tsybakov Sparse estimation by exponential weighting, Stat. Sci., Volume 27 (2012) no. 4, pp. 558-575

[35] P. Rigollet; J. Weed Uncoupled isotonic regression via minimum Wasserstein deconvolution, 2018 | arXiv

[36] A. Rolet; M. Cuturi; G. Peyré Fast dictionary learning with a smoothed wasserstein loss, AISTATS 2016, Cadiz, Spain, May 9–11, 2016 (A. Gretton; C.C. Robert, eds.) (J. Mach. Learn. Res. Workshop Conf. Proc.), Volume vol. 51, (2016), pp. 630-638

[37] Y. Rubner; C. Tomasi; L.J. Guibas The earth mover's distance as a metric for image retrieval, Int. J. Comput. Vis., Volume 40 (2000) no. 2, pp. 99-121

[38] F. Santambrogio Optimal Transport for Applied Mathematicians, Birkäuser, NY, 2015

[39] G. Schiebinger, J. Shu, M. Tabaka, B. Cleary, V. Subramanian, A. Solomon, S. Liu, S. Lin, P. Berube, L. Lee, et al., Reconstruction of developmental landscapes by optimal-transport analysis of single-cell gene expression sheds light on cellular reprogramming, bioRxiv (2017) 191056.

[40] E. Schrödinger Sur la théorie relativiste de l'électron et l'interprétation de la mécanique quantique, Ann. Inst. Henri Poincaré, Volume 2 (1932) no. 4, pp. 269-310 (MR 1508000)

[41] J. Solomon; F. De Goes; G. Peyré; M. Cuturi; A. Butscher; A. Nguyen; T. Du; L. Guibas Convolutional wasserstein distances: efficient optimal transportation on geometric domains, ACM Trans. Graph., Volume 34 (2015) no. 4, p. 66

[42] A.G. Wilson The use of entropy maximising models, in the theory of trip distribution, mode split and route split, J. Transp. Econ. Policy, Volume 3 (1969) no. 1, pp. 108-126

Cited by Sources:

Comments - Policy