Comptes Rendus
Optimal feedback control of dynamical systems via value-function approximation
Comptes Rendus. Mécanique, Volume 351 (2023) no. S1, pp. 535-571.

A self-learning approach for optimal feedback gains for finite-horizon nonlinear continuous time control systems is proposed and analysed. It relies on parameter dependent approximations to the optimal value function obtained from a family of universal approximators. The cost functional for the training of an approximate optimal feedback law incorporates two main features. First, it contains the average over the objective functional values of the parametrized feedback control for an ensemble of initial values. Second, it is adapted to exploit the relationship between the maximum principle and dynamic programming. Based on universal approximation properties, existence, convergence and first order optimality conditions for optimal neural network feedback controllers are proved.

Online First:
Published online:
DOI: 10.5802/crmeca.199
Classification: 49J15, 49N35, 68Q32, 93B52, 93D15
Keywords: optimal feedback control, neural networks, Hamilton–Jacobi–Bellman equation, self-learning, reinforcement learning
Karl Kunisch 1, 2; Daniel Walter 3

1 University of Graz, Institute of Mathematics and Scientific Computing, Heinrichstr. 36, A-8010 Graz, Austria
2 Johann Radon Institute for Computational and Applied Mathematics (RICAM), Austrian Academy of Sciences, Altenberger Straße 69, 4040 Linz, Austria
3 Institut für Mathematik, Humboldt-Universität zu Berlin, Rudower Chaussee 25, 10117 Berlin, Germany
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
     author = {Karl Kunisch and Daniel Walter},
     title = {Optimal feedback control of dynamical systems via value-function approximation},
     journal = {Comptes Rendus. M\'ecanique},
     pages = {535--571},
     publisher = {Acad\'emie des sciences, Paris},
     volume = {351},
     number = {S1},
     year = {2023},
     doi = {10.5802/crmeca.199},
     language = {en},
AU  - Karl Kunisch
AU  - Daniel Walter
TI  - Optimal feedback control of dynamical systems via value-function approximation
JO  - Comptes Rendus. Mécanique
PY  - 2023
SP  - 535
EP  - 571
VL  - 351
IS  - S1
PB  - Académie des sciences, Paris
DO  - 10.5802/crmeca.199
LA  - en
ID  - CRMECA_2023__351_S1_535_0
ER  - 
%0 Journal Article
%A Karl Kunisch
%A Daniel Walter
%T Optimal feedback control of dynamical systems via value-function approximation
%J Comptes Rendus. Mécanique
%D 2023
%P 535-571
%V 351
%N S1
%I Académie des sciences, Paris
%R 10.5802/crmeca.199
%G en
%F CRMECA_2023__351_S1_535_0
Karl Kunisch; Daniel Walter. Optimal feedback control of dynamical systems via value-function approximation. Comptes Rendus. Mécanique, Volume 351 (2023) no. S1, pp. 535-571. doi : 10.5802/crmeca.199.

[1] Wendell H. Fleming; H. Mete Soner Controlled Markov processes and viscosity solutions, Stochastic Modelling and Applied Probability, 25, Springer, 2006 | Zbl

[2] Sergey Dolgov; Dante Kalise; Karl Kunisch Tensor decomposition methods for high-dimensional Hamilton-Jacobi-Bellman equations, SIAM J. Sci. Comput., Volume 43 (2021) no. 3, p. A1625-A1650 | DOI | MR | Zbl

[3] Karl Kunisch; Daniel Walter Semiglobal optimal feedback stabilization of autonomous systems via deep neural network approximation, ESAIM, Control Optim. Calc. Var., Volume 27 (2021), 16 | MR | Zbl

[4] D. Onken; L. Nurbekyan; X. Li; S. W. Fung; S. Osher; L. Ruthotto A neural network approach applied to multi-agent optimal control (2021) (preprint, arXiv:2104.03270) | DOI

[5] Olivier Bokanowski; Jochen Garcke; Michael Griebel; Irene Klompmaker An adaptive sparse grid semi-lagrangian scheme for first order Hamilton-Jacobi Bellman equations, J. Sci. Comput., Volume 55 (2013) no. 3, pp. 575-605 | DOI | MR | Zbl

[6] Maurizio Falcone; Roberto Ferretti Numerical methods for Hamilton-Jacobi type equations (Handbook of Numerical Analysis), Volume 17, North-Holland, 2016, pp. 603-626 | DOI | MR

[7] Dante Kalise; Karl Kunisch; Z. Rao Hamilton-Jacobi-Bellman equations: Numerical Methods and Applications in Optimal Control, Radon Series on Computational and Applied Mathematics, 21, Walter de Gruyter, 2018 | DOI

[8] Dante Kalise; Karl Kunisch Polynomial approximation of high-dimensional Hamilton-Jacobi-Bellman equations and applications to feedback control of semilinear parabolic PDEs, SIAM J. Sci. Comput., Volume 40 (2018) no. 2, p. a629-a652 | DOI | MR | Zbl

[9] Konstantin Fackeldey; Mathias Oster; Leon Sallandt; Reinhold Schneider Approximative policy iteration for exit time feedback control problems driven by stochastic differential equations using tensor train format (2020) (preprint, arXiv:2010.04465) | DOI

[10] Pierre-Louis Lions; J.-C. Rochet Hopf formula and multitime Hamilton-Jacobi equations, Proc. Am. Math. Soc., Volume 96 (1986), pp. 79-84 | DOI | MR | Zbl

[11] Yat Tin Chow; Wuchen Li; Stanley Osher; Wotao Yin Algorithm for Hamilton-Jacobi equations in density space via a generalized Hopf formula (2018) (preprint, arXiv:1805.01636) | DOI

[12] Behzad Azmi; Dante Kalise; Karl Kunisch Optimal feedback law recovery by gradient-augmented sparse polynomial regression, J. Mach. Learn. Res., Volume 22 (2021), 48 | MR | Zbl

[13] Tenavi Nakamura-Zimmerer; Qi Gong; Wei Kang Adaptive deep learning for high-dimensional Hamilton-Jacobi-Bellman equations, SIAM J. Sci. Comput., Volume 43 (2021) no. 2, p. a1221-a1247 | DOI | MR | Zbl

[14] D. Bertsekas Reinforcement Learning and Optimal Control, Athena Scientific optimization and computation series, 1, Athena Scientific, 2019

[15] F. L. Lewis; D. Vrabie Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits Syst. Mag., Volume 9 (2009) no. 3, pp. 32-50 | DOI

[16] Benjamin Recht A tour of reinforcement learning: The view from continuous control (2018) (preprint, arXiv:1806.09460) | DOI

[17] K. Vamvoudakis; F. Lewis; S. S. Ge Neural networks in feedback control systems, Mechanical Engineers’ Handbook, Volume 3, Wiley, 2014, pp. 843-894

[18] Maurizio Falcone; Roberto Ferretti Semi-Lagrangian approximation schemes for linear and Hamilton-Jacobi equations, Other Titles in Applied Mathematics, 133, Society for Industrial and Applied Mathematics, 2014 | Zbl

[19] Allan Pinkus Approximation theory of the MLP model in neural networks, Acta Numerica, Volume 8 (1999), pp. 143-195 | DOI | MR | Zbl

[20] Kurt Hornik Approximation capabilities of multilayer feedforward networks, Neural Networks, Volume 4 (1991) no. 2, pp. 251-257 | DOI

[21] J. Diestel; J. J. jun. Uhl Vector measures, Mathematical Surveys, 15, American Mathematical Society, 1977 | DOI | Zbl

[22] Wolfgang Arendt; Marcel Kreuter Mapping theorems for Sobolev spaces of vector-valued functions, Stud. Math., Volume 240 (2018) no. 3, pp. 275-299 | DOI | MR | Zbl

[23] Jonathan Barzilai; Jonathan M. Borwein Two-point step size gradient methods, IMA J. Numer. Anal., Volume 8 (1988) no. 1, pp. 141-148 | DOI | MR | Zbl

Cited by Sources:

Comments - Policy

Articles of potential interest

Optimal distributed-control of vortices in Navier–Stokes flows

Slim Chaabane; Jamel Ferchichi; Karl Kunisch

C. R. Math (2005)

Differentiability of the L1-tracking functional linked to the Robin inverse problem

S. Chaabane; J. Ferchichi; K. Kunisch

C. R. Math (2003)