Optimal feedback control of dynamical systems via value-function approximation

A self-learning approach for optimal feedback gains for finite-horizon nonlinear continuous time control systems is proposed and analysed. It relies on parameter dependent approximations to the optimal value function obtained from a family of universal approximators. The cost functional for the training of an approximate optimal feedback law incorporates two main features. First, it contains the average over the objective functional values of the parametrized feedback control for an ensemble of initial values. Second, it is adapted to exploit the relationship between the maximum principle and dynamic programming. Based on universal approximation properties, existence, convergence and first order optimality conditions for optimal neural network feedback controllers are proved.


Introduction
In this paper we focus on optimal feedback control for problems of the form inf y,u J (y, u) := 1 2 T 0 |Q 1 (y(t ) − y d (t ))| 2 + β|u(t )| 2 dt + 1 2 Q 2 y(T ) − y T d 2 s.t .ẏ = f (y) + g (y)u, y(0) = y 0 , and u ∈ L 2 (0, T ; R m ) , (P ) with nonlinear dynamics described by f : [0, T ] × R n → R n . The system can be influenced by choosing a control input u which enters through a control operator g : R n → R n×m . We assess the perfomance of a given control by its objective functional value which comprises the (weighted) distance between the associated state trajectory y and a given desired state y d as well as the norm of the control for some cost parameter β > 0. The weighting matrices Q i , for i = 1, 2, are assumed to be symmetric positive semi-definite. Searching for an optimal control u * in feedback form requires to find a function F * : [0, T ] × R n → R m such that u * (t ) = F * t , y * (t ) , for t ∈ (0, T ).
Here (u * , y * ) denotes an optimal control-trajectory pair associated to (P ). Under appropriate conditions, see e.g. [1], the feedback mapping can be expressed as where V * stands for the value function associate to (P ), i.e. for (T 0 , y 0 ) ∈ [0, T ] × R n : V * (T 0 , y 0 ) = min y,u J T 0 (y, u), subject toẏ = f (y) + g (y)u, y(T 0 ) = y 0 , and J T 0 (y, u) = 1 2 The value function V * satisfies a Hamilton-Jacobi-Bellman (HJB) equation which is a timedependent first order hyperbolic equation of spatial dimension n. Numerical realisations, therefore, are plagued by the curse of dimensionality. Indeed a direct solution of the HJB equation already becomes computationally prohibitive for moderate dimensions n. Therefore, for practical realization, the interest in alternative techniques arises. In many situations of practical relevance researches have relied on linear approximations to the nonlinear dynamical system and have treated the resulting linear-quadratic problem by Riccati techniques. Much research has concentrated on validating this approach locally around a reference trajectory. Globally such a strategy may fail, see for instance [2,3].
In this paper we follow an approach, possibly first proposed in [3], circumventing the construction of the value function on the basis of solving the HJB equation. Rather the feedback mapping is constructed by an unsupervised self-learning technique. In practice, this requires the approximation of V * by a family of functions V θ which are parametrized by a finite dimensional vector θ and satisfy a uniform approximation property. Possible families of universal approximators include, e.g., neural networks or piecewise polynomial approximations. Subsequently, in view of (1), we introduce the corresponding feedback law F θ (t , y) = − 1 β g ⊤ (y)∂ y V θ (t , y), for (t , y) ∈ [0, ∞) × R n , as approximation to F * . An "optimal" parametrized feedback law is then determined by a variant of the following self-learning, structure preserving, variational problem: In this problem, minimization with respect to u is replaced by minimizing with respect to the parameters θ which characterize V θ and F θ . The cost functional of problem (3) consists of four parts: The first term represents the objective functional of (P ) where the control u is replaced by the closed loop expression F θ (y). The next two terms realize the fact that V θ is constructed as approximation to the value function associated to (P ) and exploit the well-known property that, under certain conditions, the gradient of the value function coincides with the solution of a suitable adjoint equation, see e.g. [1, p. 21]. The final term penalizes the norm of the structural parameters. We point out that V θ and F θ are learned along the orbit O = {y(t ; y 0 ) : t ∈ (0, ∞)} within the state space R n . To accommodate the case that one trajectory does not provide enough information, we propose to involve an ensemble of orbits departing from a set Y 0 of initial conditions, and to reformulate problem (3) accordingly. This will be done in Section 4 below. While we focus on linear-quadratic objective functionals, the derived results can be readily generalized if suitable coercivity and differentiability properties are assumed for this functional. In our earlier work on learning a feedback function [3], we considered infinite horizon optimal control problems. In that case, the time-dependent HJB equation results in a stationary one. There we had not yet incorporated the structure preserving terms involving V θ and ∂ y V θ into the cost. Moreover we directly constructed an approximation F θ to the vector valued function F * , rather than approximating the scalar valued function V * and subsequently using (2). In the present paper we provide the theoretical foundations for the learning based technique that we propose to construct an approximation to the optimal feedback function for (P ). Recently in [4] a variant of the approach as in [3] was used for interesting numerical investigations to construct optimal feedback functions for finite horizon multi-agent optimal control problems.
Let us very briefly mention some of the vast literature on solving the HJB equations. Semi-Lagrangian schemes and finite difference methods have been deeply investigated to directly solve HJB equations directly, see e.g. [5][6][7]. Significant progress was made in solving high dimensional HJB equations by the of use policy iterations combined with tensor calculus techniques, [2,8,9]. The use of Hopf formulas was proposed in e.g. [10,11]. Interpolation techniques, utilizing ensembles of open loop solutions have been analyzed in the works of [12,13], for example. Finally we mention that optimal feedback control is intimately related to reinforcement learning, see e.g. the monograph [14], and also the survey articles [15][16][17].
The manuscript is structured as follows. Some pertinent notation is gathered in Section 2. In Section 3 concepts of optimal feedback control, semi-global with respect to the initial condition y 0 , are gathered. Section 4 is devoted to describing the learning technique that we propose to approximate the optimal feedback function. In Section 5 the required assumptions on approximating subspaces are checked for a class of neural networks and a class of piecewise polynomials. Existence of solutions to the approximating learning problems is proved in Section 6. Their convergence is analyzed in Section 7. The case of learning from finitely many orbits is the focus of Section 8. Section 9 provides an example illustrating the numerical feasibility of the proposed method. We do not aim for sophistication in this respect. The Appendix 9.2 details the proofs of several necessary technical results.

Notation
For I := (0, T ), with T > 0, we define W T = { y ∈ L 2 (I ; R n ) |ẏ ∈ L 2 (I ; R n ) }, where the temporal derivative is understood in the distributional sense. We equip W T with the norm induced by the inner product (y 1 , y 2 ) W T = (ẏ 1 ,ẏ 2 ) L 2 (I ;R n ) + (y 1 , y 2 ) L 2 (I ;R n ) for y 1 , y 2 ∈ W T , making it a Hilbert space. We recall that W T embeds continuously into C (Ī ; R n ). For a compact metric space X we denote the space of continuous functions between X and Y by C (X ; Y ) which we endow with ∥ϕ∥ C (X ;Y ) = max x∈X ∥ϕ(x)∥ Y as norm. By Y 0 we denote a compact set of initial conditions in R n . When arising as index, the space C (Y 0 ;W T ) will frequently be abbreviated by C . The space C 1 (X ; Y ) of continuously differentiable functions is defined analogously. Open balls of radius ε in a Banach space X with center x will be denoted by B ε (x). The space of bounded linear operators between Banach spaces X and Y , endowed with the canonical norm, is denoted by B(X , Y ). We further abbreviate B(X ) := B(X , X ).

Semi-global optimal feedback control
Consider the controlled nonlinear dynamical system of the forṁ y = f(y) + g(y)u in L 2 I ; R n , y(0) = y 0 , (4) described by Nemitsky operators for a.e. t ∈ I , f : I × R n → R m and g : I × R n → R n×m . The smoothness requirements on f and g will be detailed in Assumption 1 below. Our aim is to choose a control input u * ∈ L 2 (I ; R m ) which keeps the associated solution y * ∈ W T close to a known reference trajectory y d , while keeping the control effort small. This is formulated as the constrained minimization problem which incorporates the weighted misfit between the trajectory y within the time horizon I = (0, T ) and at the terminal time to desired states y d ∈ L 2 (I ; R n ) and y T d ∈ R n , as well as the norm of the control u. While this open loop optimal control problem captures well the objective formulated above, it comes with several disadvantages. First, its solution is a function of time only, and does not include the current state y(t ). This makes the open loop approach susceptible to possible perturbations in the dynamical system. Second, determining the control action for a new initial condition requires to solve (P y 0 ) from the start.
The aforementioned limitations of open loop optimal controls motivate the study of semiglobal optimal feedback control approaches to (P y 0 ). More precisely, given a compact set Y 0 ⊂ R n , we look for a feedback function F * : I × R n → R m which induces a Nemitsky operator F * : W T → L 2 I ; R m , F * (y)(t ) = F * (t , y(t )) for a.e. t ∈ I , such that for every y 0 ∈ Y 0 the closed loop systeṁ y = f(y) + g(y)F * (y), y(0) = y 0 , admits a unique solution y * (y 0 ) ∈ W T and (y * (y 0 ), F * (y * (y 0 ))) is a minimizing pair of (P y 0 ). The determination of an optimal feedback function usually rests on the computation of the value function to(P y 0 ) which is defined as where (T 0 , y 0 ) ∈ I × R n , and J T 0 (y, u) is defined as By construction V * satisfies the final time boundary condition If V * is continuously differentiable in a neighborhood of some (t , y 0 ) ∈ I × R n then it solves the instationary Hamilton-Jacobi-Bellman (HJB) equation in the classical sense there, see e.g. [1,18]. Here ∂ t V * denotes the partial derivative of the value function with respect to t and ∂ y V * is the gradient of V * with respect to the y-variable. An optimal control for (P y 0 ) in feedback form is then given by u * = − 1 β g(y * ) ⊤ ∂ y V * (y * ) where ∂ y V * (y * )(t ) = ∂ y V * (t , y * (t )) for every t ∈ I , and y * = y * (y 0 ) ∈ W T solves the closed loop systemẏ Thus y * (y 0 ), − 1 β g y * (y 0 ) ⊤ ∂ y V * y * (y 0 ) ∈ arg min (P y 0 ) and the function is an optimal feedback law.
Realizing the optimal feedback in this way requires a solution to (8) which is a partial differential equation on R n . This can be extremely challenging or even impossible depending on the dimension n and the computational facilities at hand. Similarly to our previous manuscript [3], we take a different approach by formulating minimization problem over a suitable set of feedback functions involving the closed loop system as a constraint. This relates to a learning problem, within which the feedback functions are trained to achieve optimal stabilization. This makes the problem computationally amenable.
The procedure just described will be formalized in the following section. Here we first summarize the assumptions on the nonlinear dynamical system that we refer to throughout the paper.

Assumption 1.
(A.1) The functions f : I × R n → R n and g : I × R n → R n×m are twice continuously differentiable.
Their Jacobians and Hessians with respect to the second variable, denoted by D y f , D y y f , and D y g , D y y g , respectively, are Lipschitz continuous on compact sets, uniformly for t ∈ I .
is twice continuously differentiable on I ×B 2 M (0) with Lipschitz continuous gradient and and ı denotes the embedding of W T into C (I ; R n )).
As a consequence of (A.1), the Nemitsky operators f, g are at least two times continuously differentiable with domains and ranges as defined in (5). Their derivatives, denoted by Df(y) ∈ B(W T , L 2 (I ; R n )), and Dg(y) ∈ B(W T ; B(L 2 (I ; R m ); L 2 (I ; R n ))), are the Nemitsky operators induced by D y f and D y g . Moreover f, Df, g, Dg are Lipschitz continuous and bounded, on bounded subsets of L ∞ (I ; R n ), and thus in particular on Y ad ⊂ W T , where Finally Df ⊤ (y) ∈ B(W T , L 2 (I ; R n )) denotes the Nemitsky operator associated to D y f ⊤ .
When referring to Assumption A we mean (A.1)-(A.3). We emphasize that the constant M appearing in (A.2) and (A.3) is assumed to be same. Note further that as a consequence of (A.3) problem (P y 0 ) admits a solution for each y 0 ∈ Y 0 , with the optimal control given by u * = F * (y * (y 0 ))).
Here Dg is induced by D y g which is given by where g (y) = (g i j ) and "∂ k " denotes the partial derivative w.r.t to the k th component of y.
The transposed Dg(y) ⊤ , which will arise in the adjoint equation below, is induced by the tensor D y g (t , ·) ⊤ = (D y g (t , ·) k j i ) ∈ R n×n×m , with t ∈ I . In particular, we readily verify that Dg(·) ⊤ ∈ B(L 2 (I ; R m ); B(W T ; L 2 (I ; R n ))).
To end this section we collect structural information on the relation between the adjoined state, denoted by p below, the optima value function V * , and the induced optimal feedback law F * . Proposition 3. Let Assumption 1 hold. Then there exists a unique continuous mapping p * : Y 0 → W T such that for each y 0 ∈ Y 0 the tuple (y, p) = (y * (y 0 ), p * (y 0 )) satisfies d d t y = f(y) + g(y)F * (y), y(0) = y 0 , Moreover we have Proof of Proposition 3. By (A.3) problem (P y 0 ) admits a solution for each y 0 ∈ Y 0 . Then (A.1)-(A.2) guarantee that (13), with y = y(y 0 ) ∈ W T the state component of a solution to (P y 0 ), admits a unique solution p in W T which continuously depends on y ∈ W T . Moreover (12) - (14) represent the first order necessary optimality condition for (P y 0 ) with the optimal control u(t ) = F * (y(t )).
Since y * : Y 0 → W T is continuous as mentioned in Remark 2 and the solution to (13) depends continuously on y ∈ W T , the claimed continuity p * : Y 0 → W T follows. Equation (15) is a direct consequence of the dynamic programming principle, and (A.3).

Examples
In this section we discuss two particular examples for the parameterized mappings V ε : deep residual networks and piecewise polynomial functions of sufficiently high degree.

Residual networks
To explain the approximation of the value function by residual neural networks, we first fix some notation. Let L ε ∈ N, L ε ≥ 2, as well as N ε i ∈ N, i = 1, . . . , L ε − 1 be given. We set N ε 0 = n + 1 and N ε L = 1. Furthermore define The space R ε is uniquely determined by its architecture . A set of parameters θ ∈ R ε given by θ = W 11 ,W 12 , b 1 , . . . , W L ε is called a neural network with L ε layers. Moreover let σ ∈ C 4 (R) be given and assume that σ is not a polynomial. The function as well as is called the realization of θ with activation function σ. Here the application of σ is defined to act componentwise i.e. given an index i ∈ {1, . . . , L ε − 1} and x ∈ R N ε i we set By construction, V ε θ satisfies the terminal condition Moreover Assumption 4 is fulfilled as confirmed by the following result.
Proof. Let us set h(t , y) = V * (t , y) for (t , y) ∈ I ×B 2 M (0). Then h is twice continuously differentiable on I ×B 2 M (0) and h(T, y) = 1 2 |Q 2 (y − y T d )| 2 . A consequence of the universal approximation theorem implies that for all ε > 0 there exists h ε ∈ M net such that where [20]. Let us observe that h ε can be expressed as a residual network. Indeed, since where all norms are taken over I ×B 2 M (0). This ends the proof.

Piecewise polynomials
Fix ε 0 > 0, and let ε ∈ (0, ε 0 ] be arbitrarily fixed. Throughout this subsection we assume (A.2) and in particular we shall make use of the global Lipschitz continuity of Note that we do not highlight the dependence of (t i ,ȳ i 0 ) and K i on ε. For each i define the parametrized polynomial where Sym(n) denotes the space of real symmetric n ×n matrices. Note that V ε i is infinitely many times differentiable in all of its arguments.
For each ε ∈ (0, ε 0 ] we define a special partition of unity withμ and m positive constants independent of i , (t , y) ∈K , ε ∈ (0, ε 0 ]. Finally we define and introduce the family of parameterized functions on R n+1 by Thus the final time condition in the HJB equation is fulfilled. Next we show that V ε θ satisfies the approximation property in Assumption 4 for the particular choice of Proof. We already argued that V ε θ has the desired regularity. It remains to prove the required approximation capabilities. For abbreviation set V ε for somec > 0 depending on the global Lipschitz constant of V onK , and independent of ε ∈ (0, ε 0 ] and i . Still recall that the sets for (t , y) ∈K . From (24) and (24) we deduce that ∥V (t , y) − V ε θ (t , y)∥ C (K ) ≤ 2cε 3 . For the gradient with respect to y we proceed similarly. Fixing (t , y) ∈K we estimate By (28) with j = 1 the first terms in D 1 and D 2 can be estimated bycε 2 . Using (24) and (28) the second terms in D 1 and D 2 can be bounded by mμε 2 . Combining these estimate we arrive at In an analogous manner one can obtain a bound of the order O(ε) on the difference of the Hessians of V and V ε θ . This finishes the proof of Theorem 6.
□ In Appendix A it is shown how standard mollifiers can be used so that (24) is satisfied. This requires some extra attention due to the required bounds on the derivatives of ϕ i .

Existence of minimizers to (P ε )
This section is devoted to proving the existence of minimizing triples to (P ε ). Throughout this section c will denote a generic constant independent of ε > 0 and y 0 ∈ Y 0 .

Existence of admissible points
Recall from Assumption 1 and Remark 2 that the optimal ensemble state y * ∈ C (Y 0 ;W T ) satisfies ∥y * ∥ C ≤ M Y 0 . Accordingly we define the set of admissible states and admissible controls as We also recall the definition Y ad in (10).
To prove the existence of minimizers to (P ε ) we first argue that the admissible set is nonempty for ε small enough. For this purpose consider the family θ ε ∈ R ε , 0 < ε ≤ ε 0 , from Assumption 4 as well as the associated closed loop system of state and adjoint equationṡ subject to the following initial and terminal conditions We first prove the following approximation result. Theorem 7. Let Assumptions 1 and 4 hold. There exists a constant c such that for all ε > 0 small enough and for all y 0 ∈ Y 0 the system (30) and (31) admits unique solutions y ε = y ε (y 0 ) ∈ Y ad and p ε = p ε (y 0 ) ∈ W T . Furthermore y ε ∈ C 1 (Y 0 ;W T ), p ε ∈ C (Y 0 ;W T ), and F * (y * ) ∈ C (Y 0 ; L 2 (I ; R m )) hold and In particular, (y ε , p ε , θ ε ) ∈ N ε ad for all ε > 0 small enough. In order to prove this we require several auxiliary results.

Lemma 8. There exists a constant c such that for all ε small enough there holds
Proof. According to the definition of F * and F ε θ ε we split Applying the integral mean value theorem yields Thus we can use Assumption 4 for every s ∈ [0, 1] and δy ∈ W ∞ and estimate Similarly we obtain Last recall that g is Lipschitz continuous and uniformly bounded on Y ad . Combining these facts yields the desired statement.

□
With the same arguments the following a priori estimate can be obtained. For the sake of brevity its proof is omitted.

Corollary 9.
There exists a constant c such that for all ε small enough there holds Next we establish existence of a unique solution to (30) as well as a first approximation result.

Proposition 10. Let Assumptions 1 and 4 hold. Then for all
In particular we have Proof. The proof is based on a fixed-point argument. Let y 0 ∈ Y 0 be arbitrary but fixed. Define the set On M we consider the mapping Z : where we use Corollary 9 and the definition of M . Hence ∥v∥ L 2 ≤ cε. Here and below c denotes a generic constant which is independent of y 0 ∈ Y 0 and all ϵ > 0 sufficiently small. We may invoke Proposition 30 and Corollary 31 from the Appendix, to assert the existence of a unique solution z ∈ Y ad to (32) with if ε > 0 is chosen small enough. From this we particularly conclude Z (M ) ⊂ M for all y 0 ∈ Y 0 and ε > 0 small. It remains to prove that Z is a contraction. To this end let y 1 , y 2 ∈ M be given. Applying Corollary 31 yields the first inequality in as well as of y 0 ∈ Y 0 , and ϵ sufficiently small. The last inequality follows from Lemma 8. Choosing ε > 0 small enough we conclude that Z admits a unique fixed point y ε = Z (y ε ) ∈ W T on M . Clearly, the function y ε (y 0 ) := y ε satisfies (30), y ε ∈ M ⊂ Y ad as well as and by Corollary 9 Finally according to Proposition 30 the solution y ε (y 0 ) is unique and the mapping y ε is at least of class C 1 .

□
Next we estimate the W 1,2 difference between y ε and y * .
Proposition 11. The mapping y ε ∈ C 1 (Y 0 ;W T ) from Theorem 10 satisfies Proof. By the previous proposition the estimate is already known for C 1 (Y 0 ;W T ) replaced by C (Y 0 ;W T ). Now fix y 0 ∈ Y 0 and i ∈ {1, . . . , n}. By the inverse mapping theorem the partial derivatives of y * and y ε at y 0 are given by Here, e i denotes the i th canonical basis vector in R n and denote the linear continuous inverses of T * (y 0 )δy = δ y − Df y * (y 0 ) δy − Dg y * (y 0 ) δy F * y * (y 0 ) − g y * (y 0 ) DF * y * (y 0 ) δy Using Gronwall's inequality, we readily verify that for all δv ∈ L 2 (I , R n ), δy 0 ∈ R n , y 0 ∈ Y 0 and some C > 0 independent of y 0 , δv, δy 0 . Now we recall that y ε (y 0 ), y * (y 0 ) ∈ Y ad and that Df, Dg, g are Lipschitz continuous, and thus in particular bounded, on Y ad , see Assumption (A.1). Together with boundedness of {∥F * (y(y 0 ))∥ L 2 : y 0 ∈ Y 0 }, Corollary 9 and Theorem 10 we conclude where C > 0 is the constant from (33). Since all involved constants are independent of y 0 ∈ Y 0 we obtain the desired estimate ∂ i y ε − ∂ i y * C ≤ cε.

□
Next we address the solvability of the adjoint equation (20).

Proposition 12. There exists a constant c such that for all ε small enough there exists
Given y ∈ Y ad consider the linear ordinary differential equation It admits a unique solution p = P (y) ∈ W T which is bounded independently of y ∈ Y ad . Moreover the mapping P : W T → W T is continuous on Y ad in virtue of the Gronwall lemma and Assumption 1. The existence of a mapping p ε which satisfies (31) then follows by setting p ε = P • y ε .
It remains to prove the estimate for the difference between p ε satisfying (31) and p * satisfying (13). For this purpose we can use the same technique as in the proof of Proposition 11 and therefore we only give the main estimates. Recall that Df(·) ⊤ , Dg(·) ⊤ are Lipschitz continuous on Y ad . The the most involved term in the estimate analogous to (34) is . Now a perturbation argument as in the proof of Propostion 11 provides us with is used in the second inequality, and Proposition 11 and Corollary 9 are utilized in the final one. Since all involved constants are again independent of y 0 ∈ Y 0 , this finishes the proof.
□ Summarizing all previous observations we arrive at the proof of Theorem 7.
Proof of Theorem 7. This follows directly by combining Proposition 10, Proposition 11, and Proposition 12.
As a last prerequisite for proving existence to (P ε ) we argue that the admissible set N ε ad is closed. The existence of at least one minimizing triple to (P ε ) then follows by variational arguments. From here on we always assume that N ε ad from (29) is nonempty, i.e. that ε is sufficiently small.
The proof builds upon the following two lemmas.
Proof. By assumption we have y k ∈ y ad , and hence ∥y k (y 0 )∥ W T ≤ 2M Y 0 for all k ∈ N and y 0 ∈ Y 0 , and y k ∈ C 1 (Y 0 ;W T ) for all k ∈ N, see Proposition 10. Let us fix an arbitrary y 0 ∈ Y 0 . and set y k := y k (y 0 ) for abbreviation. Then there exists a subsequence, denoted by the same index, and y ∈ W T such that y k y in W T . Since W T → c C (Ī ; R n ) → L p (I ; R n ), 1 ≤ p ≤ +∞, we immediately get . Moreover by Assumption 4 for every δ > 0 there exits K δ ∈ N such that for all k ≥ K δ . Here M denotes the constant from (A.2). For all such k we get utilizing (36) for a constant c independent of k Since the solution to this equation is unique, every weak accumulation point of y k satisfies (37) and we have y k (y 0 ) → y in W T for the whole sequence. We repeat this construction for all y 0 ∈ Y 0 . This defines a function y : Y 0 → W T such that y k (y 0 ) → y(y 0 ) in W T and such that (37) is satisfied with y = y(y 0 ) for each y 0 ∈ Y 0 . By Proposition 10 it is the unique solution to (35). Lebesgue's dominated convergence theorem for Bochner integrals [21, p. 45] implies that y k → y in L 1 (Y 0 ;W T ), and by boundedness of {∥y k ∥ C } ∞ k=1 also in L 2 (Y 0 ;W T ). By assumption y k converges weakly in L 2 (Y 0 ;W T ) to y. Thus we have y = y. Moreover ∥y∥ C ≤ 2M Y 0 and hence y ∈ y ad . □ Next we consider the behavior of the adjoint states p k .
Proof. From Lemma 14 recall that for the sequences y k := y k (y 0 ) ∈ Y ad and y := y(y 0 ) we have for each y 0 ∈ Y 0 Further for each k ∈ N and y 0 ∈ Y 0 , the element p k := p k (y 0 ) ∈ W T satisfies Recall from Assumption 4 that ∂ y V ε · is uniformly continuous on compact sets. Thus for every δ > 0 there is Applying Proposition 29 to the time-reversed equation (39) implies that for some c > 0 independent of y 0 ∈ Y 0 and all sufficiently large k. Since ∥y k ∥ C ≤ 2M Y 0 we finally conclude ∥p k ∥ C ≤ C for some C > 0 independent of k sufficiently large. We are now prepared to pass to the limit in (39). For this purpose we proceed as in the proof of Lemma 14 and use as well as to show that every weak accumulation point p ∈ W T of p k is in fact a strong accumulation point and satisfies the differential equation in (38). Since the solution to this equation is unique we get p k → p in W T for the whole sequence. Finally utilizing ∥p k ∥ C ≤ C and Lebesgue's dominated convergence theorem we conclude p = p(y 0 ) for all y 0 ∈ Y 0 .

Existence of minimizers
Finally we prove the existence of at least one minimizing triplet to (P ε ).

Theorem 16. Let Assumption 1 and 4 hold. Then for all
Proof. According to Theorem 7, the admissible set N ε ad is nonempty for ε > 0 small enough. Fix such a ε > 0 and let (y k , p k , θ k ) ∈ N ε ad denote a minimizing sequence for J ε i.e. y, p, θ).
Since y k ∈ y ad and Thus it admits at least one subsequence, denoted by the same index, with . As in the proof of Lemma 15 we verify that ∥y k ∥ C ≤ C and ∥p k ∥ C ≤ C for some C > 0 independent of k ∈ N. Consequently, by possibly taking another subsequence we arrive at For the following estimates it will be convenient to recall the augmented functional J ε , see (18), which arises in the running cost of (P ε ) in compact form: where J t was defined below (7). Now fix an arbitrary y 0 ∈ Y 0 and set y k := y k (y 0 ), p k := p k (y 0 ), y * := y * ε (y 0 ), p := p * ε (y 0 ).

From Lemma 14 and Lemma 15 we get
and, again using the uniform continuity of V ε as well as the uniform boundedness of V ε θ k (y k ) and ∂ y V ε θ k (y k ) in C (Y 0 ; L 2 (I )) and C (Y 0 ; L 2 (I ; R n )), respectively. Moreover we readily verify that , for some c > 0 independent of y 0 ∈ Y 0 , t ∈ (0, T ), and k ∈ N. Thus we arrive at Summarizing the previous findings there holds Using these expressions in J ε as given in (40), and the boundedness of ∥y k ∥ L 2 , |y k (0)|, ∥p k ∥ L 2 ∥F ε θ k (y k )∥ L 2 , ∥V ε θ k (y k )∥ L 2 independent of k ∈ N and y 0 ∈ Y 0 we finally get by using Lebesgue's dominated convergence theorem

Convergence towards optimal controls
In Proposition 10 and 12 it was established that the ensemble triple (y * , F (y * ), p * ) can be approximated by ensemble triples (y ε , F ε θ ε (y ε ), p ε ) in the order O(ε). In this section, the convergence of solutions to (P ε ) as ε → 0 is addressed. We first consider the terms in the definition J ε , see (18). To obtain the desired asymptotic behavior a smallness condition on the regularisation parameter γ ϵ , in relation to the norm of the parameters θ ε describing the approximation quality, is required. Theorem 17. Let Assumptions 1 and 4 hold the latter with θ ε ∈ R ε , and let (y * ε , p * ε , θ * ε ), denote an optimal triple to (P ε ) for all ε > 0 small enough. If additionally γ ε ∥θ ε ∥ 2 Proof. Let y ε , p ε denote the ensembles of state and adjoint trajectories associated to θ ε , see Theorem 7, for ε > 0 small enough. Then we have for some C > 0 independent of ε. Here we have used V * (0, y 0 ) = J (y * (y 0 ), F * (y * (y 0 ))) for all y 0 ∈ Y 0 , the embedding W T → C (Ī ; R n ) as well as the a priori estimates of Proposition 10. Next we utilize p * (y 0 ) = ∂V * (y * (y 0 )), y 0 ∈ Y 0 , to estimate where the last inequality is deduced from Proposition 10 and Proposition 12. Proceeding analogously and using V * (t , y * (y 0 )(t )) = J t (y * (y 0 ), F * (y * (y 0 ))) for all y 0 ∈ Y 0 , t ∈ I , we obtain where, using Assumption 4 and again Proposition 10 Combining the previous estimates with the optimality of (y * , p * , θ * ε ), and the assumption on the asymptotic behavior of γ ϵ we deduce that Recalling the definition of J ε , this yields all claimed estimates and finishes the proof.

□
Next the convergence of the ensemble trajectories (y * ε , p * ε ), the feedback controls F ε θ * ε (y * ε ) as well as the approximate value function V ε θ * ε are analyzed. For this purpose we make use of the additional regularity of ensemble solutions to the closed loop system, see Proposition 11, and introduce further constraints to (P ε ). Without changing the notation we henceforth set where M W 1,2 > 0 is a constant with ∥y * ∥ W 1,2 ≤ M W 1,2 , the function y * was introduced in (A.3), and W 1,2 = {y ∈ L 2 (I ;W T ) : ∂ i y ∈ L 2 (Y 0 ;W T ), i ∈ {1, . . . , n}} endowed with the natural norm. Next we note that β 2 F * y * (y 0 ) 2 L 2 ≤ J y * (y 0 ), F * y * (y 0 ) = V * (0, y 0 ) for all y 0 ∈ Y 0 . Thus, due to the continuity of the value function V * , see (A.2), there is M U > 0 with ∥F * (y * )∥ L ∞ ≤ M U . Correspondingly we set We point out that Theorem 16 remains valid despite the additional restriction of the set of admissible states and controls. Problem (P ε ) with Y ad , U ad replaced by Y ad , U ad will be denoted by ( P ϵ ).

Proposition 18. Let Assumption 1 and 4 hold.
Then for all ε > 0 small enough, Problem ( P ϵ ) admits at least one minimizing triple.
Hence the admissible set of ( P ϵ ) is not empty. The existence of a minimizing triple then follows by repeating the arguments of the proof of Theorem 16 noting that the admissible set (y, p, θ)∈ Y ad ×C (Y 0 ;W T )×R ε (y, p, θ) satisfies (19)−(21), F ε θ (y)∈ U ad is closed w.r.t to the weak topology on L 2 (Y 0 ;W T ) 2 × R ε .

□
Let us next address the convergence of the optimal ensemble states y * ε , adjoint states p ε and the associated feedback controls F ε θ * ε (y * ε ) as ε tends to 0.
It remains to address the strong convergence of p k . For this purpose we show that the func- . Fixing a test function ϕ ∈ L 2 (Y 0 ; L 2 (I ; R n )) we first note that Second, for L -a.e. y 0 ∈ Y 0 we estimate for some C > 0 independent of k ∈ N and y 0 . Here we made use of the boundedness of {y * . Integrating both sides of the inequality w.r.t to L and utilizing the strong convergence of y * k and F By repeating this argument for the different terms appearing in the adjoint equation we get that (ȳ,p,ū) := (ȳ(y 0 ),p(y 0 ),ū(y 0 )) satisfies for L -a.e. y 0 ∈ Y 0 . Applying Gronwall's inequality we deduce for L -a.e. y 0 ∈ Y 0 and C > 0 independent of y 0 and k. This yields p k →p strongly in L 2 (Y 0 ;W T ). Since the weakly convergent subsequence was chosen arbitrarily in the beginning, this finishes the proof.
□ Remark 20. If g(y(t )) = B ∈ R m×n then the statement of the previous theorem also holds without constraints on the control (i.e. for U ad = L 2 (Y 0 ; L 2 (I ; R m ))). In this particular case, the uniform boundedness of F see Theorem 17. Moreover the adjoint equation does no longer depend on the control. Repeating the arguments of the last proof yields the subsequential convergence of (y such that (ȳ,p,ū) := (ȳ(y 0 ),p(y 0 ),ū(y 0 )) satisfy the system of state and adjoint equations as well as (ȳ,ū) ∈ arg min (P y 0 ) for L -a.e. y 0 ∈ Y 0 . Then it only remains to argue the additional regularity u ∈ L ∞ (Y 0 ; L 2 (Y 0 ;W T )). This is, however, a direct consequence of the first order necessary optimality conditionū = (−1/β)B ⊤p for (P y 0 ), see Proposition 3.
We point out that the statement of Theorem 19 holds independently of the values of the penalty parameters γ 1 , γ 2 . If γ 1 , γ 2 > 0 then we additionally obtain the following convergence results for the approximate value function V ε θ * k and its derivative ∂ y V ε θ * k along optimal state trajectories.

Proposition 21. Let the prerequisites of Theorem 17 hold and let
denote a sequence of minimizing triplets as described in Theorem 19 Then we also have Proof. Due to the convergence of y * k →ȳ in L 2 (Y 0 ; L 2 (I ; R n )) and F ε k θ * k (y * k ) →ū in L 2 (Y 0 ; L 2 (I ; R m )), we conclude that Together with follows similarly from the strong convergence of p k .

Learning from a finite training set
We turn to analysing a discrete version of (P ε ). In this case we can proceed without the statespace constraint y ∈ Y ad provided certain growth bounds on f and g are satisfied. The numerical realization of (P ε ) will always rely on such a discrete approximation. Henceforth we fix a finite ensemble of initial conditions {y i 0 : i = 1, . . . , N } ⊂ Y 0 . For positive weights ω i , i = 1, . . . , N , and ε > 0 we consider subject tȯ Throughout this section, Assumptions 1 and 4 are supposed to hold. Further ε is supposed to be sufficiently small so that the set of admissible solutions for (P N ε ) is nonempty, compare Theorem 7. It will be convenient to introduce y = col(y 1 , . . . , y N ), and p = col(p 1 , . . . , p N ), which replace the ensemble states and costates from the previous sections.

Proposition 22.
Let ε > 0 be sufficiently small and let (y k , p k , θ k ) ∈ W 2N T ×R ε denote an infimizing sequence for (P N ε ). If max i ∥y k i ∥ L ∞ (I ;R n ) ≤ M ∞ for some M ∞ > 0 independent of k ∈ N, then Problem (P N ε ) admits at least one minimizer (y * , p * , θ * ).
Proof. Since by assumption (y k , p k , θ k ) is an infimizing sequence for (P N ε ) and since β > 0 we have for some C N > 0 depending on N . Moreover there holds (1 +C N ) using the uniform L ∞ and L 2 boundedness of y k i and F ε θ (y k i ), respectively. Thus we also have ∥y i k ∥ W T ≤ C N for all k ∈ N, for some C N > 0 which depends on N but not on k and i . The proof can now be completed by the same steps as Theorem 16.
□ Remark 23. The L ∞ -boundedness of the minimizing sequence y k i in Proposition 22 can be be ensured by additional assumptions on the dynamics of the problem. These include: • Add an additional state constraint ∥y i ∥ L ∞ ≤ M to (P N ε ). • Assume that there are a 1 , a 2 , a 3 > 0 such that | f (x)| ≤ a 1 + a 2 |x| + a 3 |x| 2 , ∥g (x)∥ ≤ a 1 + a 2 |x| ∀ x ∈ R n , and that Q 1 is positive definite. Then by (44) the family {y k i } is uniformly w.r.t. i ∈ {1, . . . , n} and k = 1, . . . bounded in L 2 (I ; R n ). Further we can readily verify that Here we made use of the L 2 -boundedness of y k i and F ε θ k (y k i ) which follows from (44) in the proof of Proposition 22, and the assumption that Q 1 > 0. Consequently y k i is uniformly bounded in W 1,1 (I ; R n ) and thus also in L ∞ (I ; R n ).
• Assume that f (x) = Ax − h(x) where A ∈ R n×n and h is monotone i.e. (x, h(x)) R n ≥ 0 for all x ∈ R n . Moreover assume that Q 1 is positive definite and that In this case, testing the equation satisfied by y i with y i , and a Gronwall argument yields The convergence result as ε → 0 + of Theorem 19 can be transferred to the finite training set setting as well.
Proposition 24. Let the regularisation parameters satisfy γ ε ∥θ ε ∥ 2 R ε = O(ε). Further let ε k > 0 be a positive null sequence such that for each k ∈ N there exists a solution (y k , p k as well asẏ Proof. For every ε k , with k sufficiently large, denote by θ ε k ∈ R ε k the corresponding parameters from Assumption 4, by y ε k the associated ensemble solution, see Theorem 7, and by p ε k the adjoint states. For abbreviation we set y ε k i := y ε k (y i 0 ) and p ε k i := p ε k (y i 0 ). Then, by optimality, we have As in the proof of Theorem 17 we see that the righthandside of this inequality converges to N i =1 ω i V * (0, y i 0 ) as k → +∞. Thus it is bounded independently of k ∈ N. Similarly to Proposition 22 we then conclude the existence of C N > 0 depending on N , but not on k, such that Utilizing the state equation this can be improved to a k-independent bound on the W T -norm of y k i . By a Gronwall-type argument the same can be shown for the adjoint states p k i . Now fix an arbitrary index i ∈ {1, . . . , N }. Summarizing the previous observations we get the uniform boundedness of (y k i , p k i , F ε θ k (y k i )) in W 2 T × L 2 (I ; R m ) w.r.t. k, for each i = 1, . . . , N . Each of its weak accumulation points (ȳ i ,p i ,ū i ) ∈ W 2 T × L 2 (I ; R m ) satisfieṡ y = f(ȳ) + g(ȳ)ū,ȳ(0) = y 0 .
From this we conclude that Since the second and third of the above inequalities also hold for each summand we conclude The proof can now be concluded with minor adaptations to the proof of Theorem 19.

□
A result analogous to that of Proposition 21 can also be obtained for Problem (P N ε ). For the sake of brevity we do not present the details.

The reduced objective functional
In order to compute a solution to (P N ε ) we will rely on gradient-based optimization methods. For this purpose we introduce a reduced objective functional by eliminating the state and adjoint equations in (P N ε ). Subsequently, we characterize the derivative of the reduced functional by means of adjoint techniques. To simplify the presentation we fix an arbitrary index i ∈ {1, . . . , N } in the following. Moreover, for abbreviation, we define the mapping Using this notation, the adjoint equation in (P N ε ) can be expressed compactly as First, we argue the existence of parameter-to-state operators for the adjoint and the state equation.
Given y i := Y i (θ) and p i := P i (θ), the Fréchet derivatives of Y i and P i at θ ∈ N i ( θ), in direction δθ ∈ R ε , denoted by Proof. This is a direct consequence of the implicit function theorem applied to G noting that the directional derivatives satisfy □ Now consider an admissible point ( y, p, θ) ∈ W 2N T × R ε for (P N ε ). For every i = 1, . . . , N , let N i ( θ) and Y i , P i denote the corresponding neighbourhoods and operators from Lemma 25.
and set Summarizing the previous observations, we arrive at the claimed characterization.
□ Applying a gradient method to (P N ε ) requires the computation of the gradient ∇J N (θ) ∈ R ε which satisfies Proof. For the sake of readability, we drop the subscript i in the following. By partial integration and Lemma 25 we obtain Adding both equations finally yields □ We arrive at the following characterization of the gradient ∇J N (θ).
Theorem 28. Let y i , p i , ζ i , κ i ∈ W T , θ i ∈ R ε be defined as in Proposition 26 and Lemma 27. The gradient of J N at θ is given by
In order to fit this problem into the setting of the current manuscript, let {λ i , ϕ i } ∈ R + × L 2 (Ω) denote the first n ∈ N normalized eigenpairs of the Dirichlet Laplacian on Ω. Approximating the state dynamics Y as well as the desired state by we end up with min Y ∈ L 2 (I;R 10 ),u ∈ L 2 (I;R 3 ) . . , n, and the symmetric matrices A, M i ∈ R n×n are given by Note that by choosing a spectral technique to approximate the infinite dimensional system (47) by a finite dimensional one, we have done justice to the fact that grid based techniques would rapidly lead to systems of dimensions which are challenging or even impossible. Special techniques, as for instance tensor train methods, would then become essential.

Learning & validation setup
In the following, we determine an approximate optimal feedback law for (48) by applying the learning approach detailed in Section 4. The parametrized model V ε θ for the value function is given by realizations of residual networks, as described in Section 5.1, with L ε = 2 layers, arch(θ) = (11, 60, 1) and activation function σ given by This yields a total of 1440 trainable parameters. We emphasize that the architecture as well as the activation function were chosen based on numerical testing. In particular, the present tests should not be mistaken as a quantitative survey but as a proof of concept which highlights the potential of learned feedbacks for optimal control and puts a focus on the role played by the penalty parameters γ 1 and γ 2 .
Given a fixed reference vectorȲ 0 , we randomly generate a set y 0 of 130 initial conditions by sampling uniformly from the closure of B 1 (Ȳ 0 ), Subsequently, these are split into a training set y t 0 of N = 30 initial conditions, which is used in the learning problem (P N ε ) together with uniform weights w j = 1/N , and a validation set y v 0 = y 0 \y T 0 which we later utilize to assess the performance of the obtained feedback.
In order to obtain a candidate for the optimal network parameters θ * ε , a Barzilai-Borwein method [23], is applied to the learning problems (P N ε ), based on the reduced objective functional introduced in (46) as well as the characterization of its gradient in Theorem 28. For every Y 0 ∈ y t 0 , this approach entails the computation of the state Y := Y θ (Y 0 ) and the adjoint state P := P θ (Y 0 ) which satisfyẎ as well as the costates K := K θ (Y 0 ) and Z := Z θ (Y 0 ) witḣ equipped with the boundary conditions where Y , Y T and P are defined in analogy to Proposition 26. Note that this system is not fully coupled, i.e. in practice, we first solve the nonlinear closed-loop equation using a Radau timestepping scheme and then, successively treat the adjoint and costate equations by an implicit Euler method. This can be done in parallel for various initial conditions to achieve additional speed-up. Moreover, the adjoint state P and costate K only need to be computed if γ 2 > 0. The gradient of the reduced objective functional J N in (P N ε ) at an admissible θ is then obtained as 1 integration has to be understood componentwise and θ(Y 0 ) is as in Proposition 26. Once the network is determined, we compute the state Y θ (Y 0 ) and adjoint P θ (Y 0 ) for every Y 0 ∈ y 0 from (49) and set U θ (Y 0 ) := F ε θ (Y θ (Y 0 )). Subsequently we determine a stationary point (Ȳ (Y 0 ),Ū (Y 0 )) of (48), Y 0 ∈ y 0 , by applying a Barzilai-Borwein gradient method to its control-reduced formulation. The associated adjoint state is denoted byP (Y 0 ). At this point, it should be stressed that both, the open loop as well as the feedback learning problem, are nonconvex. As a consequence, we cannot ensure global optimality of the computed stationary points and, in particular, both methods might provide different results. For the present example, open loop and learned feedback controls are comparable. Moreover, for every Y 0 ∈ y 0 , we have J (Ȳ (Y 0 ),Ū (Y 0 )) ≥ J (Y θ (Y 0 ),U θ (Y 0 )). In order to assess the performance of open loop and feedback controls, let Y ad 0 ⊂ y 0 be either Y ad 0 = y t 0 or Y ad 0 = y v 0 and consider the relative difference between the averaged objective functional values: ,Ū (Y 0 ) as well as the associated normalized mean squared error of J (Y θ (·),U θ (·)): The normalized mean-squared errors of the state, Err Y , adjoint, Err P , and of the control, Err U , are defined analogously. Moreover, to quantify the influence of the penalty parameters γ 1 and γ 2 , we define as well as For Y ad 0 = Y t 0 , these terms correspond to the relative sizes of the additional penalties in (P N ε ). Finally, we also want to compare V ε θ with the optimal value function V * . Of course, V * can neither be given analytically nor can it be computed exactly. As a remedy, we recall that if V * is sufficiently regular and (Ȳ (Y 0 ),Ū (Y 0 )) is a minimizing pair of (48) with adjoint stateP (Y 0 ), we have for all t ∈ I . As a consequence, setting as well as provides a suitable "distance" for the comparison of V * and V θ ε .

Validation results
As a concrete example, we set T = 2, β = 0.01, α = 0.25 and Y d (t , x) = x 2 /10, i.e., we try to steer the system towards a parabola. Note that there is no control input u ∈ L 2 (I ; R 3 ) such that the corresponding solution Y of the PDE (47) satisfies Y (t ) = Y d . The parabolic binlinear control problem is approximated using n = 10 eigenfunctions. All computations were carried out in Matlab 2019 on a notebook with 32 GB RAM and an Intel®Core™ i7-10870H CPU@2.20 GHz. In order to compute an approximately optimal feedback law for this problem, we solve (P N ε ) for various penalty parameter configurations γ 1 , γ 2 ∈ {0, 0.1, 1}. The resulting normalized errors can be found in Table 1, for Y ad 0 = y t 0 , and Table 2, for Y ad 0 = y v 0 . Comparing their individual entries, we observe that there is (almost) no difference in performance between the training and the validation sets. This means that, while the utilized networks are rather simple and only comprise a small number of trainable parameters, the corresponding learned feedback controls generalize well to initial conditions which are not contained in the training set.
Indeed, on the one hand all computed networks provide feedback controls which perform similarly to their open loop counterparts. This is manifested in very small averaged errors for the objective functional, i.e. Err J and Err J , the states and adjoint states, Err Y and Err P , as well as the controls, Err U . These start to (slowly) deteriorate as γ 1 and/or γ 2 grow. However, cf. the explanation in Section 4, this is expected: for γ 1 > 0 and/or γ 2 > 0, the learned feedback has to strike a balance between minimizing J (Y θ (·),U θ (·)) and keeping the penalty terms small, hence the slightly larger error.
On the other hand, the picture looks different once we consider the errors associated to the approximation of the value function, i.e., Err V , Err ∂V as well as d (∂V * , ∂V ε θ ) and d (∂V * , ∂V ε θ ). Here γ 1 > 0 and/or γ 2 > 0 have a significant influence on d (V * ,V ε θ ) and d (∂V * , ∂V ε θ ) while the other normalized mean squared errors remain relatively small. Moreover, we have Err V ≈ d (V * ,V ε θ ) and Err ∂V ≈ d (∂V * , ∂V ε θ ) on the test as well as on the validation set. Hence, large values for these terms are a reliable indicator for structural differences between V ε θ and V * and/or ∂ y V ε θ and ∂ y V * , respectively. Now, while γ 1 = γ 2 = 0 provides a very good approximation to the open loop optimal control, it performs the worst in terms of approximating the optimal value function and its derivative. This is related to two observations. First, in this case, the learning problem (P N ε ) only depends on the derivative ∂ y V θ ε but not on the value function V ε θ . Since primitives are not unique, approximating V * by V ε θ is unlikely. Second, due to the absence of V ε θ in the problem, some of the parameters in the model are not trainable. In fact, for γ 1 = γ 2 = 0, there holds ∂ W 12 J N (θ) = 0 for every admissible θ.
Once we increase γ 1 and γ 2 , this is no longer the case. Hence, we observe rapid decrease for d (V * ,V ε θ ) and d (∂V * , ∂V ε θ ). Most remarkably, the improvement for both is, to some extend, already visible for γ 1 > 0 and γ 2 = 0. In this setting, applying the gradient method neither requires computing the adjoint state P nor the costate K which limits the cost of every gradient step to 2N = 60 ODE solve. Quite the contrary, increasing γ 2 > 0 but keeping γ 1 = 0 fixed, there is no improvement for d ( V * ,V ε θ ). This further backs up our reasoning given for the case of γ 1 = γ 2 = 0. Consequently, the computed results indicate that the best balance between finding an optimal control and approximating the value function is achieved by a careful choice of γ 1 , γ 2 > 0. Moreover, they highlight two important points: First, the presented learning approach indeed allows to compute semiglobal optimal feedback laws F ε θ for higher dimensional problems and, thus, to some extent, alleviates the curse of dimensionality. Second, incorporating additional terms into the learning problem penalizing the violation of the dynamic programming principles (15), allows to compute a good approximation V ε θ of the optimal value function on the fly. As stated initially, the present example should be understood as a proof of concept and, following these first promising results, we believe that this approach to feedback learning deserves further investigations, both, from the theoretical and the numerical side. For example, it would be interesting to explore systematic ways of choosing the penalty parameters γ 1 , γ 2 . However, this goes beyond the scope of the current paper and is left for future work. as well as δy k → δy in W T , and thus δy = δy(y 0 ). By uniqueness of solutions to the above equation δy(y k 0 ) → δy(y 0 ) for the whole sequence in W T follows, and therefore δy ∈ C (Y 0 ;W T ).

□
We use the following consequences of the previous proposition.

Corollary 31.
There exists an open neighborhood V 2 ⊂ V 1 ⊂ L 2 (I ; R n ) of 0 as well as c > 0 such that hold. Here M Y 0 denotes the constant from (A.3).
Proof. The first assertion follows from the continuous differentiability of v → y v (y 0 ) and compactness of Y 0 . To verify the second we use that y * (y 0 ) = y 0 (y 0 ) and estimate y v (y 0 ) W T ≤ y * (y 0 ) W T + y v (y 0 ) − y 0 (y 0 ) W T .
The claim now follows from the first inequality and (A.3).

Conflicts of interest
The authors have no conflict of interest to declare.