A disintegration of the Christoffel function

We show that the Christoffel function (CF) factorizes (or can be disintegrated) as the product of two Christoffel functions, one associated with the marginal and the another related to the conditional distribution, in the spirit of"the CF of the disintegration is the disintegration of the CFs". In the proof one uses an apparently overlooked property (but interesting in its own) which states that any sum-of-squares polynomial is the Christoffel function of some linear form (with a representing measure in the univariate case). The same is true for the convex cone of polynomials that are positive on a basic semi-algebraic set. This interpretation of the CF establishes another bridge between polynomials optimization and orthogonal polynomials.


Introduction
It is well-known that a probability measure µ on a Cartesian product X × Y ⊂ R n × R p of Borel spaces, disintegrates into μ(dy | x) φ(dx) with its marginal φ(dx) on X and its conditional probability μ(dy | x) on Y, given x ∈ X.That is: The goal of this note is to provide a similar disintegration (or factorization) for the family of its Christoffel functions (x, y) → Λ µ t (x, y), t ∈ N. Contribution.Our contribution is twofold.
Our main result states that Λ µ t disintegrates (or factorizes) into (1.2) where Λ φ t (resp.Λ νx,t t ) is the Christoffel function of the marginal φ of µ on X (resp. of some probability measure ν x,t on R, given x ∈ X).Moreover, for every fixed x ∈ X, one can compute explicitly the Hankel moment matrix of the measure ν x,t by solving a single convex optimization problem on positive definite matrices with log det(•) as objective function.
Research supported by the AI Interdisciplinary Institute ANITI funding through the french program Investing for the Future PI3A under the grant agreement number ANR-19-PI3A-0004.The author is also affiliated with IPAL-CNRS laboratory, Singapore.
Notice how (1.2) mimics the disintegration (1.1).Indeed, as we should expect from the disintegration (1.2), it turns out that for each fixed x ∈ X, the family (Λ νx,t t ) t∈N shares asymptotic properties of the Christoffel function Λ μ t (y) of the conditional probability μ(dy | x) on Y, given x ∈ X.
Actually, the same disintegration (1.2) holds if the conditioning is multivariate, i.e., on y ∈ R p given x ∈ R n , with p > 1.The only difference is that now ν x,t is a linear functional on R[y] t not necessarily represented by a probability measure on R p .
(ii) Interestingly, the technique of proof relies on a certain one-to-one mapping between interiors of the convex cone of sum-of-squares polynomials and its dual cone of moment matrices.In particular, and as a by-product, it implies the following simple but apparently unnoticed result that every sum-of-squares polynomial is the reciprocal of a Christoffel function of some linear functional (guaranteed to have a representing measure in the univariate case).

Notation, definitions and preliminary results
2.1.Notation and definitions.Let R[x] denote the ring of real polynomials in the variables x = (x 1 , . . ., x n ) and R[x] t ⊂ R[x] be its subset of polynomials of total degree at most t.Let N n t := {α ∈ N n : |α| ≤ t} (where ) α∈N n t be the vector of monomials up to degree t.Let Σ[x] t ⊂ R[x] 2t be the convex cone of polynomials of total degree at most 2t which are sum-of-squares (in short SOS).For a real symmetric matrix A = A T the notation A 0 (resp.A ≻ 0) stands for A is positive semidefinite (p.s.d.) (resp.positive definite (p.d.)).The support of a Borel measure µ on R n is the smallest closed set A such that µ(R n \ A) = 0, and such a set A is unique.Riesz functional.With a real sequence φ = (φ α ) α∈N n is associated the Riesz linear functional L φ : R[x] → R defined by: A sequence φ = (φ α ) α has a representing measure if and only if there exists a Borel measure φ on R n such that x α dφ = φ α , for all α ∈ N n .Moment matrix.With a real sequence φ = (φ α ) α∈N n is associated its moment matrix M t (φ) of order (or degree) t.It is a real symmetric matrix with rows and columns indexed by N n t , and with entries Importantly, M t (φ) depends only on moments φ α with |α| ≤ 2t.If φ has a representing measure φ then we also write M t (φ) and necessarily M t (φ) is p.s.d. for all t, i.e., M t (φ) 0 for all t.Christoffel function.Let φ = (φ α ) α∈N n be such that M t (φ) ≻ 0 for all t, and let (P α ) α∈N n ⊂ R[x] be a family of polynomials, orthonormal with respect to φ, i.e., Then the Christoffel function (CF) Λ φ t : R n → R + associated with φ, is defined by and recalling that M t (φ) is nonsingular, it turns out that An equivalent and variational definition is also In [2] the authors describe a way to obtain a family of orthonormal polynomials w.r.t.φ from the moment matrices M t (φ) ≻ 0 via simple determinant calculations.We will use this construction with a special ordering of the monomials that index the rows and columns of M t (φ).
If φ has a representing measure φ we also write its CF as Λ φ t .The CF is usually defined for measures φ on a compact set Ω rather than for linear functionals φ with M t (φ) ≻ 0 for all t.In this case one interesting and distinguishing feature of the CF is that as t increases, Λ φ t (x) ↓ 0 exponentially fast for every x outside the support of φ.In other words, Λ φ t identifies the support of φ when t is sufficiently large, a nice property that can be exploited for outlier detection in some data analysis applications; see for instance [5,6].In addition, at least in dimension n = 2 or n = 3, one may visualize this property even for small t, as the resulting superlevel sets and let M t (µ) be the moment matrix of µ with rows and columns indexed by the monomials (x α y β ) (α,β)∈N n+p t listed according to some ordering noted " " between monomials, defined as follows.First in the list, we find all monomials (x α ) α∈N n t (i.e.all monomials x α y β with |β| = 0) listed e.g. according to the lexicographic ordering.Then we find all monomials x α y β with |β| = 1, then followed by monomials x α y β with |β| = 2, etc. Below is displayed M 2 (µ) in the bivariate case (n, p) = (1, 1).
Similarly, let v t (x, y) be the vector of monomials that form a basis of R[x, y] t listed with the same above ordering " "; for instance with (n, p) = (1, 1) and t = 2, v 2 (x, y) = (1, x, x 2 , y, xy, y 2 ).Then by (2.2), the Christoffel function Λ µ t is given by With next see that with ordering " " defined above, one we may define a certain family of orthonormal polynomials (P α,β ) ⊂ R[x, y] t by following the recipe described in [2] and that we briefly summarize: To compute P α,β ∈ R[x, y] t one proceeds in three steps: • From M t (µ) extract its submatrix S(α, β) with rows and columns indexed by (γ, η) (α, β).• Delete the last row and replace it with the monomials (x γ y η ) with (γ, η) (α, β).
Lemma 2.1.With the ordering " " and the above construction, the orthonormal polynomials (P α,0 ) α∈N n t depend only on x, and are orthonormal w.r.t. the marginal φ of µ.
Proof.In the above construction the orthonormal polynomials (P α,0 ) α∈N n t are obtained from the submatrices S(α, 0) of M t (µ), α ∈ N n t , which are exactly the submatrices of M t (φ) since they are formed with only monomials x α (as |β| = 0).Hence the conclusion follows.
(ii) In addition, if p ∈ int(Σ t ) is univariate then φ has a representing measure φ, and so p −1 is the Christoffel function Λ φ t of some measure φ on the real line.Proof.The first part of the statement is a direct consequence from Nesterov [8, Theorem 17.3, p. 412] which states that the respective interiors of Σ t and its dualΣ * t are in one-to-one correspondence, and − log det(A) is a τ -self-concordant barrier function associated with the convex cone Σ t , with τ = n+t t .The second statement follows from the characterization (2.2) of Λ φ t .Surprisingly, the fact that every (strictly positive) SOS polynomial of degree at most 2t, is the Christoffel function Λ φ t of some linear functional L φ on R[x] 2t with M t (φ) ≻ 0, does not seem to have been noticed before, even though Nesterov's result [8,Theorem 17.3] is quite classical in convex conic optimization.In addition, observe that Lemma 2.3 is the degree-t analogue of the well-known fact that the Gram matrix of every positive quadratic form is the covariance of a Gaussian measure (possibly after scaling).Finally, and said differently, the Christoffel functions Λ φ t associated with moment matrices M t (φ) of size τ = n+t t , encode the central path 1 of the convex cone Σ t of n-variate SOS polynomials of degree 2t.
role in the analysis of the computational complexity of interior points methods for optimizing over such a cone.
Proof.The proof is in the same spirit and again relies on the one-to-one mapping between the interior of the convex cone K t and that of its dual for some φ ∈ int(K * t ).
Again the Christoffel functions Λ gj •φ t−sj associated with the moment matrices M t−sj (g j • φ) encode the central path of the convex cone K t in (2.7).For compact set S (with an additional Archimedean assumption), the cone K t is very important in the Moment-SOS hierarchy for polynomial optimization [7].It is used to replace the intractable positivity constraint "p ≥ 0" on S, with the more restrictive constraint "p ∈ K t " (and let t increase) because the latter being semidefinite representable, is tractable.

Main result
Let µ be a Borel measure on a compact set Ω ⊂ X × Y ⊂ R n × R which disintegrates into its marginal φ on X ⊂ R n and its conditional probability μ(dy | x) on Y x ⊂ Y for every x ∈ X.Throughout the rest of the paper we assume that Ω has nonempty interior so that M t (µ) ≻ 0 for all t ∈ N, where μt (µ) is constructed as in Section 2.2.Theorem 3.1.Let Λ µ t be as in (2.5) with M t (µ) constructed as indicated just above (2.5).Then for every x ∈ X and t ∈ N, there exists a probability measure ν x,t on R such that Proof.Let t ∈ N and x ∈ R n be fixed.From (2.6) in Corollary 2.2 and as p = 1, Hence for each fixed x ∈ R n , 1 ≤ p t (y ; x) ∈ R[y] is a strictly positive univariate SOS.Therefore by Lemma 2.3(ii) there exists a Borel measure ν x,t on R such that p t (y ; x) −1 = Λ νx,t t (y), which yields (3.1).
When x ∈ X, notice how well (3.1) mimics the disintegration (1.1) of µ into its marginal φ on X and its conditional μ(dy | x) on Y x , given x ∈ X. However when x ∈ X, it remains to relate the family of measures (ν x,t ) t∈N on Y x with the conditional probability μ(dy | x).
Computing the moment matrix of ν x,t .To obtain the moment matrix of ν x,t , for an arbitrary but fixed x ∈ R n , is relatively easy.Let S t be the space of (t + 1) × (t + 1) real symmetric matrices.
-Then following [8, p. 412], solve the convex optimization problem The optimization problem (3.2) is convex and can be solved by off-the-shelf solvers like e.g.CVX [1].Multivariate conditional.If p > 1 and Y ⊂ R p , then we still obtain the decomposition (3.1) with exactly the same proof as that of Theorem 3.1.The difference with (3.1) is that ν x,t in (3.3) is a linear functional on R[y] t which is not guaranteed to have a representing measure ν x,t on R p .
3.1.Discussion.Define the scalar s n (t) := n+t t for every integer t, n.Under some conditions on the sets Ω and X, Y x and if µ has a density w.r.t.Lebesgue measure on Ω that also satisfies some conditions, then one may indeed relate the family (ν x,t ) t∈N with the conditional probability μ(dy | x) onY, given x ∈ X.Under such conditions one may interpret the limit s n+1 (t)Λ µ t (x, y) and s n (t)Λ φ t (x), as t increases, in terms of the density of µ and an equilibrium measure intrinsically related to the respective supports Ω and X.For such conditions the interested reader is referred to [5,3] and the many references therein.For instance: Corollary 3.2.Let Ω = X × Y ⊂ R n+1 be compact with Ω = int(Ω), X = int(X), and assume that µ has a density f w.r.t.Lebesgue on R n+1 , bounded away from 0 on Ω.
If x ∈ int(X) but (x, y) ∈ Ω, then as t increases, Λ νx,t t (y) ↓ 0 exponentially fast (as would do the Christoffel function Λ μ t (y) of the conditional probability μ(dy | x)).Proof.By [6,5], as (x, y) ∈ Ω, Λ µ t (x, y) ↓ 0 exponentially fast as t increases.On the other hand, as x ∈ int(X) and the density of φ w.r.t.Lebesgue on R n is bounded away from zero, Λ φ t (x) −1 increases with t not faster than O(t n ).Therefore by (3.1), Λ νx,t t (y) −1 has to grow exponentially fast with t.The same conclusion holds for μ(dy | x); indeed let y be outside the support Y x of μ(dy | x).The density of μ(dy | x) which reads y → f (x, y)/ Y f (x, y)μ(dy | x) on Y x , is bounded away from zero.Therefore Λ μ t (y) ↓ 0 exponentially fast as t increases.
So Corollary 3.2 states that whenever x ∈ X and y ∈ supp(μ(dy | x)), then asymptotically the growth rate of Λ νx,t t (y) −1 is exponential as for the CF of the conditional probability μ(dy | x).To obtain precise asymptotic results when (x, y) ∈ Ω, additional conditions on µ are required.Below is such a typical result.Lemma 3.3.(Kroó and Lubinsky [3]) Let S ⊂ R n be compact and assume that there exists a measure ψ 0 supported on S such that uniformly on compact subsets of int(S), lim t→∞ s n (t)Λ ψ0 t (x) = W 0 (x) where W 0 is continuous and positive on int(S).
If a measure ψ has continuous and positive density D w.r.t.ψ 0 on int(S), then uniformly on compact subsets of int(S), lim t→∞ s n (t)Λ ψ t (x) = D(x)W 0 (x).Given a compact set X , let C (X ) denote the space of continuous functions on X .In our context of µ on a compact set Ω ⊂ X × R with marginal φ on X, we obtain the following result: Theorem 3.4.Assume that there exists a measure µ 0 on Ω with marginal φ 0 on X and conditional μ0 (dy | x) on R, such that uniformly on compact subsets of Ω (resp.X): In addition assume that the following Feller-type property holds: Let µ be a measure on Ω with a continuous and positive density f w.r.t.µ 0 .Then, with ν x,t being the measure on R in Theorem 3.1: where g(x) := f (x, y) μ0 (dy | x).
Proof.Disintegrating µ 0 yields dµ 0 (x, y) = μ0 (dy | x) φ 0 (dx).Therefore and φ(dx) = g(x) φ 0 (dx).Moreover observe that for every x ∈ X, That is, for every x ∈ X, y → f (x, y)/g(x) is the density of μ(dy | x) w.r.t.μ0 (dy | x), and by the Feller-like property, g is continuous and positive on X. Next, by our hypotheses and from Theorem 3.1, As expected from the disintegration (3.1), convergence of tΛ νx,t t (y) as t increases, is towards the density of the conditional μ(dy | x) times a weight function intrinsic to the support Ω of µ, which is typical of convergence results for Christoffel functions (whenever convergence takes place).

Conclusion
We have shown that in quite general setup, the Christoffel function disintegrates (or factorizes) and mimics the disintegration of its associated measure on X × Y into its marginal on X and its conditional on Y, given x ∈ X.The result uses a straightforward (but novel) interpretation of a well-known intermediate result of convex optimization, which is of interest in its own.Namely that every SOS polynomial is the reciprocal of the Christoffel function associated with some linear functional (which always has a representing measure in the univariate case).A similar interpretation is valid for the cone of polynomials that are positive on a basic semi-algebraic set.
We think that a better understanding of the linear functional ν x,t (which has a representing measure when p = 1) is needed.In particular, further investigation beyond the scope of the present note, could consider a more detailed (and nonasymptotic) comparison of ν x,t with the conditional μ(dy | x) when x ∈ X, as well as understanding its meaning when p > 1, i.e., when it may not have a representing measure.For instance we conjecture (but have been unable to prove) that ν x,t does not depend on t, and has a representing measure.
Positive polynomials and Christoffel functions.Recall that p ∈ Σ[x] t (i.e., p is an SOS of degree at most 2t) if and only if there exists a real symmetric matrix Q 0 such that p(x) = v t (x) T Q v t (x) for all x ∈ R n .Notice that except when t = 1, there are several possible choices for Q which is called a Gram matrix of p.As we next see, one choice is particularly interesting.The dual cone Σ * Lemma 2.3.(i) Every SOS polynomial in the interior of Σ t is the reciprocal of the Christoffel function Λ φ t of some linear functional L φ , with φ ∈ int(Σ * t ).That is, p ∈ int(Σ t ) if and only if p * t = { φ ∈ N n 2t : M t (φ) 0 } .