Some remarks on the ergodic theorem for $U$-statistics

In this note, we investigate the convergence of a $U$-statistic of order two having stationary ergodic data. We will find sufficient conditions for the almost sure and $L^1$ convergence and present some counter-examples showing that the $U$-statistic itself might fail to converge: centering is needed as well as boundedness of $\sup_{j\geq 2}\mathbb{E}[|h(X_1,X_j)|]$.


Introduction
In this note, we investigate the validity of the U -statistics ergodic theorem, i.e. the almost sure convergence (1) 1 where (X i ) i 1 is a stationary ergodic process with marginal distribution F , and h (x, y) is a symmetric kernel that is F ×F integrable.Birkhoff's ergodic theorem establishes the analogous result for the time averages 1 n n i=1 f (X i ), while Hoeffding [6] established (1) for i.i.d.processes (X i ) i 1 .These two classical results naturally lead to the conjecture that (1) should hold without further assumptions, i.e. for all stationary ergodic processes (X i ) i 1 and all L 1 (F × F ) functions h(x, y).Aaronson et al. [1] proved a partial result in this direction, namely showing that (1) holds for all F × F almost everywhere continuous and bounded kernels h(x, y).At the same time, they presented counterexamples showing that (1) does not hold in full generality.One of their counterexamples is a bounded kernel where the set of discontinuities has positive F × F measure, while the other counterexample is an F × F almost everywhere continuous, but unbounded kernel.
The U -statistic ergodic theorem has subsequently been addressed by various authors, e.g.Arcones [2], Borovkova, Burton and Dehling [4]; see also the review paper by Borovkova, Burton and Dehling [5].These papers provide both sufficient conditions for (1) to hold, as well as further counterexamples, both for stationary ergodic processes as well as under stronger mixing assumptions.Most of the positive results also address other forms of convergence in (1) such as convergence in probability and L 1 -convergence.Arcones [2] proved the ergodic theorem for absolutely regular processes under some moment assumptions.Borovkova, Burton and Dehling [5] investigated convergence in probability in (1), with a special focus on the kernel h(x, y) = log(|x − y|), which arises in connection with the Takens estimator for the correlation dimension.
A common feature of all these examples is that they satisfy a modified version of the U -statistics ergodic theorem, namely It might thus seem natural to conjecture that (2) holds without further assumptions.In this note, we present a counterexample that disproves this conjecture.In addition, we will give a short proof of the U -statistics ergodic theorem for bounded F × F -almost everywhere continuous kernels, and give a new condition for L 1 -convergence.

A short proof of the ergodic theorem for U -statistics
In this note, we present a short proof of the U -statistics ergodic theorem that was first established in Aaronson et al [1].For the special case, when the process has values in R k , this proof is contained in Borovkova, Burton and Dehling [5].Here, we give the proof for processes with values in an arbitrary separable metric space.
Theorem 2.1.Let (X k ) k 0 be a stationary ergodic process with values in the separable metric space S and marginal distribution F , and let h : S × S → R be a symmetric kernel that is bounded and F × F -almost everywhere continuous.Then, as n → ∞ almost surely.
Proof.We define the empirical distribution of the first n random variables where δ x denotes the Dirac delta measure in x.For any L 1 (F )-function f : S → R, we obtain by Birkhoff's ergodic theorem almost surely.This convergence holds in particular for any bounded measurable function f ∈ C b (S).Since S is separable, there exists a countably family of functions f i ∈ C b (S), i 1, that is convergence determining, i.e. that convergence of the integrals f i (x)dµ n (x) → f i (x)dµ(x), for all i 1, implies weak convergence of the probability measures µ n to µ.Now, up to a set of measure 0, we get for all i 1, and thus F n ⇒ F weakly.This is in fact Varadarajan's argument [8] for the fact that the empirical distribution of i.i.d.data X 1 , . . ., X n converges weakly almost surely to the true distribution F .By Theorem 3.2 (page 21) of Billingsley [3], we obtain convergence of the empirical product measure except on a set of measure 0. Thus, for any bounded F × F -a.e.continuous function h : S × S → R, we obtain by the portmanteau theorem almost surely.
3. Convergence in L 1 in the ergodic theorem for U -statistics In this section, we present two sufficient conditions for the convergence in L 1 of a U -statistic to h (x, y) dF (x) dF (y), where F denotes the distribution of X 0 .The first sufficient condition imposes a restriction on the continuity points of the kernel combined with a uniform integrability assumption.The second sufficient condition imposes a restriction on the joint distribution of vectors (X 0 , X k ) , k 1, but no other assumption is required for the kernel h.
Theorem 3.1.Let (X i ) i 1 be a stationary ergodic sequence taking values in R d and let h : R d × R d → R be a measurable function such that the family {h (X 1 , X j ) , j 1} is uniformly integrable.Let F be the distribution of X 1 .Assume that one of the following assumptions is satisfied: is finite, the random variable X 0 has a bounded density with respect to the Lebesgue measure on R d and for each k 1, the vector (X 0 , X k ) has a density f k with respect to the Lebesgue measure of Proof.Let us prove Theorem 3.1 under assumption (A.1).By Theorem 1 in [4], we know that 1 . We will prove Theorem 3.1 under assumption (A.2) in three steps: first we will show that (3) holds when h is a product of indicator functions of Borel subsets of R d .Then we will show the result by approximating the map uniformly with respect to k by a linear combination of products of indicator functions.Then we will conclude by uniform integrability.
First step: assume that h (x, y) = 1 A (x) 1 B (y), where A and B are Borel subsets of R d .Observe that Therefore, the following decomposition takes place: (8) 1 Observe that by the ergodic theorem and the Lebesgue dominated convergence theorem, the first term of the right hand side of (8) converges to 0 in L 1 .Moreover, by the ergodic theorem and a summation by parts, Second step.Let R > 0 be fixed and define which is integrable.By a standard result in measure theory, we know that for each positive ε, there exists an integer N , constants c 1 , . . ., c N and sets A ε,ℓ , B ε,ℓ , 1 ℓ N , such that (12) Therefore, using stationarity and the fact that (X i , X j ) has a density f j−i which is bounded by a constant M independent of (i, j), Consequently, By the first step and the triangle inequality, we deduce that for each positive ε, hence (3) holds with h replaced by h R .Third step: by uniform integrability, for each positive ε, there exists δ such that for each A satisfying P (A) < δ, sup and it follows that and we conclude by the second step.This ends the proof of Theorem 3.1.

4.
Examples of failure of the convergence of U -statistics Example 4.1 given in [1] shows that there exists a stationary ergodic sequence (X i ) i 1 and a bounded measurable function for which converges, but not to the integral of h (x, y) with respect to the product of the law of X 1 .
In a similar setting, we are able to formulate two examples, the first showing that the sequence n may fail to converge in probability even if |h (X i , X j )| is bounded by 1, and the second one showing that a centered may also fail to converge in probability.
We consider the transformation T x = 2x mod 1 of the unit interval [0, 1) equipped with the Borel sigma field B and Lebesgue measure λ.We define 4.1.Example 1: non-convergence of the U -statistics.
Proposition 4.1.There exists a strictly stationary ergodic sequence (X i ) i 1 and a bounded measurable symmetric does not converge in probability.
Proof.Let (N ℓ ) ℓ 1 and (N ′ ℓ ) ℓ 0 be sequences of positive integers such that N ′ 0 = 1 and for ℓ 1, We define (20) and for x, y ∈ [0, 1), Since for i < j and k 1, the equality T i x = T k+j x can hold only for a countable set of x (namely, the dyadic rationals), we obtain for 1 i < j the identity h (X i , X j ) = 1 G (X i , X j ) almost surely.Moreover, by definition, (X i , X j ) ∈ G if and only if T i+k x = T j x for some k ∈ I. Almost surely, the latter identity holds if and only if k = j − i, and thus In particular, |h (X i , X j )| 1.By (19) we have where the second inequality follows from N u < N ′ u , and N ′ ℓ−1 /N ℓ goes to 0 by (19).4.2.Example 2: non-convergence of a centered U -statistic.Proposition 4.2.There exists a strictly stationary ergodic sequence (X i ) i 1 and a symmetric measurable function does not converge in probability.

Note that in this example
converges in distribution to a centered non-degenerated Gaussian random variable.
By similar arguments as in the proof of Proposition 4.1, the following equality holds almost surely for each 1 i < j : (30) . In order to have a better understanding of U j f , we introduce the intervals (31) Lemma 4.3.The sequence U j (f − 1/2) j 1 is a martingale difference sequence with respect to the filtration (F j ) j 0 , where F j = σ I j,ℓ , 1 ℓ 2 j and F 0 = {∅, Ω}.
Proof.We show by induction on j 1 that . .