Index ¦ Archives ¦ Atom

Notes on the limiting distribution of incomplete U-statistics

The main results on the limiting distributions of incomplete U-statistics were developed in Blom (1976)1 and Janson (1984)2; Lee (1990)3 gives a summary. However, for my taste, the proofs in Janson (1984) are somewhat hard to read. Lee (1990) improves upon those but has some inaccuracies---the main one being that θ \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \theta is missing---, which we will address below. Nevertheless, his stated result seems correct.

Notations

Let N=(nk) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} N=\binom{n}{k}, h:RkR \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} h : \R^k \to \R a symmetric measurable function, and Un=N1(n,k)h(S), \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \begin{align} U_n = N^{-1}\sum_{(n,k)}h(S), \end{align} where the index (n,k) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} (n,k) denotes summation over all N \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} N permutations of m \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} m elements of a sample (Xi)i=1nPn \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \left(X_i\right)_{i=1}^n \sim P^n, that is S=(Xi1,,Xim) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} S=\left(X_{i_1},\ldots,X_{i_m}\right) (we hide the dependence on the summation's index). Un \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} U_n estimates the parameter θ(P)=:θ=Eh(X1,,Xk) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \theta(P) =: \theta =\E h\left(X_1,\ldots,X_k\right), that is, EUn=θ \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \E U_n = \theta. Define, for c=1,,k \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} c=1,\ldots,k, hc(x1,,xc)=Eh(x1,,xc,Xc+1,,Xk) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} h_c(x_1,\ldots,x_c) = \E h\left(x_1,\ldots,x_c,X_{c+1},\ldots,X_k\right), σc2=Var(hc) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \sigma_c^2 = \Var(h_c), and σ02=0 \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \sigma_0^2=0. h \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} h is d \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} d-degenerate if 0=σ12==σd2<σd+12 \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} 0=\sigma_1^2=\cdots =\sigma_d^2<\sigma_{d+1}^2. We call a statistic of the form Un=m1SDh(S), \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} U_n' =m^{-1}\sum_{S\in\mathcal D}h(S), where D \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \mathcal D is some (suitably chosen) subset of the (n,k) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} (n,k) sets in (1) with cardinality m \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} m an incomplete U-statistic.

In this post, we will consider m \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} m randomly chosen (with replacement) subsets of (n,k) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} (n,k). Letting (ZS)i=1NMult(m;1N,,1N) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} (Z_S)_{i=1}^{N}\sim\mathrm{Mult}(m;\frac1N,\ldots,\frac1N), we can write this incomplete U-statistic as Un,m=m1(n,k)ZSh(S). \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \begin{align} U_{n,m}' = m^{-1}\sum_{(n,k)}Z_Sh(S). \end{align} .

Limit distribution of sampling with replacement

The following is Theorem 1(ii) (p. 200; Lee, 1990), which is a restatement of Corollary 1 (Janson, 1984).

Theorem 1. Let Un \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} U_n and Un,m \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} U_{n,m}' be as in (1) and (2), respectively, with h \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} h of degeneracy d \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} d. Define α=limn,mnd+1m1 \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \alpha = \lim_{n,m\to\infty}n^{d+1}m^{-1}, and assume all necessary variances exist. If 0<α< \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} 0<\alpha<\infty then m1/2(Un,mθ)dα1/2X+σkY, \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} m^{1/2}(U_{n,m}'-\theta) \to^d\alpha^{1/2}X+\sigma_kY, where X \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} X has distribution such that n(d+1)/2(Unθ)dX \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} n^{(d+1)/2}(U_n-\theta) \to^dX, YN(0,1) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} Y\sim\mathcal N(0,1), and X \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} X and Y \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} Y are independent.

The following proof is as in Lee (1990) with minor corrections.

Proof. Recall that pointwise convergence of characteristic functions (c.f.s) implies distributional ("d \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \to^d") convergence.

Let ϕn,m \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \phi_{n,m} be the c.f. of m1/2(Un,mθ) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} m^{1/2}(U_{n,m}'-\theta) and ϕ \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \phi the limiting c.f. of n(d+1)/2(Unθ) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} n^{(d+1)/2}(U_n-\theta). We will show that ϕn,m(t)ϕ(t)et2σk2/2 \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \phi_{n,m}(t) \to \phi(t)e^{-t^2\sigma_k^2/2} for n,m \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} n,m\to\infty; the latter factor is the c.f. of σkY \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \sigma_kY. One has that ϕn,m(t)=Eexp(it  m1/2(Un,mθ))=(a)Eexp(it  m1/2ZS(h(S)θ))=EE[exp(it  m1/2ZS(h(S)θ))X1,,Xn]=(b)Eexp(it  m1/2(Unθ))××EE[it  m1/2(ZS(h(S)θ)m(Unθ))X1,,Xn]=(c)Eexp(it  m1/2(Unθ))××EE[it  m1/2(ZSmN)(h(S)θ)X1,,Xn] \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \begin{align*} \phi_{n,m}(t) &= \E\exp\left(it\;m^{1/2}\left(U_{n,m}'-\theta\right)\right) \\ &\stackrel{(a)}{=} \E\exp\left(it\;m^{-1/2}\sum Z_S\left(h(S)-\theta\right)\right) \\ &= \E\E\left[\exp\left(it\;m^{-1/2}\sum Z_S\left(h(S)-\theta\right)\right)\Big|X_1,\ldots,X_n\right] \\ &\stackrel{(b)}{=} \E\exp\left(it\;m^{1/2}(U_n-\theta)\right)\times \\ &\quad \times \E\E\left[it\;m^{-1/2}\left(\sum Z_S(h(S)-\theta)-m(U_n-\theta)\right)\Big|X_1,\ldots,X_n\right] \\ &\stackrel{(c)}{=} \E\exp\left(it\;m^{1/2}(U_n-\theta)\right)\times \\ &\quad \times \E\E\left[it\;m^{-1/2}\sum \left(Z_S-\frac mN\right)\left(h(S)-\theta\right)\Big|X_1,\ldots,X_n\right] \\ \end{align*} The details are as follows. In (a), we use that m1ZS=1 \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} m^{-1}\sum Z_S=1 and reorder. ±m(Unθ) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \pm m(U_n-\theta) and independence yields (b). (c) is by (1).

By Lemma A (p. 201; Lee, 1990) the conditional expectation converges in distribution to N(0,σk2) \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \mathcal N(0,\sigma_k^2) for m,n \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} m,n\to\infty. One has that limn,mϕn,m(t)=limnEexp(it  m1/2(Unθ))et2σk2/2=limnEexp(it  m1/2n(d+1)/2n(d+1)/2(Unθ))et2σk2/2=ϕ(α1/2t)et2σk2/2, \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} \begin{align*} \lim_{n,m\to\infty}\phi_{n,m}(t) &= \lim_{n\to\infty}\E\exp\left(it\;m^{1/2}(U_n-\theta)\right)e^{-t^2\sigma_k^2/2} \\ &= \lim_{n\to\infty}\E\exp\left(it\;m^{1/2}n^{-(d+1)/2}n^{(d+1)/2}(U_n-\theta)\right)e^{-t^2\sigma_k^2/2} \\ &= \phi\left(\alpha^{-1/2}t\right)e^{-t^2\sigma_k^2/2}, \end{align*} where we used the assumptions in the last equality. The result is the c.f. of a random variable with the stated limit, concluding the proof.

References


  1. Blom, G. (1976). Some properties of incomplete U-statistics. Biometrika, 63(3), 573-580. 

  2. Janson, S. (1984). The asymptotic distributions of incomplete U-statistics. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 66(4), 495-505. 

  3. Lee, A. J. U \renewcommand{\R}{\mathbb R} \newcommand{\E}{\mathbb E} \newcommand{\Var}{\mathrm{Var}} U-statistics. Theory and practice. Statistics: Textbooks and Monographs, 110. Marcel Dekker, Inc., New York, 1990. 

© Florian Kalinke. Built using Pelican. Theme by Giulio Fidente on github.