The main results on the limiting distributions of incomplete U-statistics were developed in Blom (1976)1 and Janson (1984)2; Lee (1990)3 gives a summary. However, for my taste, the proofs in Janson (1984) are somewhat hard to read. Lee (1990) improves upon those but has some inaccuracies---the main one being that θ is missing---, which we will address below. Nevertheless, his stated result seems correct.
Notations
Let N=(kn), h:Rk→R a symmetric measurable function, and
Un=N−1(n,k)∑h(S),
where the index (n,k) denotes summation over all N permutations of m elements of a sample (Xi)i=1n∼Pn, that is S=(Xi1,…,Xim) (we hide the dependence on the summation's index). Un estimates the parameter θ(P)=:θ=Eh(X1,…,Xk), that is, EUn=θ. Define, for c=1,…,k, hc(x1,…,xc)=Eh(x1,…,xc,Xc+1,…,Xk), σc2=Var(hc), and σ02=0. h is d-degenerate if 0=σ12=⋯=σd2<σd+12.
We call a statistic of the form
Un′=m−1S∈D∑h(S),
where D is some (suitably chosen) subset of the (n,k) sets in (1) with cardinality m an incomplete U-statistic.
In this post, we will consider m randomly chosen (with replacement) subsets of (n,k). Letting (ZS)i=1N∼Mult(m;N1,…,N1), we can write this incomplete U-statistic as
Un,m′=m−1(n,k)∑ZSh(S)..
Limit distribution of sampling with replacement
The following is Theorem 1(ii) (p. 200; Lee, 1990), which is a restatement of Corollary 1 (Janson, 1984).
Theorem 1. Let Un and Un,m′ be as in (1) and (2), respectively, with h of degeneracy d. Define α=limn,m→∞nd+1m−1, and assume all necessary variances exist. If 0<α<∞ then
m1/2(Un,m′−θ)→dα1/2X+σkY,
where X has distribution such that n(d+1)/2(Un−θ)→dX, Y∼N(0,1), and X and Y are independent.
The following proof is as in Lee (1990) with minor corrections.
Proof. Recall that pointwise convergence of characteristic functions (c.f.s) implies distributional ("→d") convergence.
Let ϕn,m be the c.f. of m1/2(Un,m′−θ) and ϕ the limiting c.f. of n(d+1)/2(Un−θ). We will show that ϕn,m(t)→ϕ(t)e−t2σk2/2 for n,m→∞; the latter factor is the c.f. of σkY. One has that
ϕn,m(t)=Eexp(itm1/2(Un,m′−θ))=(a)Eexp(itm−1/2∑ZS(h(S)−θ))=EE[exp(itm−1/2∑ZS(h(S)−θ))X1,…,Xn]=(b)Eexp(itm1/2(Un−θ))××EE[itm−1/2(∑ZS(h(S)−θ)−m(Un−θ))X1,…,Xn]=(c)Eexp(itm1/2(Un−θ))××EE[itm−1/2∑(ZS−Nm)(h(S)−θ)X1,…,Xn]
The details are as follows. In (a), we use that m−1∑ZS=1 and reorder. ±m(Un−θ) and independence yields (b). (c) is by (1).
By Lemma A (p. 201; Lee, 1990) the conditional expectation converges in distribution to N(0,σk2) for m,n→∞. One has that
n,m→∞limϕn,m(t)=n→∞limEexp(itm1/2(Un−θ))e−t2σk2/2=n→∞limEexp(itm1/2n−(d+1)/2n(d+1)/2(Un−θ))e−t2σk2/2=ϕ(α−1/2t)e−t2σk2/2,
where we used the assumptions in the last equality. The result is the c.f. of a random variable with the stated limit, concluding the proof.
References
Blom, G. (1976). Some properties of incomplete U-statistics. Biometrika, 63(3), 573-580. ↩
Janson, S. (1984). The asymptotic distributions of incomplete U-statistics. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete, 66(4), 495-505. ↩
Lee, A. J. U-statistics. Theory and practice. Statistics: Textbooks and Monographs, 110. Marcel Dekker, Inc., New York, 1990. ↩