Давайте розберемося з найпростішим випадком, щоб спробувати забезпечити максимальну інтуїцію. Нехай - iid вибірки з дискретного розподілу з k результатами. Нехай π 1 , … , π k - ймовірності кожного конкретного результату. Нас цікавить (асимптотичний) розподіл хі-квадратної статистики
X 2 = k ∑ i = 1 ( S i - n π iX1,X2,…,Xnkπ1,…,πk
Тут n π i
X2=∑i=1k(Si−nπi)2nπi.
nπi - очікувана кількість підрахунків
го результату.
i
Сугестивний евристичний
Визначте , так щоX2=∑iU 2 i =‖U‖ 2 2, деU=(U1,…,Uk).Ui=(Si−nπi)/nπi−−−√X2=∑iU2i=∥U∥22U=(U1,…,Uk)
Так як це Б я л ( п , π я ) , то в силу центральної граничної теореми ,
Т я = USiBin(n,πi)
Отже, ми також маємощо, U я d → N ( 0 , 1 - π я ) .
Ti=Ui1−πi−−−−−√=Si−nπinπi(1−πi)−−−−−−−−−√→dN(0,1),
Ui→dN(0,1−πi)
Now, if the Ti were (asymptotically) independent (which they aren't), then we could argue that
∑iT2i was asymptotically χ2k distributed. But, note that Tk is a deterministic function of (T1,…,Tk−1) and so the Ti variables can't possibly be independent.
Hence, we must take into account the covariance between them somehow. It turns out that the "correct" way to do this is to use the Ui instead, and the covariance between the components of U also changes the asymptotic distribution from what we might have thought was χ2k to what is, in fact, a χ2k−1.
Some details on this follow.
A more rigorous treatment
It is not hard to check that, in fact,
Cov(Ui,Uj)=−πiπj−−−−√i≠j
U
A=I−π−−√π−−√T,
π−−√=(π1−−√,…,πk−−√)AA=A2=AT. So, in particular, if
Z=(Z1,…,Zk) has iid standard normal components, then
AZ∼N(0,A). (
NB The multivariate normal distribution in this case is
degenerate.)
Now, by the Multivariate Central Limit Theorem, the vector U has
an asymptotic multivariate normal distribution with mean 0 and
covariance A.
So, U has the same asymptotic distribution as AZ, hence, the same asymptotic distribution of
X2=UTU is the same as the distribution of ZTATAZ=ZTAZ by the continuous mapping theorem.
But, A is symmetric and idempotent, so (a) it has orthogonal
eigenvectors, (b) all of its eigenvalues are 0 or 1, and (c)
the multiplicity of the eigenvalue of 1 is rank(A). This means that A can be decomposed as A=QDQT where Q is orthogonal and D is a diagonal matrix with rank(A) ones on the diagonal and the remaining diagonal entries being zero.
Thus, ZTAZ must be χ2k−1 distributed since
A has rank k−1 in our case.
Other connections
The chi-square statistic is also closely related to likelihood ratio
statistics. Indeed, it is a Rao score statistic and can be viewed as a
Taylor-series approximation of the likelihood ratio statistic.
References
This is my own development based on experience, but obviously influenced by classical texts. Good places to look to learn more are
- G. A. F. Seber and A. J. Lee (2003), Linear Regression Analysis, 2nd ed., Wiley.
- E. Lehmann and J. Romano (2005), Testing Statistical Hypotheses, 3rd ed., Springer. Section 14.3 in particular.
- D. R. Cox and D. V. Hinkley (1979), Theoretical Statistics, Chapman and Hall.