Чому знаменник оцінки коваріації не повинен бути n-2, а не n-1?

36

Знаменник (неупередженого) оцінювача дисперсії дорівнює $n-1$ оскільки є $n$ спостережень і оцінюється лише один параметр.

V (X) = \frac{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}}{n - 1}

$\mathbb{V}\left(X\right)=\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}}{n-1}$

Тим самим я дивуюсь, чому не повинен знаменник коваріації бути $n-2$ коли оцінюються два параметри?

C o v (X, Y) = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{n - 1}

$\mathbb{Cov}\left(X, Y\right)=\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)\left(Y_{i}-\overline{Y}\right)}{n-1}$

— MYaseen208
джерело

15

Якщо ви зробили це, ви б два суперечать один одному визначень дисперсії: одна буде перша формула , а інший буде друга формула застосовується при

Y = X

$Y=X$ .

— whuber

3

Середнє значення бі / багатоваріант (очікування) - це один, а не 2 параметри.

— ttnphns

14

@ttnphns Це неправда: середня величина - це, очевидно, два параметри, оскільки для її вираження потрібні два реальних числа. (Дійсно, це єдиний векторний параметр, але, кажучи, це маскує лише той факт, що він має дві складові.) Це явно виявляється в ступенях свободи для тестів з об'єднаною дисперсією, наприклад, коли

віднімається, а не

. Цікавим у цьому питанні є те, як він виявляє, наскільки розпливчастим, непорушним та потенційно оманливим є загальне "пояснення", що ми віднімаємо

від

оскільки один параметр був оцінений.

2

$2$

1

$1$

1

$1$

n

$n$

— whuber

@whuber, ти в цьому прав. Якби тільки

(незалежні спостереження) важливо, ми б не витратили більше df у багатоваріантних тестах, ніж у одновимірних.

n

$n$

— ttnphns

3

@whuber: Можливо, я б сказав, що це показує, що те, що вважається "параметром", залежить від ситуації. У цьому випадку дисперсія обчислюється за спостереженнями, $n$ і тому кожне спостереження - або загальна середня величина - може розглядатися як один параметр, навіть якщо це багатофакторне середнє, як сказано ttnphns. Однак в інших випадках, коли, наприклад, тест враховує лінійні комбінації розмірів, кожен вимір кожного спостереження стає "параметром". Ви праві, що це складне питання.

— Амеба каже, що повернеться до Моніки

31

Коваріанці - це дисперсії.

Оскільки за поляризаційною тотожністю

Cov (X, Y) = Var (\frac{X + Y}{2}) - Var (\frac{X - Y}{2}),

$\newcommand{\c}{\text{Cov}}\newcommand{\v}{\text{Var}} \c(X,Y) = \v\left(\frac{X+Y}{2}\right) - \v\left(\frac{X-Y}{2}\right),$

знаменники повинні бути однаковими.

— дзижчати
джерело

20

Особливий випадок повинен дати вам інтуїцію; подумайте про наступне:

\hat{C o v} (X, X) = \hat{V} (X)

$\hat{\mathbb{Cov}}\left(X, X\right)= \hat{\mathbb{V}}\left(X\right)$

Ви щасливі, що останній завдяки корекції Бесселя. $\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)^{2}}{n-1}$

Але заміна на в на перше дає $Y$ $X$ $\hat{\mathbb{Cov}}\left(X, Y\right)$ , так що, на ваш погляд, може найкраще заповнити бланк? $\frac{\sum_{i=1}^{n}\left(X_{i}-\overline{X}\right)\left(X_{i}-\overline{X}\right)}{\text{mystery denominator}}$

— Срібна рибка
джерело

1

ДОБРЕ. Але ОП може запитати "чому вважати cov (X, X) і cov (X, Y) в одному рядку логіки? Чому ти заміщаєш Y на X у cov () легковажно? Може бути, cov (X, Y) інша ситуація? " Ви цього не запобігали, хоч відповідь (дуже схвалена) повинна мати, на моє враження :-)

— ttnphns

7

Швидка і брудна відповідь ... Розглянемо перший ; якби у вас було спостережень із відомим очікуваним значенням ви б використали $\text{var}(X)$ $n$ $E(X) = 0$ ${1\over n}\sum_{i=1}^n X_i^2$ для оцінки дисперсії.

Очікуване значення невідоме, ви можете перетворити свої спостережень у спостережень із відомим очікуваним значенням, взявши для . Ви отримаєте формулу з у знаменнику - однак не є незалежним, і вам доведеться це враховувати; наприкінці ви знайдете звичайну формулу. $n$ $n-1$ $A_i = X_i - X_1$ $i = 2, \dots,n$ $n-1$ $A_i$

Тепер для коваріації ви можете використовувати ту саму ідею: якщо очікуване значення було , у вас було б $(X,Y)$ $(0,0)$ ${1\over n}$ in the formula. By subtracting $(X_1,Y_1)$ to all other observed values, you get $n-1$ observations with known expected value... and a ${1\over n-1}$ in the formula — once again, this introduces some dependence to take into account.

P.S. The clean way to do that is to choose an orthonormal basis of $\big\langle (1, \dots, 1)' \big\rangle^{\perp}$ , that is $n-1$ vectors $c_1, \dots, c_{n-1} \in \mathbb R^n$ such that

$\sum_j c_{ij}^2 = 1$ $i$ ,
$\sum_j c_{ij} = 0$ $i$ ,
$\sum_j c_{i_1j} c_{i_2j} = 0$ $i_1 \ne i_2$ .

$n-1$ $A_i = \sum_j c_{ij} X_j$ $B_i = \sum_j c_{ij} Y_j$ . The $(A_i,B_i)$ are independent, have expected value $(0,0)$ , and have same variance/covariance than the original variables.

All the point is that if you want to get rid of the unknown expectation, you drop one (and only one) observation. This works the same for both cases.

— Elvis
джерело

6

Here is a proof that the p-variate sample covariance estimator with denominator $\frac{1}{n-1}$ is an unbiased estimator of the covariance matrix:

$x' = (x_1,...,x_p)$ .

$\Sigma= E((x-\mu)(x-\mu)')$

$S = \frac{1}{n} \sum (x_i - \bar{x})(x_i - \bar{x})'$

To show: $E(S) = \frac{n-1}{n}\Sigma$

Proof: $S= \frac{1}{n}\sum x_ix_i' - \bar{x}\bar{x}'$

(2) $E(\bar{x}\bar{x}') = \frac{1}{n} \Sigma+ \mu\mu'$

Therefore: $E(S) = \Sigma + \mu\mu' - (\frac{1}{n} \Sigma+ \mu\mu') = \frac{n-1}{n} \Sigma$

And so $S_u = \frac{n}{n-1}S$ , with the final denominator $\frac{1}{n-1}$ , is unbiased. The off-diagonal elements of $S_u$ are your individual sample covariances.

Additional remarks:

The n draws are independent. This is used in (2) to calculate the covariance of the sample mean.
Step (1) and (2) use the fact that $Cov(x)= E[xx']-\mu\mu'$
Step (2) uses the fact that $Cov(\bar{x})= \frac{1}{n}\Sigma$

— statchrist
джерело

The difficulty being in step 2 ! :)

— Elvis

@Elvis It's messy. One needs to apply the rule Cov(X+Y,Z)=Cov(X,Z) + Cov(Y,Z) and recognize that the different draws are independent. Then it's basically summing up the covariance n times and scaling it down by 1/n²

— statchrist

4

I guess one way to build intuition behind using 'n-1' and not 'n-2' is - that for calculating co-variance we do not need to de-mean both X and Y, but either of the two, i.e.

$\sum (X-\mu_x)(Y - \mu_y) = \sum (X-\mu_x)Y \ \ \ or \ \ \ \sum (Y-\mu_y)X$

— Uditg_ucla
джерело

Could you elaborate on how this bears on the question of what denominator to use? The algebraic relation in evidence derives from the fact that the residuals relative to the mean sum to zero, but otherwise is silent about which denominator is relevant.

— whuber

5

I came here because I had the same question as the OP. I think this answer gets at the nub of the point @whuber pointed out above: that the rule of thumb is that df ~= n - (parameters estimated) can be "vague, unrigorous, and potentially misleading." This points out the fact that though it looks like you need to estimate two parameters (xbar and ybar), you really only estimate one (xbar or ybar). Since the df should be the same in both cases, it must be the lower of the two. I think that is the intent here.

— mpettis

1

1) Start $df=2n$ .

2) Sample covariance is proportional to $\Sigma_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})$ . Lose two $df$ ; one from $\bar{X}$ , one from $\bar{Y}$ resulting in $df=2(n-1)$ .

3) However, $\Sigma_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})$ only contains $n$ separate terms, one from each product. When two numbers are multiplied together the independent information from each separate number disappears.

As a trite example, consider that

$24=1*24=2*12=3*8=4*6=6*4=8*3=12*2=24*1$ ,

and that does not include irrationals and fractions, e.g. $24=2\sqrt{6}*2\sqrt{6}$ , so that when we multiply two number series together and examine their product, all we see are the $df=n-1$ from one number series, as we have lost half of the original information, that is, what those two numbers were before the pair-wise grouping into one number (i.e., multiplication) was performed.

In other words, without loss of generality we can write

$(X_i-\bar{X})(Y_i-\bar{Y})=z_i-\bar{z}$ for some $z_i$ and $\bar{z}$ ,

i.e., $z_i=X_iY_i-\bar{X}Y_i-X_i\bar{Y}$ , and, $\bar{z}=\bar{X}\bar{Y}$ . From the $z$ 's, which then clearly have $df=n-1$ , the covariance formula becomes

$\Sigma_{i=1}^n\frac{z_i-\bar{z}}{n-1}=$

$\Sigma_{i=1}^n\frac{[(X_i-\bar{X})(Y_i-\bar{Y})]}{n-1}=$

$\frac{1}{n-1}\Sigma_{i=1}^n(X_i-\bar{X})(Y_i-\bar{Y})$ .

Thus, the answer to the question is that the $df$ are halved by grouping.

— Carl
джерело

@whuber How on earth did I get the same thing posted twice and deleted once? What gives? Can we get rid of one of them? For future reference, is there any way to permanently delete such duplicates? I have a few hanging around and it's annoying.

— Carl

As far as I can tell, you reposted your answer from the duplicate to here. (Nobody else has the power to post answers in your name.) The system strongly discourages posting identical answers in multiple threads, so when I saw that, it convinced me these two threads are perfect duplicates and I "merged" them. This is a procedure that moves all comments and answers from the source thread to the target thread. I then deleted your duplicate post here in the target thread. It will remain permanently deleted, but will be visible to you as well as to people of sufficiently high reputation.

— whuber

@whuber I didn't know what happens in a merge, that a merge was taking place or what many of the rules are, despite looking things up constantly. It takes time to learn, be patient, BTW, would you consider taking stats.stackexchange.com/questions/251700/… off of Hold?

— Carl