Незрозуміло, якою інтуїцією може володіти читач цього питання щодо конвергенції будь-чого, не кажучи вже про випадкові змінні, тому я напишу так, ніби відповідь "дуже мало". Те , що могло б допомогти: а не думати «як може випадкова величина сходиться», запитаєте , як послідовність випадкових величин може сходитися. Іншими словами, це не просто одна змінна, а (нескінченно довгий!) Список змінних, і ті, які згодом у списку, все ближче і ближче до ... чогось. Можливо, одне число, можливо ціле розповсюдження. Щоб розвинути інтуїцію, нам потрібно розробити, що означає «ближче і ближче». Причина, що існує стільки режимів збіжності для випадкових змінних, полягає в тому, що існує кілька типів "
First let's recap convergence of sequences of real numbers. In RR we can use Euclidean distance |x−y||x−y| to measure how close xx is to yy. Consider xn=n+1n=1+1nxn=n+1n=1+1n. Then the sequence x1,x2,x3,…x1,x2,x3,… starts 2,32,43,54,65,…2,32,43,54,65,… and I claim that xnxn converges to 11. Clearly xnxn is getting closer to 11, but it's also true that xnxn is getting closer to 0.90.9. For instance, from the third term onwards, the terms in the sequence are a distance of 0.50.5 or less from 0.90.9. What matters is that they are getting arbitrarily close to 11, but not to 0.90.9. No terms in the sequence ever come within 0.050.05 of 0.90.9, let alone stay that close for subsequent terms. In contrast x20=1.05x20=1.05 so is 0.050.05 from 11, and all subsequent terms are within 0.050.05 of 11, as shown below.
I could be stricter and demand terms get and stay within 0.0010.001 of 11, and in this example I find this is true for the terms N=1000N=1000 and onwards. Moreover I could choose any fixed threshold of closeness ϵϵ, no matter how strict (except for ϵ=0ϵ=0, i.e. the term actually being 11), and eventually the condition |xn−x|<ϵ|xn−x|<ϵ will be satisfied for all terms beyond a certain term (symbolically: for n>Nn>N, where the value of NN depends on how strict an ϵϵ I chose). For more sophisticated examples, note that I'm not necessarily interested in the first time that the condition is met - the next term might not obey the condition, and that's fine, so long as I can find a term further along the sequence for which the condition is met and stays met for all later terms. I illustrate this for xn=1+sin(n)nxn=1+sin(n)n, which also converges to 11, with ϵ=0.05ϵ=0.05 shaded again.
Now consider X∼U(0,1)X∼U(0,1) and the sequence of random variables Xn=(1+1n)XXn=(1+1n)X. This is a sequence of RVs with X1=2XX1=2X, X2=32XX2=32X, X3=43XX3=43X and so on. In what senses can we say this is getting closer to XX itself?
Since XnXn and XX are distributions, not just single numbers, the condition |Xn−X|<ϵ|Xn−X|<ϵ is now an event: even for a fixed nn and ϵϵ this might or might not occur. Considering the probability of it being met gives rise to convergence in probability. For Xnp→XXn→pX we want the complementary probability P(|Xn−X|≥ϵ)P(|Xn−X|≥ϵ) - intuitively, the probability that XnXn is somewhat different (by at least ϵϵ) to XX - to become arbitrarily small, for sufficiently large nn. For a fixed ϵϵ this gives rise to a whole sequence of probabilities, P(|X1−X|≥ϵ)P(|X1−X|≥ϵ), P(|X2−X|≥ϵ)P(|X2−X|≥ϵ), P(|X3−X|≥ϵ)P(|X3−X|≥ϵ), …… and if this sequence of probabilities converges to zero (as happens in our example) then we say XnXn converges in probability to XX. Note that probability limits are often constants: for instance in regressions in econometrics, we see plim(ˆβ)=βplim(β^)=β as we increase the sample size nn. But here plim(Xn)=X∼U(0,1)plim(Xn)=X∼U(0,1). Effectively, convergence in probability means that it's unlikely that XnXn and XX will differ by much on a particular realisation - and I can make the probability of XnXn and XX being further than ϵϵ apart as small as I like, so long as I pick a sufficiently large nn.
A different sense in which XnXn becomes closer to XX is that their distributions look more and more alike. I can measure this by comparing their CDFs. In particular, pick some xx at which FX(x)=P(X≤x)FX(x)=P(X≤x) is continuous (in our example X∼U(0,1)X∼U(0,1) so its CDF is continuous everywhere and any xx will do) and evaluate the CDFs of the sequence of XnXns there. This produces another sequence of probabilities, P(X1≤x)P(X1≤x), P(X2≤x)P(X2≤x), P(X3≤x)P(X3≤x), …… and this sequence converges to P(X≤x)P(X≤x). The CDFs evaluated at xx for each of the XnXn become arbitrarily close to the CDF of XX evaluated at xx. If this result holds true regardless of which xx we picked, then XnXn converges to XX in distribution. It turns out this happens here, and we should not be surprised since convergence in probability to XX implies convergence in distribution to XX. Note that it can't be the case that XnXn converges in probability to a particular non-degenerate distribution, but converges in distribution to a constant. (Which was possibly the point of confusion in the original question? But note a clarification later.)
For a different example, let Yn∼U(1,n+1n)Yn∼U(1,n+1n). We now have a sequence of RVs, Y1∼U(1,2)Y1∼U(1,2), Y2∼U(1,32)Y2∼U(1,32), Y3∼U(1,43)Y3∼U(1,43), …… and it is clear that the probability distribution is degenerating to a spike at y=1y=1. Now consider the degenerate distribution Y=1Y=1, by which I mean P(Y=1)=1P(Y=1)=1. It is easy to see that for any ϵ>0ϵ>0, the sequence P(|Yn−Y|≥ϵ)P(|Yn−Y|≥ϵ) converges to zero so that YnYn converges to YY in probability. As a consequence, YnYn must also converge to YY in distribution, which we can confirm by considering the CDFs. Since the CDF FY(y)FY(y) of YY is discontinuous at y=1y=1 we need not consider the CDFs evaluated at that value, but for the CDFs evaluated at any other yy we can see that the sequence P(Y1≤y)P(Y1≤y), P(Y2≤y)P(Y2≤y), P(Y3≤y)P(Y3≤y), …… converges to P(Y≤y)P(Y≤y) which is zero for y<1y<1 and one for y>1y>1. This time, because the sequence of RVs converged in probability to a constant, it converged in distribution to a constant also.
Some final clarifications:
- Although convergence in probability implies convergence in distribution, the converse is false in general. Just because two variables have the same distribution, doesn't mean they have to be likely to be to close to each other. For a trivial example, take X∼Bernouilli(0.5)X∼Bernouilli(0.5) and Y=1−XY=1−X. Then XX and YY both have exactly the same distribution (a 50% chance each of being zero or one) and the sequence Xn=XXn=X i.e. the sequence going X,X,X,X,…X,X,X,X,… trivially converges in distribution to YY (the CDF at any position in the sequence is the same as the CDF of YY). But YY and XX are always one apart, so P(|Xn−Y|≥0.5)=1P(|Xn−Y|≥0.5)=1 so does not tend to zero, so XnXn does not converge to YY in probability. However, if there is convergence in distribution to a constant, then that implies convergence in probability to that constant (intuitively, further in the sequence it will become unlikely to be far from that constant).
- As my examples make clear, convergence in probability can be to a constant but doesn't have to be; convergence in distribution might also be to a constant. It isn't possible to converge in probability to a constant but converge in distribution to a particular non-degenerate distribution, or vice versa.
- Is it possible you've seen an example where, for instance, you were told a sequence XnXn converged another sequence YnYn? You may not have realised it was a sequence, but the give-away would be if it was a distribution that also depended on nn. It might be that both sequences converge to a constant (i.e. degenerate distribution). Your question suggests you're wondering how a particular sequence of RVs could converge both to a constant and to a distribution; I wonder if this is the scenario you're describing.
- My current explanation is not very "intuitive" - I was intending to make the intuition graphical, but haven't had time to add the graphs for the RVs yet.