Інтуїтивне пояснення щільності перетвореної змінної?

Припустимо, $X$ - випадкова величина з pdf . Тоді випадкова величина має pdf $f_X(x)$ $Y=X^2$

f_{Y} (y) = {\begin{cases} \frac{1}{2 \sqrt{y}} (f_{X} (\sqrt{y}) + f_{X} (- \sqrt{y})) & y \geq 0 \\ 0 & y < 0 \end{cases}

$f_Y(y)=\begin{cases}\frac{1}{2\sqrt{y}}\left(f_X(\sqrt{y})+f_X(-\sqrt{y})\right) & y \ge 0 \\ 0 & y \lt 0\end{cases}$

Я розумію обчислення, що стоїть за цим. Але я намагаюся придумати спосіб пояснити це тому, хто не знає обчислення. Зокрема, я намагаюся пояснити, чому фактор $\frac{1}{\sqrt{y}}$ з'являється спереду. Я застрелю його:

Припустимо, $X$ має гауссова розподіл. Майже вся вага його pdf знаходиться між значеннями, скажімо, і Але це відображає від 0 до 9 для $-3$ $3.$ $Y$ . Таким чином, важка вага в форматі PDF для $X$ був розширений в більш широкому діапазоні значень в переході до $Y$ . Таким чином, щоб $f_Y(y)$ був справжнім pdf, надмірна вага повинен бути зменшений за рахунок мультиплікативного коефіцієнта $\frac{1}{\sqrt{y}}$

Як це звучить?

Якщо хтось може надати краще пояснення свого власного або посилання на таке в документі чи підручнику, я дуже вдячний. Цей приклад змінної трансформації я знаходжу в кількох книгах про введення математичної ймовірності / статистики. Але я ніколи не знаходжу з цим інтуїтивного пояснення :(

random-variable pdf intuition

— низький рівень
джерело

Я думаю, що ваше пояснення правильне.

— highBandWidth

Пояснення правильне, але чисто якісне: точна форма мультиплікативного фактора все ще залишається загадкою. Потужність -1/2 просто виявляється магічно. Таким чином, на певному рівні вам потрібно зробити те саме, що робить числення: знайти швидкість зміни функції квадратного кореня.

— whuber

Відповіді:

PDF-файли є висотою, але вони використовуються для відображення ймовірності за допомогою площі. Таким чином, це допомагає виразити PDF таким чином, що нагадує нам, що площа дорівнює висоті, кратній базі.

Спочатку висота при будь-якому значенні $x$ задається PDF $f_X(x)$ . Основою є нескінченно малий відрізок $dx$ , звідки розподіл (тобто міра ймовірності на відміну від функції розподілу ) насправді є диференціальною формою, або "елементом ймовірності".

{PE}_{X} (x) = f_{X} (x) d x .

$\operatorname{PE}_X(x) = f_X(x) \, dx.$

Це, а не PDF, - це об'єкт, з яким ви хочете працювати як концептуально, так і практично, оскільки він чітко включає всі елементи, необхідні для вираження ймовірності.

Коли ми повторно виражаємо $x$ через $y = x^2$ , базові відрізки $dx$ розтягуються (або стискаються): квадратуючи обидва кінці інтервалу від $x$ до $x + dx$ ми бачимо, що основа області $y$ повинна бути інтервалом довжини

d y = (x + d x)^{2} - x^{2} = 2 x d x + (d x)^{2} .

$dy = (x + dx)^2 - x^2 = 2 x \, dx + (dx)^2.$

Оскільки добуток двох нескінченнихмалій є мізерним порівняно з самими нескінченнимимашинами, ми робимо висновок

d y = 2 x d x, whence d x = \frac{d y}{2 x} = \frac{d y}{2 \sqrt{y}} .

$dy = 2 x \, dx, \text{ whence }dx = \frac{dy}{2x} = \frac{dy}{2\sqrt{y}}.$

Встановивши це, розрахунок є тривіальним, оскільки ми просто підключаємо нову висоту та нову ширину:

{PE}_{X} (x) = f_{X} (x) d x = f_{X} (\sqrt{y}) \frac{d y}{2 \sqrt{y}} = {PE}_{Y} (y) .

$\operatorname{PE}_X(x) = f_X(x) \, dx = f_X(\sqrt{y}) \frac{dy}{2\sqrt{y}} = \operatorname{PE}_Y(y).$

Because the base, in terms of $y$ , is $dy$ , whatever multiplies it must be the height, which we can read directly off the middle term as

\frac{1}{2 \sqrt{y}} f_{X} (\sqrt{y}) = f_{Y} (y) .

$\frac{1}{2\sqrt{y}}f_X(\sqrt{y}) = f_Y(y).$

This equation $\operatorname{PE}_X(x) = \operatorname{PE}_Y(y)$ is effectively a conservation of area (=probability) law.

Two pdfs

This graphic accurately shows narrow (almost infinitesimal) pieces of two PDFs related by $y=x^2$ . Probabilities are represented by the shaded areas. Due to the squeezing of the interval $[0.32, 0.45]$ via squaring, the height of the red region ( $y$ , at the left) has to be proportionally expanded to match the area of the blue region ( $x$ , at the right).

— whuber
джерело

I love infinitesimals. This is a wonderful explanation. Thinking in terms of the

2 x

$2x$ , which can be clearly seen to emerge from the derivative of the transform, is much more intuitive than thinking in terms of the

\sqrt{y}

$\sqrt{y}$ . I think that's where my sticking point was.

— lowndrul

@whuber, I believe you first line should be

P (X \in (x, x + d x)) = f_{x} (x) d x

$P(X \in (x, x + dx)) = f_{x}(x)dx$ ? Is that what you mean by

{pdf}_{X} (x)

$\text{pdf}_{X}(x)$ ? PS: also curious about your thoughts on my answer (below).

— Carlos Cinelli

@Carlos It's a little more rigorous to express the idea in the way I did at the outset: the PDF is what you multiply the Lebesgue measure

d x

$\mathrm{d}x$ by in order to get the given probability measure.

— whuber

@whuber but if the pdf is what you multiply then it is the term

f_{X} (x)

$f_{X}(x)$ , not the product

f_{x} (x) d x

$f_{x}(x)dx$ as you wrote, right? It is not clear why you call the product

f_{X} (x) d x

$f_{X}(x)dx$ a pdf.

— Carlos Cinelli

@Carlos: thank you; now I see your point. I made some edits to address it.

— whuber

How about, if I manufacture objects that are always square and I know the distribution of the side lengths of the squares; what can I say about the distribution of the areas of the squares?

In particular, if I know the distribution of a random variable $X$ , what can I say about $Y = X^{2}$ ? One thing that you can say is

\begin{aligned} F_{Y} (c) & = & P (Y \leq c) \\ = & P (X^{2} \leq c) \\ = & P (- \sqrt{c} \leq X \leq \sqrt{c}) \\ = & F_{X} (\sqrt{c}) - F_{X} (- \sqrt{c}) . \end{aligned}

$\eqalign{ F_{Y} (c) & = & P( Y \le c ) \\ & = & P( X^{2} \le c ) \\ & = & P ( - \sqrt{c} \le X \le \sqrt{c}) \\ & = & F_{X}( \sqrt{c} ) - F_{X}( - \sqrt{c} ). \\ }$

So a relationship is established between the CDF of $Y$ and CDF of $X$ ; what is the relationship between their PDFs? We need calculus for that. Taking the derivatives of both sides gives you the results you wanted.

— schenectady
джерело

(+1) Although this is not a full answer, it presents a good way to go about finding

f_{Y}

$f_Y$ and clearly shows why it is a sum of two pieces, one for each square root.

— whuber

I don't get why pdf(x) = f(x)dx. What about pdf(x) dx = f(x), density = prob mass/interval...what i'm getting wrong?

— Fernando

Imagine we have a population and $Y$ is a summary of that population. Then $P(Y \in (y, y + \Delta y))$ is counting the proportion of individuals that have variable $Y$ in the range $(y, y + \Delta y)$ . You can consider this as a "bin" of size $\Delta y$ and we are counting how many individuals are inside that bin.

Now let us re-express those individuals in terms of another variable, $X$ . Given that we know that $Y$ and $X$ are related as $Y = X^2$ , the event $Y\in (y, y + \Delta y)$ is the same as the event $X^2 \in (x^2, (x + \Delta x)^2)$ which is the same as the event $X \in (|x|, |x| + \Delta x)~ \text{or}~ X \in (- |x| -\Delta x, -|x| )$ . Thus, the individuals that are in the bin $(y, y + \Delta y)$ must also be in the bins $(|x|, |x| + \Delta x)$ and $(- |x| -\Delta x, -|x| )$ . In other words, those bins must have the same proportion of individuals,

\begin{aligned} P (Y \in (y, y + Δ y)) & = P (X \in (| x |, | x | + Δ x)) + P (X \in (- | x | - Δ x, - | x |)) \end{aligned}

$\begin{align} P(Y \in (y, y + \Delta y)) &=P\left( X \in (|x|, |x| + \Delta x) \right) + P\left( X \in (- |x| -\Delta x, -|x| )\right) \end{align}$

Ok, now let's get to the density. First, we need to define what a probability density is. As the name suggests, it is the proportion of individuals per area. That is, we count the share of individuals on that bin and divide by the size of the bin. Since we have established that the proportions of people are the same here, but the size of the bins have changed, we conclude the density will be different. But different by how much?

As we said, the probability density is the proportion of people in the bin divided by the size of the bin, thus the density of $Y$ is given by $f_Y(y):=\frac{P(Y \in (y, y + \Delta y))}{\Delta y}$ . Analogously, the probability density of $X$ is given by $f_X(x):=\frac{P(X \in (x, x + \Delta x))}{\Delta x}$ .

From our previous result that the population in each bin is the same we then have that,

\begin{aligned} f_{Y} (y) := \frac{P (Y \in (y, y + Δ y))}{Δ y} & = \frac{P (X \in (| x |, | x | + Δ x)) + P (X \in (- | x | - Δ x, - | x |))}{Δ y} \\ = \frac{f_{X} (| x |) Δ x + f_{X} (- | x |) Δ x}{Δ y} \\ = \frac{Δ x}{Δ y} (f_{X} (| x |) + f_{X} (- | x |)) \\ = \frac{Δ x}{Δ y} (f_{X} (\sqrt{y}) + f_{X} (- \sqrt{y})) \end{aligned}

$\begin{align} f_Y(y):=\frac{P(Y \in (y, y + \Delta y))}{\Delta y} &= \frac{P\left( X \in (|x|, |x| + \Delta x) \right) + P\left( X \in (- |x| - \Delta x, -|x| )\right)}{\Delta y} \\ &= \frac{f_X(|x|)\Delta x + f_{X}(-|x|)\Delta x}{\Delta y}\\ &= \frac{\Delta x}{\Delta y} \left(f_X(|x|) + f_{X}(-|x|) \right)\\ &= \frac{\Delta x}{\Delta y} \left(f_X(\sqrt{y}) + f_{X}(-\sqrt{y}) \right) \end{align}$

That is, the density $f_X(\sqrt{y}) + f_{X}(-\sqrt{y})$ changes by the factor $\frac{\Delta x}{\Delta y}$ , which is the relative size of stretching or squeezing the bin size. In our case, since $y = x^2$ we have that $y + \Delta y = (x + \Delta x )^2 = x^2 + 2x \Delta x + \Delta x^2$ . If $\Delta x$ is tiny enough we can ignore $\Delta x ^2$ , which implies $\Delta y = 2x \Delta x$ and $\frac{\Delta x}{\Delta y} = \frac{1}{2x} = \frac{1}{2 \sqrt{y}}$ , and that is why the factor $\frac{1}{2 \sqrt{y}}$ shows up in the transformation.

— Carlos Cinelli
джерело