Статистичний висновок під неправильним визначенням

Класична обробка статистичного умовиводу спирається на припущення, що існує правильно вказана статистика. Тобто розподіл $\mathbb{P}^*(Y)$ який генерував спостережувані дані $y$ є частиною статистичної моделі $\mathcal{M}$ :

P^{*} (Y) \in M = {P_{θ} (Y) : θ \in Θ}

$\mathbb{P}^*(Y) \in \mathcal{M}=\{\mathbb{P}_\theta(Y) :\theta \in \Theta\}$ Однак у більшості ситуацій ми не можемо припустимо, що це дійсно так. Цікаво, що станеться зі статистичними процедурами висновку, якщо ми відкинемо правильно вказане припущення.

Я знайшов деяку роботу Білого 1982 року з оцінки ML під неправильним визначенням. У ній стверджується, що оцінювач максимальної ймовірності є послідовним оцінкою для розподілу

P_{θ_{1}} = \arg min_{P_{θ} \in M} K L (P^{*}, P_{θ})

$\mathbb{P}_{\theta_1}=\arg \min_{\mathbb{P}_\theta \in \mathcal{M}} KL(\mathbb{P}^*,\mathbb{P}_\theta)$ що мінімізує KL-розбіжність з усіх розподілів у статистичній моделі та справжній розподіл

P^{*}

$\mathbb{P}^*$ .

Що відбувається з оцінниками, встановленими довірою? Дозволяє переосмислити оцінювачі набору достовірності. Нехай $\delta:\Omega_Y \rightarrow 2^\Theta$ - множина оцінки, де $\Omega_Y$ - пробний простір, а $2^\Theta$ потужність, встановлена на простір параметрів $\Theta$ . Що ми хотіли б знати, це ймовірність події, що множини, що виробляються $\delta$ включають справжній розподіл $\mathbb{P}^*$ , тобто

P^{*} (P^{*} \in {P_{θ} : θ \in δ (Y)}) := A .

$\mathbb{P}^*(\mathbb{P}^* \in \{P_\theta : \theta \in \delta(Y)\}):=A.$

Однак ми, звичайно, не знаємо справжнього розподілу $\mathbb{P}^*$ . Правильно вказано припущення говорить нам , що $\mathbb{P}^* \in \mathcal{M}$ . Однак ми досі не знаємо, який саме розподіл цієї моделі. Але,

inf_{θ \in Θ} P_{θ} (θ \in δ (Y)) := B

$\inf_{\theta \in \Theta} \mathbb{P}_\theta(\theta \in \delta(Y)):=B$ є нижньою межею для ймовірності

A

$A$ . Рівняння

B

$B$ - класичне визначення рівня довіри для оцінювача, встановленого довірою.

Якщо ми відкинемо правильно вказане припущення, $B$ вже не обов'язково є нижньою межею для $A$ , терміна, який нас насправді цікавить. Дійсно, якщо припустити , що модель misspecied, яка, можливо , в більшості випадків при реальних ситуаціях, є 0, тому що справжнє розподіл не міститься в статистичній моделі . $A$ $P^*$ $\mathcal{M}$

З іншого погляду можна було б подумати про те, до чого відноситься $B$ коли модель не визначена. Це більш конкретне питання. Чи все ще має значення $B$ , якщо модель неправильно визначена. Якщо ні, то чому ми навіть заважаємо параметричній статистиці?

Я думаю, що у Білому 1982 році є деякі результати з цих питань. На жаль, моя відсутність математичного походження заважає мені розуміти багато того, що там написано.

— Джуліан Карлс
джерело

Я знайшов це питання + відповідь stats.stackexchange.com/questions/149773/… . Це дуже схоже. Читання цих книг, ймовірно, призведе до відповіді на це питання. Однак я все ще думаю, що резюме того, хто вже зробив це, було б дуже корисним.

— Джуліан Карлс

Прикро, що це питання не викликало більшої зацікавленості - посилання від Джуліана мають приємний матеріал, але мені було б цікаво почути більше думок з цього приводу.

— Флоріан Хартіг

Ну зазвичай робиться те, що розподіл тестової статистики обчислюється під нульовою гіпотезою, припускаючи, що статистична модель правильна. Якщо значення p досить низьке, то робиться висновок, що або це пов'язано з випадковістю, або що нуль помилковий. Якщо модель неправильно вказана, то це також висновок, що логічно можна зробити. Те ж саме стосується всіх інших висновків: той факт, що модель неправильно вказаний, дає альтернативний висновок. Ось як я думаю про це на основі прочитаного твору Спаноса.

— Тобі

По суті, всі моделі неправильні. Це допомагає кількісно розвинути неправильну специфікацію. Для зображення неправильне визначення - це неправильне реєстрація. Наприклад, для помилок підрахунку (наприклад, від радіоактивного розпаду) для достатньої кількості підрахунків помилка розподіляється Пуассоном. У цьому випадку неправильна реєстрація часового ряду - це похибка осі y квадратного кореня зображення, а шум - у тих самих одиницях. Приклад тут .

— Карл

Відповіді:

Нехай $y_1, \ldots, y_n$ - спостережувані дані, які, як вважається, є реалізацією послідовності iid випадкових величин $Y_1, \ldots, Y_n$ із загальною функцією густини ймовірностей $p_e$ визначеною відносно сигма-скінченного виміру $\nu$ . Щільність $p_e$ називається щільністю генерування даних (DGP).

У моделі ймовірностей дослідника ${\cal M} \equiv \{ p(y ; \theta) : \theta \in \Theta \}$ - це сукупність функцій щільності ймовірностей, які індексуються вектором параметрів $\theta$ . Припустимо, що кожна щільність у ${\cal M}$ є визначеною щодо загальної сигма-кінцевої міри $\nu$ (наприклад, кожна щільність може бути функцією маси ймовірностей з тим самим простором $S$ вибірки ).

Важливо зберегти щільність $p_e$ яка фактично генерувала дані, концептуально відмінна від моделі ймовірності даних. У класичних статистичних обробках ретельне відокремлення цих понять або ігнорується, не робиться, або вважається з самого початку, що модель ймовірності правильно вказана.

Правильно задана модель ${\cal M}$ щодо $p_e$ визначається як модель, де $p_e \in {\cal M}$ $\nu$ - майже всюди. Якщо ${\cal M}$ неправильно визначено щодо $p_e$ це відповідає випадку, коли модель вірогідності не вказана правильно.

Якщо модель ймовірності вказана правильно, то в просторі параметрів існує $\theta^*$ така що $\Theta$ $p_e(y) = p(y ; \theta^*)$ $\nu$ - майже всюди. Такий вектор параметрів називається "справжнім вектором параметрів". Якщо модель ймовірності неправильно визначена, то справжній вектор параметрів не існує.

В рамках моделі невірної специфікації Уайта мета полягає в тому, щоб знайти оцінку параметра & , що зводить до мінімуму в протягом деякого компактного простору параметрів & . Передбачається , що унікальна сувора глобальна мінімізант, , від очікуваного значення на & розташований у внутрішній частині & $\hat{\theta}_n$ $\hat{\ell}_n({\theta}) \equiv (1/n) \sum_{i=1}^n \log p(y_i ; { \theta})$ $\Theta$ $\theta^*$ $\hat{\ell}_n$ $\Theta$ $\Theta$ . In the lucky case where the probability model is correctly specified, $\theta^*$ may be interpreted as the "true parameter value".

In the special case where the probability model is correctly specified, then $\hat{\theta}_n$ is the familiar maximum likelihood estimate. If we don't know have absolute knowledge that the probability model is correctly specified, then $\hat{\theta}_n$ is called a quasi-maximum likelihood estimate and the goal is to estimate $\theta^*$ . If we get lucky and the probability model is correctly specified, then the quasi-maximum likelihood estimate reduces as a special case to the familiar maximum likelihood estimate and $\theta^*$ becomes the true parameter value.

Consistency within White's (1982) framework corresponds to convergence to $\theta^*$ without requiring that $\theta^*$ is necessarily the true parameter vector. Within White's framework, we would never estimate the probability of the event that the sets produced by δ include the TRUE distribution P*. Instead, we would always estimate the probability distribution P** which is the probability of the event that the sets produced by δ include the distribution specified by the density $p(y ; \theta^*)$ .

Finally, a few comments about model misspecification. It is easy to find examples where a misspecified model is extremely useful and very predictive. For example, consider a nonlinear (or even a linear) regression model with a Gaussian residual error term whose variance is extremely small yet the actual residual error in the environment is not Gaussian.

It is also easy to find examples where a correctly specified model is not useful and not predictive. For example, consider a random walk model for predicting stock prices which predicts tomorrow's closing price is a weighted sum of today's closing priced and some Gaussian noise with an extremely large variance.

The purpose of the model misspecification framework is not to ensure model validity but rather to ensure reliability. That is, ensure that the sampling error associated with your parameter estimates, confidence intervals, hypothesis tests, and so on are correctly estimated despite the presence of either a small or large amount of model misspecification. The quasi-maximum likelihood estimates are asymptotically normal centered at $\theta^*$ with a covariance matrix estimator which depends upon both the first and second derivatives of the negative log-likelihood function. In the special case where you get lucky and the model is correct then all of the formulas reduce to the familiar classical statistical framework where the goal is to estimate the "true" parameter values.

— RMG
джерело

$\Theta$ $\mathcal{M}$ $\mathbb{P}_{\theta_1}$ , which is the closest proxy for $\mathbb{P}^*$ in $\mathcal{M}$ . This method of looking at $\mathbb{P}_{\theta_1}$ can be extended to give interesting quantities relating to your question about the confidence sets.

Before getting to this, it is worth pointing out that the values $A$ and $B$ are mathematically well-defined in your analysis (i.e., they exist), and they still have a meaning; it is just not necessarily a very useful meaning. The value $A$ in your analysis is well-defined; it is the true probability that the inferred set of probability measures includes the true probability measure. You are correct that $\mathbb{P}^* \notin \mathcal{M}$ implies $A = 0$ , which means that this quantity is trivial in the case of misspecification. Following White's lead, it is perhaps more interesting to look at the quantity:

A^{*} \equiv A^{*} (Y) \equiv P^{*} (P_{θ_{1}} \in {P_{θ} | θ \in δ (Y)}) .

$A^* \equiv A^*(Y) \equiv \mathbb{P}^* (\mathbb{P}_{\theta_1} \in \{P_\theta | \theta \in \delta(Y) \} ).$

Here we have replaced the inner occurrence of $\mathbb{P}^*$ with its closest proxy in the model $\mathcal{M}$ , so that the quantity is no longer rendered trivial when $\mathbb{P}^* \notin \mathcal{M}$ . Now we are asking for the true probability that the inferred set of probability measures includes the closest proxy for the true probability measure in the model. Misspecification of the model no longer trivialises this quantity, since we have $\mathbb{P}_{\theta_1} \in \mathcal{M}$ by construction.

White analyses misspecification by showing that the MLE is a consistent estimator of $\mathbb{P}_{\theta_1}$ . This is valuable because it tells you that even if there is misspecification, you still correctly estimate the closest proxy to the true probability measure in the model. A natural follow-up question concerning confidence sets is whether or not a particular inference method $\delta$ imposes any lower bound on the quantity $A^*$ or any convergence result in the limit as $n \rightarrow \infty$ . If you can establish a (positive) lower bound or a (positive) convergence result, this gives you some value in guaranteeing that even if there is misspecification, you still correctly estimate the closest proxy with some probability level. I would recommend that you explore those issues, following the kind of analysis done by White.

— Reinstate Monica
джерело