Оскільки можна обчислити довірчі інтервали для p-значень, а оскільки протилежне оцінці інтервалу - оцінка балів: Чи є p-значення бальною оцінкою?
Оскільки можна обчислити довірчі інтервали для p-значень, а оскільки протилежне оцінці інтервалу - оцінка балів: Чи є p-значення бальною оцінкою?
Відповіді:
Оцінки балів та довірчі інтервали призначені для параметрів, які описують розподіл, наприклад середнє або стандартне відхилення.
Але на відміну від інших статистичних даних вибірки, таких як середнє значення вибірки та стандартне відхилення вибірки, значення p не є корисним оцінювачем цікавого параметра розподілу. Подивіться на відповідь від @whuber для отримання технічних деталей.
Значення р для тестової статистики дає ймовірність спостереження за відхиленням від очікуваного значення тестової статистики як мінімум таким же великим, як спостерігається у вибірці, обчисленому при припущенні, що нульова гіпотеза є істинною. Якщо у вас є весь розподіл, він або відповідає нульовій гіпотезі, або це не так. Це можна описати за допомогою змінної індикатора (ще раз дивіться відповідь @whuber).
But the p-value cannot be used as an useful estimator of the indicator variable because it is not consistent as the p-value does not converge as the sample size increases if the null hypothesis is true. This is a pretty complicated alternate way of stating that a statistical test can either reject or fail to reject the null, but never confirm it.
Yes, it could be (and has been) argued that a p-value is a point estimate.
In order to identify whatever property of a distribution a p-value might estimate, we would have to assume it is asymptotically unbiased. But, asymptotically, the mean p-value for the null hypothesis is (ideally; for some tests it might be some other nonzero number) and for any other hypothesis it is . Thus, the p-value could be considered an estimator of one-half the indicator function for the null hypothesis.
Admittedly it takes some creativity to view a p-value in this way. We could do a little better by viewing the estimator in question as the decision we make by means of the p-value: is the underlying distribution a member of the null hypothesis or of the alternate hypothesis? Let's call this set of possible decisions . Jack Kiefer writes
We suppose that there is an experiment whose outcome the statistician can observe. This outcome is described by a random variable or random vector ... . The probability law of is unknown to the statistician, but it is known that the distribution function of is a member of a specified class of distribution functions. ...
A statistical problem is said to be a problem of point estimation if is the collection of possible values of some real or vector-valued property of which depends on in a reasonably smooth way.
In this case, because is discrete, "reasonably smooth" is not a restriction at all. Kiefer's terminology reflects this by referring to statistical procedures with discrete decision spaces as "tests" instead of "point estimators."
Although it is interesting to explore the limits (and limitations) of such definitions, as this question invites us to do, perhaps we should not insist too strongly that a p-value is a point estimator, because this distinction between estimators and tests is both useful and conventional.
In a comment to this question, Christian Robert brought attention to a 1992 paper where he and co-authors took exactly this point of view and analyzed the admissibility of the p-value as an estimator of the indicator function. See the link in the references below. The paper begins,
Approaches to hypothesis testing have usually treated the problem of testing as one of decision-making rather than estimation. More precisely, a formal hypothesis test will result in a conclusion as to whether a hypothesis is true, and not provide a measure of evidence to associate with that conclusion. In this paper we consider hypothesis testing as an estimation problem within a decision-theoretic framework ... .
[Emphasis added.]
Jiunn Tzon Hwang, George Casella, Christian Robert, Martin T. Wells, and Roger H. Farrell, Estimation of Accuracy in Testing. Ann. Statist. Volume 20, Number 1 (1992), 490-509. Open access.
Jack Carl Kiefer, Introduction to Statistical Inference. Springer-Verlag, 1987.
-values are not used for estimating any parameter of interest, but for hypothesis testing. For example, you could be interested in estimating population based on the sample you have, or you could be interested in interval estimate of it, but in hypothesis testing scenario you would rather compare the sample mean with population mean to see if they differ. In fact in hypothesis testing scenario you are not interested in their particular values, but rather if they are below some threshold (e.g. ). With -values you are not that much interested in their point values, but rather you want to know if your data provides enough evidence against null hypothesis. In hypothesis testing scenario, you would not be comparing different -values to each other, but rather use each of them to make separate decisions about your hypotheses. You don't really want to know anything about the hull hypothesis, as far as you know if you can reject it or not. This makes their values inseparable from the decision context and so they differ from point estimates, because with point estimates we are interested in their values per se.