My random thoughts on random events posted as a blog comment:

Suppose we have disjoint events \( (E_1,E_2,\dots,E_N),\) and corresponding probabilities for these events \( (p_1,p_2,\dots,p_N),\) where \( p_n=\mbox{Prob}(E_n)\) and \(\sum_{n=1}^N p_n=1.\) If a particular event \(E_k\) occurs, what would make us think this was in some sense "unusual" or perhaps "suspicious"? It's not enough that \(p_k\) be small, since for large \(N\), even a uniform distribution on the \(E_n\) will have \(p_k=1/N\) small. Nor is it enough that \(p_k\) be much less than \( \max_n p_n,\) since it is possible that all \(p_n\) are equal except for one event having many times larger yet still tiny probability. It's not enough if \(p_k\) is less than nearly all the other \(p_n\), because all the \(p_n\) could be very nearly equal.

What does seem to work in the cases I can think of is to choose some factor \(R \gt 1\), calculate \(\sum\{p_n: p_n \gt R p_k\}\), and see if this is close to \(1.\) To work this into a hypothesis test, we could reject the null hypothesis \(H_0\)
if $$\sum\{p_n: p_n\gt R p_k\} \gt (1-1/R),$$ though the expression on the right-hand side is rather arbitrary. With this setup, what value should \(R\) be? Let \(x_0\) be a sample we have collected, and consider the standard normal and the \(p=.05\) rule, where \(\mbox{Prob}(|x_0|\gt 1.96)=0.05.\) Then \(R=3.71,\) since \(3.71\cdot \phi(1.96) = \phi(1.1),\) and \(\mbox{Prob}(|x_0|\gt 1.1)=1/3.71.\) If we wanted \(R=20,\) we would need to use a cutoff \( |x_0|\gt 3.135749,\) which corresponds to a very small standard \(p\)-value of \(0.001714.\)

Clearly, given any \(p\) cutoff, a.k.a \(\alpha\), we can find a corresponding factor \(R,\) and vice-versa. Since the \(p=.05\) rule is arbitrary, I don't see what difference it makes for the most common cases. Thus, \(p\)-value analysis seems generally ok to me in practice. My concern here is with its justification.

## 0 comments:

Post a Comment