Sunday, April 1, 2007

The Difference Between Statistically Significant and Not Statistically Significant is Not Statistically Significant

By Andrew Gelman and Hal Stern.

The difference between 25+-10 (statistically significantly different from zero) and 10+-10 (not statistically significantly different from zero) is 15+-14 -- not statistically significant.

A good example was the research linking homosexuality to the number of older siblings -- number of older brothers was statistically significant, number of older sisters was not, but the difference between the two effects was not statistically significant!

Gerd Gigerenzer has an article "Mindless Statistics" criticizing the use of statistics in the field of psychology. Psychologists use the "null ritual" to publish papers and advance their careers, but don't take statistics seriously. Practicioners suffer from cognitive illusions which emphasize the "usefulness" of finding a statistically significant result.

If an effect with p<0.01 is found, the probability of the null hypothesis being true is not known, nor is the probability that the effect will be found in future experiments.

3 comments:

Anonymous said...

I can easily imagine that psychologists pay too much attention to statistics because it advances their careers. But I'm a little confused about using statistical significance to debunk itself. It seems like circular reasoning? Or maybe I'm not thinking about this right.

SNOOZY said...

OK, so that wasn't clear.

The first part -- the "difference between significant and not blah blah" -- is not debunking statistical significance, but rather pointing out a potential misuse.

For example, the study on homosexuality and siblings found a significant effect for older brothers but not for older sisters. This piqued the authors' interest -- why brothers and not sisters? It turns out that the effect in both cases was positive, but only the brothers effect was significant. On the other hand, the difference between the brothers effect and the sisters effect was not significant.

In published reports, the sisters effect was downplayed because it was not significant (reporters said things like "only the number of older brothers is important"), even though the difference between the brother effect and the sister effect was not significant. The error here is downplaying the sister effect simply because it didn't pass the test of statistical significance.

Lev said...

that's really neat. also, i like the monkey.