Attractive Models

Via Jeremy Freese, a paper by Alan Gerber and Neil Malhotra called “Can political science literatures be believed? A study of publication bias in the APSR and the AJPS.” Here’s the main finding.

When you run a bog-standard regression, you typically want to know how much a change in some variable x—usually a number of such x variables—is associated with a change in y, some outcome variable of interest. When you run the regression, you get a coefficient for each x telling you how much a one-unit change in that x changes the value of y according to your data. But you also want to know whether that estimate is worth paying attention to. So you calculate a statistic—a p-value—that tells you, roughly, whether the coefficient you got is relatively unexpected or unusual. How unexpected is unexpected enough to be interesting? This is a matter of convention. There is an established threshold below which you are not typically entitled, by the agreed or inherited standards of your field, to say the result is “statistically significant.” The subtleties of interpreting p-values need not detain us here. The point is that for good or bad there’s a conventional threshold. Most often, the line you have to cross is a p-value that is < 0.05.

Now, if you write a paper describing negative results—a model where nothing is significant—then you may have a hard time getting it published. In the absence of some specific controversy, negative results are boring. For the same reason, though, if your results just barely cross the threshold of conventional significance, they may stand a disproportionately better chance of getting published than an otherwise quite similar paper where the results just failed to make the threshold. And this is what the graph above shows, for papers published in the American Political Science Review. It’s a histogram of p-values for coefficients in regressions reported in the journal. The dashed line is the conventional threshold for significance. The tall red bar to the right of the dashed line is the number of coefficients that just made it over the threshold, while the short red bar is the number of coefficients that just failed to do so. If there were no bias in the publication process, the shape of the histogram would approximate the right-hand side of a bell curve. The gap between the big and the small red bars is a consequence of two things: the unwillingness of journals to report negative results, and the efforts of authors to search for (and write up) results that cross the conventional threshold.

Political Science—or social science in general—is not especially to blame here. There’s an ongoing controversy in clinical trials of drugs, where pharmaceutical companies are known to conduct a large number of separate trials for a new product, and then publish only the ones that yielded significant results. (An important fact about statistics is that almost everything has a distribution—or you can hold your nose and pretend it does—including expected results from multiple clinical trials: if you try often enough you will get the result you want just by chance.) If there are enough published trials, techniques like meta-analysis can help reveal the number of “missing” trials—the ones that were done but not published, or just not done at all.

This all reminds me of an old joke, a shaggy dog story about a man who gets thrown into a jail cell with a long-term occupant and then begins a series of attempts to escape, each by some different method. He fails every time, getting captured and thrown back in the cell. The older prisoner looks at him silently after each failure. Finally, after six or seven attempts, the man loses his patience with the old prisoner and says “Well, couldn’t you help me a little?” “Oh,” says the old guy, “I’ve tried all the ways you thought of—they don’t work.” “Well why the hell didn’t you tell me?!” shouts the man. “Who reports negative results?” says the old prisoner.