Bad Outcomes: A Comment Exchange With Scott Sumner

by Charlie Clarke

Scott Sumner and I had an exchange in the comments of his blog in this post. I hope I wasn't being a jerk about it, because I like Scott so much, but I vehemently disagree with a point he made at the end of the post:

Scott:

Here’s one (anonymous) criticism of this interpretation:

With regards to Medicaid, Chetty also paints a surprisingly incomplete picture of the Oregon Medicaid experiment. As you will recall, Chetty is correct in pointing out that expanding Medicaid seems to have increased usage of health care, decreased financial strain, improved mental health, and improved self reported well being, but he, quite surprisingly given the caliber economist Chetty is, leaves out the less flattering (for supporters of the ACA) part of the study that found no statistically significant increase in objective measures of physical health for patients who received Medicaid.

At best, the Medicaid study was a mixed result for supporters of expanding the Medicaid program (which the ACA does quite dramatically). At worst, the study is a sad demonstration of how bad Medicaid (and perhaps insurance in general) is at improving objective physical health. Why Chetty presented this study as an unambiguous victory for the pro Medicaid crowd is a mystery to me (although I suspect support of ACA has something to do with it)?

In my view the Oregon Medicaid study provides support for replacing Medicaid with a new program called “Mediplacebo.” I think the improvements in mental health identified in the Oregon study were real, and were important. But surely they can be produced at much lower cost. I know that every time I’ve had “cancer,” I’ve felt much better after going to the doctor and being told that I don’t have cancer. Under my plan, consumers would receive the same care provided to the uninsured for things like traffic accidents. For those health problems where the uninsured would not normally receive coverage, health consumers would receive a placebo.

I am really disagreeing with anonymous, but it is clear Scott agrees, and I'd rather argue with Scott.

Me:

The Oregon study showed that Medicaid improved most health care outcomes, but that many of the improvements were not statistcally significant. The tests, we know now, were underpowered. There just weren’t enough people with certain health conditions in the study to show an effect even though the point estimates were quite reasonable. It’s just too much nuance for Chetty to even try to convey and for no gain. “Anonymous’s” description is much more misleading. The worst kind of “cult of stastical significance.”

Scott:

Charlie, I agree that there is way too much emphasis on statistical significance, but your comment is way off base. If there is no statistical significance you have NOTHING. It’s quite right that it doesn’t prove there was no effect (the study was too small), it doesn’t prove anything. But the bottom line is that the Oregon study found no statistically significant physical improvement in health. I think if you asked the average voter whether that finding was important, they’d give you a very different answer from Chetty. Feelings of “subjective well being” are all well and good, but they most certainly are not the reason voters support Medicaid.

Me:

“If there is no statistical significance you have NOTHING.”

I agree you have learned nothing with any confidence, but we absolutely did not learn anything about medicaid NOT improving health outcomes. The question is why you want Raj Chetty to report on everything a study doesn’t learn about. The Oregon Health study didn’t learn anything exchange rates, should he also report that?

“But the bottom line is that the Oregon study found no statistically significant physical improvement in health. I think if you asked the average voter whether that finding was important, they’d give you a very different answer from Chetty.”

Only if they were confused!! It’s almost as if you think the study found that the confidence interval around the health outcomes shows the effects were bound to be small. That is emphatically not what the study showed. In many cases, the point estimates were large and economically significant, but the standard errors were huge. That’s not the same as learning the effects were in some small range around zero. Bottom line, (get Bayesian) an informed reader would not have updated his views about the effectiveness of medicaid on health outcomes, there just wasn’t enough power.

Please consider this: I’ve seen you analyze the stock market reactions to Fed announcements time and time again. Never once have I seen you note that the estimate was not statistically significant, even though you had one data point, your standard errors were infinite! I agree with that Scott Sumner, not this one.

Scott:

Charlie, Let me put it this way. If the Oregon study had found a statistically signficant impact on physical health, I GAURANTEE liberals would not have said “please ignore this result, the sample size is too small.” Well you

can’t have it both ways. You can’t cherry puck the results you like and ignore those you don’t like. They tested for those things, so they should report the findings, even if inconclusive.

Here’s another way of putting it. Suppose Chetty’s column had been in the NYT. I’ll bet if you asked NYT readers a day later what he had said, 99% would “remember” that he found Medicaid improved physical health, if they remembered anything at all.

My reports on stock market reactions are far more significant that you assume. If you looked at daily market reactions the results would not be particularly significant. But if you look at the market reaction immediately after the data hits the market, the result becomes highly significant, as the average variability of asset prices approaches zero as the time frame approaches zero.

I’m not saying everything I report is statistically significant, but when it’s not I almost always caution that the results are merely suggestive. I also look for evidence that the timing of the market reaction was linked to the timing of the policy announcement. Feel free to show me blog posts where I reported more significance than was warranted. I try to be careful, but probably err on occasion.

Me:

Scott you said:

“Let me put it this way. If the Oregon study had found a statistically signficant impact on physical health, I GAURANTEE liberals would not have said “please ignore this result, the sample size is too small. Well you can’t have it both ways.”

But that is exactly wrong. You can absolutely have it both ways. That’s what low power means. If a test has low power, but still has statistical significance, it means even though the effect is measured with lots of error that the estimate is so large, we can still be confident it’s different that zero. The sample size is going to determine how wide the error bars are, but if with wide error bars the effect is still significantly different from zero, it’s completely reasonable to take that as strong evidence the effect is not zero.

Yet, the other way doesn’t work. If the effect is economically meaningful, but the confidence interval is very large, then you can’t conclude much. Possibly the effect is very economically meaningful, possibly its zero or has the opposite sign. We just don’t know.

“But if you look at the market reaction immediately after the data hits the market, the result becomes highly significant, as the average variability of asset prices approaches zero as the time frame approaches zero.”

First, I don’t think I’ve ever seen you try to put a confidence interval on your posts. Have you ever?

Next, I’m willing to accept the implicit assumption you’ve made that the event being studied doesn’t change the variance (which allows you to estimate standard errors to begin with–otherwise you have infinite errors and only 1 observation).

Last, it’s true that variance of returns go to zero as the observation period gets smaller, but stock returns at high frequencies are highly non-gaussian. They have high skew and fat tails. Traditional t-tests are misspecified. These problems are surmountable, but I’ve never seen you try to surmount them.

Again, I’m not saying you are wrong. I’m saying you have a lot of valuable things to say even though you don’t compute confidence intervals.

“I’m not saying everything I report is statistically significant, but when it’s not I almost always caution that the results are merely suggestive.”

So Raj is actually being much more cautious than you. In a Scott Sumner world, he could have said, “additionally, the evidence suggests that Medicaid helped lower the percent of patients with high blood pressure and high cholesterol, as well as improve markers linked to the health of patients with diabetes, though this improvements did not reach the level of statistical significance.”

I think not saying anything is much more cautious and less misleading, but maybe this is somewhere we just have disagreement. To me, if you have a wide confidence interval that includes big effects, small effects, zero effects and negative effects, just don’t report it to lay people as it will probably confuse them. But you do indeed report such suggestive evidence all the time to your readers, and I appreciate the discussion.

I wonder how many people who read your post or Anonymous’s post would, when asked the next day, answer that the effects on physical health were “small” or “near zero,” rather than “possibly economically important but measure with a great deal of error.”

Scott:

Charlie, You said;

“But that is exactly wrong. You can absolutely have it both ways.”

If you’d think about what you said here, you realize it can’t be right. Otherwise you could go into study X with Z prior belief. Study X could shift Z in one direction but not the other. That’s clearly impossible. No study (going in) has the potential to make it more probable that Medicaid is effective, but does not have even the possibility of making it less probable. That violates the laws of statistics.

Me:

“If you’d think about what you said here, you realize it can’t be right. Otherwise you could go into study X with Z prior belief. Study X could shift Z in one direction but not the other. That’s clearly impossible.”

You are all over the place now. First you want to read a hypothesis test, “If there is no statistical significance you have NOTHING.” Now you want to get Bayesian, which is exactly where I started. An informed Bayesian interpretation of how Medicaid effects physical health outcomes wouldn’t have moved the priors very much, because the standard errors of this study were large. But the little bit they did move, would NOT have moved the estimated effect of Medicaid on physical health towards zero, because of the RESULTS of the study. The point estimates showed positive effects on physical health around the size expected by informed priors.

If you came in to the study with uninformed priors, the point estimate (your best guess) would still be that medicaid improves health outcomes for patients with diabetes, high blood pressure, and high cholesterol. It’s just that you would have a lot of uncertainty about those guesses (the distribution around you point estimate is wide–and includes zero).

You are imagining some other study with different results that should move us towards the belief that medicaid has zero effect on health outcomes, but that study doesn’t exist.

Consider the example you gave, statistical significance doesn’t compare how close Z is to X, it compares how close Z is to zero. In this study, Z was approximately equal to X, but Z and X, together, were close enough to zero that given the low power of the study, we couldn’t reject that Z is equal to 0.

Me again:

One more point said a slightly different way using hypothesis testing. The hypothesis test (to use your example) is “Is Z different from zero?” The answer is there is not statistically significant evidence that it is. We could also ask, “Is Z different than X?” The answer is “there is not statistically significant evidence that it is.” Yet, you want people to revise X toward zero, why?

I imagine Scott is tired of the exchange, but I'll update if he responds.

Thursday, October 24, 2013

A Comment Exchange With Scott Sumner

1 comment:

Contributors