Bad Outcomes: October 2013

Thursday, October 31, 2013

This Post is Secretly About Diction

By Robert H.

Leo Strauss argued that truly ground breaking thinkers have had to disguise their work ever since Socrates got hemlocked by the idiot masses. Just so, Strauss thought many important texts were written on two levels: 1. with an exoteric meaning, a straightforward reading acceptable to prevailing sentiment, and 2. With an esoteric meaning, a covert reading that advances arguments too troubling or unacceptable to be openly shared with mainstream audiences and local elites. The esoteric meaning is revealed through the author sprinkling in symbols, deliberate contradictions that weaken his straightforward argument, etc.

I bring it up because, in a Straussian critique of Straussian scholarship, Tyler Cowen has been writing tongue-in-cheek Straussian reviews of popular films.

I bring that up because the AV club just published the first straight-faced Straussian review of a movie I've ever seen in the mainstream press. Apparently Ender's Game, a pulpy science fiction movie about space Harry Potter also being Space Horatio Hornblower (sorry, Honor), is a bad movie because it is actually a subtle attempt to advance intentionalist ethics, the very thing Charlie and I have been arguing about in a sports context. The review is really an epic exercise in writing horses---; I highly recommend it.

Protip: The esoteric meaning of this post is that I like saying the word "Straussian."

Tuesday, October 29, 2013

Put him in a body bag!

By Robert H.

Charlie points out that I didn't address reputational issues in my last post. It was a big oversight, so I'll do that here:

First, it is clearly true that violating norms in sports can be unethical. If there is a norm rather than a rule against punching, you still shouldn't punch. But that doesn't actually tell us anything about whether intentionally injuring a player is unethical -- the punch is unethical regardless of your intent. We are in the same place as my last post, only now I am saying "it's ethical to try to injure other players within the rules *and norms*."

So, Imagine a league has a norm against trying to injure other players with legal hits. Point one: this is a bad norm! Sports aren't a court room with a lot of time to dig into motives, they're fast affairs full of thousands of decisions and actions. Norms should stick to policing obvious actions. Stuff like "don't hit a boxer who wants to touch gloves at the start of a fight." To the extent intent is looked at, it should be intent clearly discernible from the action. An intentional late hit looks different from a guy who simply can't avoid the runner after the whistle is blown. But when you go beyond "was that action intentional" and get to "what was the intent of that action," that is some hazy stuff. Did Roy Williams horse collar that guy because it is an effective tackle and he wanted to take him down? Or because it is a dangerous tackle and he wanted to hurt him? Maybe courtrooms could get to the bottom of that, but I'm not sure refs, players, and spectators can. Again, better to police people intentionally going for the knees to injure players with a rule (or norm) against intentionally going for the knees than a rule (or norm) against intentionally injuring players. It's just too hard to discern motive.

That said, a way out is to use reputation. Maybe over the course of his career someone who tries to injure other players will injure more guys, people will notice, he will get a bad reputation and other players will retaliate. My thoughts about this:

1. This rarely happens. When I think of players with bad reputations it's normally because they routinely engaged in acts that clearly violated the rules and norms -- they cheat. If a guy twists ankles in the pile a lot the ref may not notice, but the players do.

It's more rare for a guy to get a reputation simply because bad things seem to happen around him more than usual (ie, he has injured more people over his career). The only thing I can think of is pitchers: if you "accidentally" beam a lot if batters, pretty soon people will decide it isn't an accident.

2. The norm is clearly a good *strategic* reason not to try to injure. That said

3. When the norm is enforced with cheating, it's clearly not ethically binding. "Don't try to hurt other players because if you do people will cheat against you in retaliation"? That's a ridiculous norm and using the threat of cheating to constrain your opponents play is as bad as cheating itself. At best, cheating can be a strategic decision -- we will give up the penalty yardage and cheat in order to get the outcome we want. But while it might be good strategy, I have trouble seeing it as an ethically binding norm on the other guys. "If we try to eat up the clock they will foul us, so we have an ethical norm to not eat up the clock" is crazy thinking.

4. At last the hardest question, what to say when the norm is enforced with threats of licit retaliation. Imagine the norm is enforced with a sort of mutually assured destruction: if you try to legally injure players people will notice the greater number of injuries and try to injure you right back. Has that created an ethically binding norm? This puts tension on some of my claims (it seems more workable than other options, it removes any "I have to do it because I don't want the other guy to get an advantage" rational, and it multiplies the harm of intentionally injuring, since it will push the whole sport in a more violent direction). I am going to tentatively say that I don't think so. Remember point two of my last post: the outcome of blowing up the norm is that it will lead to injuries society neither abhors nor thinks people should not be able to consent to. This isn't really mutually assured destruction, where breaking the norm means nuclear annihilation. It's a sport, and breaking the agreement means some marginally greater number of injuries everyone has already consented to risk. They knew these types of injuries happen and they knew a breakdown in this unofficial norm could happen when they took the field, marginally increasing the incidence of this type of injury.

To me, the consent issue clinches it. It could be strategically very dumb to start legally injuring a greater number of players than usual, but I don't think it is unethical. If it's within the rules, if everyone has consented to that possibility, then whether to do it or not is as much a strategic question as whether to tackle high or low.

And again, in practiced I think this norm enforced this way is rare. People don't dislike Meriweather because he has violated a norm against intentionally injuring people, they dislike him because he repeatedly employs dangerous, illegal hits. Bad reputations are normally about repeated cheating, not dark motives revealed by statistically aberrant outcomes over years of play.

Well that was rambley.

Sweep the Leg, Johnny

By Robert H.

Today Brandon Meriweather furthered his scheme to become American football's first hockey thug, vowing to tackle more players low in an attempt to injure their ACLs. This means it's time for me to endorse one of my least popular opinions: it is moral for players in contact sports to intentionally injure each other, provided they don't break the rules (ie, targeting a runner's knee instead of his thigh in the hopes of badly hurting him is moral). Here's why (throughout I assume it's legal to tackle someone by the knees, but if I missed a rule change and it is not then I don't think that undermines the basic logic):

1. Hitting people in contact sports is legal because athletes impliedly consent to a certain level of violence when they play the sport. You can argue about what kind of illegal hits players have consented to, but it seems pretty clear that they have consented to anything legal. You can't be shocked and outraged when the other guys play by the rules.

While the above deals with the law, I think this legal reasoning has moral weight: if someone consents to be hit then A. you aren't just using them as a means, you are engaging in a fair contest, and B. The consensual relationship is very likely utility maximizing.

2. These aren't the kinds of harms we want to forbid even if people consent. You can fairly say "no one can consent to be murdered," but "no one can consent to a hit that tears a ligament or breaks a bone?" Those hits happen all the time in contact sports, even when players aren't trying to injure each other. If I can't consent to someone hitting my knee because ACL tears are so bad, *hitting knees* is unethical, not intentional injuries. You should get just as mad as players who do it without the intent to injure and should promptly make the tackle illegal (see 4).

3. It is not mere sadism: injuring a player takes them out of the game and weakens their team. It isn't just viscousness for the sake of viscousness, it's an attempt to win the athletic contest within the rules of the contest.

4. "Don't intentionally injure" is a norm that breeds bad rules. Again, if intentionally targeting kneecaps is terrible I want that to be illegal. I don't want to rely on some impossible to enforce catch-all rule that says it is illegal to intentionally hit kneecaps with the intent to injure.

5. It doesn't change my calculus that injuries can cost pros millions. The greater possibility of financial loss is counterbalanced by the greater possibility of financial gain for players and teams who help win games by causing injuries. Meanwhile, pros are guaranteed good medical care, which to me makes the injured *less* sympathetic.

6. The risks are semi-reciprocal. If the other team legally can and possibly is trying to injure you (and it often isn't clear if they are or aren't), a decent regard for self defense and for the fairness of the contest demands that you be able to do the same.

7. To the extent it is sadistic, I am not sure that is wrong or unnatural. I think most contact sports athletes enjoy hitting and hurting people in the context of an athletic event, despite the possibility of seriously injuring them. If you change "despite" to "because of," I don't get why you are suddenly a moral monster. Both positions are sadistic, it's a matter of degrees.

Caveats:

A. I am pretty sympathetic to the view that American football and many combat sports are unethically violent full stop, for reasons hinted at in point 2. Again, if violence in sports causes too many injuries it makes more sense to forbid the violence than to permit the violence but forbid the intent to cause injuries.

B. I acknowledge that it might be *more* ethical to refrain from intentionally causing injuries.

C. I acknowledge that, ethics aside, intentionally causing injuries might make people squeamish and be something they don't want to do. I, for example, never had a problem trying to knock people out and possibly concuss them, but aiming for the planted knee of a helpless runner always struck me as something I wanted no part of. Other people will fall other places on this spectrum. My point isn't that intentionally injuring other players should sit right with you, it's that you shouldn't judge other people for doing it. Eating spiders doesn't sit right with me, but it's ethical.

Edit: More here.

Statistical Significance vs. OOmpf

My co-blogger and I have recently been blogging about an old fallacy in economics. It is incredibly common for researchers to confound the size or importance of their result with the precision with which they measure it. For many years now, Stephen Ziliak and Deidre McCloskey have been standard bearers on this issue and their work has culminated in a book, The Cult of Statistical Significance. They have gone so far as to advocate against doing hypothesis testing, why not just report the confidence interval? After all, all of the information is contained in the confidence interval anyway. Rejecting a null hypothesis is just saying that some number isn't in the confidence interval. In Social Science and Medicine, it is especially bad, because regardless of theory or outside evidence, we almost always choose zero to be our null hypothesis. Even if eight studies have come before showing some statistically significant effect, we set the null naively at zero. If the ninth study shows no significant results, it is often interpreted as countervailing evidence to the studies that came before, when in fact, the estimates aren't statistically significantly different that the earlier studies. They are just measured with more error, so that zero ends up in the confidence interval.

Imagine you read a study on the effects of totally revamping the tax code, perhaps a massive simplification to the tax code and no taxes on capital accumulation. The study finds that such a move on average will raise the growth rate 1% a year. A one percent change in the growth rate is huge! Over a hundred years will be 1 and a half times richer. That's the difference between the U.S. and a country like Mexico or Croatia. Yet, you read further and see that the effect is measured with considerable error. The effect is somewhere between -1% and 3%. The effect is "not statistically significant." The headline of the study is often, "Large Changes to the Tax Code Show No Effect On Growth."

Then another study crosses your desk and finds for other dramatic changes to the tax code growth will rise .01% each year. But the effect is very precisely measured, the 95% confidence interval is between .005% and .015%. This study is titled, "Large Changes to the Tax Code Significantly Increase Economic Growth." Yet, "significance" has taken on this strange meaning. If there is any cost associated with these massive changes, it will quickly eat into any policy relevance. The changes in study one is where the action is. The potential is huge. The study should make us much more interested in those changes, rather than much less.

Ziliak and McCloskey go through many examples in published articles pointing out this phenomenon. Often example one is a drug that is "shown not to work," and example two is a costly procedure or a drug who's statistically significant benefits are quickly eroded by the economically significant side effects. "We found that this drug lowered the weight of subjects by five pounds, a number statistically significantly different from zero. We also found that the rate of heart attacks tripled, though this was not statistically significant."

We have a term "economic significance" that tries to capture this idea. It is quite an awkward term in this context. I just used it to describe a costly side effect. The authors introduce the term OOmpf, which is itself a bit clumsy. The mere fact that we don't have a term of art across all Sciences to describe the importance of a point estimate a part from the statistical significance is quite telling.

I must say I have renewed appreciation for Ziliak and McCloskey's mission in recent days. Thomas Schelling sums up the feeling of exasperation well on the books cover, “McCloskey and Ziliak have been pushing this very elementary, very correct, very important argument through several articles over several years and for reasons I cannot fathom it is still resisted. If it takes a book to get it across, I hope this book will do it. It ought to.”

Monday, October 28, 2013

Tag Team (Back Again)

By Robert H.

Having followed and enjoyed Charlie's recent discussions about the Oregon Medicaid study, I think Anon Ymous is suffering from a basic misunderstanding. To clarify: "more significantly significant" does not mean "better policy outcome."

Imagine getting on medicaid automatically turned your hair green. The Oregon Medicaid study would have found that 1. There was extremely robust evidence that getting on medicaid turned people's hair green 2. There was statistically significant evidence that getting on medicaid improved mental health for some folks 3. There was inconclusive evidence about how medicaid effected physical health. For example, Charlie interprets the study to provide evidence that medicaid could lower the number of people with high blood pressure by 7% or raise it 5%.

So let's say the actual numbers in the population were (and these are very made up): 1. Turns hair green 100% of the time. 2. Reduces number of depressed people, who make up a third of the population, by 20 percent 3. Reduces number of people with high blood pressure, who make up a fifth of the population, by 7 percent. Which of these, from a policy perspective, is the best and most desirable result of medicaid?

It's impossible to tell from that information, because, "statistically significant" or "big effect on the population" does not equal "good." For example, in any sample of that population the most statistically significant result is going to be that medicaid turns your hair green. But it isn't actually very useful or pleasurable to have green hair. Green hair is the most statistically robust but the least useful correlation. Just so, studies in that world would probably find that medicaid has more robust and observable effects when it came to reducing depression than high blood pressure. But that doesn't mean reducing depression is the best result from a policy perspective: it could be the case that a relatively small reduction in high blood pressure has net better effects than a larger reduction in depression. Or not. Figuring out the answer is an empirical and philosophical question (how do you weigh extending lives versus improving lives?). You've got to dig down in the trenches and do tough, expert-level cost benefit analysis to make a conclusion.

That's why "medicaid has no health effects" would be such a good result for people who want to fund it less. If you are spending money to get *nothing* then that is clearly a waste. But if you are spending money to get 7 or 3 or 4 or 5 percent reductions in diseases with high morbidity, suddenly we've got to have a tough cost/benfit argument.

The only other thing I'd add is that even if the physical health effects of medicaid are less important in and of themselves, that doesn't mean they are separable from the mental health effects. Once you acknowledge there are real health benefits to medicaid coverage, it becomes possible that those real benefits are causing the improved mental health. Maybe I will put myself through a whole lot for a pill that can reduce my chance of dying young by 1 percent, and maybe just handing me that pill will improve my mental health a whole lot versus a situation where I have to blow my life's savings to get it. If a lot of people thought like that, the mental health effects of getting the pill would apply to more people than the health effects, would be more robust in any survey, but would still be totally inseparable from the health effects. Just giving me a placebo wouldn't work: I don't want a pill to make my worries go away, I want that pill that has that result.

A Power Calculation Example

I gave an example in this post of an experiment with low power. I thought it would be interesting to do a calculation to actually show how large of a sample that one would expect to need to show a significant effect.

Suppose you have a magic pill that stops you from dying, you give it a group of undergraduates and give a placebo to a control group. You measure the effect on mortality after one year. How large of a sample do you need to find a statistically significant result at the 95% confidence level?

So let's get specific. Sometimes you'll run the experiment and find no effect. The best you can hope to do is have a large enough sample to find the effect a good proportion of the time. It's typical to hope a test is high enough powered to find an effect 80% of the time. We know the drug works (we're assuming that), what we hope to do is design a study large enough that the results will show that the drug works better than a placebo 80% of the time.

So how large should the study be. Well, first you need to know the baseline mortality for undergraduates. I'm getting the number from this study, which shows 22.404 per 100,000 undergrads die over one year. Thankfully, the number is small (.000224), 99.978% of undergraduates won't die over the course of this year.

Using this power calculator, we can see that we'd need 87,000 undergraduates in our study. And this is just to show any effect different than zero.

This calculation is with a drug that ends mortality. That's pretty unrealistic. Suppose the drug just decreases the mortality rate 10%. That'd be a wonderful thing. Now we expect a proportion of .000224 deaths in the control group and .0002016 in the treatment group. Now we need 13.5 million undergraduates for the study just to show a statistically significant effect 80% of the time. That's pretty remarkable.

In the real world, we do mortality studies over many years and in populations with higher mortality than your typical undergraduate. Those tests will have more power. But notice, the effect isn't small in importance. Ending all death among undergraduates or lowering their mortality rate 10% is, I think, really significant. I wonder how many drugs can claim that. The effect is a small number, though, and that makes it difficult to detect.

Saturday, October 26, 2013

It Is Not Significant When Your Low Power Test Is Not Significant

In Anon Ymous's latest response, it seems he feels he's being misunderstood. Yet, he keeps making statements that demonstrate exactly the view I'm arguing against.

"The Oregon study did not find evidence of statistically significant improvements in physical health. That is a pretty significant result."

No, this is exactly what I've criticized. If you run a test that has low power, it is NOT significant if you find no effect. Suppose you have a magic pill that stops you from dying, you give it to 10 undergrads and use 10 as a control group and measure the effect after one year. You find no statistically significant effect. Is that finding significant? NO!!! Even if the pill prevents 100% of all deaths, you'd have to give it to thousands of undergrads to get statistical significance, because undergrads just don't die very often.

"Obviously, it is not as bad as positive evidence that Medicaid does not improve physical health. Still, if we had the critics and supporters of Medicaid in its current form predict the results on physical health in advance, I think that it is fairly clear that the critics would have been closer."

That the results are far from what supporters of Medicaid would predict is far from clear. In fact, it is exactly what we are arguing over. The whole point of my post was to show that the study was consistent with "bold claims about Medicaid" as well as with the view that Medicaid doesn't improve health. The whole argument is over whether the results were consistent with "bold claims about Medicaid." Let's try to define "bold claims." I don't know who particularly Anon Ymous is arguing with, but I can put bounds on what kinds of "bold claims" this is consistent with.

First, what are the bold claims such that you would still be closer to the study's estimate than someone whose prior is an effect on health of zero. Let's take the percent of people with high blood pressure and the percent of people with high glycated hemoglobin which is a key diagnostic for diabetes. I use these because there significance is pretty interpretable even for non-medical folks like myself.

For high blood pressure, there is a fall from 16.3% of people to 14.97% of people. That's an 8% reduction in incidence of high blood pressure. It's more consistent with a "bold claim" of up to a 16% reduction of high blood pressure than it is a zero percent reduction.

For diabetes, there is a fall from 5.1% to 4.17%. That's an 18% reduction in the incidence of diabetes. That's more consistent with a "bold claim" of up 36% reduction in the incidence of diabetes than no reduction in the incidence of diabetes.

I don't see any indication that the study was less consistent with bold claims than critical claims, and I can't imagine why Anon Ymous would say otherwise.

Yet, it goes much further. If you are a liberal with "bold claims" about Medicaid, we can also ask how bold of a claim you can have before you can no longer say, "The Oregon Health Study does not provide evidence against my bold claim" (to use the phrasing Anon Ymous prefers--I'd put statistically significant in front of evidence, but apparently Anon Ymous is very into classical statistics).

"This study provides no evidence against my belief that Medicaid lowers the incidence of high blood pressure by 40%."

"This study provides no evidence against my belief that Medicaid lowers the incidence of diabetes by 87%."

Those are ridiculously bold claims, but they are also perfectly consistent with the results of this study.

So the real question is, why are the opponents of Medicaid so focused on a this portion of the study where we basically learn nothing about the effects of Medicaid on physical health?

Here is Anon Ymous:

"The defenders of Medicaid seem to think that the burden of proof is shared on this question. It is not. It is squarely on the shoulders of those making these bold claims about Medicaid, as they are the ones defending the status quo of a program that commands a lot of resources at least partially based on very bold claims about what good the program supposedly does. Claims that, thus far, are quite scant on evidence."

If you think that Medicaid improves access to health care and access to health care is likely to improve health, there is still no evidence that your wrong. But even though that is a pretty common-sense prior, Anon Ymous doesn't think you should be able to have it. As best I can tell, he's really upset that Raj Chetty didn't report that his prior was not rejected by the evidence, even though virtually every other prior was also not rejected by the evidence.

Raj Chetty's piece is on the ability of economics to settle disputes. Raj reported on all the ways the Oregon Health Study settled disputes. He omitted the portion of the study that didn't settle any disputes. Perhaps, he could have included a sentence such as, "The effects of Medicaid on physical health were anything from large positive effects to large negative effects." I think omission is better than including the statement, "the study found no statistically significant effect on health," because as is clear from this discussion, people have a very hard time understanding the meaning of that phrase.

A "Bayesian" Analysis of the Oregon Health Study

by Charlie Clarke

(A follow up to this post, more here)

Ok, a real Bayesian analysis very carefully formulates priors (best guess hypotheses) and then updates those priors based on the results for a study. I'm not going to do that, as it requires lots of field specific knowledge and is a fair bit of work after that. But one of my points of contention with Scott Sumner and Anon Ymous is that as a whole the OHS was bad for people that didn't think Medicaid improved health. It's not particularly bad the study had low power, but it's pretty clear, which way the sign points. It's the people that went into the study thinking that Medicaid had no effects on measures of physical health that should be revising up their priors a little.

The reason, as I alluded to last post, is that with no other information, the point estimates of the study are the best guess for the effect of Medicaid on some measure. In general, the point estimates point in the direction of Medicaid improving outcomes. These improvements are quite small relative to the errors with which they are measured, but that does not mean they are small in absolute terms. To make the criticism Scott and Anon Ymous want to make, they really need to argue the whole confidence interval around an estimate is economically insignificant. If they can show that, then they can argue this study conclusively supports their view.

So how do the point estimates of the actually study look? I'll try to classify them below, by there answer to this question, "Does Medicaid improve measures of physical health?" If the point estimate has a sign in the direction of better health, I'll classify it yes.

Physical Health Measures:

Yes

Systolic blood pressure is lower
Diastolic blood pressure is lower
Percent of people with elevated blood pressure is lower

Lower percent of percent of people with high cholesterol
HDL is higher
Percent of people with low HDL is lower

Percent of people with high Glycated Hemoglobin (diabetes marker) is lower

Framingham measure for risk of heart attack is lower

No

Total Cholesterol is up

Glycated Hemoglobin is up

Frammingham measure for high risk individuals is higher

Those are the results. To me, it looks like Scott and Anon Ymous should be epsilon less confident in there current positions, not asking us to join them. Two of the No answers aren't necessarily bad. Since the percent of people with low HDL is lowering, it might be good that total cholesterol is rising. When its high, glycated hemoglobin is a marker for diabetes and Medicaid lowers that, is a change within the safe range a predictor of anything?

Realize, none of the results are statistically significant. All of the effects are measured with large error. But in a perfect world, we'd all carry around really rational and well-thought out prior beliefs and update those beliefs whenever we are confronted with new evidence.

As far as the politics of Medicaid, this certainly doesn't mean that supporters are immune from criticism. In a more perfect world, Scott and Anon Ymous would be arguing the effects estimated by this study are too small to be worth the money or much smaller that liberal so and so would have thought going in. If even the high end of the estimates are not worth the cost, then that is a big feather in the cap of Medicaid opponents.

Alas, the commentary we get is on the order of: Study proves Medicaid doesn't make people healthier. And then we get attacks on Raj Chetty for not parroting that false interpretation.

What is Statistical Power?

by Charlie Clarke

More here and here.

Statistical power is the ability to reject false hypothesis. Intuitively, it means you are measuring the effects of an experiment with small errors.

In a follow up to the post that started a long argument between Scott Sumner and I, Anon Ymous demonstrates exactly what you shouldn't conclude from the Oregon Health Study, "To put it simply, the Oregon study showed that Medicaid does a good job of protecting the poor from crushing medical expenses, but it doesn't make them healthier or save lives."

The correct statement is, "the Oregon study showed that Medicaid does a good job of protecting the poor from crushing medical expenses, but we don't know if it makes them healthier or saves lives."

If you were arguing with someone about whether Medicaid makes people less likely to face financial hardship or improves self-reported well-being, then that argument has been settled. It does. You can start arguing about how much, but the effect is not zero.

If you were arguing about whether people on Medicaid have better physical outcomes, then keep arguing, because that has not been settled. The results in the Oregon Health Study are consistent with a wide range of positions, thus is unlikely to persuade you or your opponent.

Let's take an example of the studies finding on blood pressure.

1) Medicaid decreased blood Systolic Blood Pressure on average by -.52 mmHg

2) Medicaid decreased the percent of people with high blood pressure by 1.33%

Both of the effects are measured with very large errors. The effect of Medicaid on blood pressure is likely somewhere between -.2.97 mmHg and 1.93 mmHg. Medicaid either lowers the percent of people with high blood pressure by 7% or raises it 5%.

That is, there is a good chance Medicaid lowers blood pressure a lot or raises it a lot. We don't know. If you were arguing with someone, as long as there position wasn't Medicaid raises percent of people with high blood pressure by more than 5% or lowers it more that 7%, you won't be able to settle the argument.

The temptation is to do what Anon Ymous did and conclude that since the effects are statistically different from zero that the effect is zero. But you can see from the confidence that the effect isn't significantly different from -5% either. The error is so large that it probably didn't reject your prior.

What's the best guess? The best guess with no other information is always the point estimates. So with no priors, your best guess is that Medicaid lowers the percent of people with high blood pressure by 1.33%. That's a far cry from what Scott Sumner and Anon Ymous have been saying.

Thursday, October 24, 2013

A Comment Exchange With Scott Sumner

by Charlie Clarke

Scott Sumner and I had an exchange in the comments of his blog in this post. I hope I wasn't being a jerk about it, because I like Scott so much, but I vehemently disagree with a point he made at the end of the post:

Scott:

Here’s one (anonymous) criticism of this interpretation:

With regards to Medicaid, Chetty also paints a surprisingly incomplete picture of the Oregon Medicaid experiment. As you will recall, Chetty is correct in pointing out that expanding Medicaid seems to have increased usage of health care, decreased financial strain, improved mental health, and improved self reported well being, but he, quite surprisingly given the caliber economist Chetty is, leaves out the less flattering (for supporters of the ACA) part of the study that found no statistically significant increase in objective measures of physical health for patients who received Medicaid.

At best, the Medicaid study was a mixed result for supporters of expanding the Medicaid program (which the ACA does quite dramatically). At worst, the study is a sad demonstration of how bad Medicaid (and perhaps insurance in general) is at improving objective physical health. Why Chetty presented this study as an unambiguous victory for the pro Medicaid crowd is a mystery to me (although I suspect support of ACA has something to do with it)?

In my view the Oregon Medicaid study provides support for replacing Medicaid with a new program called “Mediplacebo.” I think the improvements in mental health identified in the Oregon study were real, and were important. But surely they can be produced at much lower cost. I know that every time I’ve had “cancer,” I’ve felt much better after going to the doctor and being told that I don’t have cancer. Under my plan, consumers would receive the same care provided to the uninsured for things like traffic accidents. For those health problems where the uninsured would not normally receive coverage, health consumers would receive a placebo.

I am really disagreeing with anonymous, but it is clear Scott agrees, and I'd rather argue with Scott.

Me:

The Oregon study showed that Medicaid improved most health care outcomes, but that many of the improvements were not statistcally significant. The tests, we know now, were underpowered. There just weren’t enough people with certain health conditions in the study to show an effect even though the point estimates were quite reasonable. It’s just too much nuance for Chetty to even try to convey and for no gain. “Anonymous’s” description is much more misleading. The worst kind of “cult of stastical significance.”

Scott:

Charlie, I agree that there is way too much emphasis on statistical significance, but your comment is way off base. If there is no statistical significance you have NOTHING. It’s quite right that it doesn’t prove there was no effect (the study was too small), it doesn’t prove anything. But the bottom line is that the Oregon study found no statistically significant physical improvement in health. I think if you asked the average voter whether that finding was important, they’d give you a very different answer from Chetty. Feelings of “subjective well being” are all well and good, but they most certainly are not the reason voters support Medicaid.

Me:

“If there is no statistical significance you have NOTHING.”

I agree you have learned nothing with any confidence, but we absolutely did not learn anything about medicaid NOT improving health outcomes. The question is why you want Raj Chetty to report on everything a study doesn’t learn about. The Oregon Health study didn’t learn anything exchange rates, should he also report that?

“But the bottom line is that the Oregon study found no statistically significant physical improvement in health. I think if you asked the average voter whether that finding was important, they’d give you a very different answer from Chetty.”

Only if they were confused!! It’s almost as if you think the study found that the confidence interval around the health outcomes shows the effects were bound to be small. That is emphatically not what the study showed. In many cases, the point estimates were large and economically significant, but the standard errors were huge. That’s not the same as learning the effects were in some small range around zero. Bottom line, (get Bayesian) an informed reader would not have updated his views about the effectiveness of medicaid on health outcomes, there just wasn’t enough power.

Please consider this: I’ve seen you analyze the stock market reactions to Fed announcements time and time again. Never once have I seen you note that the estimate was not statistically significant, even though you had one data point, your standard errors were infinite! I agree with that Scott Sumner, not this one.

Scott:

Charlie, Let me put it this way. If the Oregon study had found a statistically signficant impact on physical health, I GAURANTEE liberals would not have said “please ignore this result, the sample size is too small.” Well you

can’t have it both ways. You can’t cherry puck the results you like and ignore those you don’t like. They tested for those things, so they should report the findings, even if inconclusive.

Here’s another way of putting it. Suppose Chetty’s column had been in the NYT. I’ll bet if you asked NYT readers a day later what he had said, 99% would “remember” that he found Medicaid improved physical health, if they remembered anything at all.

My reports on stock market reactions are far more significant that you assume. If you looked at daily market reactions the results would not be particularly significant. But if you look at the market reaction immediately after the data hits the market, the result becomes highly significant, as the average variability of asset prices approaches zero as the time frame approaches zero.

I’m not saying everything I report is statistically significant, but when it’s not I almost always caution that the results are merely suggestive. I also look for evidence that the timing of the market reaction was linked to the timing of the policy announcement. Feel free to show me blog posts where I reported more significance than was warranted. I try to be careful, but probably err on occasion.

Me:

Scott you said:

“Let me put it this way. If the Oregon study had found a statistically signficant impact on physical health, I GAURANTEE liberals would not have said “please ignore this result, the sample size is too small. Well you can’t have it both ways.”

But that is exactly wrong. You can absolutely have it both ways. That’s what low power means. If a test has low power, but still has statistical significance, it means even though the effect is measured with lots of error that the estimate is so large, we can still be confident it’s different that zero. The sample size is going to determine how wide the error bars are, but if with wide error bars the effect is still significantly different from zero, it’s completely reasonable to take that as strong evidence the effect is not zero.

Yet, the other way doesn’t work. If the effect is economically meaningful, but the confidence interval is very large, then you can’t conclude much. Possibly the effect is very economically meaningful, possibly its zero or has the opposite sign. We just don’t know.

“But if you look at the market reaction immediately after the data hits the market, the result becomes highly significant, as the average variability of asset prices approaches zero as the time frame approaches zero.”

First, I don’t think I’ve ever seen you try to put a confidence interval on your posts. Have you ever?

Next, I’m willing to accept the implicit assumption you’ve made that the event being studied doesn’t change the variance (which allows you to estimate standard errors to begin with–otherwise you have infinite errors and only 1 observation).

Last, it’s true that variance of returns go to zero as the observation period gets smaller, but stock returns at high frequencies are highly non-gaussian. They have high skew and fat tails. Traditional t-tests are misspecified. These problems are surmountable, but I’ve never seen you try to surmount them.

Again, I’m not saying you are wrong. I’m saying you have a lot of valuable things to say even though you don’t compute confidence intervals.

“I’m not saying everything I report is statistically significant, but when it’s not I almost always caution that the results are merely suggestive.”

So Raj is actually being much more cautious than you. In a Scott Sumner world, he could have said, “additionally, the evidence suggests that Medicaid helped lower the percent of patients with high blood pressure and high cholesterol, as well as improve markers linked to the health of patients with diabetes, though this improvements did not reach the level of statistical significance.”

I think not saying anything is much more cautious and less misleading, but maybe this is somewhere we just have disagreement. To me, if you have a wide confidence interval that includes big effects, small effects, zero effects and negative effects, just don’t report it to lay people as it will probably confuse them. But you do indeed report such suggestive evidence all the time to your readers, and I appreciate the discussion.

I wonder how many people who read your post or Anonymous’s post would, when asked the next day, answer that the effects on physical health were “small” or “near zero,” rather than “possibly economically important but measure with a great deal of error.”

Scott:

Charlie, You said;

“But that is exactly wrong. You can absolutely have it both ways.”

If you’d think about what you said here, you realize it can’t be right. Otherwise you could go into study X with Z prior belief. Study X could shift Z in one direction but not the other. That’s clearly impossible. No study (going in) has the potential to make it more probable that Medicaid is effective, but does not have even the possibility of making it less probable. That violates the laws of statistics.

Me:

“If you’d think about what you said here, you realize it can’t be right. Otherwise you could go into study X with Z prior belief. Study X could shift Z in one direction but not the other. That’s clearly impossible.”

You are all over the place now. First you want to read a hypothesis test, “If there is no statistical significance you have NOTHING.” Now you want to get Bayesian, which is exactly where I started. An informed Bayesian interpretation of how Medicaid effects physical health outcomes wouldn’t have moved the priors very much, because the standard errors of this study were large. But the little bit they did move, would NOT have moved the estimated effect of Medicaid on physical health towards zero, because of the RESULTS of the study. The point estimates showed positive effects on physical health around the size expected by informed priors.

If you came in to the study with uninformed priors, the point estimate (your best guess) would still be that medicaid improves health outcomes for patients with diabetes, high blood pressure, and high cholesterol. It’s just that you would have a lot of uncertainty about those guesses (the distribution around you point estimate is wide–and includes zero).

You are imagining some other study with different results that should move us towards the belief that medicaid has zero effect on health outcomes, but that study doesn’t exist.

Consider the example you gave, statistical significance doesn’t compare how close Z is to X, it compares how close Z is to zero. In this study, Z was approximately equal to X, but Z and X, together, were close enough to zero that given the low power of the study, we couldn’t reject that Z is equal to 0.

Me again:

One more point said a slightly different way using hypothesis testing. The hypothesis test (to use your example) is “Is Z different from zero?” The answer is there is not statistically significant evidence that it is. We could also ask, “Is Z different than X?” The answer is “there is not statistically significant evidence that it is.” Yet, you want people to revise X toward zero, why?

I imagine Scott is tired of the exchange, but I'll update if he responds.

Thursday, October 17, 2013

May God Bless and Keep Bad Arguments.... Far Away From Us!

By Robert H.

Law stud Eugene Volokh has a completely incoherent critique of how a study investigating rates of sexual violence defined "rape." This is what I can figure out from his blog post: he correctly notes that the study uses a broader definition of rape than what is supported in either American law or traditional usage (they included sex obtained via emotional coercion). He also thinks the study shouldn't use that definition. What is incoherent, to me, is how he got from the first sentence to the second one. Surely he isn't dumb enough to argue that legal or traditional concepts of rape exclusively define how to use the word?

Rape as both a concept and as an English word predates rape as an American crime by centuries (14th century English adoption, from the Latin rapio rapere, meaning to seize), so it would be weird if American legal statutes now get to dominate the word's usage. What's more, our definition of rape has expanded before and will again, both culturally and legally. For example, violent spousal rapes used to not be considered rape. Most people are glad that they now are.

So that puts a big burden on most of us: we cannot oppose any expansion of the definition of rape with arguments that would also have applied, decades ago, against those who expanded "rape" to cover husbands assaulting wives. These bad arguments include:

1. But that's not how rape is defined legally!
2. But I do that! Are you calling me a rapist?
3. But lots of people do that, are you trying to make them all rapists?
4. Most people don't use the word that way, so while I agree with you that those actions are wrong, it is imprecise to call them rape.
5. That isn't how the word is traditionally used.
6. That isn't how the word is traditionally understood.

Instead, try "The actions you want to call rape should not be a crime for reason x (bad consequences, unjust, whatever), or should be a different crime from rape because rape's dominant character is y, and the actions you describe lack y. Rape's dominant character should be Y because of z."

So maybe Volohk could have said, "Society abhors physical violence and the threat of physical violence, that is why society abhors rape, and therefor 'rape' should only cover situations involving sex coerced by violence or the threat of violence. Society is cool with pressuring each other emotionally, though, so sex coerced that way isn't rape." Then the other side can respond.

No one can respond to "no because TRADITION! Tradition!"

Edit: I am having trouble getting the video to embed, so yo: http://youtu.be/gRdfX7ut8gw

Tuesday, October 15, 2013

Murphy Attacks the EMH With Evidence of the EMH's Success

By Charlie Clarke

Robert Murphy linked to some articles attacking the Efficient Markets Hypothesis. Ironically, with the little more hindsight that we have today, we can see that both articles really point to an EMH success.

The articles rely heavily on the claim that Mark Thornton or Austrian Business Cycle Theory beat the market by predicting the housing crash:

Fama says he doesn't see how any of the investors could have predicted the sudden collapse in housing prices. But what if they were familiar with Austrian business-cycle theory, and had read Mark Thornton's 2004 prediction that the boom in housing was too good to be true?

Fama would presumably say that Thornton got lucky, and that his general macro forecasting (using Austrian theory) would "beat the market" half the time and be beaten by the market the other half.

That seems like a reasonable statement. If Mark Thornton has lots of insight into when the market will be wrong, then he can make lots of money of this insight. Yet, since the housing price crash, Mark Thornton has been predicting high inflation. That's not surprise as that comes right out of Austrian Business Cycle Theory and has also taken down wrong famed bubble predictor Peter Schiff after his many hyperinflation forecasts. Robert Murphy himself made an inflation bet he has since lost.

In hindsight, there is a little more evidence that the Austrians were just lucky. I should note that these are just two predictions over 10 years, so we are very far from having any data to actually test a theory. If this is the rate at which Austrian business cycle theory makes predictions, it may take many hundreds of years to actually perform a real test of the predictions. This highlights some of the challenges of defending EMH. At any given time, there are numerous predictions that some assets price will rise or fall. The market proves people right (and wrong) all the time. And since at any given time there are people both long and short in an asset, any time an asset has a large change in price there are some people making boatloads of money.

Monday, October 14, 2013

Lots of Federal Debt is Held by the Federal Government

By Charlie Clarke

I am not sure it's generally appreciated that the debt ceiling is on gross debt and includes debt that the Federal Government itself holds. Thus, the treasury issues a bond (an IOU) and then also buys the bond to hold in a government account. This counts as outstanding debt, but it isn't the more meaningful "debt held by the public." The debt ceiling constrains the somewhat meaningless gross debt. Of the $16.8 trillion in total debt, $12 trillion is owed to the public and $4.8 trillion is held by the U.S. government. Below I have a picture of where those government Treasuries end up.

Tuesday, October 8, 2013

Libertarian Foot Fetish

By Robert H.

Ilya Somin has a post up arguing for the relative advantages of foot voting vs. ballot box voting. The argument is drawn from his latest book, so see that for more. I have two bones to pick with it.

1. Somin is missing, I think, the major critique of foot voting. He says "I cover several standard objections to foot voting, the problem of moving costs, the danger of 'races to the bottom,' and the likelihood that political decentralization might harm unpopular racial and ethnic minorities." But he misses the biggest problem with foot voting: it's hard to know what the hell people are voting for.

If a polity elects a politician, it's pretty clear that that happened because of what kind of politician she is. Special interests or ignorant voters or wise crowds or whoever sat down and said, "this gal's going to a better job than the other guy," or at the very least "this gal belongs to the political team I belong to, and voting for her flatters my team pride." Whatever the reasoning, the outcome is mainly a reflection of the selectorate's political calculations, political prejudices, political signaling, political whatever.

But if I move to California, who the hell knows why I did it? Maybe there's a gold rush. Maybe I love the ocean. Maybe I figure it's my last, best chance to be cast in a star trek film (these are presented from least to most likely). But one thing is for dang sure, we can't know it's related to political anything.

You can see this in demographics: our most populous states are New York (generally liberal government), Texas (generally conservative government), and California (generally incompetent government). If good political policies determined foot voting, why are people voting for totally different political policies? Or maybe your affinity for local politics, not political outcomes, determines foot voting. In that case, why do huge minorities of republicans live in NYC or democrats live in Texas? Or maybe Texas is attracting tons of migrants because of its good policies, and New York and California only retain large populations because some sort of population stickiness makes people stay in place. Ok, but then won't population stickiness undermine the whole "we will get good policy when people vote with their feet" thing? People won't vote with their feet, they'll just hang around where they live now. Or maybe this only happens because the federal government is so much more important than state government that it fudges the outcome. Ok, then, why haven't most Greeks moved to Germany?

So all this adds up to a big problem. If people move for lots of non-political reasons, relying on people moving as a way to determine political outcomes has problems. The best policies in the world could fail to attract migrants for exogenous reasons, and the worst (ok, maybe not the worst) could take advantage of, say, a resource boom to attract lots of folks.

***

2. Here and elsewhere, I think Somin has been too down on voting voting. Yes, voters are ignorant. Yes, they are rationally ignorant, since their vote will normally never matter. Yes, the heuristics voters could be using ("reward incumbents when things are good" being a classical one) don't really get you to great government.

No, that does not mean democracy is useless. High praise time: despite those and other political economy constraints, democracy is pretty good at keeping politicians from passing policies that are disastrous for most people. For example, I'm pretty confident that a party running on an "enslave most of the voters" platform would not win. Under a rationally ignorant voter model, if the Enslave Everyone party ever comes close to winning then the relative gains from paying attention and voting would go up and lots of voters would stop being ignorant. Under a median voter model, the median voter would not want to be enslaved. I could keep going through different models, but come on. People will rarely vote themselves into slavery.

That may seem like a really low bar, but it is not! Human government has just been incredibly freaking awful over the centuries, ranging from the genocidal to the overly bellicose to the (and this is most common) widely extractive. Governments *have* enslaved most of their people. At the very least, almost all governments have made women second class citizens. Any system of government that can avoid that stuff and routinely churn out merely bad policy is doing great. A few special interests shaving off some GDP and GDP growth? Possible national debt crisis in 20 years? Weird predilection for subsidizing the old rather than the poor? Yes please! So much better than being a serf!

Point being, the least bad governments have gotten there through representative democracy, and there are theoretical reasons for functioning democracies to give us not-terrible policies, despite voter ignorance. Foot voting, on the other hand, has not been used as often to reach good political outcomes, and doubts about the ability of good policy to attract migrants seriously undermine the theoretical case for relying on foot voting.

***

As I final point, I just want to be clear, if it is not obvious, that relying on voting and relying on foot voting are in tension. For example, let's say you make American states autonomous power centers that can set their own policy because you want to create a world where people can move between them and foot vote. Well, by necessity you just stripped folks of their ability to vote vote for a range of federal policies, namely those that would cut into state power.

Saturday, October 5, 2013

The Market Doesn't Know This Isn't Twitter Stock

The Twitter IPO is now public and the suggested ticker symbol is TWTR. This announcement has created a rally for an unrelated, near bankrupt, retailer named Tweeter Electronics whose stock is traded over the counter under the ticker TWTRQ. The stock rose from two cents to over ten cents, before trading was halted at five cents. Sometimes the market is dumb...

Friday, October 4, 2013

Your Plan to Resolve a Debt Ceiling Crisis is Arguably Wrong

Robert H.

I'm seeing people way too confident in their view of what the "correct" constitutional resolution to a breach of the debt ceiling would be, so I want to emphasize how hard the problem is.

In normal times:

1. The president arbitrarily deciding not to fund a program, or to fund it less than congress wants, is a massive usurpation of congress's powers to legislate and to spend money to promote the general welfare. What's more, in normal times the president not giving a welfare recipient money because he wants to use it for a different purpose would be a due process violation. Same with not paying out a contract. Not paying out money to a creditor or pensioner would be a fourteenth amendment section 4 violation. All totes unconstitutional.

In short, the Prez simply deciding he is going to arbitrarily choose how to spend America's funds is crazy unconstitutional and a crazy infringement on congress's power.

2. In normal times, the president deciding, without constitutional approval, to raise some funds would be crazy unconstitutional. Congress has the power to tax or borrow, the President doesn't. This is, like, separation of powers 101 stuff, you guys.

***

Ok, so all those things are clearly true. But people don't seem to recognize that they are both true? Matt Yglesias emphasizes point 1 and then says "so clearly the president should just breach the debt ceiling." Laurence Tribe emphasizes point 2 and then says "so clearly the president prioritizing payments is the lesser of two evils."

But no, guys, neither of these are clearly a lesser evil. They are both terrible. Which one of these terrible and unconstitutional things might be, due to circumstances, constitutional in this case is a hard question.

That said, these are what I think the two correct answers might be, from most likely to least likely, but I am still very unsure about either:

1. This is clearly a failure of the constitution and there is no "correct" way out. So long as he limits his unconstitutional looking actions to narrowly solving the crisis, either by prioritizing payments or by borrowing more, how the President resolves two contradictory and equally grave constitutional obligations is a political question and the courts shouldn't review it.

2. Both arbitrarily choosing what to fund and borrowing money to cover what congress wants to spend seize congressional power in a (normally) hideously unconstitutional way, but arbitrarily prioritizing payments has the added problem of violating the due process rights of the people owed those payments, since the amount of due process you are owed when you are deprived a legal benefit is normally greater than "the president doesn't prioritize your program and really wants to spend the money elsewhere." So, given the extra constitutional problems prioritizing payments brings, go with borrowing more.

***

A simple "he has to do it because of the fourteenth amendment says you have to honor debt" argument, by the by, is likely wrong. The fourteenth amendment makes a distinction between debts and obligations, and only says that the validity of the former cannot be questioned. In the fourteenth amendment "debt" is probably stuff like money the government owes to creditors, money owed to pensioners, and payment for services rendered. I'm pretty sure you could pay all of that out of the revenue we collect.