Tuesday, October 28, 2008

The PUMA Question

In the comments on my post A Funny Thing Happened In The Voting Booth commenter Dr. Nobel Dynamite made the following point.

"Not to mention the PUMA factor." [that was me - ed.]

You need to turn off the propaganda for a while, my friend. Just because Neil Cavuto trots out a cigarette hag that claims to speak for disgruntled Hillary supporters doesn't make PUMAs anything more than wishful thinking from Fox.
No Bell,

In science/engineering we look at unusual results and outliers for undiscovered phenomenon.

It is more than possible that I may have discovered some interesting effects.

BTW the reports on the PUMA effect started in the high 30s. Dipped to the low 20s and then started rising again to the mid 30s. (that would be % of Hillary voters going for McCain). Since the last reports of that rise there has been no news of the PUMA effect. Did it just disappear? Or was it a case of "did not fit the narrative"?

A defection rate as a % of the total vote of 3.75% (roughly 20% [defectioon rate] of 1/2 [Hillary voters] of 37%[Dems in the electorate]) can be overcome. It will be offset by a 13% or so R defection rate (about 5% of the vote). If the defection rate is 40% of Hillary voters that is a 7.5% loss. Killer.

The question is: is that defection rate being measured accurately? Since the announcements of the PUMA factor have stopped, I'd have to say no. If it had dropped it would have been announced. If it rose above 40% it would be buried.

How about what DJ Drummond has to say on the subject of polls.
Gallup has noted the strength of early voting this year. The most significant points from that article are these; early voting is stronger than expected this year, and so far republicans have been just as eager to vote early as democrats. The third point is the most important signal of all. Says Gallup; "Early voting ranges from 14% of voters 55 and older (in aggregated data from Friday through Wednesday) to 5% of those under age 35. Plus, another 22% of voters aged 55 and up say they plan to vote early, meaning that by Election Day, over a third of voters in this older age group may already have cast their ballots."

The last two statements are very good news for McCain and bad news for Obama. This is because it demonstrates that enthusiasm to actually vote by republicans is equal to enthusiasm to vote by democrats. This runs directly against claims made in polling up to now, demonstrating that participation in polls is not directly related to voting this year. Second, the higher participation by senior voters and weaker participation by younger voters is directly in line with historical norms, again running against the poll expectations that this year would see a wave of young people voting but seniors staying at home. Gallup's own data proves this is not happening as they predicted, and the polls are therefore invalid in those respects, in addition to obvious flaws in the party weighting. The reasonable expectation from these facts, would be for Gallup to back down and correct its weighting to match the observed behavior. As of yet, Gallup has not taken that step.
Then we have this wonderful explanation of polling by Charlie Colorado at Just One Minute.
What we're hoping for the polls to tell us is how people will vote in the future. In order to figure that out, we start by asking some number of people how they would vote today.

Obviously, we don't and can't know how people would really vote (Obama could be caught with a dead girl and a live boy and Fox News with a camera.) But it's everyone's best guess, and they have a chance to answer "O" or "McP" or "undecided".

Now, if we could ask every single person who will be voting this question, we'd get a fairly precise number --- not exact, but pretty close. Asking 130 million people their opinion is pretty intractable, so they ask a much smaller number. There are mathematical reasons to let us make an estimate of the amount of error we get by just asking that smaller number, and that's where this "margin of error" comes from. The wy it works is basically like this: say we have 130 million red and blue marbles, in proportions of 51 percent red and 49 percent blue. Since they're well mixed, we can be confident that most of the time, if we scoop out a bucket full of 1000 marbles and count the colors, there will be something close to 510 red and 490 blue. It's extremely unlikely --- although possible --- that we'd scoop out 1000 blue marbles. It's also very unlikely that every time we scoop up marbles, we'll get exactly 510/490. But let's say we try it 100 times. Roughly 95 times out of 100, we'll get a count between 495/505 and 525/475.

That's exactly what the "margin of error" is: we know, mathematically, that 95 times out of 100, our random scoop will deliver a number plus or minus 1.5 percent (or, total, 3 percent) of the "real" value we'd get if we counted all the marbles.

The problem is that when we talk about a real poll, our "marbles" aren't perfectly mixed. If we were to, say, call the first thousand people in the Cambridge Mass phone book, that wouldn't represent the country as a whole very well. So instead, polling companies call a lot of people, carefully selected, and try to work backwards to what a "perfectly mixed" sample would have been like.

Now, say we were talking about the marbles example again. We know, because they were our marbles to start with, that exactly 51 percent of them were red, 49 percent blue. So when we scoop out 1000 of them, we have an "ideal sample" in mind. A little algebra lets us then compute what the perfect sample would have looked like, and it is going to come up 51/49 every time.

But now let's say we don't know what the real number is; we just think we have roughly 51 percent red when we start. now we scoop out 1000 marbles and apply the same adjustment; we think it's 51/49, and we scoop them out, checking each scoop. If they're really 51/49, the numbers we get should cluster around 51/49. If not, then we can compute what the "real" proportion is.

But now, what if we start with the wrong assumption that they're really 55 percent red, 45 percent blue? When we compute our adjusted values, we're going to "slant" what we think the real value is toward the red ones. we may compute a guess that it's really 53/47.

And that's where the polls are right now. Each one starts with an assumption, or model, of the real electorate. That assumption causes the values to slant one direction or another; how good that initial guess is will determine how good the eventual result is when all the marbles are finally counted.

A lot of the polls have fairly radical assumptions, like that people identify themselves as 40 percent D, 25 percent R, 35 percent independent. Those polls also show Obama with a big leads. Other polls have closer assumptions, and get smaller ranges. That's why I said above that the way to read the polls is really "IF the mix is really like this THEN the election results would be roughly so".
There is an interesting addition to this question from Iowahawk who shows his math.
Works pretty well if you're interested in hypothetical colored balls in hypothetical giant urns, or growth of plants in a controlled experiment, or defects in a batch of factory products. It may even work well if you're interested in blind cola taste tests. But what if the thing you are studying doesn't quite fit the balls & urns template?

What if 40% of the balls have personally chosen to live in an urn that you legally can't stick your hand into?

What if 50% of the balls who live in the legal urn explicitly refuse to let you select them?

What if the balls inside the urn are constantly interacting and talking and arguing with each other, and can decide to change their color on a whim?

What if you have to rely on the balls to report their own color, and some unknown number are probably lying to you?

What if you've been hired to count balls by a company who has endorsed blue as their favorite color?

What if you have outsourced the urn-ball counting to part-time temp balls, most of whom happen to be blue?

What if the balls inside the urn are listening to you counting out there, and it affects whether they want to be counted, and/or which color they want to be?

If one or more of the above statements are true, then the formula for margin of error simplifies to

Margin of Error = Who the hell knows

Because, in this case, so-called scientific "sampling error" is meaningless, because it is utterly overwhelmed by non-sampling error. Under these circumstances "margin of error" is a numeric fiction masquerading as a pseudo-scientific fact, and if a poll reports it -- even if collected "scientifically" -- the pollster is guilty of aggravated bullshit in the first degree.

The moral of this midterm for all would-be pollsters: if you are really interested in how many of us red and blue balls there are in this great big urn, sit back and relax until Tuesday, and let us show our true colors.

Until then, fondle your own balls.
That can be fun. More fun is when you have the right kind of help. So today I want to ask for your help.

Don't give it to him. Make him steal it.

As to the fondling balls question. I'm looking for volunteers of the female persuasion. Urn fondling in return. Then maybe a cigarette afterwards.

Cross Posted at Classical Values

No comments: