The truth about polls in 2020
The polls predicted a Biden landslide, as well as Democratic takeover of the Senate majority. Neither happened.
Let’s look at why polls may be having problems with prediction right now.
Where the polls were way off
According to estimates by fivethirtyeight.com, an aggregation of polls indicated that Biden would get 51.8% of the final vote vs. 43.4% for Trump, a lead of 8.4%. As I write this, Biden has 50.4% of the vote and Trump, 47.8%, a spread of only 2.6%. (This will shift as more mail votes some in, but I’d be surprised if Biden’s lead ended up being more than 3.5%, which is way shy of 8.4%.)
In states, fivethirtyeight.com’s analysis of polling indicated that Biden had a 2.5% lead in Florida, a state that he lost by 3.5%. It predicted 6% to 8% Biden blowouts in Wisconsin, Michigan, and Nevada, states that will end up nearly tied.
In the Senate, the final fivethirtyeight.com estimate gave Democrats an average of a 52-seat majority in the Senate, while right now the most likely outcome is a 48- or 49-seat minority, pending what happens in runoffs in Georgia. In Maine, Democrat Sarah Gideon had a 59% chance to beat Republican Susan Collins; Gideon lost. In North Carolina, Democrat Cal Cunningham had a 68% chance to beat Republican Tom Tillis; he has probably lost as well. Fivethirtyeight gave Republicans a 25% chance of retaining a Senate majority, but a Republican Senate majority is now almost certain in the next Congress.
Before I go on, you might ask, who am I to analyze polls? While at Forrester Research, I originated a survey business called Technographics. Technographics was based on surveys of tens of thousands of people on their opinions and behaviors regarding technology trends. We analyzed the data, published it, and sold access to it. So I spent a lot of time immersed in survey data, and I know where it works and where it has problems.
Consider the polls in 2020
Think about what happens when a pollster sets out to poll who someone will vote for, either nationally or in a state.
First off, they need to reach people. These people might be people willing to answer the phone. They might be people who click on an ad on the Internet. Or they might be people willing to be on a panel of consumers/voters taking surveys, often for tiny financial incentives.
What you end up with is a sample. Is the sample representative of the voting population?
There is an obvious and impossible-to-remove source of bias: nonresponse bias. You can’t reach people who don’t take surveys. Are the people who answer polls different from those who don’t? The only way to answer that is to see how the election comes out. And this year, the election came out differently from a lot of the polls. It appeared that Biden voters — and people voting for Democrats in the Senate — were more willing to take polls than Trump voters.
There are other forms of bias that are more easily identified. For example, in 2016, pollsters apparently didn’t survey enough white people without a college education, and therefore underrepresented them in their final count. Because such people were more likely to vote for Trump than for Hillary Clinton, the polls indicated that Clinton was going to do better than she did.
Unlike nonresponse bias, you can fix biases that come from a sample that’s proportionally different from the population. Not enough men in your sample? Weight the men’s votes higher. Not enough people without college educations? Weight the ones without degrees more. Not enough Hispanics? Weight the Hispanics you do have more.
Of course, weighting has its own problems. If you have one Hispanic woman and the demographics say you are supposed to have three, her opinions have the weight of three people. If she’s unusual, so will be your results that include her in it. In general, the larger the sample, the more accurate the prediction, but if you undersample key groups and weight to correct, you inflate influence in a way that undermines the larger sample.
Finally, there’s a more pervasive problem. Lying. You have people who say they will vote and don’t get around to it. You have people who say they are voting for Biden and actually are voting for Trump, just to screw with the pollster. And their are people who change their minds — voting for Trump at the last moment even though they thought when they told the pollster they would be voting for Biden.
There are ways to try to catch liars — like asking the same question in different ways to see if you get the same answer — but if someone is going to lie, you’re going to have an inaccurate poll.
I believe that there wasn’t too much lying in the surveys we did at Forrester, because people don’t tend to lie about things like whether they have broadband or whether they shop online. But when it comes to politics, there’s a lot more reason to lie.
Polls are useful tools for politicians
If you’re running for office, you may want to know which of your positions are resonating, and which messages you might share are most likely to connect with your core group of voters.
You might also like to know which demographic groups are inclined to vote for or against you, and why. If you find out you have a weakness with suburban women, you can change your messages and positions to address that problem.
In these situations, an imperfect tool is fine. You know there are flaws in the poll data, but even so, the data may be helpful to you as you plan and execute your campaign.
But as a predictive tool, polls are a lot less useful than they once were. The polarized nature of elections in America makes trust a problem — and that includes trust of pollsters. Perhaps fewer people believe that participating in polls is a useful activity. Those people won’t answer — which will create a bias.
Even if polls were more accurate, they only predict what will happen. They don’t tell you, the voter, who deserves your vote. They mostly create a spectacle for media to analyze while waiting for the election to actually happen.
All of the statistical analysts, pollsters, and survey workers I have interacted with have operated with the greatest possible degree of integrity. I respect the work they do, and I was happy to work with them when I was working with survey data.
I just wonder if we’ve lost sight of what matters while immersed in a sea of data about the subset of people willing to fill out an online survey or talk to somebody on the phone.
Any concern over post-election polling that is used to predict the winners in order to have something to say on TV/radio?
Outstanding votes (those still to be counted) >>> delta between the various candidates, yet they predicted/projected winners. Heads were explaining that they did that using the pre-election polling numbers. So it seems reasonable that is pre was broken, post is broken too.
I hadn’t thought about mass lying. As opposed to “shy” voters, this has an altogether different implication. We already know that culturally those that tend to support Trump are predisposed to be suspicious of polls – largely based on what they are told by Trump and “conservative” media.
Thing is, how smart a strategy is that – encouraging your “team” to disengage or disrupt polls?
They need to have some way to read the tea leaves too – anecdotal information, such as rally intensity, is hardly a good indicator of where you stand overall.
Given all of these issues, and the impact they have (real or potential) on voters’ willingness to vote or even who they end up voting for, during early voting and on the day of the election… wouldn’t it be wise to outlaw the publication of polls prior to the end of election day?
Yes, Aaron, I like your suggestion.
I’m sorta enamored of this free speech thing. So, no.
You can put restrictions on how polls gather data (phone calls, internet) — but that would impair candidates ability to respond to what their constituents want.
I am confident any restriction on publishing polls would be thrown out on First Amendment grounds.