The Perils of Polling: Can Polling Really Predict Midterm Election Results? (Part I)
It's true: predicting midterm election results for 2018 has been front page news for more than a year.
Countless articles have been written on why the bulk of the 2016 presidential polls were so wrong.
Some of the reasons given include of under- coverage of important voting blocs, nonresponse errors (individuals chosen for a sample could not be contacted or did not cooperate), response bias, poor wording of questions, and the like.
It should be noted—indeed, emphasized—the Trafalgar Group, The Los Angeles Times/USC poll, and the IBD/TIPP polls all performed exceedingly well.
However, a slew of polls were wrong. Why?
In nearly every battleground state, hundreds of public and private polls did not account for "a tremendous turnout of rural America and a lower turnout of Millennials and African Americans."
Most of the 2016 post-election analysis reveals improper weighting of under-represented demographics—whether by ethnic breakdowns, location, or political affiliation.
Critical: It has also become apparent, as reported in a number of news articles, that many Americans withdrew from survey participation and viewed pollsters as political pawns.
In short, it's impossible to accurately capture people who will fib to pollsters or refuse to talk to them altogether.
This recently revised article was originally published years ago (prior to the 2016 election).
Our pre-election skepticism of predictions of a Clinton victory in the 2016 presidential election polls was based primarily on a long-ago example: the 1936 Roosevelt/Landon election (still used in statistical courses on sampling survey methodology to teach the concept of non-sampling error).
Perhaps in the future, the Clinton/Trump election will replace the Roosevelt/Landon election example to dramatically illustrate the perils of polling.
In essence, the difficulty getting a representative sample of registered voters has become harder and harder due to the fact cell phones are not usually publicly-listed.
Non-response bias was also a major factor in the 2016 election. People, more likely to vote for Trump, did not respond (in sufficient quantity) to phone call solicitations of survey takers.
After reading this article, you will be in a better position to understand what really happened with respect to the 2016 presidential polls and extrapolate lessons learned to the 2018 midterm elections.
Further, we expect (in the very near-future) a flood of media stories which will rightfully question all kinds of polls, including presidential approval rating polls, immigration polls, and the like.
Simply put, this article illustrates the dangers of defining the wrong statistical universe and the dangers of non-response bias.
We also strongly suggest you also read Part II of this article.
It will serve you well and, hopefully, prevent you from making colossal blunders related to conducting and/or interpreting survey/polling results related customer and market analysis.
More than 100 years ago H.G. Wells said, “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.”
Polling of public opinion has become an industry of its own, one of our more thriving growth industries.
Today we ask people what they think about everything—the 2016 presidential election, President Obama's strategy for combating ISIS in the Middle East, the ever-growing budget deficit and its implications for the future, the entitlement crisis, the homeland threat of radical groups, Black Lives Matter, and the welfare mess to name a few.
We poll people regardless of whether they have given the matter serious thought (or knows anything about the subject).
We ask workers what they find wrong with their jobs, with perhaps no better result than to activate their natural human instinct for dissatisfaction.
We ask CEOs and CFOs what they see for the immediate economic future, knowing that they cannot know––and when there's been little time for sober thought.
Daily headlines quote the results of polls. But—and this is a big "but"—the public's “opinion” on almost any issue will be a function of three factors: the questions asked, the responses, and the analysis.
Many books and articles have been devoted to the subject of misuses and abuses of polling and sample surveys. But in today's fast-paced media world, the science of correctly collecting, summarizing, analyzing, and using data seems bothersome. Ignorance is bliss!
Lawmakers seem more interested in the poll results, not how they were obtained. Until recently, it did not occur to them that the results of polls and how they are designed, implemented, and analyzed are inseparable.
A Neglected Truth About Opinion Polls
Neil Postman, in a marvelous book entitled Technopoly, exposed an inherent weakness in interpreting survey results:
Pollsters ask questions that will elicit yes or no answers. Polling ignores what people know about the subjects they are queried on.
In a culture that is not obsessed with measuring and ranking things, this omission would probably be regarded as bizarre.
But let us imagine what we would think of opinion polls if the questions came in pairs, indicating what people 'believe' and what they 'know' about the subject.
If I may make up some figures, let us suppose we read the following:
The latest poll indicates that 72 percent of the American public believes we should withdraw economic aid from Nicaragua…
Of those who expressed this opinion: 28 percent thought Nicaragua was in Central Asia, 18 percent thought it was an island near New Zealand and 27.4 percent believed that ‘Africans should help themselves, obviously confusing Nicaragua with Nigeria.'
Moreover, of those polled, 61.8 percent did not know that Americans give economic aid to Nicaragua, and 23 percent did not know what ‘economic aid’ means.
Postman sadly concluded, "Were pollsters inclined to provide such information, the prestige and power of polling would be considerably reduced."
Polls Show Support for Embattled Public Sector Workers – A Suspicious Finding
On February 28, 2011, a New York Times/CBS poll reported a majority of people opposed efforts to weaken collective bargaining rights and didn’t approve of cutting the pay or benefits of public employees to reduce state budget deficits.
Similarly, in a 2011 national Gallup survey suggested that a majority of Americans oppose measures like the one proposed in Wisconsin that restricts collective bargaining rights for public employees, a result near-identical to the New York Times/CBS News poll.
It would be interesting to apply Postman's comments relating to what those sampled knew about the issues involved. Readers can draw their own conclusions.
We should say, however, that one sharp commentator on the New York Times website summed it up this way: “Americans prefer to have their taxes raised rather than decrease benefits to public employees… that would certainly be news if it were true.”
How to Guarantee Wrong Conclusions from Polls or Surveys
Let's get right to it. Classically trained statisticians call it the statistical universe or population—it consists of all things from which conclusions are to be drawn.
For example, in a study of price fluctuations of heating oil in New York City from 2012 to 2015, the statistical universe would include every price change which occurred during the specified time interval.
If the scope of the study were expanded to cover a larger territory or a longer period of time, the statistical universe is correspondingly enlarged.
Obviously the term statistical universe is an elastic one and varies in its precise connotation in every statistical undertaking.
In general, the statistical universe may be defined as a totality embracing every item which might have been brought under observation had a complete enumeration been effected.
The Need for Sampling
Lack of time and money render it impossible to make a complete survey of most statistical universes. Thankfully, it's not necessary to survey the entire statistical universe.
Why? Because hard-working, brilliant statisticians discovered how to get the same information from carefully selected, relatively small samples.
Making the correct inferences on small-size samples taken from large or sometimes infinite statistical universes is the subject matter of basic and advanced statistics.
If the sampling process is properly carried out, an analysis of the samples makes it possible to infer information about the statistical universe within the limits of unavoidable chance errors of sampling––the so-called margin of error.
Unfortunately, many courses in basic statistics fail to emphasize one critical point: If the statistical universe is improperly defined, the powerful techniques of inferential statistics (making inferences on the basis of samples) are of little or no value.
The term for making inferences from an improperly defined statistical universe is called non-sampling error. Statistical techniques designed to measure sampling error are valueless if the group conducting the poll/survey has committed the biggest statistical error of them all – non-sampling error.
An Example of a Poorly Defined Statistical Universe
Undoubtedly, the most widely publicized illustration of a poorly defined statistical universe is the one concerning The Literary Digest's error in predicting the winner of the presidential election of 1936. (Indeed, this was the example most often cited by W. Edwards Deming, a statistical sampling guru and quality management pioneer)
During the 1936 election campaign between Democrat Franklin D. Roosevelt and Republican Alfred M. Landon, The Literary Digest magazine sent mock ballots to a large list of people whose names appeared in telephone directories and automobile registration records. (Their lists also included their own magazine subscribers and country club members.)
Over 10 million mock ballots were sent out; 2.4 million ballots were returned.
On the basis of the returned ballots, The Digest predicted Landon would win by a comfortable margin––indeed, a landslide.
As it turned out, however, Roosevelt received 61 percent of the votes cast, a proportion representing one of the largest majorities in American presidential history.
How Could They Be So Wrong?
Polls only represent the people who are in a statistical universe and who respond to them. Despite the sample's huge size, this election became a textbook case of a biased sample: All the sample's component groups were heavily Republican.
Let's get more specific. There were two important reasons for the erroneous prediction: (1) an incorrectly defined statistical universe and (2) non-response bias.
Everyone with telephones and automobiles in 1936 was part of a higher economic group than those people without these two luxuries. There was a bias inherent in the statistical universe.
A large percentage of the voting population would not show up in telephone directories, automobile registrations, and club memberships.
The statistical universe was improperly defined––it tended to be biased in favor of higher income groups. Higher income groups tended to be Republican.
In the 1936 election there was a strong relationship between income and party preference. Lower income groups tended to be vote Democrat.
Bias in the Sample Selection Process
Classically trained statisticians define a bias as a persistent error in one direction. What does this mean?
No matter who you sampled from The Literary Digest's statistical universe, there was a high probability you’d select a relatively affluent person.
To repeat: The statistical universe selected was slanted toward middle and upper-class voters and excluded most lower-income voters. And, in reality, there were a great many low-income voters in 1936.
Nine million people were unemployed in 1936.
“With regard to economic status, The Literary Digest poll was far from being a representative cross-section of the population. Then as now, voters are generally known to vote with their pocketbooks.”
It should be mentioned––indeed, emphasized––that George Gallup was able to predict a victory for Roosevelt using a much smaller sample of about 50,000 people.
His statistical universe consisted of a representative cross-section on the population.
The Literary Digest poll sample size was 2.4 million people. This illustrates that you cannot compensate for a poorly defined statistical universe by increasing the sample size. That just compounds the mistakes.
The second problem with The Literary Digest poll: Out of the 10 million people whose names were on the original mailing list, only 2.4 million responded to the survey.
It was then a fact that individuals of higher educational and higher economic status were more likely to respond to mail questionnaires than those of lower economic and educational status.
Therefore, the non-response group—7.6 million people—likely contained a high percentage of the lower economic status group. The 2.4 million people who responded to the questionnaire tended to be from a higher educational and economic status group.
A case study involving the Roosevelt/ Landon poll from the University of Pennsylvania's Wharton School describes the situation as follows:
“When the response rate is low (as it was in this case, 24 percent), a survey is said to suffer from non-response bias. This is a special type of selection bias where reluctant and non-responsive people are excluded from the sample.”
A Quick Look at the 2016 Presidential Election
The 2016 presidential polls may be less reliable than ever, because in the era in which few people answer landline calls, pollsters are having to make do with response rates of 9 percent or less.
That could make them increasingly likely to under-represent key voting blocs. Non-respond bias could impact or distory the results.
Summary and Conclusions
It's a major error to take a sample from a statistical universe that differs considerably from the “ideal” statistical universe you want to draw valid conclusions from.
Considerable time must be spent in defining the ideal statistical universe and every attempt must be made to draw a representative sample from that universe.
Always be on the lookout for a poorly defined statistical universe. Fancy calculations relating to sampling errors, interval estimates, and highfalutin statistical tests are meaningless if the statistical universe from which conclusions are drawn is incorrect.
Finally, beware of non-response bias. As the Wharton School case study said, “People who respond to surveys are different from people who don't, not only in the obvious way (their attitude toward surveys) but also in more subtle and significant ways.”