The Canadian Curmudgeon: Analyzing the Gallup Poll on Creationism

A few weeks ago, Gallup released the results of a poll about USAians' views on evolution vs. creationism. The results aren't good (but they've never been very good). I'm taking this opportunity to explore the meaning of confidence levels in polls — and, in the process, to tease out some actual information from the polls, which turns out to be slightly good.

The Gallup data

The Gallup poll title “In U.S., 46% Hold Creationist View of Human Origins” (June 1, 2012) is a good summary of why the results aren't good. But it's worse than that; 32% (of USAians of age 18 years or more) hold the view^[1] that “Human beings have developed over millions of years from less advanced forms of life, but God guided this process”. This means that 78% of USAians believe that humans are the result of some superstitious bullshit.

There are some fairly obvious refinements of the data, e.g., “Two-thirds of Americans who attend religious services weekly choose the creationist alternative” and “Majority of Republicans Are Creationists” (see the Gallup link above). Gallup also published the data (again, see the Gallup link) for the results of some similar surveys since 1982, and plotted a graph over time. Here is my reproduction of their graph:

Figure 1: Belief in the Creationism Myth of Human Origins Over Time (see text for explanation)

(Actually, this isn't quite a reproduction of their graph, because they didn't include the “undecided” data.) These data have a confidence level of 4 percentage points at 95% confidence.

The question asked by Gallup was:

Which of the following statements comes closest to your views on the origin and development of human beings —

(EG+) Human beings have developed over millions of years from less advanced forms of life, but God guided this process;

(EG−) Human beings have developed over millions of years from less advanced forms of life, but God had no part in this process;

(C) God created human beings pretty much in their present form at one time within the last 10,000 years or so?

They had two versions of the survey, one with the responses as listed, and the other with the responses in the opposite order^[2]; I've introduced the labels EG+ (Evolution, god-positive), EG− (Evolution, god-negative), C (Creationist), and U (Undecided, or Other/No Opinion).

People have been talking about the little uptick at the end, and what it might mean; others have pointed out that the correct response (EG−) seems to be increasing over time. We can construct linear regression models, and then plot what Gallup should have plotted:

Figure 2: Belief in the Creationism Myth of Human Origins Over Time (with one-standard-deviation error bars)

Before we say any more about this, we should talk about where the error bars came from and say a few things about confidence levels.

A Crash Course on Confidence Levels, By Example

The Gallup data are based on approximately 1000 people (their web post says 1012, their ‘details” document says 1024; I'm just going to call it 1000). This is where the confidence levels come from. But what does “4% at 95% confidence” mean?

There's a big population out there — all the USAians of age 18 or more years — and each of these people can be put into one of four categories (EG+, EG−, C, U). The goal of this survey is to estimate the number of people in each of those four categories; or, equivalently (assuming you know how many are in the population!), the percentage of people in each of the categories. What the surveyors do is this: find some manageable number of people, called the ‘sample’; ask them the question; calculate the percentages of the sample in each of the four categories; and then say that those percentages are estimates of the population percentages. The ‘problem’ with this is that the sample might not be truly representative of the population. There are (at least) three reasons for this:

First, there might be some direct bias in the sample. For example, if they chose their sample by standing outside a church and grabbing people as they came out, they would quite obviously be biasing their sample towards god-believers; if they chose their sample by standing outside a scientific conference on evolutionary biology, they would be biasing their sample towards evidence-accepters. What they actually did was phone random phone numbers (400 cell phones and 600 land lines); this biases their sample towards people who answer the telephone (and, therefore, away from people who have other things to do), but it's probably the best they can reasonably do.
Second, even if the sample is randomly chosen from the population, there might be some bias on the survey results. For example, if they ask ‘do you agree with everyone else who says that your god had an influence on you’, they'll be biasing the question towards god-believing, whereas if they ask ‘are you stupid enough to believe some superstitious bullshit’, they'll be biasing the question towards evidence-acceptance. This is a blatant example, but there are more subtle ways to introduce similar biases. (See the footnote for another.) One thing they do to try to avoid this bias is to put the question in the middle of a whole bunch of unrelated questions, so that you don't know what they're actually interested in. (Of course, this is tricky, and can be abused, because of the possibility of priming.)
Third, even if you could completely remove any trace of bias from the sampling and from the survey, there's the simple fact that a random sample is random, and therefore brings along things like probability distributions and uncertainty.

So, let's assume that there's no bias, and see what happens as a result of this randomness. Let's also assume that the estimates obtained from the sample are equal to the true population percentages: EG+ = 32%, EG− = 15%, C = 46%, U = 7%.

The survey, assuming no bias, is equivalent to the following experiment. Start with four empty buckets, labelled "EG+", "EG−", "C", and "U". Pick up a rock (or something similar); put it into one of the four buckets, with^[3] probability 32% that it goes in EG+, 15% in EG−, 46% in C, and 7% in U. Do this a thousand times, so you've distributed 1000 rocks between four buckets. (Probabilists call this "balls in urns", not "rocks in buckets", but it's easier to find rocks and buckets.) Or write some computer code to simulate this; I did this, and got 307 in EG+, 132 in EG−, 488 in C, and 73 in U. Then I did it again; I did it 100 times and computed the average and standard deviation; and then I computed the results theoretically. Here are the results:

Table 1: An Example Multinomial Distribution
	EG+	EG−	C	U
Probability	0.32	0.15	0.46	0.07
First trial (n=1000)	307	132	488	73
Second trial (n=1000)	278	156	486	80
100 trial average (n=1000)	321.6	151.6	458.3	68.5
expected value (n=1000)	320.0	150.0	460.0	70.0
100 trial StDev (n=1000)	15.6	10.7	16.6	8.0
theoretical StDev (n=1000)	15.3	10.4	15.7	7.3

(The expected value is np; the standard deviation is (np(1−p))^1/2.)

The point, here, is that a single sample of 1000 people will deviate from the population percentages by some amount, and this amount can be estimated once you've estimated those percentages. Let me convert the (theoretical) counts with n=1000 (from Table 1 above) into percentage points: EG+ = (32±1.53)%, EG− = (15±1.04)%, C = (46±1.57)%, U = (7±0.73)%. It's those uncertainties (±1.53, ±1.04, ±1.57, ±0.73) that are the key to understanding the confidence levels. I'm expressing the uncertainties by stating the standard deviation; in a Gaussian random variable, approximately 68.3% of the population are within one standard deviation of the average, and approximately 95.5% are within two standard deviations. So, a 95% confidence level is approximately two standard deviations. That's pretty much it, except for a bunch of stuff having to do with the fact that the four percentages are not independent, but are (anti)correlated (if one increases, the others have to decrease, because their sum is 100).

Analysis Of the Gallup Data

Let's take another look at the data from 2010 and 2012, in particular, the uptick in the C value. It's a difference of 6 percentage points. The standard deviation is 1.57. A relevant question is this: what is the probability that the absolute value of the difference of two standard normal variables is greater than 6/1.57 = 3.82? The answer is 0.7% (because the difference of two standard normals has a standard deviation of 2^1/2, and 99.3% of a standard normal lies within 2.7 standard deviations of the mean). That is, there's a 0.7% chance that a change by at least 6 percentage points is random, 99.3% that it's not. From this, I think we can say that the increase in the C value, from 40% in 2010 to 46% in 2012, is real with 99.3% confidence^[4].

In comparison, EG+ changed from 38% to 32%, a drop of 3.93 standard deviations (99.5% confidence); EG− changed from 16% to 15%, a drop of 0.96 standard deviations (50% confidence); and U changed from 6% to 7%, an increase of 1.36 standard deviations (66% confidence). The changes in EG− and in U are not significant; the sigificant change is the increase in C and the corresponding decrease in EG+. (I think the most reasonable explanation is that 6% changed their mind from EG+ to C; however, the survey doesn't track individuals, so we can't conclude that from this survey.)

What about the longer-term increase in EG−? We can run another experiment. Assume that there is no change over time^[5], do the 1000-sample experiment 11 times using the average probabilities, and calculate the linear regression coefficients. Repeat this a whole bunch of times, and measure the average and standard deviations of the coefficients. The constant coefficient isn't particularly relevant, it's only the slope that we care about. Here are the results (1000 trials):

Table 2: Linear Regression Coefficients
	dEG+/dt	dEG−/dt	dC/dt	dU/dt
Expected	0.000	0.000	0.000	0.000
Expected StDev	0.059	0.039	0.061	0.028
Observed	−0.082	0.232	−0.056	−0.107
Observed (StDevs)	−1.4	5.9	−0.9	−3.9

The observed slope of EG− is 5.9 standard deviations above zero. That's a confidence level^[6] of 99.9999996% (or a doubt level of 4×10⁻⁹) that the slope is non-zero. The percentage of people who hold the view of EG−, the evidence-based view that evolution happened without any supernatural influences, is, beyond any reasonable doubt, increasing at 0.232% per year. That's good news. (It would be better if it was faster.)

It may seem odd that U (undecided) is decreasing significantly. This effect is dominated by the data from 1982. If we repeat the preceding analysis, but discarding all the data from 1982, we find that EG− is increasing at 0.321% per year, which is 4.9 standard deviations above zero (a mere 99.9999% confidence level, or a doubt level of 1×10⁻⁶), and U is decreasing at less than one standard deviation, so is effectively constant; EG+ and C are decreasing at 0.125% and 0.176% per year, or 1.3 and 1.9 standard deviations below zero, respectively (confidence 81% and 94%, respectively, that they're non-zero). (An alternative hypothesis is that U bottomed out in 2005 and is now increasing again, but I won't bother analyzing that.)

Upon looking at the graph in Figure 2, one might observe that there is some variation around the straight lines, and one might ask whether the change in C and EG+ from 2010 to 2012 is truly larger than expected; after all, C in 2010 was below the line and C in 2012 was above, couldn't these just be random deviations? One way to address this is to simply compute how many standard deviations away from the straight line are the data, and how significant are each of these deviations. Here are the results:

Figure 3: Confidence Levels (nonlinear scale) of Deviations from Linear Model

There are 44 data points. 36 of them, or 82%, are less than 90% away from expected; 39, or 89%, are less than 95% away; and 41, or 93%, are less than 99% away. This is sort of borderline significant; in other words, the linear model together with expected uncertainty explains almost everything. (If you throw away the data from 1982, the significance is even smaller.) In the 2012 data, the most significant deviation from the linear trend is that EG+ is smaller than expected.

What seems a more likely explanation of the recent data is the following. First, in between spring 2008 and autumn 2010, there was a decrease in C, along with an increase in both EG+ and EG−; most likely, a bunch of people changed from C to EG+, and roughly half that number of people changed from EG+ to EG−. Next, between autumn 2010 and spring 2012, some, but not all, of the latter group reverted to EG+, and a bunch of EG+ regressed to C. All of this happened on top of the linear trend.

Conclusions

There is one slightly ambiguous finding and one unambiguous finding:

Between 2010 December and 2012 May, there was a change in the flavour of superstition of the USA adult population: a 6% move from ‘everything evolved but god made me special’ to ‘god made me special and that's all that matters’. (This is but one of many stupidities that are prominent in the USA that have increased during the same time period. I think it's worth emphasizing that this latter superstition includes the silly belief that humans have only been around for 10k years.)
The percentage of the USA adult population that accepts evidence and prefers rational thought to fairy-tales is slowly increasing.

This is good news, but we have a long way to go.

^There are so many things wrong with this statement. The first, of course, is the implication that there is a god. Next is the “millions of years” — it's been a lot longer than that! (We, as a species, diverged from our closest living relatives a mere five million years ago; the complete development took almost four billion years!) Third is the anthropocentrism inherent in the use of the phrase “less advanced forms of life”. (On the one hand, the phrase erroneously implies that humans are somehow more “advanced” than other species; on the other hand, the phrase is vacuously tautological, if you define “advancement” as the forward progression of time.)

^Changing the order of responses is, I assume, to avoid a possible bias introduced by ‘That was complicated, I'm just going to say the first one you said’ thought patterns. If there is such a bias, it could be removed from the survey by randomly permuting the responses, such that each was equally likely to be listed first. If I'm interpreting their published data correctly (they say “ROTATE 1-3/3-1”), they didn't permute randomly, they merely reversed the order, which, it seems to me, would introduce a bias against response number 2, i.e., EG−, because it never shows up first.

^How, one might ask, can one choose one of these four buckets with the specified probabilities. Easy: first, let's assume you have some way of generating a random number, uniformly distributed in the interval [0,1]. Call this number R. Then divide the interval [0,1] into four sub-intervals of lengths 0.32, 0.15, 0.46, and 0.07; in other words, look at the intervals [0.00,0.32], [0.32,0.47], [0.47,0.93], and [0.93,1.00]; call those intervals EG+, EG− C, and U. Then find which of those four intervals your number R is in, and put the rock in the corresponding bucket. (There is a probability of 0 that R will be on the boundary between intervals, but if this happens, flip a coin.) The ‘random numbers’ that computers generate are almost always pseudo-random, not truly random, but they're good enough for our purposes here. If you don't have access to a computer, you can use a coin, as follows. Let Heads represent 0 and Tails represent 1, start flipping, and write down the resulting sequence of binary decimal places. For example, the sequence of coin flips THHTHTT represents 0.1001011 in base two, or 75/128, approximately 0.586. But it's more subtle than that; you don't know what the next digit is going to be (or any of the following ones); the number R you're constructing is somewhere between 0.10010110000... and 0.10010111111..., i.e., between approximately 0.586 and 0.59375. After N flips, you've determined some interval of length 2^−N. Keep going until this interval lies entirely within one of the four target intervals EG+, EG− C, and U, at which point you will have randomly chosen one of those four with the desired probabilities. In this particular case, the sequence of intervals is [0,1] (before any flips), [0.5 1] (after the first flip), [0.5 0.75], [0.5 0.625], [0.5625,0.625] — and you can stop here, because the interval [0.5625,0.625] is a sub-interval of [0.47,0.93]. Exercises: (1) prove that R, constructed by this coin-flipping method, is uniformly distributed in [0,1] (assuming an unbiased coin); (2) modify the procedure so that you can use it with a six-sided die (in case you don't have a coin).

^I should probably point out that this analysis is wrong, because (as I mentioned earlier) the changes are correlated, not independent; their sum must be 0. The question ‘Is the change in C significant?’ is somewhat ill-posed (or, at the very least, extremely subtle). A more appropriate question is ‘Is the change in the data point [EG+,EG−,C,U] significantly different from the zero vector?”. The Mahalanobis length of this change is approximately 4.6; this is the three-dimensional version of saying the change is 4.6 standard deviations. Then you need to divide by the square root of two (because you're testing the difference of two multivariate normal random variables), so the change is 3.25 standard deviations in three dimensions. (It's three dimensions because of the constraint that the sum is zero.) The probability that a three-dimensional multivariate standard normal variable has length more than 3.25 is 1.4%, so, with 98.6% confidence, the change is non-zero. You could then ask the follow-up question ‘given a non-zero change, is it significantly in the direction of increasing C?’, but I won't bother with that; it's pretty clear that the biggest change is from EG+ to C. (Having said all that, I should probably point out that even this analysis isn't quite right; these things are only approximately distributed like multivariate normals. To do it right, you need to start from the multinomial random variable. Even then, though, the question of whether the direction of the change is significant is subtle, if not ill-posed.)

^This is the so-called ‘null hypothesis’. I don't particularly like that terminology (it's so judgemental!).

^I didn't incorporate the uncertainty in the estimated regression coefficients in this analysis. These uncertainties are around 10% of the expected standard deviations shown in Table 2, so those standard deviations should be multiplied by 1.01. This changes the level of doubt from 3.72×10⁻⁹ to 4.49×10⁻⁹.

1 comment:

Medina6416 Jun 2012, 23:54:00
This is good. I have always found it beyond my ability to understand when people seriously think the world is 10000 years old and blow off a century of geology and palaeontology. I can kind of understand the EG+ view, it is a way to not have to confront reality, but still find it ridiculous in this day and age. For Franklin and Jefferson, ok, it fit with the science of their time, but today, no. All this stuff is on a par with thinking aliens come from millions of light years away to stick a probe up your ass. Which I wouldn't care about either, if we weren't making critical decisions on the religious equivalent of probed assholes.

Commenting might not work. You can try and see what happens, who knows, it might work. (It'll show a message “Your comment was published” if it worked.) If it didn't work, try hitting the “Post Comment” button again. Still didn't work? Hit it harder this time! (Seriously. It seems to work the third time, and then always after that, unless you clear some browsing data. I'm trying to fix it.)

The Canadian Curmudgeon

Please get off my lawn, eh?

2012-06-15

Analyzing the Gallup Poll on Creationism

The Gallup data

A Crash Course on Confidence Levels, By Example

Analysis Of the Gallup Data

Conclusions

1 comment: