On Saturday, I posted a poll asking readers to simply pick a number between 1 and 20. I promised I'd explain what this is all about, so here goes.

The poll was inspired by this post on Pharyngula, which in turn was inspired by this article on Cosmic Variance. The idea is that 17 will always be the most common answer when people are asked to choose a number between 1 and 20. But neither Cosmic Variance nor Pharyngula offered a reasonable means of testing this proposition. That's where our poll came in. This morning, I took a look at our data, and with 347 responses, I can confirm that 17 is significantly more popular than any number. Take a look at the chart:

As you can see, the number 17 was picked much more often -- almost 18 percent of the time, compared to the 5 percent you might expect from this sample.

But even random numbers aren't perfectly distributed -- if you roll a die 6 times, you most likely won't get one of each number. Perhaps in a truly random sample, we'd see a similar distribution. So I had my computer generate 347 random numbers in the same range and plotted them in light blue on the chart. Using the computer, the number 19 was most common, but it was chosen just 8 percent of the time. Humans picked the number 17 significantly more often than the computer picked 19.

Are there any other patterns in numbers humans "randomly" choose? Take a look at this chart:

Humans picked odd numbers significantly more often than the computer did. But how much of that effect is due simply to the larger "17 effect"? Consider this chart with the 17 data removed:

Now there is no significant difference between the values picked by humans or by the computer, and both results are no different than the theoretical "random" distribution of numbers.

What about prime numbers? Commenter Fletcher suggested that prime numbers seem more random, so they are more likely to be chosen. Here's a chart of those results:

Yes, we do pick prime numbers more often than computers! A similar analysis, removing "17" from the results, diminished but did not eliminate the effect.

Clearly humans aren't very good random number generators. We predictably select some numbers more than others. If I were to repeat this experiment with a naive audience, I'd very likely find "17" to be the most popular random number, but if I repeated it with the computer, a completely different number would most likely emerge as the preferred number.

## Comments

You can do a chi-squared test with your data.

It looks like the number 7 is also popular with people. I bet 7 is popular because of our culture. I also bet that people who pick 17 first think of 7 then think that is two obvious so they pick 17.

Posted by: Reed A. Cartwright | February 5, 2007 01:41 PM

Don't you psychologists ever put error bars on your "charts"?

Posted by: CCP | February 5, 2007 02:17 PM

"Too obvious" that is.

Posted by: Reed A. Cartwright | February 5, 2007 02:26 PM

Yes, psychologists put error bars on their "charts." However, this blog is intended for a general audience, and error bars can be problematic. Are we talking about confidence intervals, standard errors, or what? Lay readers do not understand the difference, and the end result is that they make incorrect assumptions based on an incomplete understanding of the underlying statistics.

Take a look at this post for a good explanation of the difficulties in presenting error bars.

Posted by: Dave Munger | February 5, 2007 02:32 PM

Reed is onto something, I think. It would be interesting to see the results if subject were first primed with a reading sample that involved a heavy use of odd numbers, including 17.

I think humans tend to construe "random" in a non-mathematical way. The interpretation is closer to "uncommon."

Posted by: Mr. Person | February 5, 2007 02:38 PM

Significant how? .05 level? With a binomial test, I presume?

Perhaps this can be seen as a version of the typicality effect. If I ask you to name any bird, you're more likely to say "robin" than "penguin". This effect is explained rather cleverly by connectionist models of memory (e.g., McClelland, gah, lost the reference). In these models, the effect appears because the network favours prototypical nodes - i.e., nodes which are connected to a lot of other related categories, which are presumably also activated by the request to name a number. The most typical member of the category will activate most strongly, and will thus determine the output.

Of course, this only gives an account of the hows, not the whys. I would think that 3, 7, etc would be more typical numbers than 17.

Posted by: phineasgage | February 5, 2007 02:44 PM

As someone who choseeht number 17 perhaps I can offer my personal insight - I was looking to select the most unlikley number (sorry!). In this sense i wonder if this is actually a measure of percieved least common number rather than random selection - there is a body of research about the ability to actually generate random numbers being very difficult for human subjects (methinks Sallice, T).

Posted by: Austin | February 5, 2007 02:46 PM

In Italy, 17 is considered unlucky, like 13 is in the US, with Friday the 17th being particularly unlucky...

Posted by: DeirdrÃ© Straughan | February 5, 2007 02:48 PM

The only problem with this is that it's highly likely many of those who took the survey had read the Pharyngula and/or Cosmic Daily post, compromising the results.

After the Language Log post on 17 back in December (I think that's when it was), I became so curious that I started asking participants for a random number between 1 and 50 at the beginning of every experiment. I'll have data sometime between now and the end of the semester.

Posted by: Chris | February 5, 2007 02:53 PM

My stats knowledge is baby-simple. I first use an unpaired

t-test, and if that doesn't give good results, I move on to a paired test (p<.05). That's pretty close to the limit for me.Posted by: Dave Munger | February 5, 2007 02:53 PM

....that should have been "chose the" not "choseeth" - monitoring failure

Posted by: Austin | February 5, 2007 02:54 PM

I think the answer is bisection. Our minds first go to (20+0)/2=10 but then we think, oh that's too obvious. So we next choose (20+10)/2=15. Nope, still too obvious. Then we go (20+15)/2=17.5, round it to 17 and the loop ends, because we are getting too close to 20 to be comfortable.

Nonsense, probably, but that is what comes to mind...

Posted by: Michael | February 5, 2007 03:34 PM

I wonder if the fairly uniform distribution of n*7 modulo 10 (7, 4, 1, 8, 5, 2, 9, 6, 3, 0) has anything to do with the popularity of 7 and 17. I'd bet (x*10)+7 is preferred for most "random" ranges (more so for primes -- and just under 1 in 4 primes less than 100 is 7 modulo 10.)

Posted by: Craig Pennington | February 5, 2007 03:43 PM

Here is a nice web site about numbers. Scroll down to see number 17.

If you follow the link on that page you can see that 37 is also claimed to be a "psychologically random number".

Posted by: Ahmet Kutsi Nircan | February 5, 2007 04:00 PM

Nice swooping pattern to the graph too.

Ascending (sorta) into 7, then descending to 10; ascending back up to 13, then descending to 15/16, etc.

Posted by: Mr. Person | February 5, 2007 04:03 PM

People do not understand the word "random". I suspect that, similarly to Austin's comment, we want to pick an unusual number, perhaps to offset the expected "typical" numbers that others will pick. Notice also that 4, 10, 15, 16, and 20 are under represented in your graph. [Me, I selected either 1 or 20 thinking that they would be under selected. Maybe I don't understand random either.]

Perhaps trying a pair of tests, one asking for a random number and the other for the least likely number to be picked could show that many people have this common confusion of the word "random".

Posted by: Steve Maguire | February 5, 2007 04:15 PM

i wonder what would be result of an equivalent experiment carried out on some of our primate cousins? Experiments do suggest that rhesus monkeys, at least, do posses the capacity for "spontaneous number representation."

Posted by: lawrence | February 5, 2007 04:32 PM

The famously irritable Phineas Gage said:

". If I ask you to name any bird, you're more likely to say "robin" than "penguin"."

I'd like to see a poll of this. I, for one, would be more likely to say "penguin", probably because of its ubiquity in pop culture. I mean, no one ever heard of an exploding robin on the telly.

Posted by: onymous | February 5, 2007 04:37 PM

This also works with Battleship. Inexperienced players tend to think that certain positions are less "random" than others. They thus tend to shy away from putting their ships on edge or center positions. They also tend to shy away from putting shots onto those positions. When starting to learn to play battleship, this is always one of the first things you have to unlearn.

Posted by: boojieboy | February 5, 2007 05:03 PM

nice post. just a little note on the statistics: doing only one simulated data set is a little, well, worthless. the goal in a simulation is to estimate the actual distribution of a quantity (in your case, the number of '17's) under the null hypothesis, so you'd need to do a lot of them. Luckily, you already know the distribution you're looking for! It's a binomial with parameters n=367 and p=1/20. So can you reject the null hypothesis at p=0.05 based on your data? I leave this as an exercise for the reader.

Posted by: p-ter | February 5, 2007 06:25 PM

Try it in base 16 and see what happens:

1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,10(16),11(17),12(18),13(19),14(20),15(21),16(22),17(23),18(24),19(25),1A(26),1B(27),1C(28),1D(29),1E(30),1F(31),20(32)

I'm not sure how best to present that to a subject, but I would want to try to find ways to separate how much a subject is responding to the numerical concept vs. the numerical symbol. I would think the associations to each are different.

Posted by: DavidD | February 5, 2007 08:04 PM

I definitely think people think "unusual" when they see "random". I think that must have happened a bit for me, because I chose 19, which is also a prime. I also like the "7...no, 17" explanation because I do think 7 is many people's favorite number, so I'm not surprised it was chosen a lot, too. I do think people consider 17 to be more "random", though.

Posted by: Katherine Moore | February 5, 2007 08:23 PM

I attended a math camp (cult?) where the running joke was the awesomeness of the number 17 (and yellow pigs). The camp director, David Kelly spent a lot of time finding interesting properties of the number 17. An often recounted story is that in a competition between supporters of the number 23 and 17, Kelly and Co were able to name two interesting properties for each interesting property of 23.

Check out the math camp's website: http://www.hcssim.org/

Posted by: Ian Wang | February 5, 2007 08:40 PM

My calculator couldn't handle the concept of 347 trials (too large numbers), but even if you reduce the number of trials to 50 the choice of 17 this many times has a p-value of 0.0005, i.e. statistically significant. And so if there were even more trials, that would only make it less likely, and more significant.

There's a real effect here, both intuitively and statistically.

Posted by: Bob Larsen | February 5, 2007 10:42 PM