The essence of a sampling argument is the "sample." Usually, populations
are so large that we cannot reasonably test the **state** of
every member of that population. For instance, if we wanted to know what
proportion of Scotsmen get tipsy (slightly drunk) on Hogmanay,
we cannot possibly hire enough obervers to follow around every Scotsman
around on the evening of December 31st. (Especially if we count female
Scots as "Scotsmen" Oh, lets just call them "Scots."), so we're scre... I
mean, so we have to fall back on looking at a much *smaller*
number of Scots and *extrapolating* the results to *all*
haggis-eatin', kilt-wearin' caber-tossers. (This is perhaps an unfair
characterization of the Scots. Very few of them actually toss cabers.) So
let's just hire people to follow around a randomly selected group of one
million Scots next Hogmanay and to report on whether or not they get
tipsy. Say that 75% of these randomly selected Scots get tipsy on
Hogmanay, we could then make the following argument.

Exactly 75% of our sample got tipsy this Hogmanay, therefore 75% of all haggis-eaters got tipsy this Hogmanay.

Here's how the terminology of generalization matches up with this
argument.

This is how a generalization works, if it works at all. A sample is
taken, and it is argued that the state of the sample *must *be
the same as the state of the population. If the state of the sample cannot
reasonably be explained without assuming that the population has the same
state, the argument is good. If we *can* reasonably explain the
state of the sample *without* assuming that the population has the
same state, the argument is no good, lousy, bogus, wack, heinous.... I'll
stop now.

For another example, imagine that two people, call them "Jeeves" and
"Wooster," are trying to figure out the overall composition of the
following population. Imagine also that neither of them can see the
population the way *you* can. (*You* can see that this
population is extremely well mixed. In fact, there are only two deviations
from perfect mixing. They appear in the top left and bottom right corners
of the field. By some strange coincidence, that's where Jeeves and Wooster
take their samples from.) They know that it's composed of 2,600 colored
dots, but that's about it. Neither of them has any idea of how the dots
are distributed, or anything else besides the fact that it's made up of
dots. And of course, neither of them knows that the population is made up
of 650 red dots (25%), 650 blue dots (25%) and 1,300 green dots
(50%) Now Jeeves takes a sample from the top left corner of the
population (red line) while Wooster takes a sample from the bottom right
corner, (blue line). Each of them then makes a claim about the composition
of the population based on their samples.

Jeeves's sample is 50% green, 25% red and 25% blue. So he claims that the population is 50% green, 25% red and 25% blue.

Wooster's sample is 25% green, 25% red and 50% blue. So he claims that the population is 25% green, 25% red and 50% blue.

That's quite a big difference. Who's closer to being right and why?

The reason Jeeves's argument is better than Wooster's argument is that
argument Jeeves's sample is big enough to swallow it's imperfection in the
mixing of the population (which means that his sample is **representative**
of the population) while Wooster's sample is so small that it's
imperfection crosses the sample border, distorting the result (which means
that *his* sample* isn't *representative of the population).
Are these samples too small? Well that depends on what we know about the *structure*
of the population.

We saw above that it's *possible* to have a sample that's way too
small to accurately represent the population it's taken from. *However*,
it is sometimes the case that a population is **structured**
in such a way that even a small sample can be perfectly representative, *if*
it's taken the right way. A population is not always arranged as a chaotic
mixture of individuals. Some populations are arranged in such a manner
that we can take a very small sample with absolute confidence that the
result will perfectly represent the composition of the population. For
instance, consider the population of dots shown below. Imagine that we
know that the population is structured in the way shown, but we don't know
the colors of any of the rows. Now imagine we take the very, very, very,
very small sample of exactly four dots comprising the first dot in each of
the first four rows, as shown in the top left corner of the image. That's
a sample of four out of four thousand. That's one per thousand, which
means one tenth of one percent, or 0.001. Is that too small?

Our sample comes out 50 percent red, 25 percent blue and 25 percent
green. **Given that we know the structure of the population**,
what are the chances that the population is 50 percent red, 25 percent
blue and 25 percent green?

Therefore, the following argument is very __bad__. (Technically, it
commits what we call a **red herring** fallacy:)

It hasn't been proved that the dots in the picture above are 50% red, 25% blue and 25% green because the sample upon which that generalization is based is only 0.001 of the population, which is waaaaaaaay too small a sample.

The **key fact** here - the thing that makes this argument
*bad* - is that the population is completely structured in
alternating homogeneous rows of red, green, red and blue dots. It is this
highly organized structure that allows a miniscule sample of just four
dots to perfectly represent the composition of the whole population.

How big a sample do you need to tell the composition of this population? Will four dots do? It will if each of those four is the

There are two lessons here. The first is that even an

Another way to get an accurate result with a very small sample is if a population is

Try to find a four-square group, or a contiguous line of four dots that

Now imagine blindly picking dots from

Now imagine you work for a petroleum company. You check the composition of oil products so the company can decide how each tanker load will be processed. Your company's tankers contain a pumping system that circulates the oil between all the tanker's oil-carrying compartments. All the oil is moved and turbulance from the pumping process mixes the oil products so thoroughly that every centiliter in that tanker is absolutely identical to every other centiliter in that tanker. Given that a litre is one hundred centiliters, would one liter be a big enough sample to test the composition of the oil mixture in a tanker holding a billion liter of oil products?

The point here is that small sample size

If 1% can be an adequate sample, 50% can be inadequate. Imagine that Noah was an educational administrator who had to rely on state grants for his funding. God issues a grant that will allow Noah collect two of every animal, but Noah's immediate superiors insist he spends half of God's money on computers. Thinking outside the box, Noah adapts to the situation by only including

When we're worrying about sample size for a perfectly random sampling method, it is sometimes useful to talk about variables and their values. Consider Noah's Ark, but this time without any middle-management between god and Noah. Noah marches on board two-by-two, one couple of each kind. In this situation, sex and species can both be considered variables, each with its own characteristic range of values. Sex is a variable with only two values, and thus a sampling argument concerning the sexes of the animals would only need a fairly small random sample. Species, on the other hand, is a variable with thousands of possible values. Given that Noah's Ark contains only two of each kind of animal, a sampling argument concerning the distribution of species on the Ark would need a sample size of considerably more than fifty percent, if it was based on a truly random sample. (A non-random sample could do it accurately at only fifty percent, if the sample was chosen in the right way.)

How big is big enough? Firstly, the issue of whether a particular sample is big enough

The minimum necessary sample size depends on the *number*
of different relevant properties individuals can have, and on the *degree*
of mixing in the population. In the well-mixed population above, the
number of different relevant properties is four, because there are four
colors, and the population is perfectly mixed. If the number of
different properties was larger, or if the population was less well
mixed, minimum necessary sample size would be larger.

So, there are two different things to think about when you consider the proportional size of a sample to it's population:

- How large is the sample
*compared to the number of different relevant properties*individuals in the population can have? - How is the population
*structured*? Is it evenly or unevenly mixed? Are its members arranged in some way that so that the given sample is sure to be*representative*of the given population?

Is the sample too old? Not if the sample was absolutely isolated! Remember, there's nothing in Earth's air that inertium can react with, so the sample can't change over time. Isolation prevents sample gasses from escaping and gasses from later atmospheres from getting in, so it can't change that way either. So, in this case, a three thousand year old sample is enough for a good generalization, provided that all the

Here's a real-life example. More than about 4 billion years ago, the solar system was nothing but a widely spread out mass of gas and dust particles which was slowly but surely organizing itself into bigger and bigger clumps, many of which banged into each other to make larger clumps. Our Earth was one of those lumps. While the Earth was first forming, it was hot and mostly molten, so the heavier materials gravitated to the center of the lump and the lighter materials were forced up to the surface. The heaviest materials became the Earth's core. Just before the Earth finished forming, a really big lump smashed into it hard enough to kick some of that core material up to the surface on the other side of the Earth. 4 billion years later, scientists found some of that material, figured out what it was, and used it to figure out the exact chemical composition of the Earth's core. Think about it. Not only is the few pounds of material they used a tiny, tiny sample relative to the total size of the Earth's core, that sample is 4 billion years old. However, the Earth's core has been subjected to enormous heat, pressure and mixing by convection, so it's extremely well mixed. Furthermore, there's no known substance that could turn into nickel-iron over any timescale, so we have good reason to think that the composition of that core has not changed in 4 billion years, and that the composition of the pieces of core material that the scientists used hadn't changed either. So in this case, a sample that's about as old as a sample can get on this planet turns out not too old!

Of course, saying "That generalization's no good because the sample's 4,000,000,000 years old!" is a

And, conversely, having a very very recent sample does not guarantee a logically compelling argument. Some populations change very rapidly. Think about trying to do a generalization about present computer use, or present cell phone use, or home recording equipment, based on data from 1950.

There are two things to think about when you consider the age of a sample:

- How quickly and in what ways do the things in this population and sample change over time? What forces bring about these changes, and how quickly or slowly do they operate?
- Given the known rate of change in this kind of
thing, was the sample taken recently enough that we have a reasonable
guarantee that the sample is still representative of the population?

People sometimes say that all samples have to be taken randomly, or they're no good. This isn't exactly true. There are circumstances where the population structure will make it possible for a small

A generalization can only work if it uses a sampling method that is completely

Some people call this sensitivity "bias." I don't like that terminology. For one thing, "bias" has more than one meaning, and not all it's meanings have anything to do with the accuracy of generalizations. And a sampling method can be

Let's go back to assessing the tipsiness of Hogmanaying Scots. Say we happen to know the names and addresses of three significant groups of Scots. We know the names and addresses of all Scottish accountants, all Scottish teetotalers, and all Scots who have been convicted of drunk driving at least three times. Say we examine every member of each group to see whether he or she got tipsy last Hogmanay. And say we got the following results.

1. 67% of all Scottish accountants got tipsy last Hogmanay.

2. 0% of all Scottish teetotalers got tipsy last Hogmanay.

3. 99% of all Scots who have been convicted of drunk driving at least three times got tipsy last Hogmanay.

These results can't all be representative of the whole Scottish population. At best, only one is right. So which of these figures is more reliable? The answer is, whichever one is least sensitive to the feature being tested. What do accountants have to do with tipsiness? Nothing that I can think of! But teetotalers are people who habitually abstain from alcoholic beverages. (Strange, but true.) So of course none of them got tipsy on Hogmanay. Are all Scots teetotalers? I don't think so! So that sample is definitely dependent on the feature being tested. On the other hand, habitual drunk drivers can be expected to drink more than regular Scots, so that sample is dependant too. (Notice that one of them is

Key Fact 1. Being an accountant has nothing to do with getting tipsy. (Makes the sample

Key Fact 2. Teetotallers don't drink, and this is a question of drinking behavior. (Makes the sample

Key Fact 3. Drunk drivers can be expected to be heavy drinkers, and this is a question of drinking behavior. (Makes the sample

It can be very difficult to tell whether or not a sampling method is

1. Testing American reactions to the war in Iraq by mailing questionnaires to the membership of the American Pacifists Association.

2. Testing the distribution of blood types across the United States by taking blood samples from members of the Mayflower Society, a group which restricts its membership to people who have at least one direct ancestor that came over to America on the Mayflower.

3. Assessing the bodily proportions of 18th-century Americans by measuring antique clothes preserved by historical societies.

Obviously, the first sampling method is no good because (key fact) we would be taking our sample from a group that is already self-selected to be against any war. The second sampling method is also dependent because (key fact) the Mayflower passengers came from a very small region in Europe whereas the vast majority of other immigrants to the United States came from other regions, and continents, and (other key fact) blood type is very highly correlated with ancestry. Finally, there is the (key) fact that until recently, good quality clothing (the kind that is likely to be preserved) tended to be reused as long as it could be made to fit new people. Larger clothing was easier to alter than smaller sized clothing, so it tended to be reused until it wore out. Smaller sized clothes tended to be put away in the hope that someone would come along who could use them, so smaller sized clothing is much more likely to have been preserved than larger sized clothing. Therefore, the third sampling method is also dependent.

There are two different things to think about when you consider the sampling method used by a particular sampling argument:

- What
*particular*fraction of the population was picked out by this method? What are all the characteristics of this fraction?

- How are the characteristics of this
*fraction*, the fraction that was picked out to be the sample, related to the*feature*of the sampling argument? Does the chosen fraction have something*in common*with the feature cited in the conclusion, so that the sampling method is fatally*dependent*on that feature. Or does that fraction have*nothing*in common with the feature, so that the sample is safely*independent*of the feature.

The concepts of randomness and independence are very difficult for some
people. There are those who think that nonrandomness is a kind of magic
bullet that automatically kills an argument. Seeing that some sample was
not taken randomly, they stop thinking and write things like "the sample
was not taken randomly, so the argument is no good." Don't do this. Don't
think that nonrandomness *magically* kills arguments. When you see
that a sample is nonrandom, start thinking about the relationship between
the sampling method and the feature cited in the conclusion. If they have
nothing in common, the method is independent of the feature, and the
nonrandomness does not matter.

Fallacies are specific things that can go wrong with arguments. I like to think of them as bad arguments that some people commonly mistake for logically compelling arguments. Here I will talk about those fallacies I think most relevant to sampling arguments. Some of them will also be important in other contexts, while others will only be important when we specifically discuss sampling. I will discuss six specific fallacies that I think are relevant here. They are Inadequate Sample, Obsolete sample, Dependent Sample, Anecdotal Evidence, Begging the Question and Red Herring.The first four are fallacies that are specific to sampling arguments, the last two are general fallacies that we will see again and again as we go on.

**Inadequate sample** occurs when the population
clearly has **not****
**been shown to be so evenly mixed that a sample of this size can
be reasonably assumed to properly represent the population. (Remember that
1% can sometimes be big enough while 90% (or more) can sometimes be too
small.)

Imagine that 143 countries are represented on the moon. In that case, a ten-student sample will miss at least 133 of those countries. This means that a sample needs to be at least 143 students to have a hope of being adequate, and we would probably want about 300 to have anything like a reasonable sample. (Key fact: There's about 140 different countries.)

In 1843, 35% of all American families owned at least one buggy whip, that means that there's a 35% chance that there's a buggy whip in your house.

Considering that Americans almost completely stopped driving horse-drawn buggies once automobiles became widely available, information from when buggies were widely used is not going to represent present transportation related realities. (Key facts: Buggy whips are only needed by people who drive buggies, which are drawn by horses, and almost nobody uses horse-drawn transport nowadays.)

Did you know that they recently held a school assembly where they publicly interviewed 20 randomly chosen graduates of the schools Substance Control and Abuse Rejection Enterprise program, and 100% of those SCARE graduates reported that they've

Considering that drugs are illegal, and that a student who publicly admits to having tried drugs is going to be in a lot of trouble, it wouldn't be surprising if some or all of those students were lying. (Key fact: people tend to give answers that please the questioners, especially if the questioners have power over them.)

This too counts as a counter argument. If my analysis turns out to be bad, then it's a bad counter argument. But it's still a counter argument, whether it's good or not.

Handgun Control, Inc. faked statistics on gun violence. That proves all gun-control activists are liars.

That Mensa member tried to murder the people next door with thallium, and wrote snotty articles about it in the Mensa newsletter. That proves that all smart people are evil.

Of course America was as deeply involved in witch burning as Europe was. Didn't you hear about the Salem Witch Trials?

It's true that Aylin gives a very

By the way, did you notice that Keagan's argument relied on a claimed

I was once caught in a dispute with a person who obviously considered himself intelligent, educated and reasonable. At one point, this person made a sweeping and controversial generalization based on no evidence whatsoever. I asked him if he could back this up. His response was "let me validate with examples," by which he meant he was going to give me a series of cases each of which would be

If you look carefully at Samantha's arguments, you can see two distinct logical errors. The first is, of course, that she is "supporting" her claim with a set of anecdotes that she chooses herself rather than by properly collected and interpreted independent evidence. The second is that she's not

The fallacy of

Apart from the fact that Red Herring is a *very* common fallacy,
I mention it here because people often attack sampling arguments on the
basis of sample age, sample size or sampling bias when these issues are
completely irrelevant to the strength of the argument. Therefore, an
arguer commits **red herring** if:

1. His criticism of a generalization is based on sample age when we have
*no* reason to think that either the population or the sample has
changed since the sample was taken.

2. His criticism of a generalization is based on sample size when we
have *no* reason to think that this particular sample is too small
for this particular population.

3. His criticism of a generalization is based on a bias in the sampling
method when we have *no* reason to think that *this*
particular bias has anything to do with the feature we're testing for.

Understanding Red Herring depends on understanding the concepts of "** salience**"
and "

1. In

2. What kind of statements are sampling arguments generally used to support?

3. In your own words, what are "generalizations?"

4. What do sampling arguments always start with?

Suppose that someone argues that 78% of all Irish people are Catholics because he has surveyed Irish hurling players, who make up 9% of the Irish people, and found out that 78% of Irish hurling players are Catholics.

5. What is the premise in this argument?

6. What is the conclusion in this argument?

7. What is the population in this example?

8. What is the sample in this example?

9. What is the feature in this example?

10. What is the fact (or evidence) in this example?

11. If we

12. If we

13. Does the fact that a sample is

14. Does the fact that a sample is

15. Does the fact that a sample is

16. In your own words, what is "population structure?"

17. In your own words, what is the first thing you think about when you consider sample size?

18. In your own words, what is the second thing you think about when you consider sample size?

19. In your own words, what is the first thing you think about when you consider sample age?

20. In your own words, what is the second thing you think about when you consider sample age?

21. In your own words, what is the first thing to think about when you consider the randomness or nonrandomness of a sample?

22. In your own words, what is the second thing to think about when you consider the randomness or nonrandomness of a sample?

23. In your own words, what is a "key fact?"

24. In your own words, define "inadequate sample."

25. In your own words, define "obsolete sample."

26. In your own words, define "dependent sample."

27. In your own words, define "anecdotal evidence."

28. In your own words, define "red herring."

29. What is "salience?"

30. What is "relevance?"

31. In your own words, define "begging the question."

32. In your own words, define "validation by examples."

33. How is "validation by examples" related to the other fallacies in this chapter?

34. What are the three different ways an arguer can commit red herring in criticizing sampling arguments?

2. Sampling arguments are generally used to support generalizations.

3. "Generalizations" are general statements that make claims about

4. Sampling arguments always start with a sample taken from some larger population.

5. The premise is "78% of Irish hurling players are Catholics."

6. The conclusion is "78% of all Irish people are Catholics."

7. The population is Irish people.

8. The sample is Irish hurling players.

9. The feature is the proportion of Catholics.

10. The fact is "78% of Irish hurling players are Catholics." (This time it's the same as the premise.)

11. If we

12. If we

13. A sample being

14. A sample being

15. A sample

16. "Population structure" is the way different parts of a population are arranged relative to each other.

17. How large is the sample

18. How is the population

19. How quickly and in what ways do the things in this population and sample change over time?

20. Given this rate of change, do we have a reasonable guarantee that the sample is still representative of the population?

21. What are all the characteristics of particular fraction of the population picked out by this method?

22. How are the characteristics of this fraction related to the

23. A "key fact" is something about an argument that

24. "Inadequate sample" is when a population is

25. "Obsolete sample" is when a population is

26. "Dependent sample" is when the sampling method is somehow related to the feature.

27. "Anecdotal evidence" is when people talk about a few selected cases rather than giving a properly conducted sample.

28. "Red herring" is when someone talks about something that is salient but not relevant.

29. "Salience" is when something sticks out and

30. "Relevance" is when something actually matters, even if it doesn't immediately seem important.

31. "Begging the question" is when someone

32. "Validation by examples" is when someone gives a set of anecdotes and just

33. "Validation by examples" is a combination of begging the question and anecdotal evidence.

34. An arguer can commit red herring by saying a sample is too old when it isn't, saying a sample is too small when it isn't, or saying a sample is too nonrandom when that doesn't matter.

Copyright © 2013 by Martin C. Young