Yuma 4×4

Media and Communications

Sampling Methods and Bias with Surveys: Crash Course Statistics #10

Sampling Methods and Bias with Surveys: Crash Course Statistics #10


Hi, I’m Adriene Hill and welcome back to
Crash Course Statistics. In our last episode we talked about how we
use experiments to imitate having two parallel universes to test things. But sometimes you can’t do certain experiments
without becoming an all-powerful and evil dictator, and since it’s statistically unlikely
that any of you are evil dictators, today, we’ll explore those methods. Like we mentioned at the beginning of the
series, you’re not always able to answer the questions you really want to answer using
statistics. For example, it would be great to experimentally
test whether getting married increases your lifespan, but you can’t randomly assign
some people to be married and force another group to be single. Not only would that be difficult to enforce,
it would also be pretty unethical, though I suppose you being evil takes care of that
particular concern. Similarly we can’t assign someone to be
a twin, or a Democrat, or a smoker. But that doesn’t mean we should just give
up and stop trying to find out more about these topics. Not at all. Instead we just need a different method to
collect data. Enter Non-Experimental methods. INTRO One of the most common non-experimental method
is the survey. From user experience surveys on websites,
to political polls, to health questionnaires at the doctor’s office, you’ve probably
taken hundreds of surveys in your lifetime. There are two things that can make or break a survey: the questions, and who the researcher gives the questions to. The goal of a survey is to get specific information. Say you’re walking your dog in a local park,
and someone approaches you and asks you to take a survey on local businesses in your
town. When you look at the questions you notice
that none of them are about local businesses, instead you find yourself answering questions
about your politics and religious beliefs. Unless the surveyor was lying to you about
their purposes, this is not a very good survey….It’s also not a very good lie
A survey should measure what it claims to measure. It might seem obvious that having only unrelated
questions on your survey is problematic, and there are even more subtle ways a question
can be biased. Let’s take a look at a few questions from
a health survey you might take at a doctor’s office. The first question asks you how often you
exercise: never, less than 30 minutes a week or 30 minutes a day. So what do you answer if you exercise for
half an hour twice a week? Or if you’re on swim team and exercise for
at least an hour a day? And does walking count as exercise? Multiple choice questions that don’t offer
all possible options and/or an “Other” option can cause respondents to either skip
the question, or feel forced to choose an answer that isn’t accurate. Claims made using these questions aren’t
as strong as they could be if people were offered a full range of choices. The next question asks you “Answer yes or
no: I don’t smoke because I know it’s damaging to my health” this is a leading
question since the wording leads to towards the quote “desired” answer. This is especially effective when a question
deals with sensitive issues like smoking, politics, or religion. People answering the questions want to be
seen in a positive light, and so they tend to give the answer they think is “appropriate”. While having people fill surveys out anonymously
by themselves can help, it can sometimes be the case that respondents don’t want to
admit things–even to themselves–that are socially undesirable. In general terms, good survey questions are
worded in a neutral way such as asking “how often do you exercise” or “describe your
smoking habits” instead of using wording or options that push survey takers in a certain
direction. And while your doctor wouldn’t…or shouldn’t…do
this…sometimes groups purposely use biased questions in their surveys to get the results
that they want. Apparently, back in 1972, Virginia Slims conducted
a poll asking respondents if they would agree with the statement: “There won’t be a
woman President of the United States for a long time and that’s probably just as well.” Not a well-written question. Biased questions can be more subtle…and
can lead to skewed reports of very serious things like sexual assault, or mental health
conditions. It’s important to always look for biased
questions in surveys, especially when the people giving the survey stand to benefit
from a certain response. Even when researchers have created a non-biased
survey, they still need to get it into the right hands. Ideally, a survey should go to a random sample
of the population that they’re interested in. Usually this means using a random number generator
to pick who gets the survey. We do Simple Random Sampling so that there’s
no pattern or system for selecting respondents and each respondent has an equal chance of
being selected. For example, telephone surveys often use Random
Digit Dialing which selects 7 random digits and dials them. When someone picks up, they’re asked to
take a survey. But here’s where we hit our first issue. If people aren’t forced to respond to the
survey, we might experience something called Non-Response Bias in which the people who
are most likely to complete a survey are systematically different from those who don’t. For example, people with non-traditional working
schedules like retirees, stay at home parents, or people who work from home might be more
likely to answer a middle-of-the-day phone survey. This is a huge problem if those groups are
different than the population as a whole. If your survey was on health insurance plans,
or political opinions, it’s likely that these three groups would have different opinions
than the population, but they represent the majority of survey responses, which means
your data won’t represent the total population very well. This is also related to Voluntary Response
Bias in which people who choose to respond to voluntary surveys they see on Facebook…or
Twitter… are people who again, are different than the broad population. This is especially true with things like customer
service surveys. People who respond tend to have either very
positive or very negative opinions. See the comment section below. The majority of customers with an average
experience tend not to respond because service wasn’t noteworthy. Wait. Does that mean I’m not noteworthy? Another source of bias is just plain underrepresentation. If a group of interest is a minority in the
population, random sampling paired with response biases might mean that that minority isn’t
represented at all in the sample. Let’s say there is a city where 5% of the population
is single mothers, it’s entirely possible that the sample will contain no single moms. To overcome these issues, we have a couple
options. We could weight people’s responses so that
they match the population (like, counting the few single mothers who do respond multiple times so that they count for 5% of the total sample). But, this can be problematic for the same
reasons that response bias is problematic. If the few single mothers who respond don’t
represent all single mothers, our data is still biased. In a 2016 LA Times/USC political tracking
poll, a 19-year-old black man was one of 3,000 panelists who was interviewed week after week
about the upcoming presidential election. Because he was a member of more than one group
that was underrepresented in this poll, his response was weighted 30x more than the average
respondent. According to the New York Times, his survey
boost his candidate’s margins by an entire percentage point. Stratified Random Sampling is another option. It splits the population into groups of interest
and randomly selects people from each of the “stratas” so that each group in the overall
sample is represented appropriately. Researchers have used stratified sampling
to study differences in the way same-sex and different-sex couples parent their kids. They randomly select people from the same-sex
parenting group and… randomly select people from a different-sex group of parents to make
sure that they’re well represented in the sample. Another issue is that getting surveys to people
can be expensive. If a cereal company wants to see how families
react to their new cereal, it would be costly to send some cereal to a random sample of
all families in the country. Instead they use Cluster Sampling which create clusters (not Honey Nut Clusters) that are naturally occuring (like schools, or cities) and randomly select a
few clusters to survey instead of randomly selecting individuals. For this to work, clusters cannot be systematically
different than the population as a whole and and they should about equally represent all groups. Issues can also arise when the population
being surveyed is very small or difficult to reach, like children with rare genetic
disorders, or people addicted to certain drugs. In this case, surveyors may choose to not
use randomness at all, and instead use Snowball Sampling. That’s when current respondents are asked
to help recruit people they know from the population of interest… since people tend
to know others in their communities and can help researchers get more responses. And note that these sampling techniques can
and are used in experiments as well as surveys. There are also non-experimental data collection
methods like a Census. A Census is a survey that samples an ENTIRE
population. The United States conducts a Census every
10 years, with the next one scheduled to be done in 2020. It attempts to collect data from every.single.resident
of the United States (even undocumented residents, and homeless residents). As you can imagine, this is hard, and
it is not without error. In Medieval Europe, William the I of England
conducted a census in order to properly tax the people he had conquered. In fact a lot of rulers tended to use censuses
to know just how much money they should be demanding. Until the widespread availability of computers,
the US census data took almost 10 years to collect and analyze. Meaning that the data from the last census
wasn’t even available until right before the next census. The length of time it took to complete the
census is part of the reason we even have computers…check out our CompSci series for
more on that. So why collect census data–instead of just
sampling the population? In the US–the Census could cost more than
15 Billion dollars in 2020. There are a lot of reasons. The constitution says we have to, but also
the census provides the truest measure of the population we can get. It minimizes sampling error. It also functions as a benchmark for future
studies. And a census can give researchers really specific
information about small groups of the population–information that might be hard to gather with regular
sampling methods. Doing statistics on Census data is different,
because most statistical inference aims to take a small sample and use it to make guesses
about that population. But with a census we already have data from
the entire population, we don’t need to guess if there are differences, we can just
see them. Analysis on Census data is usually more concerned
with whether differences we see are large enough to make a difference in everyday life,
rather than guessing IF there is a relationship. The census as we said can take years. And entire countries to fund. That doesn’t discount the value of sampling. But we should be cautious…Badly worded polls,
fake polls, and biased polls are common. So are the results of those polls. The statistics-friendly website FiveThirtyEight
put together a great list of advice on how not-to-fall for a fake poll. Among its advice–Ask yourself if it seems
professional. Check to see who conducted the poll–and if
you trust them. See how the poll was conducted. Check out the questions they asked…and who
they asked. If it seems fishy. It probably is fishy. That said, well done surveys are essential. They allow us to get information without all
the trouble of doing an experiment, and since they’re comparatively easy, they’re popular
ways for businesses, countries, and even Youtube channels to collect information. In fact Crash Course Statistics has its own
survey! The Link is in the description. And it takes way less time than
the Nerdfighteria one. I promise. Thanks for watching. I’ll see you next time.

100 thoughts on “Sampling Methods and Bias with Surveys: Crash Course Statistics #10

  1. Hey all, fill out the official Crash Course Statistics Survey: https://bit.ly/2J1zimn And if you don't want to, help us fight non-response bias by filling out anyway! We'll be analyzing this data in future episodes. (All individual data will be kept anonymous.) – brandon

  2. I was asked to take part in a survey where they compared a political candidate to a clown. Definitely a leading poll.

  3. Nicely done video. Data collection is NOT in the best interest of the common citizen. Uphold the Constitution and keep America free.

  4. HELLO GUYSS! Help me to reach 9.7K subscribers on my channel i really appreciate those people who help & particpate to reach my GOAL✔ thanks it really means a lot🇵🇭😄😁💕

  5. This is the video that more people need to see. No more dumb, biased Muricens who listen to fake news and a fake president. XD

  6. Snowball sampling reminded me of a joke!
    What's the difference between a Snowman and a Snowwoman?
    -Snowballs!

  7. The earth is flat there a dome above your head Antarctica surrounds the earth space evolution dinosaurs global warming don't exist the Bible is 100 percent true we are living in the end times the rapture tribulation and the second coming of Jesus Christ is about it take place Jesus said I am the truth the way and the life no one comes the father but threw me repent of your sins believe in Jesus Christ as your lord and savior obey the 10 commandments and be saved

  8. Regarding 538…. I want them to do a story "Why can't 538 turn a profit?" ESPN is supposed to be trying to shop them.

  9. I just want to say that, without fail, each time I click on one of these statistics videos I get that Google ad stating that teams wearing blue are more likely to win. In fact, Google has several versions of this ad and each states a statistics based fact that seems like it was misinterpreted from a data set… right before a video about interpreting data.
    It's too meta for me.

  10. My old employer used to require employees fill out employee satisfaction surveys every year, and the questions were always super leading; in some cases the questions didn't even give an option to show dissent.

  11. It's like that "4 out of 5 dentists recommend [brand]" – The original question was "Which do you recommend, [brand] or not [brushing, flossing, using mouthwash, etc]?"

  12. You've mentioned random numbers a few times. Will one of the episodes cover 'random numbers are hard to get'?

  13. Hello Crashcourse, you are amazing! Awesome and mega cool. You are doing a very good thing!

    Have you thought about adding a few control questions, tasks and case studies? Granted, this is more work but this would make you even "pioneerier" 🙂 May be the community will help?

  14. There's another problem with "I don't smoke because I know it can be damaging to my health" – I would answer "no" to that, not because I smoke, but because I have other reasons for not smoking (I don't like the smell, and I know it's an expensive habit that's hard to give up)

  15. Thank you, a very good description of all possible limitations 😊 I know, that this is the course in statistics, but it would be also interesting to see something on qualitative methods and how two can be combined

  16. This video was remarkable in the fact that i have made a remark with regard to it. But, you know. Not too bad. Not like "woah!" But nice

  17. Also, if this is not too much to ask, can you put sources and references in the description? A textbook section or even some useful links are good.

  18. Voluntary response bias? Does this mean only people who watch this videos are the ones that like this topics? Kinda discouraging.

  19. Adriene: “…and since it’s statistically unlikely that any of you are evil dictators…”
    Me: “aw… dang.” 🙁

  20. Can we get a Crash Course Evolution? With the most common misconceptions about evolution (e.g., that it's completely random) are addressed?

  21. So what your saying is that they are usually bias no matter what so why do this In the first place to be bias.

  22. 538 are hacks and were proven during the 2016 election to be skewing their data both during the primaries and the election. "Does it look professional and is it from someone you trust?" are phrases designed to get you to hand over your reasoning capability to others for reasons not related to the issue at hand.
    The question is never the believeability or professionalism of the person or outlet providing you the survey results, it is always the methodology and integrity of the survey itself.

  23. Surveys are data collection tools, they are not inherently experimental or non-experimental. In fact the most common way to collect data in an RCT is with surveys XD. As a rule of thumb, experimental methods are those where your treatment and control groups are randomly selected, quasi-experimental are those where they are not randomly selected and non-experimental are those where you don't have a comparison group. Surveys can be used in all of them. Still, this series is a great idea, keep the good work!

  24. You know I absolutely love that you have started a crash course on statistics and I watch every video over and over again, but where is the next???? If not anything else, at least mention the date because if you don't send videos every week people will lose interest

  25. A good method I use while pondering is to do analysis on subject matters that are unknown to many or most of the population. Give it a go if you like, you’ll be surprised on how a lot of major functions in our Society got their jump from even the most trivial of things that we ignore or deem common.

  26. The racoon question was tough to answer realistically, I live in Australia, I've never seen a racoon in my life and my likelihood of encountering one is virtually non-existent. That being said, I am Australian, defeating armies of cute dangerous things is pretty common place so here so it's hard to judge.

  27. …you could do what astronomers do and look at marital vs lifespan universally, (In other-words 'think of the universe as your larger experiment and collect your results')—but then again you're talking greco-bacchanalia/roman-saturnalia-wedding-marriage designed for gladiators-and-not-so-gladiators to do battle and die-away at empirically decreed ages…

  28. Okay, hold it. Do you guys have a time machine to know the cost of the U.S. census in 2020??? If so, please send me the blueprints!

  29. I'm not sure how I feel about saying it is possible to write "non-biased questions". Humans seem to be biased whether they admit it or not, and they are writing the survey, no?. For example, "describe your smoking habits" has an implicit cultural bias that smoking can be harmful, even if people assume the risk anyway. But it is certainly a better question than other possible ones. The best way I've seen is to ask more or less the same question different ways.

  30. Ugh, I had to ask a bunch of kids a survey question that was written by someone else. It was "you never take things that don't belong to you" and they had to rate how true it was, and it was the only question in the list that was written with a negative, so i think that people were likely to answer it backwards. there were a lot of other problems with the survey, and i am annoyed that we weren't allowed to modify the survey for future participants as we noticed problems. the terribly-written questions were just not getting any helpful information.

  31. the pizza question was like WAY inaccurate cuz 1) I don't exactly count my pizza and 2) there are quarter slices and there are 1/8 slices of pizza. Which one?

  32. Thank you for the awesome crash course on stats! Going back to grad school and haven't taken stats before yet a two hundred level stats class is part of the curriculum! You are saving me!!! Keep up the great work!

  33. Adrian Hill is really reinforcing some of the general attitudes (be wary of fake polls) encouraged by Crash Course Media Literacy.

  34. i done a few surveys where i just outright trolled and TYT reminded me of one and they had it as canada being enemy after asking all this other information about canada and i gave all positive stuff and enemy was there and i was like come on if you put enemy i am obviously going to click it

    it was great i love trolling on surveys it made cenk laugh (he didnt know he read mine off but it was great and i forgot about doing it until a bit after that video but in the comment section i did tell him if i did that survey i would put enemy because its to hard not to)

  35. Dear CrashCourse,
    I was sad to discover that there was not a CrashCourse Calculus series. Please make a Crash Course series! It would be beneficial to many Calculus students and mathematicians. Please make one!

  36. Adriene, You are not only noteworthy but a witty, funny and a very good presenter as well. I am in love with you after watching all these tutorials. I am gonna keep supporting this channel on patreon. Thank You!

  37. The survey asks how many raccoons could you fight off before they overtook you. But it didn't give an option for all answers (I learned that from this episode!). I wouldn't fight them off. They are my little buddies.

  38. Awesome job showing a man as a stay at home parent! Way to crush gender norms. Men should not made to feel any worst than a woman; having not been or are the bread winner.

  39. Hi ! I have ask repeatedly to be taken of the list !!!
    Now i get 3 or more a day I SCREENSHOT them , I’m making list , so far I have 12 names ! Just started a week ago !!
    I’m Disabled and have young SON can’t afford to NOT answer the phone !!!
    I’d like to know who you shared my information with and take me off you list ????!!!???? Thank you 😊
    The phone company Say’s if i do this and I have repeated calls they can help !

  40. Share the 3rd party Companies you share information with???
    China , India and or Africa !!! Why is this ?? Since your helping the Company?? Don’t they buy $$ your services??

  41. I think 3rd party sharing is the Biggest Problem!!! My research concludes this because my phone can be traced to everything I’m involved with!!! Because I’m a MOTHER and Disabled !!! I’m tired of the phone calls and say it everyday to the person that just lied to me when i ask them , whom I’m speaking with . Ive had 3 Karen’s this week !!!

  42. Hey I use this series in 2019 before my masters stats class to get an idea of what is going on. Thanks for the effort to make it all easy!

  43. Just as a thought, the census is actually not that good, in spite of the cost.

    Lots of humans, lots of human error, etc…

    The recent concerns about people being intimidated by questions on the census being used to punish them (responder bias) and issues like the Jedi Knight census phenomenon (intentional faulty data) are problems inherent to the design.

    This course may praise the census, but that’s not a universally held opinion, and parts of the census are conducted by sampling to much controversy.

  44. You are very much appreciated with your work in explaining stats in a very simple format with excellent examples& understandable to everyone, many times for clarity I revisit your channel, keep up the good work Ms Hill

  45. Kindly add something like this to the description…
    Types of bias in a survey… 1.Non response Bias 2.Voluntary response bias 3.Under-representation

Leave comment

Your email address will not be published. Required fields are marked with *.