This is How Easy It Is to Lie With Statistics

00:18:54
https://www.youtube.com/watch?v=bVG2OQp6jEQ

Summary

TLDRThe video examines the profound impact of statistics on marketing, criminal justice, and public health. It narrates the journey of Target's algorithm that predicted customer pregnancies based on shopping habits, leading to increased sales. It underscores famous court cases, notably that of Sally Clark, where statistical evidence misled juries, resulting in wrongful convictions. The discussion includes the pitfalls of interpreting statistics, including the prosecutor's fallacy and the dangers of misleading data representation in media and advertising. Ultimately, it highlights the complexity of using statistics responsibly.

Takeaways

  • 📊 Statistics in marketing can boost sales significantly.
  • 👶 Target's algorithm predicted pregnancy based on shopping habits.
  • ⚖️ Misuse of statistics can lead to wrongful convictions.
  • 👩‍⚖️ Sally Clark's case exemplifies the dangers of statistical misuse.
  • 📈 Misrepresentation of data can mislead public perception.
  • 🔍 Correlation does not always mean causation.
  • ⚖️ The prosecutor's fallacy can distort the judicial process.
  • 🔣 Ethical implications arise in the use of statistics.
  • 📉 Statistics can change perceptions about health risks.
  • 📌 Data representation matters in media and advertising.

Timeline

  • 00:00:00 - 00:05:00

    In 2002, statistician Andrew Pole was tasked by Target to develop an algorithm that identifies pregnant customers based on their shopping patterns. He discovered trends such as increased purchases of lotions and vitamins, allowing Target to target coupons at the right times, even before customers confirmed their pregnancies. However, when Target's marketing led to a father receiving baby product coupons for his daughter, who was pregnant, it raised privacy concerns about how the company obtained its information.

  • 00:05:00 - 00:10:00

    The power of statistics in legal cases is illustrated with the case of Janet Collins and her husband. They were wrongfully convicted of robbery based on statistical evidence suggesting a low probability of finding an innocent couple matching witness descriptions. This approach misrepresented the true probabilities and led to a guilty verdict. Similarly, in the case of Sally Clark, the reliance on statistical probabilities of infant deaths being due to SIDS led to her conviction for murder, despite significant evidence undermining that assumption.

  • 00:10:00 - 00:18:54

    Statistics can be misleading, as seen with the Colgate advertisement claiming '80% of dentists recommend Colgate,' which lacked context about how dentists recommend multiple brands. Moreover, a UK warning about birth control pills indicated a 100% increase in blood clots without revealing the true rarity of the event. Examples of correlation versus causation demonstrate how easily data can be misinterpreted, such as the assumption that more violent TV causes violent behavior in children without considering other factors. Misuse of statistics often distorts the truth and overlooks the importance of context.

Mind Map

Video Q&A

  • How did Target identify pregnant customers?

    Target used statistical algorithms to analyze shopping patterns and uncover common behaviors indicative of pregnancy.

  • What is the significance of the case involving Sally Clark?

    Sally Clark was wrongfully convicted based on misleading statistics regarding the probability of two children dying from SIDS.

  • What is the prosecutor's fallacy?

    It is the assumption that the probability of a suspect being innocent is the same as the probability of random characteristics matching an innocent person.

  • How can statistics be misrepresented in ads?

    Statistics can be selectively presented or framed in a way that misleads the audience, such as showing percentages without context.

  • What does 'correlation does not imply causation' mean?

    It means that just because two variables are correlated does not mean one causes the other.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:00
    Back around 2002 target came to a statistician with a question, in which is the answer
  • 00:00:04
    could potentially make the company millions of dollars. They asked, "using only computers
  • 00:00:09
    can you determine which customers are pregnant even if they don't want us to know?" and
  • 00:00:15
    From then on statistician Andrew Pole was in search of an algorithm to do just that
  • 00:00:20
    What he did was analyzed the shopping patterns of expectant mothers and noticed some common behaviors like an increase in lotion purchases
  • 00:00:28
    Loading up on vitamins and more stuff that I know nothing about and he used this information
  • 00:00:33
    To not only determine which customers were likely pregnant
  • 00:00:36
    But what their expected due date was and after developing his mathematical model the statistician had a list of hundreds of thousands of women
  • 00:00:44
    who were likely pregnant along with their expected due date and what trimester they were in and
  • 00:00:49
    From then on target could send coupons at just the right time over the next several months and even after the baby was born
  • 00:00:56
    Now, although target was cautious about following secrecy laws. It still might turn women away
  • 00:01:01
    if all of a sudden they started getting coupons like cribs and diapers and other related items when they didn't in fact
  • 00:01:06
    Tell the company that they were pregnant
  • 00:01:08
    So what target did was just sprinkle these items in along with some other unrelated products when coupons arrived so it would seem more natural
  • 00:01:16
    But about a year after creating this algorithm something happened though, and this is where it gets interesting
  • 00:01:21
    One day a man walked into a Minnesota Target demanding to see a manager
  • 00:01:25
    He was very angry
  • 00:01:26
    and apparently what had been going on was target was sending coupons for things like diapers and
  • 00:01:32
    Cribs and other related items to this guy's high school daughter and he was very upset about this
  • 00:01:37
    He was saying things like are you guys trying to encourage her to get pregnant?
  • 00:01:40
    And the manager didn't really know what was going on
  • 00:01:43
    He of course apologized and a few days later the manager called the dad back to apologize again
  • 00:01:49
    But this time the dad wasn't so much angry but a little more embarrassed
  • 00:01:53
    I think you guys know where this is going. on the phone The dad said I in fact owe you an apology
  • 00:01:59
    There's been some things going on around here that haven't been fully aware of and in fact
  • 00:02:03
    My daughter is pregnant and she's due in August
  • 00:02:06
    So yes this statistical algorithm figured out that this girl was pregnant before her dad even knew about
  • 00:02:12
    That right there is the power of statistics and we're just getting started. In
  • 00:02:17
    1964 an elderly woman was walking home from grocery shopping when she was all of a sudden pushed to the ground and had her purse stolen
  • 00:02:24
    Now she was able to get a glimpse of the thief and saw a blonde woman in a ponytail who then fled the scene
  • 00:02:30
    Then there was also a man nearby who heard the screaming
  • 00:02:33
    And saw the woman run into a yellow car that was driven by a black man who had a beard and a mustache
  • 00:02:39
    And yes
  • 00:02:40
    This is all needed for the story by the way. a few days after the incident police ended up catching
  • 00:02:45
    Janet Collins and her husband Malcolm who matched all the descriptions given by the witnesses
  • 00:02:50
    They were then charged with the crime and put in front of a jury
  • 00:02:53
    now since most of the evidence that could be provided for this was just from the victim and the man who saw the event and
  • 00:02:59
    what they both witnessed they brought in a mathematician as well to help prove the guilt of this couple. This mathematician calculated the
  • 00:03:05
    Probability of just randomly selecting a couple that was innocent
  • 00:03:09
    But also happened to share all these characteristics that were observed by the witnesses. Based on data
  • 00:03:14
    The mathematician came up with these numbers and assuming independent events
  • 00:03:18
    We can multiply them all together to find the joint probability that they all happened to apply to an innocent couple
  • 00:03:24
    Turns out there was less than a 1 in 12 million chance that this random couple who just happen to fit all those descriptions
  • 00:03:32
    Was innocent, so the jury returned a guilty verdict
  • 00:03:35
    This is actually a very famous case in terms of using statistics in the courtroom. Another quick example
  • 00:03:41
    is that of Sally Clark who was found guilty of murdering her two infant son's back in the 90s. Her first son died suddenly in
  • 00:03:49
    1996 due to unknown causes so it was assumed it was a case of SIDS, or sudden infant death syndrome
  • 00:03:55
    But about a year later she gave birth to her second son
  • 00:03:58
    Who was then found dead 8 weeks after his birth again of unknown causes
  • 00:04:03
    So after this happened and it was reported, the police ended up arresting her and her husband on suspicion of murder
  • 00:04:09
    During the trial a pediatrician professor
  • 00:04:11
    Testified that the chance of two infants dying due to SIDS at around the same time relative to their birth
  • 00:04:17
    Was about 1 in 73 million and again one in 73 million is way beyond a reasonable doubt
  • 00:04:24
    so it was more likely this was an event of shaking or smothering or whatever and
  • 00:04:28
    Sally Clark was found guilty and sentenced to life in prison
  • 00:04:31
    So you can see statistics has a lot of power in our world whether it be advertising
  • 00:04:37
    Criminal cases and so on but what's also really powerful and way easier to do is lie
  • 00:04:43
    mislead and misinform
  • 00:04:44
    Using statistics and you don't even have to use wrong data to do this
  • 00:04:48
    I mean, I've already done that multiple times in this video. I'm going to talk about that soon
  • 00:04:53
    so yes this next part for all you people who comment on videos before watching the entire thing because there is more I'm going to
  • 00:05:00
    Say but let's start off light though. In
  • 00:05:02
    2007 in the UK an ad was released for Colgate that claimed the classic "80% of dentists recommend Colgate"
  • 00:05:09
    It wasn't long before the advertising standard authority of the UK ordered they abandon this claim because although it was true
  • 00:05:16
    They knew people would not really understand what it meant
  • 00:05:19
    The study that was done allowed dentists to answer with more than one toothpaste
  • 00:05:23
    So like dentist one might say I recommend Colgate, crest, oral-b
  • 00:05:28
    Dentist two might say Colgate, crest, or Sensodyne and similar for dentists three, four, and five
  • 00:05:34
    In this scenario 80% of dentists do recommend Colgate. That is true.
  • 00:05:39
    But 100% of dentists recommend crest in this hypothetical and 80% recommend oral B as well
  • 00:05:45
    All of these numbers are factual and you can make an advertisement with any of these claims
  • 00:05:49
    But again, we know people would not understand what they really meant
  • 00:05:53
    now for this next part I'm going to ask you guys a question. If
  • 00:05:56
    Let's say hypothetically the high school dropout rates of a certain country go from 5% one year to 10%,
  • 00:06:03
    Is that a 5% increase or 100% increase?
  • 00:06:07
    Because if you're at 5% and you add five you get to 10% obviously
  • 00:06:11
    But if you're making let's say $5 an hour and you get a 100% raise you'll be at $10 an hour
  • 00:06:18
    So which one of these is it and I'm sure many of you are saying that seems like a pointless question
  • 00:06:22
    Yes
  • 00:06:23
    You do add five to get to 10, but the physical amount of people who are dropping out would be increasing by 100%
  • 00:06:30
    Well in the spirit of this video
  • 00:06:31
    Let's ask something else. Which one of these paints a more accurate picture
  • 00:06:35
    Like if one of these was posted in the New York Times or on Forbes or whatever
  • 00:06:39
    Which one tells the people more about what's going on?
  • 00:06:42
    And I'm actually curious what you guys have to say about that because I think we're gonna hear different answers from different people
  • 00:06:47
    But for this next part, I think we're all going to agree
  • 00:06:49
    What if hypothetically the dropout rates are one in a million people and then the next year they go to two in a million people
  • 00:06:56
    So that's .0001% to .0002%, a difference of again .0001
  • 00:07:03
    But that's also a 100% increase in the physical amount of people dropping out
  • 00:07:08
    So which one of these two headlines do think paints a better picture?
  • 00:07:12
    Well again, we might hear different answers
  • 00:07:14
    But I think we can agree the 100% makes it seem like a worse problem than it is
  • 00:07:19
    Like if five people in the whole nation are dropping out and the next year ten people do I
  • 00:07:24
    Wouldn't necessarily call that an epidemic just yet
  • 00:07:26
    Now using numbers like this in the misleading way is actually not hypothetical because it happened a few decades ago in the UK
  • 00:07:32
    But not with college dropout rates, but rather a birth control pill. In
  • 00:07:37
    1995 the UK Committee on safety of medicines issued a warning that a certain type of birth control pill
  • 00:07:42
    Increased the risk of life-threatening blood clots by 100%
  • 00:07:46
    What that actually meant was that with the older second-generation pill about 1 in 7,000 women developed a blood clot
  • 00:07:52
    Whereas with the new pill about 2 in 7,000 women developed a blood clot
  • 00:07:56
    So yes, the physical amount of women receiving a blood clot did go up by a hundred percent. That is true
  • 00:08:01
    But if we dig just a little deeper we see with the older pill is about .014%
  • 00:08:07
    Whereas with the new pill it was about .028%, which hardly seems worthy of a breaking news alert
  • 00:08:13
    But articles were posted about this misleading statistic and as a result naturally tens to hundreds of thousands of women stopped taking this birth control
  • 00:08:21
    pill
  • 00:08:21
    fast forward one year and that scare was blamed for
  • 00:08:25
    13,000 unwanted pregnancies many of which were teenage pregnancies... a lot of teenage pregnancy stories in this video... moving on
  • 00:08:32
    Do you guys actually know head lice is good for your health?
  • 00:08:36
    Seems pretty stupid
  • 00:08:37
    Right, but people actually thought this at one point and that brings us to the part of this video titled correlation or causation
  • 00:08:44
    or both
  • 00:08:45
    Remember, it's usually very easy to determine that two things are correlated from a statistical test but causation is a completely different thing
  • 00:08:52
    That isn't so easy to spot. yet
  • 00:08:54
    People are very quick to assume that A causes B just because A is correlated to B
  • 00:08:59
    Sometimes the logic can be stupidly obvious like fast-moving wind turbines are positively correlated to fast wind. As one goes up
  • 00:09:06
    The other goes up, but does that mean that fast-moving wind turbines cause fast wind?
  • 00:09:11
    Well, obviously not it's the other way around
  • 00:09:14
    But in many cases it isn't this obvious like what if I said that kids who watch more violent TV shows are more violent themselves
  • 00:09:21
    Does this mean that those shows cause kids to be more violent?
  • 00:09:24
    I mean that could be possible and definitely would be an immediate thought for many people
  • 00:09:28
    But what if kids who are more violent just happen to watch more violent TV shows that also seems perfectly reasonable
  • 00:09:34
    So we can't just jump to conclusions too early
  • 00:09:37
    Even though that's what many people would probably do. Or in the Middle Ages European
  • 00:09:41
    Saw that people who had head lice were normally healthy. Whereas people who were sick rarely ever had head lice. as a result
  • 00:09:47
    They assumed that lice would cause people to be more healthy when in reality head lice is very sensitive the body temperature
  • 00:09:53
    So people had a fever or anything like that the head lice would find another host
  • 00:09:58
    Then on the subject we have the third cause fallacy where two correlated events
  • 00:10:03
    Actually, don't cause each other at all, but it's rather a third thing causing both. For example ice cream sales
  • 00:10:08
    Do not cause an increase in heat strokes nor the other way around
  • 00:10:11
    Even though they are correlated. Hot weather is instead the cause of both of them. Or for the past several decades
  • 00:10:18
    Atmospheric co2 has increased along with obesity levels. So does one cause the other
  • 00:10:22
    Well, no richer populations tend to eat more and also produce more co2
  • 00:10:28
    And sometimes it can just be really unclear what's causing what.
  • 00:10:31
    Like a while back they found that students who smoke cigarettes get lower grades and that could mean smoking causes lower grades
  • 00:10:38
    Or maybe it means that getting bad grades causes smoking... Maybe the added stress that comes along with lower grades
  • 00:10:45
    Increases the chance that a student will pick up that first cigarette. That also seems like a reasonable explanation
  • 00:10:51
    Or it could be a variety of third factors is actually responsible for both
  • 00:10:56
    So even when looking at a statistical test with accurate numbers, it's pretty crazy how far you can still be from the truth?
  • 00:11:03
    Next we have a story that will probably have people making assumptions really early in the 1970s
  • 00:11:08
    Someone noticed that Berkeley was accepting 44% of male applicants to their graduate school
  • 00:11:13
    But only 35% of female applicants. Now right there half the Internet's like well say no more...
  • 00:11:31
    But only 35% of female applicants
  • 00:11:59
  • 00:12:04
    That was an actual unedited clip of everyone on the Internet
  • 00:12:09
    Now these numbers that we saw are true
  • 00:12:12
    But very misleading when you look at how male and female applicants applied to each program within the Graduate School the assumed bias
  • 00:12:18
    Not only goes away but kind of flips. Look closely here in this row
  • 00:12:23
    You see there was a high acceptance rate for the program
  • 00:12:26
    In fact women had a much higher acceptance rate, but still overall it was high for everyone
  • 00:12:31
    however way more men applied to this one
  • 00:12:34
    Whereas more women applied to these programs down here with much lower acceptance rates
  • 00:12:39
    So since a higher percentage of women were applying to these programs with higher rejection rates
  • 00:12:43
    The overall acceptance of women would be lower guaranteed even though they are in fact slightly favored across a couple of departments
  • 00:12:50
    so either of these headlines could be published with the necessary stats to back them up and
  • 00:12:56
    All you gotta do is pick which one you want to use
  • 00:12:59
    toss
  • 00:12:59
    the other
  • 00:13:00
    Throw that into an article put it in bold right on the top, put the cleverly selected statistics down below it to back it up,
  • 00:13:07
    And you've got yourself a story
  • 00:13:10
    This here was an example of Simpsons paradox
  • 00:13:13
    Where looking at data as a whole tells a totally different story than grouping the data appropriately which I'm sure many of
  • 00:13:19
    You know, but I had to include it here
  • 00:13:21
    You guys remember the story of the blonde woman who robbed the elderly lady
  • 00:13:25
    Well, like I said, this is a famous case but not for the use of statistics
  • 00:13:30
    But rather the misuse of statistics in the courtroom, this was a classic example of the prosecutors fallacy
  • 00:13:37
    now this fallacy comes up when people assume that the probability of A given B is the same as the probability of B given A
  • 00:13:44
    Which I'm sure many of you know is not usually true from this equation.
  • 00:13:48
    Like if I said behind this curtain is an animal with four legs, that's the given. What is the chance that it's a dog?
  • 00:13:56
    Well, you probably do some thinking like well, it could be a dog, it could be a cat, it could be a cheetah
  • 00:14:01
    It could be a lot of other things and if you had to come up with a number you might say one in a hundred
  • 00:14:05
    One in a thousand or whatever, but if instead I said behind this curtain is a dog
  • 00:14:10
    That's the given, what's the chance that it has four legs?
  • 00:14:14
    Well, that's almost a guarantee because most dogs have four legs
  • 00:14:18
    So you see switch the given and the question at hand and the probability can change by a lot
  • 00:14:24
    So now let's look at what I said earlier
  • 00:14:27
    Turns out there was less than a 1 and 12 million chance that this random couple who just happened to fit all those descriptions
  • 00:14:34
    Was innocent, so the jury returned a guilty verdict.
  • 00:14:38
    This here was wrong
  • 00:14:39
    The stats actually showed us that given an innocent couple the odds that they fit the descriptions was one in 12 million
  • 00:14:47
    But then I said what the jury had also assumed that if you switch the given and the question at hand
  • 00:14:53
    The probability stays the same which we just saw can be very wrong
  • 00:14:58
    This left side should make sense like if I just grabbed a random couple out of a mall
  • 00:15:03
    That's the given there was a very small chance all of these would apply to them
  • 00:15:07
    But this is the false assumption that is the prosecutors fallacy
  • 00:15:12
    We're told or given
  • 00:15:14
    Hey
  • 00:15:14
    Here's a couple that fits all those descriptions. If maybe ten people in the entire city fit all of those given a random one
  • 00:15:21
    There's a 1 out of 10 chance that they're guilty or a 9 in 10 chance of being innocent. Not one in 12 million.
  • 00:15:29
    And remember Sally Clark who was found guilty of murdering her two children
  • 00:15:33
    This is also a famous case of the misuse of statistics
  • 00:15:36
    It turns out bacterial tests had actually been withheld that would reveal more specific information than a simple multiplication of two probabilities
  • 00:15:43
    Which didn't tell the full story at all.
  • 00:15:46
    Like it assumed that the two events were independent of each other when genetic or environmental factors could have definitely been at play
  • 00:15:53
    Like I said, sally was found guilty and sentenced to life in prison
  • 00:15:56
    But she only served three years when the convictions were finally overturned in early 2003. Up until then though
  • 00:16:03
    Sally Clark was widely criticized in the press as a child murderer
  • 00:16:06
    And she was never able to recover mentally from the false accusation. A few years after her release
  • 00:16:11
    She developed psychiatric problems and died in her home from alcohol poisoning in 2007
  • 00:16:17
    I'm gonna repeat that for everyone who didn't follow it. A woman lost two of her children due to natural causes
  • 00:16:24
    was accused of murdering them, was put on trial and found guilty due to a misuse of
  • 00:16:29
    Statistics, spent 3 years in prison, and even after her release was not able to recover mentally and died
  • 00:16:36
    Just about four years later
  • 00:16:37
    If you guys don't find that story
  • 00:16:39
    Insane then I don't know what to tell you
  • 00:16:42
    And actually the result of this case prompted the Attorney General to order a review of hundreds of other similar cases
  • 00:16:48
    Now because I don't want that to be where we end this video
  • 00:16:50
    Let's look at one more classic misuse of Statistics
  • 00:16:54
    This one has to do with how data is represented and it often involves bar graphs that don't have zero as their baseline
  • 00:17:01
    For example FoxNews one showed a chart detailing the numbers what would happen if the Bush tax cuts expired?
  • 00:17:07
    Do you guys see a problem?
  • 00:17:09
    Yeah
  • 00:17:09
    It starts at 34 percent at the bottom making a not even 15 percent increase look like a few hundred percent in
  • 00:17:16
    Reality the chart should look like this
  • 00:17:19
    Or take the Terri Schiavo case that occurred about two decades ago
  • 00:17:23
    Which involved a debate of whether a feeding tube should be removed from a woman in an irreversible vegetative state
  • 00:17:29
    During that time CNN posted this graph detailing which political parties agreed with the courts
  • 00:17:34
    It seems like Democrats supported the decision significantly more but because the baseline is not zero
  • 00:17:40
    It appears way different than it should, this is what the actual graph would look like
  • 00:17:45
    Or in 2015 the White House published a tweet about the increase in students receiving high school diplomas with an extremely misleading graphic
  • 00:17:53
    They made around a ten percent increase look like nearly 200%. Or in music
  • 00:17:59
    There was a chart released showing views between top artists that made drake look like he was ahead by a large margin
  • 00:18:04
    When in fact it was about a five percent lead
  • 00:18:08
    Now I'm guessing the comments on this video will be rather interesting
  • 00:18:11
    But remember these were cherry-picked events and it's not like everything. I said paints the full picture either
  • 00:18:16
    I just find it interesting that these numbers can change the way we think about a person
  • 00:18:20
    They can peek into some of the most intimate moments of our lives based on our grocery list
  • 00:18:24
    They can make very trivial events seem very serious and vice versa
  • 00:18:27
    You don't even need use wrong numbers for this
  • 00:18:30
    But hopefully this show just how not cut and dry math and statistics can be in the real world outside of a school setting
  • 00:18:37
    Especially and with that I'm gonna end that video there if you guys enjoyed be sure to LIKE and subscribe
  • 00:18:41
    Don't forget to follow me on Twitter and try them in facebook group for updates on everything hit the bell if you're not being notified
  • 00:18:47
    Share comments and all those other YouTube boards and I will see you guys in the next video
Tags
  • statistics
  • Target
  • advertising
  • criminal justice
  • correlation
  • causation
  • misleading data
  • public health
  • Sally Clark
  • prosecutor's fallacy