00:00:00
Back around 2002 target came to a statistician with a question, in which is the answer
00:00:04
could potentially make the company millions of dollars. They asked, "using only computers
00:00:09
can you determine which customers are pregnant even if they don't want us to know?" and
00:00:15
From then on statistician Andrew Pole was in search of an algorithm to do just that
00:00:20
What he did was analyzed the shopping patterns of expectant mothers and noticed some common behaviors like an increase in lotion purchases
00:00:28
Loading up on vitamins and more stuff that I know nothing about and he used this information
00:00:33
To not only determine which customers were likely pregnant
00:00:36
But what their expected due date was and after developing his mathematical model the statistician had a list of hundreds of thousands of women
00:00:44
who were likely pregnant along with their expected due date and what trimester they were in and
00:00:49
From then on target could send coupons at just the right time over the next several months and even after the baby was born
00:00:56
Now, although target was cautious about following secrecy laws. It still might turn women away
00:01:01
if all of a sudden they started getting coupons like cribs and diapers and other related items when they didn't in fact
00:01:06
Tell the company that they were pregnant
00:01:08
So what target did was just sprinkle these items in along with some other unrelated products when coupons arrived so it would seem more natural
00:01:16
But about a year after creating this algorithm something happened though, and this is where it gets interesting
00:01:21
One day a man walked into a Minnesota Target demanding to see a manager
00:01:25
He was very angry
00:01:26
and apparently what had been going on was target was sending coupons for things like diapers and
00:01:32
Cribs and other related items to this guy's high school daughter and he was very upset about this
00:01:37
He was saying things like are you guys trying to encourage her to get pregnant?
00:01:40
And the manager didn't really know what was going on
00:01:43
He of course apologized and a few days later the manager called the dad back to apologize again
00:01:49
But this time the dad wasn't so much angry but a little more embarrassed
00:01:53
I think you guys know where this is going. on the phone The dad said I in fact owe you an apology
00:01:59
There's been some things going on around here that haven't been fully aware of and in fact
00:02:03
My daughter is pregnant and she's due in August
00:02:06
So yes this statistical algorithm figured out that this girl was pregnant before her dad even knew about
00:02:12
That right there is the power of statistics and we're just getting started. In
00:02:17
1964 an elderly woman was walking home from grocery shopping when she was all of a sudden pushed to the ground and had her purse stolen
00:02:24
Now she was able to get a glimpse of the thief and saw a blonde woman in a ponytail who then fled the scene
00:02:30
Then there was also a man nearby who heard the screaming
00:02:33
And saw the woman run into a yellow car that was driven by a black man who had a beard and a mustache
00:02:39
And yes
00:02:40
This is all needed for the story by the way. a few days after the incident police ended up catching
00:02:45
Janet Collins and her husband Malcolm who matched all the descriptions given by the witnesses
00:02:50
They were then charged with the crime and put in front of a jury
00:02:53
now since most of the evidence that could be provided for this was just from the victim and the man who saw the event and
00:02:59
what they both witnessed they brought in a mathematician as well to help prove the guilt of this couple. This mathematician calculated the
00:03:05
Probability of just randomly selecting a couple that was innocent
00:03:09
But also happened to share all these characteristics that were observed by the witnesses. Based on data
00:03:14
The mathematician came up with these numbers and assuming independent events
00:03:18
We can multiply them all together to find the joint probability that they all happened to apply to an innocent couple
00:03:24
Turns out there was less than a 1 in 12 million chance that this random couple who just happen to fit all those descriptions
00:03:32
Was innocent, so the jury returned a guilty verdict
00:03:35
This is actually a very famous case in terms of using statistics in the courtroom. Another quick example
00:03:41
is that of Sally Clark who was found guilty of murdering her two infant son's back in the 90s. Her first son died suddenly in
00:03:49
1996 due to unknown causes so it was assumed it was a case of SIDS, or sudden infant death syndrome
00:03:55
But about a year later she gave birth to her second son
00:03:58
Who was then found dead 8 weeks after his birth again of unknown causes
00:04:03
So after this happened and it was reported, the police ended up arresting her and her husband on suspicion of murder
00:04:09
During the trial a pediatrician professor
00:04:11
Testified that the chance of two infants dying due to SIDS at around the same time relative to their birth
00:04:17
Was about 1 in 73 million and again one in 73 million is way beyond a reasonable doubt
00:04:24
so it was more likely this was an event of shaking or smothering or whatever and
00:04:28
Sally Clark was found guilty and sentenced to life in prison
00:04:31
So you can see statistics has a lot of power in our world whether it be advertising
00:04:37
Criminal cases and so on but what's also really powerful and way easier to do is lie
00:04:43
mislead and misinform
00:04:44
Using statistics and you don't even have to use wrong data to do this
00:04:48
I mean, I've already done that multiple times in this video. I'm going to talk about that soon
00:04:53
so yes this next part for all you people who comment on videos before watching the entire thing because there is more I'm going to
00:05:00
Say but let's start off light though. In
00:05:02
2007 in the UK an ad was released for Colgate that claimed the classic "80% of dentists recommend Colgate"
00:05:09
It wasn't long before the advertising standard authority of the UK ordered they abandon this claim because although it was true
00:05:16
They knew people would not really understand what it meant
00:05:19
The study that was done allowed dentists to answer with more than one toothpaste
00:05:23
So like dentist one might say I recommend Colgate, crest, oral-b
00:05:28
Dentist two might say Colgate, crest, or Sensodyne and similar for dentists three, four, and five
00:05:34
In this scenario 80% of dentists do recommend Colgate. That is true.
00:05:39
But 100% of dentists recommend crest in this hypothetical and 80% recommend oral B as well
00:05:45
All of these numbers are factual and you can make an advertisement with any of these claims
00:05:49
But again, we know people would not understand what they really meant
00:05:53
now for this next part I'm going to ask you guys a question. If
00:05:56
Let's say hypothetically the high school dropout rates of a certain country go from 5% one year to 10%,
00:06:03
Is that a 5% increase or 100% increase?
00:06:07
Because if you're at 5% and you add five you get to 10% obviously
00:06:11
But if you're making let's say $5 an hour and you get a 100% raise you'll be at $10 an hour
00:06:18
So which one of these is it and I'm sure many of you are saying that seems like a pointless question
00:06:22
Yes
00:06:23
You do add five to get to 10, but the physical amount of people who are dropping out would be increasing by 100%
00:06:30
Well in the spirit of this video
00:06:31
Let's ask something else. Which one of these paints a more accurate picture
00:06:35
Like if one of these was posted in the New York Times or on Forbes or whatever
00:06:39
Which one tells the people more about what's going on?
00:06:42
And I'm actually curious what you guys have to say about that because I think we're gonna hear different answers from different people
00:06:47
But for this next part, I think we're all going to agree
00:06:49
What if hypothetically the dropout rates are one in a million people and then the next year they go to two in a million people
00:06:56
So that's .0001% to .0002%, a difference of again .0001
00:07:03
But that's also a 100% increase in the physical amount of people dropping out
00:07:08
So which one of these two headlines do think paints a better picture?
00:07:12
Well again, we might hear different answers
00:07:14
But I think we can agree the 100% makes it seem like a worse problem than it is
00:07:19
Like if five people in the whole nation are dropping out and the next year ten people do I
00:07:24
Wouldn't necessarily call that an epidemic just yet
00:07:26
Now using numbers like this in the misleading way is actually not hypothetical because it happened a few decades ago in the UK
00:07:32
But not with college dropout rates, but rather a birth control pill. In
00:07:37
1995 the UK Committee on safety of medicines issued a warning that a certain type of birth control pill
00:07:42
Increased the risk of life-threatening blood clots by 100%
00:07:46
What that actually meant was that with the older second-generation pill about 1 in 7,000 women developed a blood clot
00:07:52
Whereas with the new pill about 2 in 7,000 women developed a blood clot
00:07:56
So yes, the physical amount of women receiving a blood clot did go up by a hundred percent. That is true
00:08:01
But if we dig just a little deeper we see with the older pill is about .014%
00:08:07
Whereas with the new pill it was about .028%, which hardly seems worthy of a breaking news alert
00:08:13
But articles were posted about this misleading statistic and as a result naturally tens to hundreds of thousands of women stopped taking this birth control
00:08:21
pill
00:08:21
fast forward one year and that scare was blamed for
00:08:25
13,000 unwanted pregnancies many of which were teenage pregnancies... a lot of teenage pregnancy stories in this video... moving on
00:08:32
Do you guys actually know head lice is good for your health?
00:08:36
Seems pretty stupid
00:08:37
Right, but people actually thought this at one point and that brings us to the part of this video titled correlation or causation
00:08:44
or both
00:08:45
Remember, it's usually very easy to determine that two things are correlated from a statistical test but causation is a completely different thing
00:08:52
That isn't so easy to spot. yet
00:08:54
People are very quick to assume that A causes B just because A is correlated to B
00:08:59
Sometimes the logic can be stupidly obvious like fast-moving wind turbines are positively correlated to fast wind. As one goes up
00:09:06
The other goes up, but does that mean that fast-moving wind turbines cause fast wind?
00:09:11
Well, obviously not it's the other way around
00:09:14
But in many cases it isn't this obvious like what if I said that kids who watch more violent TV shows are more violent themselves
00:09:21
Does this mean that those shows cause kids to be more violent?
00:09:24
I mean that could be possible and definitely would be an immediate thought for many people
00:09:28
But what if kids who are more violent just happen to watch more violent TV shows that also seems perfectly reasonable
00:09:34
So we can't just jump to conclusions too early
00:09:37
Even though that's what many people would probably do. Or in the Middle Ages European
00:09:41
Saw that people who had head lice were normally healthy. Whereas people who were sick rarely ever had head lice. as a result
00:09:47
They assumed that lice would cause people to be more healthy when in reality head lice is very sensitive the body temperature
00:09:53
So people had a fever or anything like that the head lice would find another host
00:09:58
Then on the subject we have the third cause fallacy where two correlated events
00:10:03
Actually, don't cause each other at all, but it's rather a third thing causing both. For example ice cream sales
00:10:08
Do not cause an increase in heat strokes nor the other way around
00:10:11
Even though they are correlated. Hot weather is instead the cause of both of them. Or for the past several decades
00:10:18
Atmospheric co2 has increased along with obesity levels. So does one cause the other
00:10:22
Well, no richer populations tend to eat more and also produce more co2
00:10:28
And sometimes it can just be really unclear what's causing what.
00:10:31
Like a while back they found that students who smoke cigarettes get lower grades and that could mean smoking causes lower grades
00:10:38
Or maybe it means that getting bad grades causes smoking... Maybe the added stress that comes along with lower grades
00:10:45
Increases the chance that a student will pick up that first cigarette. That also seems like a reasonable explanation
00:10:51
Or it could be a variety of third factors is actually responsible for both
00:10:56
So even when looking at a statistical test with accurate numbers, it's pretty crazy how far you can still be from the truth?
00:11:03
Next we have a story that will probably have people making assumptions really early in the 1970s
00:11:08
Someone noticed that Berkeley was accepting 44% of male applicants to their graduate school
00:11:13
But only 35% of female applicants. Now right there half the Internet's like well say no more...
00:11:31
But only 35% of female applicants
00:11:59
00:12:04
That was an actual unedited clip of everyone on the Internet
00:12:09
Now these numbers that we saw are true
00:12:12
But very misleading when you look at how male and female applicants applied to each program within the Graduate School the assumed bias
00:12:18
Not only goes away but kind of flips. Look closely here in this row
00:12:23
You see there was a high acceptance rate for the program
00:12:26
In fact women had a much higher acceptance rate, but still overall it was high for everyone
00:12:31
however way more men applied to this one
00:12:34
Whereas more women applied to these programs down here with much lower acceptance rates
00:12:39
So since a higher percentage of women were applying to these programs with higher rejection rates
00:12:43
The overall acceptance of women would be lower guaranteed even though they are in fact slightly favored across a couple of departments
00:12:50
so either of these headlines could be published with the necessary stats to back them up and
00:12:56
All you gotta do is pick which one you want to use
00:12:59
toss
00:12:59
the other
00:13:00
Throw that into an article put it in bold right on the top, put the cleverly selected statistics down below it to back it up,
00:13:07
And you've got yourself a story
00:13:10
This here was an example of Simpsons paradox
00:13:13
Where looking at data as a whole tells a totally different story than grouping the data appropriately which I'm sure many of
00:13:19
You know, but I had to include it here
00:13:21
You guys remember the story of the blonde woman who robbed the elderly lady
00:13:25
Well, like I said, this is a famous case but not for the use of statistics
00:13:30
But rather the misuse of statistics in the courtroom, this was a classic example of the prosecutors fallacy
00:13:37
now this fallacy comes up when people assume that the probability of A given B is the same as the probability of B given A
00:13:44
Which I'm sure many of you know is not usually true from this equation.
00:13:48
Like if I said behind this curtain is an animal with four legs, that's the given. What is the chance that it's a dog?
00:13:56
Well, you probably do some thinking like well, it could be a dog, it could be a cat, it could be a cheetah
00:14:01
It could be a lot of other things and if you had to come up with a number you might say one in a hundred
00:14:05
One in a thousand or whatever, but if instead I said behind this curtain is a dog
00:14:10
That's the given, what's the chance that it has four legs?
00:14:14
Well, that's almost a guarantee because most dogs have four legs
00:14:18
So you see switch the given and the question at hand and the probability can change by a lot
00:14:24
So now let's look at what I said earlier
00:14:27
Turns out there was less than a 1 and 12 million chance that this random couple who just happened to fit all those descriptions
00:14:34
Was innocent, so the jury returned a guilty verdict.
00:14:38
This here was wrong
00:14:39
The stats actually showed us that given an innocent couple the odds that they fit the descriptions was one in 12 million
00:14:47
But then I said what the jury had also assumed that if you switch the given and the question at hand
00:14:53
The probability stays the same which we just saw can be very wrong
00:14:58
This left side should make sense like if I just grabbed a random couple out of a mall
00:15:03
That's the given there was a very small chance all of these would apply to them
00:15:07
But this is the false assumption that is the prosecutors fallacy
00:15:12
We're told or given
00:15:14
Hey
00:15:14
Here's a couple that fits all those descriptions. If maybe ten people in the entire city fit all of those given a random one
00:15:21
There's a 1 out of 10 chance that they're guilty or a 9 in 10 chance of being innocent. Not one in 12 million.
00:15:29
And remember Sally Clark who was found guilty of murdering her two children
00:15:33
This is also a famous case of the misuse of statistics
00:15:36
It turns out bacterial tests had actually been withheld that would reveal more specific information than a simple multiplication of two probabilities
00:15:43
Which didn't tell the full story at all.
00:15:46
Like it assumed that the two events were independent of each other when genetic or environmental factors could have definitely been at play
00:15:53
Like I said, sally was found guilty and sentenced to life in prison
00:15:56
But she only served three years when the convictions were finally overturned in early 2003. Up until then though
00:16:03
Sally Clark was widely criticized in the press as a child murderer
00:16:06
And she was never able to recover mentally from the false accusation. A few years after her release
00:16:11
She developed psychiatric problems and died in her home from alcohol poisoning in 2007
00:16:17
I'm gonna repeat that for everyone who didn't follow it. A woman lost two of her children due to natural causes
00:16:24
was accused of murdering them, was put on trial and found guilty due to a misuse of
00:16:29
Statistics, spent 3 years in prison, and even after her release was not able to recover mentally and died
00:16:36
Just about four years later
00:16:37
If you guys don't find that story
00:16:39
Insane then I don't know what to tell you
00:16:42
And actually the result of this case prompted the Attorney General to order a review of hundreds of other similar cases
00:16:48
Now because I don't want that to be where we end this video
00:16:50
Let's look at one more classic misuse of Statistics
00:16:54
This one has to do with how data is represented and it often involves bar graphs that don't have zero as their baseline
00:17:01
For example FoxNews one showed a chart detailing the numbers what would happen if the Bush tax cuts expired?
00:17:07
Do you guys see a problem?
00:17:09
Yeah
00:17:09
It starts at 34 percent at the bottom making a not even 15 percent increase look like a few hundred percent in
00:17:16
Reality the chart should look like this
00:17:19
Or take the Terri Schiavo case that occurred about two decades ago
00:17:23
Which involved a debate of whether a feeding tube should be removed from a woman in an irreversible vegetative state
00:17:29
During that time CNN posted this graph detailing which political parties agreed with the courts
00:17:34
It seems like Democrats supported the decision significantly more but because the baseline is not zero
00:17:40
It appears way different than it should, this is what the actual graph would look like
00:17:45
Or in 2015 the White House published a tweet about the increase in students receiving high school diplomas with an extremely misleading graphic
00:17:53
They made around a ten percent increase look like nearly 200%. Or in music
00:17:59
There was a chart released showing views between top artists that made drake look like he was ahead by a large margin
00:18:04
When in fact it was about a five percent lead
00:18:08
Now I'm guessing the comments on this video will be rather interesting
00:18:11
But remember these were cherry-picked events and it's not like everything. I said paints the full picture either
00:18:16
I just find it interesting that these numbers can change the way we think about a person
00:18:20
They can peek into some of the most intimate moments of our lives based on our grocery list
00:18:24
They can make very trivial events seem very serious and vice versa
00:18:27
You don't even need use wrong numbers for this
00:18:30
But hopefully this show just how not cut and dry math and statistics can be in the real world outside of a school setting
00:18:37
Especially and with that I'm gonna end that video there if you guys enjoyed be sure to LIKE and subscribe
00:18:41
Don't forget to follow me on Twitter and try them in facebook group for updates on everything hit the bell if you're not being notified
00:18:47
Share comments and all those other YouTube boards and I will see you guys in the next video