00:00:06
computer programs can calculate our
00:00:09
intelligence our blood pressure and our
00:00:12
heart
00:00:13
[Music]
00:00:15
rate but can computers calculate our
00:00:18
future
00:00:19
[Music]
00:00:29
[Music]
00:00:30
the digital Revolution opens up new
00:00:33
possibilities almost everything we now
00:00:35
do is recorded and stored but do our
00:00:38
digital Footprints really represent who
00:00:40
we are and if they do can they be used
00:00:43
to tell our
00:00:44
[Music]
00:00:51
future we're in the midst of a
00:00:53
revolution whether we like it or
00:00:57
not it's called Big Data
00:01:04
in 2013 alone we produced more data than
00:01:07
in the entire history of mankind almost
00:01:09
4 and 1 half billion
00:01:15
terabytes since then we've been
00:01:16
producing a further 2 and a half million
00:01:18
terabytes every
00:01:21
day with our smartphones we constantly
00:01:24
generate data about ourselves and our
00:01:28
environment the sensors cameras in our
00:01:30
smartphones vehicles and computers
00:01:33
record where we are and what we
00:01:37
do this data explosion changes the way
00:01:40
we live our
00:01:44
lives text photographs sounds even odors
00:01:48
everything can now be translated into
00:01:51
numbers so have our whole lives become
00:01:57
computable our two directors Pina and
00:01:59
Jacob want to find out scientists search
00:02:03
for patterns in the data we produce to
00:02:05
calculate the future from the past this
00:02:08
is called Predictive
00:02:11
Analytics can they tell our personal
00:02:13
future by looking at our
00:02:15
[Music]
00:02:22
data peina and Jacob meet two
00:02:24
Specialists for Predictive Analytics
00:02:29
[Music]
00:02:30
computer experts from the German frown
00:02:32
hor Institute and Bon University have
00:02:34
set up a special experiment for
00:02:37
[Music]
00:02:41
us for two months Pina and Jacob would
00:02:44
allow these total strangers full access
00:02:47
to their smartphone
00:02:51
data a special smartphone app transmits
00:02:54
all their data to gorg fs and Alexander
00:02:58
marovitz these sign scientists know
00:03:00
nothing about Pina and Jacob yet they
00:03:02
hope to construct an accurate model of
00:03:04
their lives and their
00:03:07
[Music]
00:03:10
behavior we want to find patterns like
00:03:13
when and where are you who do you call
00:03:17
regularly when do you send a text
00:03:19
message to whom we can see what apps you
00:03:22
use and when we'll look at regular
00:03:24
patterns to describe your typical
00:03:26
behavior and then predict your future
00:03:28
behavior from it
00:03:34
we look for these regular patterns
00:03:36
that's a real challenge for Is because
00:03:38
you travel so much we really want to see
00:03:40
what we can find
00:03:46
out I think it would be cool to discover
00:03:48
habits that even you yourselves don't
00:03:50
know about some you may be proud of
00:03:53
others less so I'm not talking about big
00:03:55
things but stuff that makes you think
00:03:57
hey if I'd stopped doing that I might
00:03:59
become a better
00:04:04
person peina and Jacob have mixed
00:04:06
feelings about this experiment from now
00:04:09
on complete strangers have intimate
00:04:11
access to their
00:04:15
lives apart from their smartphones they
00:04:17
will use a Google Glass this wearable
00:04:21
computer will constantly record what
00:04:23
they
00:04:24
see if such smart glasses become part of
00:04:27
our everyday lives the amount of data we
00:04:30
produce would increase even
00:04:36
[Music]
00:04:37
further for this project Jacob will
00:04:40
travel to the US while Pina will stay in
00:04:45
Europe their data will be analyzed by
00:04:48
so-called
00:04:49
algorithms algorithms are procedures to
00:04:52
solve a specific
00:04:56
problem a simple example of an algorithm
00:04:59
is a cooking recipe for example for
00:05:02
making a
00:05:04
burger by exactly following the
00:05:06
instructions you always get the same
00:05:08
Burger in the
00:05:10
end for more complex tasks there are
00:05:13
intelligent algorithms that are able to
00:05:16
learn they can automatically detect new
00:05:19
parameters for example that on Saturdays
00:05:22
fewer burgers are sold but more fries
00:05:29
while Pina remains in Europe Jacob
00:05:32
travels to the United States the first
00:05:34
stop on his tour is
00:05:38
California Silicon Valley near San
00:05:41
Francisco is the birthplace of
00:05:43
Predictive
00:05:47
Analytics here even the police use it to
00:05:50
fight crimes before they committed
00:05:58
[Music]
00:05:59
since July 2011 the Santa Cruz Police
00:06:02
Force uses a computer program called
00:06:04
predpol which stands for predictive
00:06:07
policing it's able to predict the time
00:06:09
and the location of a future
00:06:12
[Music]
00:06:15
crime and just so you guys know uh the
00:06:18
big thing flaring up for us right now is
00:06:20
the Harvey West area we're going to try
00:06:22
to toss some overtime at it uh for
00:06:24
nighttime hours as soon as the Sun
00:06:26
starts to go down there they are flood
00:06:28
in the area so it's reflected here in
00:06:31
fact if you you take a look up at the
00:06:32
pred Pole map we're showing Bergs for
00:06:34
the Harvey West area so there you
00:06:37
go in California the police are under
00:06:40
pressure the state is Cash strapped
00:06:43
public spending has been reduced
00:06:46
break-ins car thefts and robberies are
00:06:50
increasing the solution here in Santa
00:06:52
Cruz let the computer decide which
00:06:55
neighborhoods to Patrol
00:06:57
[Music]
00:07:02
the program was developed here at Santa
00:07:04
Clara University South of San
00:07:07
[Music]
00:07:10
Francisco computer specialist George
00:07:12
Mohler is one of the pioneers of
00:07:14
predictive
00:07:16
policing the algorithm was always better
00:07:19
it was always two to three times more
00:07:21
accurate than the human Analyst at
00:07:22
predicting where crime is going to
00:07:24
happen
00:07:27
[Music]
00:07:33
the data that the algorithms use is past
00:07:36
crime data so we take the past 5 to 10
00:07:38
years of crime reports from a police
00:07:40
database and we pull it in we look at
00:07:43
the locations the times crime types
00:07:46
whether a gun was used uh we pull all
00:07:49
that information in um and then we run
00:07:52
these algorithms each day and get a new
00:07:54
set of predictions for tomorrow for
00:07:56
officers uh in the field to use to
00:07:59
determine where to patrol the police's
00:08:01
resources must be used effectively Santa
00:08:04
Cruz Charlie 1113 I'll be out with
00:08:07
184 every officer focuses on three or
00:08:10
four critical
00:08:11
areas can you send two people that
00:08:15
please red pole predicts that the risk
00:08:18
of a crime being committed is
00:08:19
particularly high in certain hotpots at
00:08:22
a certain time pay our our hot spot map
00:08:26
for today see if we can pull that up
00:08:28
here real quick
00:08:32
one of the things that I can do here
00:08:34
with with predpol is I can actually
00:08:36
point on one of these boxes and it will
00:08:38
show me what that area looks like we're
00:08:41
definitely going to get out into the
00:08:42
Harvey West area up here and then we're
00:08:45
also going to get down towards the beach
00:08:47
and look at some of look look at some of
00:08:49
our vulnerable areas here down along our
00:08:56
beach the algorithm searches for
00:08:58
specific pattern an existing crime data
00:09:01
to predict the crime
00:09:05
[Applause]
00:09:06
hotspots there's been a lot of research
00:09:08
that shows that certain types of crime
00:09:11
are contagious they spread like a virus
00:09:13
so in the case of gang violence what
00:09:16
you'll have is a g one gang will attack
00:09:18
another gang and that second gang will
00:09:20
retaliate a few days later so you'll see
00:09:23
clusters or series of gang crimes that
00:09:26
are contagious because the once you have
00:09:29
that initial event it increases the
00:09:31
likelihood for more violent events Santa
00:09:35
Cruz Charley 1113 I'll be out with
00:09:37
criminal gangs commit most of the crimes
00:09:39
in Santa Cruz particularly violent
00:09:42
crimes and Drug
00:09:47
offenses gotta okay in one of PR Pole's
00:09:52
hotpots our Patrol actually finds a
00:09:54
known gang
00:09:55
member they keep a close eye on him
00:09:59
[Music]
00:10:02
SC especially well seasoned experienced
00:10:05
officers they're telling me there's no
00:10:07
way the S can predict crime in this
00:10:09
neighborhood better than I can I have
00:10:11
all this experience and so it you know
00:10:14
there was a lot of
00:10:15
skepticism but Steve Clark knows the
00:10:18
statistics prove him
00:10:20
right if she was going to contact that
00:10:22
guy he's he's been known to be a little
00:10:24
dangerous so we were going to stick
00:10:25
around for that
00:10:29
in
00:10:30
2013 for the first five months of the
00:10:33
year our crime statistics were had
00:10:36
increased 42% for auto thefts so we were
00:10:39
42% up and I took a team I sent them out
00:10:42
there using PR Poole and I says you guys
00:10:45
need to go work this impact Auto thefts
00:10:47
in these areas and we did that we ended
00:10:50
the year reduced by 15% so we went from
00:10:55
being up 42% to minus 15% a hug huge
00:10:59
swing in
00:11:02
crime a safer City thanks to computers
00:11:05
controlling the
00:11:10
criminals preventing crimes before
00:11:12
they're actually committed sounds like a
00:11:14
good
00:11:15
idea no wonder that other researchers
00:11:17
are looking into it too for instance at
00:11:20
MIT in Cambridge
00:11:24
Massachusetts mathematician Cynthia
00:11:26
Rudin 2 wants to predict crimes using
00:11:29
algorithms and Big
00:11:33
Data however her approach is slightly
00:11:35
different from the one employed by the
00:11:37
researchers in
00:11:41
California so series finder detects
00:11:44
patterns so if you can see a pattern you
00:11:47
know something about where it's going in
00:11:50
the future so for for instance if you
00:11:53
know that a particular criminal has an
00:11:56
affinity for a particular location and a
00:11:58
particular time
00:11:59
you can send someone there to
00:12:01
potentially do something about that
00:12:03
person continuing the
00:12:05
[Music]
00:12:06
pattern searching for behavioral
00:12:09
patterns and thereby predicting possible
00:12:11
crimes is not a new idea criminal
00:12:14
analysts have been trying to do this for
00:12:16
decades but the software has a crucial
00:12:20
Advantage the good thing about a
00:12:22
computer program is it never gets tired
00:12:24
it just can run in the background you
00:12:26
know and and just pop up things that a
00:12:28
human can look at man
00:12:30
right and they had a lot of things in
00:12:32
common the Cambridge Police Department
00:12:34
Plan to use pattern finder soon for
00:12:36
their daily work you know it would sort
00:12:37
of run in the background and you
00:12:39
wouldn't know about it but unlike the
00:12:41
system in Santa Cruz the program doesn't
00:12:44
yet Run in real time it has yet to prove
00:12:47
its predictive power in
00:12:50
[Music]
00:12:56
reality when we start to understand
00:12:58
crime patterns better and better will it
00:13:00
then be possible to identify a
00:13:02
perpetrator even before he's committed a
00:13:05
crime like in the movie Minority
00:13:09
Report what we're predicting is that
00:13:11
there's a human behind several crimes at
00:13:14
once and that human is going to commit a
00:13:16
crime again this is a Minority Report
00:13:18
you're not going to say that um you know
00:13:21
someone who's never committed a crime is
00:13:23
going to commit a crime in this house
00:13:24
tomorrow that's not what it's about it's
00:13:26
about finding a pattern that already
00:13:28
exists and assuming it's going to
00:13:29
continue on into the
00:13:31
future Jacob is quite impressed by what
00:13:34
he has seen and heard nevertheless he
00:13:37
just like pina has an uneasy feeling
00:13:39
about computers planning Police
00:13:42
Operations it would normally
00:13:45
work evi Moroso shares these
00:13:50
concerns Pina meets the renowned
00:13:52
internet critic in Berlin
00:13:54
[Music]
00:14:02
the underlying philosophy behind most of
00:14:05
the systems that rely on Predictive
00:14:08
Analytics and some kind of you know
00:14:10
mechanism for eliminating the problem be
00:14:14
before it happens whether it's in health
00:14:17
or whether it's in crime or whether it's
00:14:18
in any other social domain I mean the
00:14:21
underlying assumption there is that the
00:14:23
current setup is
00:14:25
perfect right so I mean I don't think
00:14:28
that the current setup is perfect in any
00:14:30
of those domains and I don't accept
00:14:32
philosophically that that would ever be
00:14:33
the
00:14:35
case algorithms can only find patterns
00:14:38
and compute accurate forecasts if
00:14:40
they're fed with large amounts of
00:14:41
reliable data just like in our
00:14:48
experiment Jacob and pina's first sets
00:14:51
of data are being sent to G fols and
00:14:54
Alexander
00:14:55
marovitz although the information is
00:14:57
still raw and chaotic Jacob and Pina
00:14:59
have to get used to the idea that their
00:15:01
behavior is far less individual than
00:15:04
they would like to
00:15:07
[Music]
00:15:13
think the mobile phone behavior of each
00:15:15
user is different but within our
00:15:18
Behavior patterns we're pretty
00:15:19
predictable we're not this
00:15:21
homoeconomicus making great smart
00:15:24
decisions Free Will explains only a
00:15:26
small part of our Behavior the rest is
00:15:29
made up of habits that occur in patterns
00:15:32
we can predict all
00:15:34
that and this is computer science's
00:15:37
great insult to
00:15:42
humanity Jacob is unique and so is Pena
00:15:46
but in what they like or do they
00:15:48
resemble thousands of
00:15:52
others therefore retailers use
00:15:54
Predictive Analytics to forecast
00:15:56
consumer Behavior
00:16:00
because our seemingly unique biographies
00:16:02
are not so unique when compared to
00:16:04
others certain patterns appear again and
00:16:08
again the algorithms of retail companies
00:16:11
search for exactly these patterns in the
00:16:14
data sets we produce on a daily basis
00:16:17
for example which products do women Buy
00:16:19
in the third month of
00:16:23
pregnancy once the algorithm has learned
00:16:25
what the pregnancy pattern looks like it
00:16:27
starts searching for it in the
00:16:29
customer's data in the end the algorithm
00:16:32
can predict pregnancy with an accuracy
00:16:34
of more than
00:16:35
95% and send out brochures for nappies
00:16:38
and baby
00:16:42
food big data is only possible because
00:16:45
of the digitization of society we have
00:16:48
so much data because lots of our actions
00:16:50
take place in the digital domain and are
00:16:52
automatically
00:16:55
recorded that's true for our purchasing
00:16:57
habits our travel habits our social
00:16:59
habits and all of our
00:17:04
communication and you can examine this
00:17:06
huge part of our lives for
00:17:13
free of all data guzzlers no one is
00:17:16
watching us more closely than
00:17:18
Google Jacob is on his way to the
00:17:20
internet Giants headquarters in Mountain
00:17:22
View
00:17:26
California Google was the first company
00:17:28
to systematically collect and analyze
00:17:30
internet
00:17:32
data since its Creation in 1998 the
00:17:36
company has saved all search
00:17:39
queries Google's algorithms and methods
00:17:41
of analysis are considered the
00:17:47
best for months Jacob has been trying to
00:17:49
get an interview but with little
00:17:53
success Google knows a lot more about us
00:17:56
than we know about them
00:18:01
we wanted to know more about Google flu
00:18:03
Trends one of the applications for
00:18:05
Predictive
00:18:06
Analytics Google counts the frequency of
00:18:09
specific Search terms and similar to a
00:18:11
weather forecast makes predictions about
00:18:13
where and when a flu epidemic will
00:18:16
occur however scientists criticize that
00:18:19
the company connects advertising to
00:18:21
search queries and thus distorts the
00:18:24
[Music]
00:18:27
data p is traveling to
00:18:30
Zurich here researchers use a completely
00:18:32
different approach to predicting
00:18:40
epidemics Dirk helbing heads the
00:18:42
research group of the Swiss Federal
00:18:44
Institute of
00:18:46
Technology spre around the world in this
00:18:49
he also wants to fight future epidemics
00:18:51
before they happen by making predictions
00:18:54
about where and when exactly the next
00:18:56
outbreak will occur
00:19:05
infections start in one city and then
00:19:07
move to the
00:19:09
next this allows us to stock up our
00:19:11
medical supplies in these places and
00:19:14
therefore successfully combat the spread
00:19:16
of these
00:19:20
diseases in the past infected people
00:19:22
move slowly from place to place this
00:19:25
resulted in uniform circular and wave
00:19:28
light propagation patterns of epidemics
00:19:30
like the plague but today things are
00:19:35
different but if you look at modern
00:19:37
disease propagation patterns and they
00:19:39
look pretty
00:19:40
chaotic suddenly people are sick in
00:19:42
America then in Europe then in Asia and
00:19:46
it's quite a
00:19:49
mess the reason is that today diseases
00:19:52
are primarily spread by air
00:19:54
passengers the result is a Global
00:19:57
Network of cities that are closely
00:19:59
linked by air traffic Durk Hing calls
00:20:02
this new approach to measuring distance
00:20:04
effective
00:20:05
distance for example the distance
00:20:08
between big cities such as Frankfurt and
00:20:10
New York is effectively smaller than the
00:20:12
distance between New York and Rural
00:20:17
Pennsylvania Durk hing's algorithm
00:20:19
analyzes the propagation pattern by
00:20:22
looking at individual
00:20:23
airports suddenly the apparent chaos
00:20:27
turns into something surprisingly
00:20:31
regular at some airports the propagation
00:20:34
pattern looks nearly
00:20:42
circular and this tells us the most
00:20:44
likely origin of the
00:20:47
disease once you have this information
00:20:50
you can also make
00:20:52
predictions so if you want to protect
00:20:54
yourself from flu in New York you might
00:20:57
have to watch Frankfurt more more
00:20:58
closely than
00:21:03
Pennsylvania Jacob is on his way to San
00:21:05
Francisco to find out more about the
00:21:07
latest health
00:21:13
Trends I have a Misfit Shine a Fitbit
00:21:17
Flex a Nike fuel band a jabone up a
00:21:20
basis watch two more shines a pebble
00:21:24
another shine on my necklace basis four
00:21:26
more shines a meadow watch Fitbit Ultra
00:21:29
a fitbug a a zami a Fitbit Zip a Fitbit
00:21:33
One a strive play a w things pulse four
00:21:36
more shines and I have an Android and an
00:21:40
iOS
00:21:41
device Rachel CMA not only uses these
00:21:44
devices she helps to develop them too
00:21:47
one of the things I'm interested in is
00:21:49
early detection of disease so for
00:21:53
instance how far back do you have to go
00:21:56
in time to be able to predict say onset
00:21:59
of a neurod degenerative disorder and
00:22:02
can you find traces of that in people's
00:22:05
activity patterns and being able to take
00:22:08
these kinds of uh sets of data and once
00:22:12
we have a better understanding of how
00:22:14
they relate to health disease to
00:22:16
behavior then we're also going to be
00:22:18
able to build better predictive models
00:22:20
that are going to lead to uh better
00:22:24
early diagnosis of preventable diseases
00:22:31
she also tries to find ways of
00:22:32
preventing
00:22:38
depression one device that I think could
00:22:41
be really cool would be something like a
00:22:42
mood ring that measures how much
00:22:44
exposure to different light you have and
00:22:47
then it could automatically adjust the
00:22:48
lighting in your house to compensate
00:22:51
then this could increase the quality of
00:22:53
life for myself as well as other
00:22:56
people using self-tracking to improve
00:22:59
your health sounds great Pina puts a few
00:23:01
of these devices to the
00:23:03
test and realizes logging all the data
00:23:07
takes a lot of
00:23:09
time
00:23:12
oh up to now selft trackers haven't been
00:23:15
taken seriously but what if recording
00:23:17
vital values becomes a prerequisite for
00:23:20
health
00:23:22
insurance the idea is that if you
00:23:25
monitor yourself enough or often enough
00:23:28
you won't even need to go to the doctor
00:23:30
right so the idea then if you push that
00:23:33
idea to its ultimate conclusions is that
00:23:35
if you do not monitor yourself enough
00:23:38
often enough or you know with enough
00:23:40
gadgets uh there is something wrong with
00:23:43
you as a
00:23:46
citizen until now self trackers Only log
00:23:49
physical data such as running distance
00:23:51
heart rate or sleep
00:23:54
patterns but a new research project also
00:23:57
targets the psyche
00:24:00
now a new smartphone app is supposedly
00:24:03
able to predict depression this
00:24:07
keep there are several parameters that
00:24:10
change during depression first the
00:24:12
communication pattern then the movement
00:24:15
pattern and the third is perhaps the
00:24:17
most
00:24:18
interesting the tone of voice changes
00:24:21
during
00:24:23
depression a very highly modulated voice
00:24:25
Melody changes to a very quiet
00:24:28
uniform unmodulated
00:24:36
Melody the depression app which is still
00:24:39
a prototype will soon be able to record
00:24:41
communication Behavior voice Melody and
00:24:43
the movement pattern of a
00:24:47
patient it's an early warning system not
00:24:50
only for patients and
00:24:57
doctors healthy people who have
00:24:59
absolutely no psychiatric disorders have
00:25:02
a natural mechanism to deal with stress
00:25:05
but if they're exposed to too much
00:25:06
stress it's quite possible that their
00:25:08
behavior could
00:25:12
change P will test the app for two
00:25:16
months by analyzing her communication
00:25:18
and movement patterns Thomas Scher wants
00:25:21
to predict how big her risk is for
00:25:23
developing a depression
00:25:30
I must warn you this will reveal a great
00:25:32
deal about your behavior and how you
00:25:34
deal with
00:25:39
stress it might show that you handle
00:25:41
stress not quite as well as you think
00:25:43
and we will see
00:25:46
this and we might predict that you're
00:25:49
likely to develop a depressive disorder
00:25:51
like a burnout syndrome or even
00:25:54
full-blown depression
00:26:00
none of the previous applications of
00:26:01
Predictive Analytics have interfered so
00:26:04
deeply with our personal lives how
00:26:06
reliable are the app's predictions how
00:26:09
is Pina going to deal with the outcome
00:26:11
and who else apart from the doctor will
00:26:13
see her
00:26:16
data virtuality poses new questions like
00:26:20
what information should I be allowed to
00:26:22
collect from you who's going to have
00:26:24
access to it and why my computer model
00:26:27
of view is 9 5% accurate is that good
00:26:30
enough and can I share this with
00:26:33
everyone these are fundamental issues
00:26:37
what makes a person a person and what
00:26:40
should I be allowed to do with big data
00:26:42
and what
00:26:50
not in the meantime our prediction
00:26:52
experiment
00:26:54
continues by now gorg fols and Alexander
00:26:57
maret have collected a significant
00:26:59
amount of
00:27:02
[Music]
00:27:04
data the scientists analyze the quality
00:27:07
of the data and start looking for
00:27:09
patterns that will allow them to make
00:27:10
predictions about Jacob's and pina's
00:27:15
[Music]
00:27:18
behavior but especially Jacob's Restless
00:27:21
lifestyle makes life hard for the
00:27:23
experts
00:27:29
this is a yeah
00:27:34
super it's not only scientists who are
00:27:37
excited about the possibility of telling
00:27:38
the future with computer
00:27:40
models Predictive Analytics has become
00:27:43
big
00:27:46
business in San Francisco Jacob meets
00:27:49
one of the stars of the program
00:27:53
scene Anthony Goldblum founder of the
00:27:56
company kackle
00:27:59
no one I I think is going to man an
00:28:00
argument today that quality of life is
00:28:03
not better because of you know Factory
00:28:05
processes and Automation and um I think
00:28:08
in 50 50 to 100 years time people will
00:28:11
say be saying the same things about
00:28:12
predictive modeling and big data I like
00:28:14
data I think um the thing that I really
00:28:16
love about data is that it doesn't lie
00:28:18
it's very objective so you know when you
00:28:21
when you ask for somebody's opinion you
00:28:23
get a you get a you get something back
00:28:26
that's subjective when you ask when you
00:28:28
you look at the data and you answer a
00:28:29
question with data you're getting a much
00:28:31
clearer much much much more real much
00:28:35
more tangible answer
00:28:36
back there are lots of ambitious young
00:28:39
people in San Francisco trying to get
00:28:41
rich with Predictive
00:28:43
Analytics Anthony goldblue is certainly
00:28:46
one of the most
00:28:48
successful we solve problems that I
00:28:51
wouldn't have thought were
00:28:54
um were humanly possible to solve using
00:28:57
U machine Lear learning so things like
00:29:00
grading High School essays using
00:29:01
algorithms um predicting which drugs are
00:29:04
going to be good drugs using algorithms
00:29:06
image detection machine learning with
00:29:08
audio with financial markets for
00:29:10
instance starting to do a a lot of work
00:29:11
in the O and gas industry and just
00:29:13
things that I never would have thought
00:29:14
were possible that kaggle Community has
00:29:16
been able to solve kaggle is an online
00:29:19
platform that's used by companies like
00:29:21
Google Microsoft and NASA they invite
00:29:25
programmers from around the world to
00:29:26
write the best algorithm for a
00:29:28
particular
00:29:31
problem a lot of these statisticians and
00:29:33
data scientists are exceptionally
00:29:35
brilliant uh but no one ever you know no
00:29:38
one had ever discovered them before and
00:29:39
so kaggle has given um people who are
00:29:42
previously undiscovered a chance to
00:29:44
really become
00:29:45
Superstars nearly 200,000 number
00:29:48
crunchers from around the world have
00:29:50
already participated in kagle
00:29:51
competitions from students to Elite
00:29:54
professors Pina meets one of those
00:29:57
computer whis kits in
00:29:59
[Music]
00:30:01
Hamburg Yosef figel ranks among the top
00:30:04
10 of kagle programmers
00:30:07
[Music]
00:30:12
worldwide towards the end it's always
00:30:14
stressful mainly because then everybody
00:30:16
posts their best Solutions and you have
00:30:18
to keep up so that's a bit hectic but
00:30:20
otherwise it's okay it's a nice hobby
00:30:23
hobby
00:30:32
I'd never worked with real data during
00:30:33
my studies and for me that's the thrill
00:30:36
what's different in real
00:30:39
life during my first competition I
00:30:42
learned much more than during my entire
00:30:50
studies for example if a pharmaceutical
00:30:53
company wants to predict what kinds of
00:30:54
people are at risk of Contracting
00:30:56
diabetes they start a competition on
00:30:58
kaggle they provide programmers with
00:31:01
Anonymous raw data from patients who
00:31:03
already have
00:31:07
diabetes you have to think in advance
00:31:08
about what data you want to use for your
00:31:10
algorithm and then feed that into the
00:31:12
model this takes up about 80% of the
00:31:15
time and that's the hard part if you
00:31:17
simply feed raw unprocessed data into an
00:31:20
algorithm then it'll work but he won't
00:31:22
be very good at recognizing
00:31:26
patterns after the algor gthm has
00:31:28
learned what the typical pattern for a
00:31:29
patient looks like it can then search
00:31:32
other people's data for the same pattern
00:31:35
then it can calculate How likely it is
00:31:37
that these people develop
00:31:40
[Music]
00:31:42
diabetes Yosef figel sends his
00:31:44
calculations to kaggle and immediately
00:31:46
gets feedback on how well his algorithm
00:31:52
performs in the end an algorithm is only
00:31:55
able to calculate probabilities and even
00:31:57
even if an algorithm is 97% accurate its
00:32:00
results can be wrong so you can't
00:32:03
eliminate Randomness but you can get
00:32:05
increasingly
00:32:08
accurate this is one fundamental
00:32:11
limitation of Predictive
00:32:12
Analytics even the best algorithm can be
00:32:15
wrong it's not a digital crystal
00:32:20
[Music]
00:32:24
ball we're quite good at predicting
00:32:26
things that happen on a regular basis
00:32:29
but we will never be able to predict
00:32:31
singular events we won't be able to
00:32:34
predict the next 911 or a suicide or
00:32:37
other extraordinary
00:32:43
[Music]
00:32:48
events this means that Minority Report
00:32:51
will remain Hollywood fiction for the
00:32:52
time being people won't be arrested
00:32:55
before they've committed a crime just
00:32:57
because an algorithm has calculated they
00:32:59
will actually do it at some point in the
00:33:05
future there's always a big difference
00:33:07
between probability and
00:33:09
reality in individual cases statistics
00:33:12
are useless the probability of getting a
00:33:15
particular form of cancer maybe
00:33:22
0.1 but if I get this cancer statistics
00:33:25
won't be much help
00:33:30
it's the same with crime there's a
00:33:32
certain probability that a certain
00:33:34
individual will commit a crime in the
00:33:36
future but whether he really will commit
00:33:38
a crime is impossible to
00:33:42
predict I think there's there's
00:33:44
fundamentally Randomness in the world uh
00:33:48
so no matter how much data you collect
00:33:50
if there's a source of
00:33:52
Randomness then uh you won't be able to
00:33:55
predict with 100% accuracy
00:33:59
however even if there isn't 100%
00:34:01
guarantee that someone will commit a
00:34:03
crime in the future The crucial question
00:34:05
is how will Society deal with a 95%
00:34:13
chance is it okay to detain a person
00:34:16
even if it means that we catch somebody
00:34:18
totally
00:34:19
innocent in the end it's a question of
00:34:22
ethics
00:34:28
I could just go ahead and arrest people
00:34:30
as a
00:34:31
precaution statistically that would mean
00:34:33
that one in a 100 is locked up for no
00:34:34
reason at
00:34:36
all in the US people are more pragmatic
00:34:40
and might say okay this is acceptable
00:34:43
collateral
00:34:46
damage but in Europe or at least Germany
00:34:49
they would say oh no if only one of
00:34:51
these people is innocent we simply can't
00:34:54
do this
00:34:59
[Music]
00:35:02
back to our experiment now the experts
00:35:04
start their analysis for two months
00:35:07
they've collected Jacobs and pina's
00:35:11
data the algorithm Now searches for
00:35:13
recurring events and tries to calculate
00:35:16
a
00:35:17
[Music]
00:35:19
prediction and gets
00:35:22
the business trips
00:35:28
Over The Irregular lifestyle of our two
00:35:31
filmmakers poses a particular challenge
00:35:34
office workers are more
00:35:37
[Music]
00:35:39
predictable another problem is the Jacob
00:35:41
and Pina use their smartphones far less
00:35:44
than teenagers for
00:35:47
[Music]
00:35:50
example this is
00:35:52
a Pina just uses her phone very little
00:35:56
very good for her but it's much easier
00:35:57
to predict a 17year
00:36:00
[Music]
00:36:03
old totally cryptic this whole thing I
00:36:07
don't understand what's going on it
00:36:08
needs to be cleaned
00:36:13
[Music]
00:36:15
up Jacob is still on the road in the US
00:36:19
the most interesting work on Predictive
00:36:20
Analytics is done here in the
00:36:26
states he wants are talk to a programmer
00:36:28
Who develops algorithms that can predict
00:36:30
the outcome of sporting events Paul
00:36:33
verier from Cincinnati Ohio wants to do
00:36:35
the interview via
00:36:38
Skype I predict over 10,000 games every
00:36:41
year I am right about when we're when
00:36:45
when you're factoring in the gambling
00:36:46
element of it I'm right maybe 5600 times
00:36:49
a year that means I'm wrong 4,400 times
00:36:51
a year but it's also very easy to figure
00:36:53
out if I'm better or worse than somebody
00:36:55
else because of the quick turnaround
00:36:57
immediate not just satisfaction but
00:37:00
understanding of whether or not there
00:37:01
that our analysis our prediction was
00:37:04
successful and every time we get a new
00:37:05
piece of data which happens every single
00:37:07
day with almost all these Sports we can
00:37:09
add it to what we're doing and improve
00:37:10
the model going
00:37:14
forward Paul basier makes predictions
00:37:16
for hockey games the algorithm searches
00:37:19
historical data for recurring patterns
00:37:22
how do individual players react in
00:37:24
certain
00:37:25
situations which strategy does the
00:37:27
manager
00:37:32
choose hockey is actually the most
00:37:34
difficult sport because it's hard to
00:37:35
really understand what a play means
00:37:37
right the actual impact of individual
00:37:38
players on an IND on a given play and
00:37:41
understanding just what a play is is
00:37:43
very difficult within hockey because it
00:37:44
moves at such a fast rate and because
00:37:46
the puck never officially starts from
00:37:48
one team and goes to the other it's
00:37:50
basically going back and forth Paul B's
00:37:52
prediction machine anticipates a game by
00:37:55
running 50,000 simulations on all
00:37:57
conceivable variance it does this in
00:38:00
real time even during a game Sports
00:38:03
punters are especially interested in
00:38:05
this technology PA bia's website already
00:38:08
has more than 10,000 paying
00:38:11
subscribers a success rate of 56%
00:38:14
doesn't sound much it's far away from a
00:38:17
safe bet but for now it's the most
00:38:19
reliable forecast for sporting
00:38:23
events there is a concept that heart in
00:38:27
will and some of these other kind of
00:38:30
glorified traits that that some athletes
00:38:33
are considered to have in some
00:38:34
circumstances versus others is impactful
00:38:36
and maybe it is maybe it exists but
00:38:38
that's already in that person it's very
00:38:41
difficult for somebody to immediately
00:38:42
and and quickly have a change of heart
00:38:44
or a change of skill or a change of
00:38:46
talent I've worked and looked at uh at
00:38:49
machines which you would assume would be
00:38:50
far more predictable and I still feel
00:38:52
people are the Ultimate Machine in terms
00:38:55
of being able to understand what they
00:38:56
are likely to do when there is a certain
00:38:58
objective at hand on the
00:39:01
field humans as the ultimate machines
00:39:04
here in Boston we find a company that
00:39:06
claims it can predict not only the fate
00:39:08
of individuals or teams but of whole
00:39:15
Nations the Boston Globe calls it the
00:39:18
Nostradamus of the digital age its
00:39:21
investors include Google and the CIA
00:39:26
[Music]
00:39:29
I'm not sure Nostradamus had any
00:39:30
methodology to his workings you know we
00:39:32
don't know how he did it uh so we
00:39:34
believe that we are a bit more
00:39:37
scientific Swedish born staffan Tru is
00:39:40
one of the founders of recorded
00:39:43
future he claims to be able to predict
00:39:46
riots Wars and revolutions based on
00:39:48
information that's floating around the
00:39:50
internet closer you can see this
00:39:53
area more
00:39:55
detail recorded future allegedly
00:39:58
predicted the overthrow of Egyptian
00:40:00
president Mory in July
00:40:02
2013 well at least they knew that
00:40:04
trouble was
00:40:07
brewing what happened last year in in
00:40:09
June when Mory was thrown out as
00:40:11
President we had we saw four or five
00:40:14
days beforehand that there was something
00:40:16
big going to happen you know we couldn't
00:40:18
know exactly what would happen you know
00:40:19
and and it could have backlashed you
00:40:21
know Mory could have done something
00:40:22
dramatic to and state in power but at
00:40:25
least we saw very clear in our system
00:40:26
that you know there were very dark
00:40:29
clouds in the sky a little bit into the
00:40:31
future every day recorded future scours
00:40:34
the internet for millions of documents
00:40:36
in seven languages texts videos and
00:40:39
audio files are searched for specific
00:40:41
keywords the resulting prognoses are
00:40:43
sold to all those who don't like
00:40:45
surprises commercial companies
00:40:47
governments and intelligence
00:40:49
agencies
00:40:52
explosion newspap were on the
00:40:57
[Music]
00:41:01
in this case uh if you were to do an
00:41:03
aggression from Russia on the Ukraine
00:41:05
you would probably do something about
00:41:06
the natural gas supply which is
00:41:08
something which you need to to do
00:41:09
beforehand so reports on something you
00:41:13
know happening to the national gas
00:41:14
supply to the Ukraine would be a
00:41:15
possible
00:41:16
indicator uh there's also been numerous
00:41:19
reports over the years of this
00:41:20
motorcycle gang called the nightwolves
00:41:23
which have some kind of tie to the
00:41:25
Russian government and they were
00:41:27
actually seen in the crian before the
00:41:29
conflict escalated so there were
00:41:32
definitely signals there you know and a
00:41:34
good analyst would probably be knowing
00:41:35
how to look for exactly these
00:41:38
signals data analysts now do the work of
00:41:41
spies and agents but what's their
00:41:44
motivation world peace or world
00:41:50
domination they a bunch of companies
00:41:53
trying to essentially get away with
00:41:56
making as much money as they can the
00:41:59
state is using them to pursue its own
00:42:01
objectives and that's the reality right
00:42:04
you can talk about the internet Big Data
00:42:06
algorithms digitization deeply
00:42:09
alienating effects of all of this great
00:42:11
for me just it's not going to explain
00:42:13
99% of what's Happening which revolves
00:42:16
around those two simple factors a this a
00:42:18
companies B you have the state actively
00:42:21
encouraging them to expand because it
00:42:23
USS its own agendas whether it's
00:42:25
fighting Terror promoting innovation or
00:42:27
you name it a future without disasters
00:42:30
Wars and epidemics because we can
00:42:33
eliminate problems before they arise all
00:42:36
thanks to computer
00:42:38
algorithms this will remain a dream even
00:42:41
today the problem is not a lack of
00:42:43
analysis but a lack of will to
00:42:48
act back to the fate of the
00:42:51
individual Pina once again meets up with
00:42:53
Dr
00:42:54
Scher what has the depression app found
00:42:57
found out about her mental
00:43:01
state this shows you the frequency of
00:43:03
your phone
00:43:04
calls on Mondays you make significantly
00:43:07
more calls than during the rest of the
00:43:12
week on Sundays you send virtually no
00:43:15
text
00:43:16
messages but on Wednesdays Fridays and
00:43:19
Saturdays you send lots of
00:43:23
messages if we look at your mood over
00:43:25
the week you can see that on Sunday you
00:43:27
feel
00:43:29
great and during the 8 weeks we
00:43:31
monitored you you consistently felt low
00:43:33
on
00:43:35
Tuesdays but that doesn't mean that you
00:43:37
suffer from
00:43:42
depression
00:43:44
interesting I didn't know that Tuesday
00:43:47
is my black
00:43:49
day so I make lots of calls early in the
00:43:52
week and then I feel
00:43:54
bad you know exactly
00:44:00
we can also say that you're good at
00:44:01
managing stress because your mobile
00:44:03
phone Behavior doesn't change depending
00:44:05
on how you feel or how stressed you
00:44:09
[Music]
00:44:11
are this means that you can handle
00:44:13
stress
00:44:14
well and that in turn means that the
00:44:17
probability of you getting a stress
00:44:19
related disease is rather small
00:44:26
[Music]
00:44:28
in this case Pena's quite happy to trust
00:44:31
the algorithm's
00:44:34
predictions the trips are coming to an
00:44:36
end Jacob is on his way back home in a
00:44:40
few days the experts will present them
00:44:42
with the results of the
00:44:50
[Music]
00:44:53
experiment for 2 months they've been
00:44:55
collecting penas and Jacob's
00:44:59
data the analysis has taken gorg fuks
00:45:02
and Alexander maret another 4
00:45:05
[Music]
00:45:10
weeks well Jacob you lead a very
00:45:12
interesting
00:45:14
life in Jacob's case the algorithm
00:45:16
failed his lifestyle is simply too
00:45:22
unpredictable the problem was that we
00:45:24
couldn't find any patterns during the
00:45:26
time we collected your
00:45:28
a prediction relies on regular patterns
00:45:31
without them you simply can't generate
00:45:32
meaningful
00:45:33
statistics this means that we have to
00:45:35
completely abandon the forecast this
00:45:38
isn't a crystal
00:45:40
[Music]
00:45:43
ball this is ma this is statistics which
00:45:47
means we need to have patterns that are
00:45:48
present in the
00:45:51
data Jacob did a lot of traveling so in
00:45:54
terms of geography we couldn't really
00:45:56
determine the pattern
00:45:59
but that's fine Jacob is what we call an
00:46:02
outlier in computer
00:46:04
science these are people whose behavior
00:46:06
is so unique that they have no
00:46:08
similarities with anybody else they
00:46:10
simply stand
00:46:14
out in pina's data however the algorithm
00:46:17
was able to find
00:46:19
[Music]
00:46:23
patterns you weren't an easy candidate
00:46:26
either because you have a completely
00:46:27
different lifestyle from a person who
00:46:29
works at a bank or keeps regular office
00:46:32
hours you also don't use your phone as
00:46:35
much as many other people but what's
00:46:37
really amazing is that we do find solid
00:46:39
rhythms and fairly fixed patterns in
00:46:41
your weekly
00:46:44
routine at 8 you crawl out of bed at
00:46:48
9:30 you arrive at your office you're
00:46:50
not a vegetarian because you regularly
00:46:52
go to a kebab shop you don't own a car
00:46:55
you work from home alone
00:46:58
you don't have
00:46:59
kids you go to sleep late usually at
00:47:02
half
00:47:06
midnight you have a partner who lives
00:47:08
with you because your data doesn't show
00:47:10
the typical pattern of you sleeping in a
00:47:12
second apartment half of the
00:47:20
week gor FS a total stranger now knows
00:47:24
more about Pina than some of her closest
00:47:26
friends
00:47:29
this amazes me I wouldn't have thought
00:47:32
that I lead such a regular
00:47:34
life I wasn't aware that I have so many
00:47:37
set
00:47:40
habits the computer can even calculate a
00:47:42
precise forecast for one particular day
00:47:45
in pina's
00:47:49
life the prediction and the entries in
00:47:51
Pena's diary are an exact match
00:47:59
by looking at our data computer analysts
00:48:02
can learn as much about us as our
00:48:03
closest friends and this is just the
00:48:09
beginning for me Silicon Valley is a
00:48:12
cult right which operates in its own
00:48:14
language which has its own uh Gods and
00:48:19
which has its own Theology and values
00:48:21
and that
00:48:23
cult you know was a celebration of
00:48:25
disruption
00:48:28
has now more or less invaded all the
00:48:31
other domains from ucation to health to
00:48:33
security to crime prevention do you name
00:48:39
it many of us Embrace this cult or at
00:48:43
least don't see it as something
00:48:49
dangerous but we need to be careful not
00:48:51
to trust statistics too much
00:48:58
algorithms will never be able to predict
00:49:00
with 100% certainty whether a child will
00:49:02
be successful in school or whether
00:49:05
someone will commit a crime in the
00:49:09
future but in the end the question is
00:49:12
not how accurate the algorithms are the
00:49:16
crucial question is how willing we are
00:49:18
to trust them and base our decisions on
00:49:21
their results
00:49:30
the threat is an allwell
00:49:32
scenario it's not so much the Spy trying
00:49:34
to read my thoughts trying to find out
00:49:36
how subversive I
00:49:38
am the threat is more like huxley's
00:49:40
Brave New World or cfa's
00:49:46
process in these scenarios I'm
00:49:48
controlled simply based on probability I
00:49:51
told you are not allowed to enroll in a
00:49:53
good school because you're not likely to
00:49:55
succeed and when I question the system
00:49:58
then I'm told stop doing this or you
00:50:00
make yourself even more
00:50:05
suspicious but whether we like it or not
00:50:08
these developments can't be stopped as
00:50:11
so often it's up to us what we make of
00:50:16
it we're in the process of completely
00:50:19
rebuilding Society this is a radical
00:50:21
change For Better or For Worse it will
00:50:24
be naive to think that we can stop this
00:50:26
the question is how can we help to shape
00:50:29
this Vision so that the result is more
00:50:31
humane
00:50:37
[Music]
00:50:53
[Music]
00:50:57
oh
00:50:58
[Music]