What is supervised learning?

Supervised learning is a type of machine learning where the model is trained on labeled data, meaning the outcome variable is known.

What is unsupervised learning?

Unsupervised learning involves training a model on data without labeled outcomes, focusing on finding patterns or groupings in the data.

What are some examples of supervised learning problems?

Examples include regression problems (predicting continuous outcomes like price) and classification problems (predicting categorical outcomes like survival status).

What is the Netflix Prize?

The Netflix Prize was a competition to improve Netflix's movie recommendation system, offering a $1 million prize for a 10% improvement over their existing algorithm.

What is the difference between statistical learning and machine learning?

Statistical learning focuses on model interpretation and uncertainty, while machine learning emphasizes prediction accuracy and often deals with larger datasets.

What tools will be used in this course?

The course will use R, a free software environment for statistical computing and graphics.

Are the course materials free?

Yes, both the course and the associated textbooks will be available for free.

What is the importance of understanding the methods in machine learning?

Understanding the methods helps in applying them effectively to new problems and assessing their performance.

What are some techniques covered in unsupervised learning?

Techniques include clustering and principal components analysis.

Why is unsupervised learning important?

It helps in organizing data and can serve as a preprocessing step for supervised learning.

Statistical Learning: 1.2 Examples and Framework

00:12:13

https://www.youtube.com/watch?v=B9s8rpdNxU0

Resumo

TLDRThe video provides an overview of supervised and unsupervised learning, key concepts in statistical learning and machine learning. It defines outcome measurements (y) and predictor measurements (x), explaining the differences between regression and classification problems. The objectives of supervised learning include predicting unseen cases, understanding input effects, and assessing prediction quality. The importance of grasping underlying ideas behind methods is emphasized. Unsupervised learning is introduced as a method to find patterns in unlabeled data. The Netflix Prize competition is highlighted as a practical example of applying these concepts, showcasing the excitement and ongoing research in the field.

Conclusões

📊 Supervised learning uses labeled data to predict outcomes.
🔍 Unsupervised learning finds patterns in unlabeled data.
💡 Regression predicts continuous outcomes; classification predicts categories.
🏆 The Netflix Prize improved recommendation systems through competition.
📚 Understanding methods is crucial for effective application.
🖥️ R is a key tool for data analysis in this course.
🔄 Unsupervised learning can preprocess data for supervised learning.
📈 Statistical learning focuses on model interpretation and uncertainty.
🌐 Machine learning often deals with larger datasets and pure prediction.
🎓 Course materials, including textbooks, are available for free.

Linha do tempo

00:00:00 - 00:05:00
The discussion begins with an introduction to supervised learning, defining key terms such as outcome measurement (y) and predictor measurements (x). It distinguishes between regression and classification problems, emphasizing the goal of accurately predicting unseen test cases and understanding the influence of inputs on outcomes. The importance of grasping the underlying ideas of various techniques is highlighted, as well as the necessity of assessing the quality of predictions and the potential need for algorithm improvement or better data collection.
00:05:00 - 00:12:13
The concept of unsupervised learning is introduced, contrasting it with supervised learning. In unsupervised learning, there is no outcome variable, and the objective is to understand how data is organized and identify important features. Techniques like clustering and principal components are discussed, along with the challenges of evaluating performance without a gold standard. The Netflix Prize example illustrates the practical application of these concepts, showcasing the competition's impact on research and the development of new techniques in the field.

Mapa mental

Vídeo de perguntas e respostas

What is supervised learning?
Supervised learning is a type of machine learning where the model is trained on labeled data, meaning the outcome variable is known.
What is unsupervised learning?
Unsupervised learning involves training a model on data without labeled outcomes, focusing on finding patterns or groupings in the data.
What are some examples of supervised learning problems?
Examples include regression problems (predicting continuous outcomes like price) and classification problems (predicting categorical outcomes like survival status).
What is the Netflix Prize?
The Netflix Prize was a competition to improve Netflix's movie recommendation system, offering a $1 million prize for a 10% improvement over their existing algorithm.
What is the difference between statistical learning and machine learning?
Statistical learning focuses on model interpretation and uncertainty, while machine learning emphasizes prediction accuracy and often deals with larger datasets.
What tools will be used in this course?
The course will use R, a free software environment for statistical computing and graphics.
Are the course materials free?
Yes, both the course and the associated textbooks will be available for free.
What is the importance of understanding the methods in machine learning?
Understanding the methods helps in applying them effectively to new problems and assessing their performance.
What are some techniques covered in unsupervised learning?
Techniques include clustering and principal components analysis.
Why is unsupervised learning important?
It helps in organizing data and can serve as a preprocessing step for supervised learning.

Ver mais resumos de vídeos

Obtenha acesso instantâneo a resumos gratuitos de vídeos do YouTube com tecnologia de IA!

Legendas

Rolagem automática:

00:00:01
okay now we're going to talk about the
00:00:02
supervised learning problem and set down
00:00:04
a little bit of notation
00:00:06
so we'll have an outcome measurement y
00:00:09
which is goes by various names dependent
00:00:11
variable response or target and then
00:00:13
we'll have a vector of p predictor
00:00:15
measurements which are usually called x
00:00:17
they go by the name inputs regressors
00:00:19
covariates
00:00:21
features or independent variables
00:00:24
and we distinguish two cases one is the
00:00:26
regression problem y is quantitative
00:00:29
such as price or blood pressure
00:00:32
in the classification problem y takes
00:00:34
values in a in a finite unordered set
00:00:36
such as survived or died the digit class
00:00:39
is zero to nine the cancer class of the
00:00:41
tissue sample
00:00:43
now we have training data pairs x1 y1 x2
00:00:46
y2 up to xn yn so again x1 is a vector
00:00:50
of p measurements y1 is usually a single
00:00:53
response variable and so these are
00:00:55
examples or instances
00:00:57
of these measurements
00:01:00
so the objectives of supervised learning
00:01:02
is as follows on the basis of the
00:01:04
training data we would like to
00:01:06
accurately predict unseen test cases
00:01:09
understand which inputs affect the
00:01:11
outcome and how
00:01:12
and also to assess the quality of our
00:01:14
predictions and the inferences
00:01:18
so by way of philosophy as you take this
00:01:20
course we want
00:01:22
not just to give you a laundry list of
00:01:23
methods but we want you
00:01:25
to know that it's important to
00:01:26
understand the ideas behind the various
00:01:27
techniques so you know where and when to
00:01:29
use them because in your own work
00:01:31
you're going to have problems that we've
00:01:32
never seen before you've never seen
00:01:34
before and you want to be able to judge
00:01:35
which methods are likely to work well
00:01:36
which ones are not likely to work well
00:01:38
as well uh
00:01:40
not just prediction accuracy is
00:01:41
important but it's it's important to to
00:01:43
to try simple methods first in order to
00:01:45
grasp the more sophisticated ones we're
00:01:47
going to spend quite a bit of time on on
00:01:49
linear models linear regression and
00:01:50
linear logistic regression these are
00:01:52
simple methods but they're very
00:01:54
effective
00:01:55
and it's also important to understand
00:01:56
how well method is doing right it's easy
00:01:58
to apply an algorithm you can nowadays
00:01:59
you can just run software but
00:02:01
it's it's difficult but also very
00:02:03
important to figure out how methods is
00:02:04
how well is the method actually working
00:02:06
so you can tell your boss or your
00:02:07
collaborator that when you apply this
00:02:09
method we've developed this is how
00:02:11
likely how well you're likely to do
00:02:12
tomorrow and and in some cases you won't
00:02:14
do well enough to actually use the
00:02:16
method and you'll have to improve your
00:02:17
algorithm or maybe collect better data
00:02:20
another other thing we want to convey
00:02:22
just through the course and hopefully
00:02:23
through
00:02:24
the examples is that this is a really
00:02:26
exciting exciting area in research i
00:02:27
mean statistics in general is very hot
00:02:29
area cisco learning and machine learning
00:02:31
is of more and more importance and it's
00:02:33
really exciting that the area is not in
00:02:35
it gelled in any way in the sense that
00:02:37
there's a lot of good methods out there
00:02:38
but a lot of challenging problems that
00:02:39
aren't solved so especially in in recent
00:02:42
years rob you know with the onset of big
00:02:44
data and
00:02:45
and and coined the word data science
00:02:49
right and uh physical learning as trevor
00:02:51
mentioned is a fundamental ingredient in
00:02:52
the in this new area of data science
00:02:56
so you might be wondering where where's
00:02:57
the term supervised and
00:02:59
supervised learning come from it's
00:03:00
actually a very clever term and i'd like
00:03:02
to take credit for it but i can't it
00:03:03
wasn't it was developed by someone in
00:03:05
the i think in the in the machine
00:03:07
learning area that he has supervisor and
00:03:09
he can think in a kindergarten of a
00:03:10
teacher trying to teach a child to
00:03:12
classify to to discriminate between what
00:03:15
a say a house is in a bike so he might
00:03:17
show the child maybe johnny say
00:03:19
uh johnny here's here's some examples of
00:03:21
what a house looks like it may be in in
00:03:23
in lego blocks and here's some examples
00:03:24
of what what a bike looks like
00:03:27
so and you see and
00:03:29
he tells johnny this and shows him
00:03:30
examples of each of the classes and then
00:03:32
the child then learns oh i see a house
00:03:35
has got sort of square edges and a bike
00:03:36
has got some more rounded edges etc
00:03:38
that's supervised learning because he's
00:03:40
been given examples of label training
00:03:43
observations he's been supervised
00:03:45
and and as trevor just sketched out in
00:03:47
on the previous slide
00:03:49
the y there is given and the the child
00:03:52
tries to learn the to classify the two
00:03:54
objects based on the features the x's
00:03:57
now um
00:03:58
unsupervised learning is another
00:04:00
thing topic of this course and which i
00:04:02
grew up
00:04:04
i see that's the problem okay well so in
00:04:06
unsupervised learning now in the
00:04:08
kindergarten not trevor's in
00:04:09
kindergarten and the child was not
00:04:11
trevor was not given examples of what a
00:04:14
house in a bike was he's he just sees on
00:04:16
the ground lots lots of things right he
00:04:19
sees maybe some houses some bikes some
00:04:21
other things
00:04:22
and so this this data is unlabeled
00:04:24
there's no why oh it's pretty sharp bro
00:04:26
okay so um the problem there now is for
00:04:29
the child just it's unsupervised to try
00:04:32
to organize in his own mind the common
00:04:34
patterns of what he sees right he may
00:04:37
look at the object and say oh these
00:04:38
three things are probably houses or he
00:04:39
doesn't know they're called houses but
00:04:40
they they're similar to each other
00:04:42
because they have common features these
00:04:44
other objects maybe their bikes or other
00:04:46
things they're similar to each other
00:04:47
because i see some commonality
00:04:50
and
00:04:51
that brings the idea of trying to group
00:04:53
observations by similarity of features
00:04:55
which is going to be a
00:04:57
a major topic of this course
00:04:59
unsupervised learning
00:05:00
so more formally again there's no
00:05:02
outcome variable measure just a set of
00:05:04
predictors and the objective is more
00:05:06
fuzzy it's sort of it's not just it's
00:05:07
not to predict why because there is no
00:05:09
why it's rather to learn about the how
00:05:11
the data is organized
00:05:13
and to find which features are important
00:05:15
for the organization of the data so
00:05:16
we'll talk about again clustering and
00:05:18
principal components which are important
00:05:21
techniques for unsupervised learning one
00:05:23
of the other challenges is
00:05:25
is that it's hard to know how well
00:05:26
you're doing right there's no gold
00:05:27
standard there's no why so when you've
00:05:29
done a clustering
00:05:32
analysis you don't really know how well
00:05:33
you've done
00:05:34
and that's one of the challenges but
00:05:36
nonetheless it's an extremely important
00:05:38
area
00:05:40
both because well one reason is that the
00:05:42
idea of unsupervised learning is an
00:05:44
important preprocessor for supervised
00:05:45
learning it's often useful to try to
00:05:47
organize your features choose features
00:05:50
based on on the um
00:05:52
on the on the x's themselves and then
00:05:54
use those those processed or chosen
00:05:56
features as input into supervised
00:05:58
learning and the last point is that it's
00:06:00
a lot easier it's a lot more common to
00:06:02
collect data which is unlabeled right
00:06:04
because on the web for example if you
00:06:06
look at movie reviews you can you can
00:06:08
a computer algorithm can just scan the
00:06:10
web and grab reviews
00:06:12
figuring out whether review on the other
00:06:13
hand is positive or negative often takes
00:06:15
human intervention so it's much harder
00:06:17
and costly to label data much easier
00:06:20
just to collect unsupervised unlabeled
00:06:22
data
00:06:24
well the last example we're going to
00:06:26
show you is a wonderful example it's a
00:06:27
netflix prize
00:06:29
netflix is a
00:06:31
movie rental company in in the us
00:06:34
and now you can get the movies online
00:06:36
they used to be
00:06:38
see dvds that were mailed out
00:06:41
and
00:06:42
netflix set up a competition
00:06:45
to try and improve on their recommender
00:06:46
system so they created a data set with
00:06:49
400 000 netflix customers
00:06:52
and 18 000 movies
00:06:54
and each of these customers had rated on
00:06:56
average around 200 movies each so
00:07:00
so each customer had not seen
00:07:03
they've only seen about one percent of
00:07:04
the of the movies
00:07:06
and so you can think of this as having a
00:07:07
a very big matrix
00:07:10
which is very sparsely populated with
00:07:12
ratings between one and five and then
00:07:14
the goal is to try and predict as in all
00:07:16
recommender systems to predict what the
00:07:18
customers would think of the other
00:07:19
movies based on what they rated so far
00:07:21
so netflix set up a competition
00:07:24
um which
00:07:25
where they offered a one million dollar
00:07:27
prize for the first team that could
00:07:29
improve on their
00:07:31
on their rating system by 10 percent
00:07:33
by some measure and the design of the
00:07:35
competition was very clever i don't know
00:07:37
if it was by by luck or not but the the
00:07:39
root mean square error of the original
00:07:40
algorithm was about 9.953 so that's on a
00:07:43
scale of again one to five
00:07:45
and it took the community when they
00:07:46
announced the the competition and put
00:07:48
the data on the web it took the
00:07:49
community about about a month or so to
00:07:51
get to have an algorithm which improved
00:07:52
upon that
00:07:53
but then it took the community about
00:07:55
another three years to actually for
00:07:57
someone to win the competition
00:07:59
so it's a it's a it's a great example
00:08:01
here's the leaderboard um at the time
00:08:03
the competition ended it was with it was
00:08:06
eventually won by a team called bell
00:08:08
course pragmatic chaos
00:08:10
um but a very close second was ensemble
00:08:13
in fact they had the same score up to
00:08:15
four decimal points um and and the final
00:08:18
winner was determined by who submitted
00:08:20
the the final predictions first
00:08:24
and so this was a wonderful competition
00:08:26
but what was especially wonderful was
00:08:28
the amount of research that it generated
00:08:30
there were thousands tens of thousands
00:08:32
of teams all over the world entered this
00:08:34
competition over the period of of three
00:08:36
years and a whole lot of new techniques
00:08:38
were invented in the process
00:08:40
a lot of the winning techniques ended up
00:08:43
using a form of
00:08:45
principal components in the presence of
00:08:46
missing data how come our names not on
00:08:48
that list trevor where's our team
00:08:51
that's a good point rob
00:08:53
the page isn't long enough
00:08:55
i think if we went down a few hundred
00:08:57
you might
00:08:58
so actually seriously we actually tried
00:09:00
with a graduate student when the
00:09:01
competition started we spent about three
00:09:02
or four months
00:09:04
trying to trying to win the competition
00:09:06
and
00:09:07
one of the problems with was computation
00:09:09
the data was so big and our computers
00:09:10
were not fast enough to just to try
00:09:12
things out took too long and we realized
00:09:14
that the graduate student
00:09:16
um was probably not going to succeed and
00:09:18
he was probably going to waste three
00:09:19
years of his graduate program which is
00:09:20
not a good idea for his career so we
00:09:23
we basically abandoned ship early on
00:09:27
so i mentioned the beginning of the idea
00:09:29
of the field of machine learning which
00:09:31
actually
00:09:32
led to
00:09:33
the
00:09:34
statistical learning area which we're
00:09:35
talking about in this course
00:09:37
and machine learning say itself arose as
00:09:40
a subfield of artificial intelligence
00:09:42
especially with the advent of neural
00:09:43
networks in the 80s
00:09:46
so it's natural to wonder what's the
00:09:48
relationship between statistical
00:09:49
learning and machine learning and
00:09:50
there's it's first of all the question's
00:09:52
hard to answer we ask that question
00:09:53
often there's a lot of overlap machine
00:09:55
learning tends to work at larger scales
00:09:57
they tend to work on bigger problems
00:09:59
although again the gap tends to be
00:10:01
closing because computers fast computers
00:10:03
now becoming much cheaper
00:10:05
machine learning worries more about pure
00:10:06
prediction and how well things predict
00:10:08
cisco learning also worries about
00:10:10
prediction but but also about
00:10:13
models tries to come up with models
00:10:15
methods that can be interpreted by
00:10:16
scientists and others and also um
00:10:19
by how well the method is doing we worry
00:10:22
more about precision uncertainty
00:10:24
but again the distinctions become more
00:10:25
and more blurred and there's a lot of
00:10:26
cross-fertilization between the methods
00:10:29
machine learning clearly has the upper
00:10:31
hand in marketing
00:10:32
they tend to get much bigger grants and
00:10:34
their their conferences are much nicer
00:10:36
places but
00:10:37
we're trying to change that starting
00:10:38
with this course
00:10:41
so
00:10:42
here's the course text
00:10:44
introduction to statistical learning
00:10:46
we're very excited this is a new book um
00:10:49
by two of our graduate students past
00:10:51
graduate students gareth james and
00:10:53
daniella witten and robin myself book
00:10:56
just came out in in august 2013 and this
00:10:59
course will cover this book in its
00:11:00
entirety
00:11:02
the book has
00:11:04
as at the end of each chapter there's
00:11:07
examples run through in in the r
00:11:09
computing language
00:11:10
and we we do sessions on r and so when
00:11:13
you do this course you'll actually learn
00:11:15
to use r as well r is a wonderful
00:11:18
environment it's free
00:11:20
and
00:11:22
and it's a really nice way of doing data
00:11:23
analysis
00:11:25
you'll see there's a second book there
00:11:27
which is our
00:11:29
more advanced textbook elements of
00:11:31
statistical learning that's been around
00:11:32
for a while
00:11:34
that that would be serve as a reference
00:11:36
book for this course for people who want
00:11:38
to understand some of the techniques in
00:11:40
in more detail now the nice thing is
00:11:43
this court not only is this course free
00:11:45
but these books are free as well
00:11:47
the elements of statistical learning has
00:11:49
been free and and the pdfs available on
00:11:51
our websites this new book is going to
00:11:53
be free beginning of january when the
00:11:55
course begins
00:11:57
and uh and that's a with agreement with
00:11:59
the um with the publishers but if you
00:12:01
want to buy the book that's okay too
00:12:02
it's nice having the hard copy but if
00:12:04
you want the pdf is available
00:12:06
so
00:12:07
um we hope you enjoy the rest of the
00:12:09
class

Etiquetas

supervised learning
unsupervised learning
regression
classification
Netflix Prize
data science
machine learning
statistical learning
R programming
clustering