Statistical Learning: 1.2 Examples and Framework

00:12:13
https://www.youtube.com/watch?v=B9s8rpdNxU0

概要

TLDRThe video provides an overview of supervised and unsupervised learning, key concepts in statistical learning and machine learning. It defines outcome measurements (y) and predictor measurements (x), explaining the differences between regression and classification problems. The objectives of supervised learning include predicting unseen cases, understanding input effects, and assessing prediction quality. The importance of grasping underlying ideas behind methods is emphasized. Unsupervised learning is introduced as a method to find patterns in unlabeled data. The Netflix Prize competition is highlighted as a practical example of applying these concepts, showcasing the excitement and ongoing research in the field.

収穫

  • 📊 Supervised learning uses labeled data to predict outcomes.
  • 🔍 Unsupervised learning finds patterns in unlabeled data.
  • 💡 Regression predicts continuous outcomes; classification predicts categories.
  • 🏆 The Netflix Prize improved recommendation systems through competition.
  • 📚 Understanding methods is crucial for effective application.
  • 🖥️ R is a key tool for data analysis in this course.
  • 🔄 Unsupervised learning can preprocess data for supervised learning.
  • 📈 Statistical learning focuses on model interpretation and uncertainty.
  • 🌐 Machine learning often deals with larger datasets and pure prediction.
  • 🎓 Course materials, including textbooks, are available for free.

タイムライン

  • 00:00:00 - 00:05:00

    The discussion begins with an introduction to supervised learning, defining key terms such as outcome measurement (y) and predictor measurements (x). It distinguishes between regression and classification problems, emphasizing the goal of accurately predicting unseen test cases and understanding the influence of inputs on outcomes. The importance of grasping the underlying ideas of various techniques is highlighted, as well as the necessity of assessing the quality of predictions and the potential need for algorithm improvement or better data collection.

  • 00:05:00 - 00:12:13

    The concept of unsupervised learning is introduced, contrasting it with supervised learning. In unsupervised learning, there is no outcome variable, and the objective is to understand how data is organized and identify important features. Techniques like clustering and principal components are discussed, along with the challenges of evaluating performance without a gold standard. The Netflix Prize example illustrates the practical application of these concepts, showcasing the competition's impact on research and the development of new techniques in the field.

マインドマップ

ビデオQ&A

  • What is supervised learning?

    Supervised learning is a type of machine learning where the model is trained on labeled data, meaning the outcome variable is known.

  • What is unsupervised learning?

    Unsupervised learning involves training a model on data without labeled outcomes, focusing on finding patterns or groupings in the data.

  • What are some examples of supervised learning problems?

    Examples include regression problems (predicting continuous outcomes like price) and classification problems (predicting categorical outcomes like survival status).

  • What is the Netflix Prize?

    The Netflix Prize was a competition to improve Netflix's movie recommendation system, offering a $1 million prize for a 10% improvement over their existing algorithm.

  • What is the difference between statistical learning and machine learning?

    Statistical learning focuses on model interpretation and uncertainty, while machine learning emphasizes prediction accuracy and often deals with larger datasets.

  • What tools will be used in this course?

    The course will use R, a free software environment for statistical computing and graphics.

  • Are the course materials free?

    Yes, both the course and the associated textbooks will be available for free.

  • What is the importance of understanding the methods in machine learning?

    Understanding the methods helps in applying them effectively to new problems and assessing their performance.

  • What are some techniques covered in unsupervised learning?

    Techniques include clustering and principal components analysis.

  • Why is unsupervised learning important?

    It helps in organizing data and can serve as a preprocessing step for supervised learning.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:01
    okay now we're going to talk about the
  • 00:00:02
    supervised learning problem and set down
  • 00:00:04
    a little bit of notation
  • 00:00:06
    so we'll have an outcome measurement y
  • 00:00:09
    which is goes by various names dependent
  • 00:00:11
    variable response or target and then
  • 00:00:13
    we'll have a vector of p predictor
  • 00:00:15
    measurements which are usually called x
  • 00:00:17
    they go by the name inputs regressors
  • 00:00:19
    covariates
  • 00:00:21
    features or independent variables
  • 00:00:24
    and we distinguish two cases one is the
  • 00:00:26
    regression problem y is quantitative
  • 00:00:29
    such as price or blood pressure
  • 00:00:32
    in the classification problem y takes
  • 00:00:34
    values in a in a finite unordered set
  • 00:00:36
    such as survived or died the digit class
  • 00:00:39
    is zero to nine the cancer class of the
  • 00:00:41
    tissue sample
  • 00:00:43
    now we have training data pairs x1 y1 x2
  • 00:00:46
    y2 up to xn yn so again x1 is a vector
  • 00:00:50
    of p measurements y1 is usually a single
  • 00:00:53
    response variable and so these are
  • 00:00:55
    examples or instances
  • 00:00:57
    of these measurements
  • 00:01:00
    so the objectives of supervised learning
  • 00:01:02
    is as follows on the basis of the
  • 00:01:04
    training data we would like to
  • 00:01:06
    accurately predict unseen test cases
  • 00:01:09
    understand which inputs affect the
  • 00:01:11
    outcome and how
  • 00:01:12
    and also to assess the quality of our
  • 00:01:14
    predictions and the inferences
  • 00:01:18
    so by way of philosophy as you take this
  • 00:01:20
    course we want
  • 00:01:22
    not just to give you a laundry list of
  • 00:01:23
    methods but we want you
  • 00:01:25
    to know that it's important to
  • 00:01:26
    understand the ideas behind the various
  • 00:01:27
    techniques so you know where and when to
  • 00:01:29
    use them because in your own work
  • 00:01:31
    you're going to have problems that we've
  • 00:01:32
    never seen before you've never seen
  • 00:01:34
    before and you want to be able to judge
  • 00:01:35
    which methods are likely to work well
  • 00:01:36
    which ones are not likely to work well
  • 00:01:38
    as well uh
  • 00:01:40
    not just prediction accuracy is
  • 00:01:41
    important but it's it's important to to
  • 00:01:43
    to try simple methods first in order to
  • 00:01:45
    grasp the more sophisticated ones we're
  • 00:01:47
    going to spend quite a bit of time on on
  • 00:01:49
    linear models linear regression and
  • 00:01:50
    linear logistic regression these are
  • 00:01:52
    simple methods but they're very
  • 00:01:54
    effective
  • 00:01:55
    and it's also important to understand
  • 00:01:56
    how well method is doing right it's easy
  • 00:01:58
    to apply an algorithm you can nowadays
  • 00:01:59
    you can just run software but
  • 00:02:01
    it's it's difficult but also very
  • 00:02:03
    important to figure out how methods is
  • 00:02:04
    how well is the method actually working
  • 00:02:06
    so you can tell your boss or your
  • 00:02:07
    collaborator that when you apply this
  • 00:02:09
    method we've developed this is how
  • 00:02:11
    likely how well you're likely to do
  • 00:02:12
    tomorrow and and in some cases you won't
  • 00:02:14
    do well enough to actually use the
  • 00:02:16
    method and you'll have to improve your
  • 00:02:17
    algorithm or maybe collect better data
  • 00:02:20
    another other thing we want to convey
  • 00:02:22
    just through the course and hopefully
  • 00:02:23
    through
  • 00:02:24
    the examples is that this is a really
  • 00:02:26
    exciting exciting area in research i
  • 00:02:27
    mean statistics in general is very hot
  • 00:02:29
    area cisco learning and machine learning
  • 00:02:31
    is of more and more importance and it's
  • 00:02:33
    really exciting that the area is not in
  • 00:02:35
    it gelled in any way in the sense that
  • 00:02:37
    there's a lot of good methods out there
  • 00:02:38
    but a lot of challenging problems that
  • 00:02:39
    aren't solved so especially in in recent
  • 00:02:42
    years rob you know with the onset of big
  • 00:02:44
    data and
  • 00:02:45
    and and coined the word data science
  • 00:02:49
    right and uh physical learning as trevor
  • 00:02:51
    mentioned is a fundamental ingredient in
  • 00:02:52
    the in this new area of data science
  • 00:02:56
    so you might be wondering where where's
  • 00:02:57
    the term supervised and
  • 00:02:59
    supervised learning come from it's
  • 00:03:00
    actually a very clever term and i'd like
  • 00:03:02
    to take credit for it but i can't it
  • 00:03:03
    wasn't it was developed by someone in
  • 00:03:05
    the i think in the in the machine
  • 00:03:07
    learning area that he has supervisor and
  • 00:03:09
    he can think in a kindergarten of a
  • 00:03:10
    teacher trying to teach a child to
  • 00:03:12
    classify to to discriminate between what
  • 00:03:15
    a say a house is in a bike so he might
  • 00:03:17
    show the child maybe johnny say
  • 00:03:19
    uh johnny here's here's some examples of
  • 00:03:21
    what a house looks like it may be in in
  • 00:03:23
    in lego blocks and here's some examples
  • 00:03:24
    of what what a bike looks like
  • 00:03:27
    so and you see and
  • 00:03:29
    he tells johnny this and shows him
  • 00:03:30
    examples of each of the classes and then
  • 00:03:32
    the child then learns oh i see a house
  • 00:03:35
    has got sort of square edges and a bike
  • 00:03:36
    has got some more rounded edges etc
  • 00:03:38
    that's supervised learning because he's
  • 00:03:40
    been given examples of label training
  • 00:03:43
    observations he's been supervised
  • 00:03:45
    and and as trevor just sketched out in
  • 00:03:47
    on the previous slide
  • 00:03:49
    the y there is given and the the child
  • 00:03:52
    tries to learn the to classify the two
  • 00:03:54
    objects based on the features the x's
  • 00:03:57
    now um
  • 00:03:58
    unsupervised learning is another
  • 00:04:00
    thing topic of this course and which i
  • 00:04:02
    grew up
  • 00:04:04
    i see that's the problem okay well so in
  • 00:04:06
    unsupervised learning now in the
  • 00:04:08
    kindergarten not trevor's in
  • 00:04:09
    kindergarten and the child was not
  • 00:04:11
    trevor was not given examples of what a
  • 00:04:14
    house in a bike was he's he just sees on
  • 00:04:16
    the ground lots lots of things right he
  • 00:04:19
    sees maybe some houses some bikes some
  • 00:04:21
    other things
  • 00:04:22
    and so this this data is unlabeled
  • 00:04:24
    there's no why oh it's pretty sharp bro
  • 00:04:26
    okay so um the problem there now is for
  • 00:04:29
    the child just it's unsupervised to try
  • 00:04:32
    to organize in his own mind the common
  • 00:04:34
    patterns of what he sees right he may
  • 00:04:37
    look at the object and say oh these
  • 00:04:38
    three things are probably houses or he
  • 00:04:39
    doesn't know they're called houses but
  • 00:04:40
    they they're similar to each other
  • 00:04:42
    because they have common features these
  • 00:04:44
    other objects maybe their bikes or other
  • 00:04:46
    things they're similar to each other
  • 00:04:47
    because i see some commonality
  • 00:04:50
    and
  • 00:04:51
    that brings the idea of trying to group
  • 00:04:53
    observations by similarity of features
  • 00:04:55
    which is going to be a
  • 00:04:57
    a major topic of this course
  • 00:04:59
    unsupervised learning
  • 00:05:00
    so more formally again there's no
  • 00:05:02
    outcome variable measure just a set of
  • 00:05:04
    predictors and the objective is more
  • 00:05:06
    fuzzy it's sort of it's not just it's
  • 00:05:07
    not to predict why because there is no
  • 00:05:09
    why it's rather to learn about the how
  • 00:05:11
    the data is organized
  • 00:05:13
    and to find which features are important
  • 00:05:15
    for the organization of the data so
  • 00:05:16
    we'll talk about again clustering and
  • 00:05:18
    principal components which are important
  • 00:05:21
    techniques for unsupervised learning one
  • 00:05:23
    of the other challenges is
  • 00:05:25
    is that it's hard to know how well
  • 00:05:26
    you're doing right there's no gold
  • 00:05:27
    standard there's no why so when you've
  • 00:05:29
    done a clustering
  • 00:05:32
    analysis you don't really know how well
  • 00:05:33
    you've done
  • 00:05:34
    and that's one of the challenges but
  • 00:05:36
    nonetheless it's an extremely important
  • 00:05:38
    area
  • 00:05:40
    both because well one reason is that the
  • 00:05:42
    idea of unsupervised learning is an
  • 00:05:44
    important preprocessor for supervised
  • 00:05:45
    learning it's often useful to try to
  • 00:05:47
    organize your features choose features
  • 00:05:50
    based on on the um
  • 00:05:52
    on the on the x's themselves and then
  • 00:05:54
    use those those processed or chosen
  • 00:05:56
    features as input into supervised
  • 00:05:58
    learning and the last point is that it's
  • 00:06:00
    a lot easier it's a lot more common to
  • 00:06:02
    collect data which is unlabeled right
  • 00:06:04
    because on the web for example if you
  • 00:06:06
    look at movie reviews you can you can
  • 00:06:08
    a computer algorithm can just scan the
  • 00:06:10
    web and grab reviews
  • 00:06:12
    figuring out whether review on the other
  • 00:06:13
    hand is positive or negative often takes
  • 00:06:15
    human intervention so it's much harder
  • 00:06:17
    and costly to label data much easier
  • 00:06:20
    just to collect unsupervised unlabeled
  • 00:06:22
    data
  • 00:06:24
    well the last example we're going to
  • 00:06:26
    show you is a wonderful example it's a
  • 00:06:27
    netflix prize
  • 00:06:29
    netflix is a
  • 00:06:31
    movie rental company in in the us
  • 00:06:34
    and now you can get the movies online
  • 00:06:36
    they used to be
  • 00:06:38
    see dvds that were mailed out
  • 00:06:41
    and
  • 00:06:42
    netflix set up a competition
  • 00:06:45
    to try and improve on their recommender
  • 00:06:46
    system so they created a data set with
  • 00:06:49
    400 000 netflix customers
  • 00:06:52
    and 18 000 movies
  • 00:06:54
    and each of these customers had rated on
  • 00:06:56
    average around 200 movies each so
  • 00:07:00
    so each customer had not seen
  • 00:07:03
    they've only seen about one percent of
  • 00:07:04
    the of the movies
  • 00:07:06
    and so you can think of this as having a
  • 00:07:07
    a very big matrix
  • 00:07:10
    which is very sparsely populated with
  • 00:07:12
    ratings between one and five and then
  • 00:07:14
    the goal is to try and predict as in all
  • 00:07:16
    recommender systems to predict what the
  • 00:07:18
    customers would think of the other
  • 00:07:19
    movies based on what they rated so far
  • 00:07:21
    so netflix set up a competition
  • 00:07:24
    um which
  • 00:07:25
    where they offered a one million dollar
  • 00:07:27
    prize for the first team that could
  • 00:07:29
    improve on their
  • 00:07:31
    on their rating system by 10 percent
  • 00:07:33
    by some measure and the design of the
  • 00:07:35
    competition was very clever i don't know
  • 00:07:37
    if it was by by luck or not but the the
  • 00:07:39
    root mean square error of the original
  • 00:07:40
    algorithm was about 9.953 so that's on a
  • 00:07:43
    scale of again one to five
  • 00:07:45
    and it took the community when they
  • 00:07:46
    announced the the competition and put
  • 00:07:48
    the data on the web it took the
  • 00:07:49
    community about about a month or so to
  • 00:07:51
    get to have an algorithm which improved
  • 00:07:52
    upon that
  • 00:07:53
    but then it took the community about
  • 00:07:55
    another three years to actually for
  • 00:07:57
    someone to win the competition
  • 00:07:59
    so it's a it's a it's a great example
  • 00:08:01
    here's the leaderboard um at the time
  • 00:08:03
    the competition ended it was with it was
  • 00:08:06
    eventually won by a team called bell
  • 00:08:08
    course pragmatic chaos
  • 00:08:10
    um but a very close second was ensemble
  • 00:08:13
    in fact they had the same score up to
  • 00:08:15
    four decimal points um and and the final
  • 00:08:18
    winner was determined by who submitted
  • 00:08:20
    the the final predictions first
  • 00:08:24
    and so this was a wonderful competition
  • 00:08:26
    but what was especially wonderful was
  • 00:08:28
    the amount of research that it generated
  • 00:08:30
    there were thousands tens of thousands
  • 00:08:32
    of teams all over the world entered this
  • 00:08:34
    competition over the period of of three
  • 00:08:36
    years and a whole lot of new techniques
  • 00:08:38
    were invented in the process
  • 00:08:40
    a lot of the winning techniques ended up
  • 00:08:43
    using a form of
  • 00:08:45
    principal components in the presence of
  • 00:08:46
    missing data how come our names not on
  • 00:08:48
    that list trevor where's our team
  • 00:08:51
    that's a good point rob
  • 00:08:53
    the page isn't long enough
  • 00:08:55
    i think if we went down a few hundred
  • 00:08:57
    you might
  • 00:08:58
    so actually seriously we actually tried
  • 00:09:00
    with a graduate student when the
  • 00:09:01
    competition started we spent about three
  • 00:09:02
    or four months
  • 00:09:04
    trying to trying to win the competition
  • 00:09:06
    and
  • 00:09:07
    one of the problems with was computation
  • 00:09:09
    the data was so big and our computers
  • 00:09:10
    were not fast enough to just to try
  • 00:09:12
    things out took too long and we realized
  • 00:09:14
    that the graduate student
  • 00:09:16
    um was probably not going to succeed and
  • 00:09:18
    he was probably going to waste three
  • 00:09:19
    years of his graduate program which is
  • 00:09:20
    not a good idea for his career so we
  • 00:09:23
    we basically abandoned ship early on
  • 00:09:27
    so i mentioned the beginning of the idea
  • 00:09:29
    of the field of machine learning which
  • 00:09:31
    actually
  • 00:09:32
    led to
  • 00:09:33
    the
  • 00:09:34
    statistical learning area which we're
  • 00:09:35
    talking about in this course
  • 00:09:37
    and machine learning say itself arose as
  • 00:09:40
    a subfield of artificial intelligence
  • 00:09:42
    especially with the advent of neural
  • 00:09:43
    networks in the 80s
  • 00:09:46
    so it's natural to wonder what's the
  • 00:09:48
    relationship between statistical
  • 00:09:49
    learning and machine learning and
  • 00:09:50
    there's it's first of all the question's
  • 00:09:52
    hard to answer we ask that question
  • 00:09:53
    often there's a lot of overlap machine
  • 00:09:55
    learning tends to work at larger scales
  • 00:09:57
    they tend to work on bigger problems
  • 00:09:59
    although again the gap tends to be
  • 00:10:01
    closing because computers fast computers
  • 00:10:03
    now becoming much cheaper
  • 00:10:05
    machine learning worries more about pure
  • 00:10:06
    prediction and how well things predict
  • 00:10:08
    cisco learning also worries about
  • 00:10:10
    prediction but but also about
  • 00:10:13
    models tries to come up with models
  • 00:10:15
    methods that can be interpreted by
  • 00:10:16
    scientists and others and also um
  • 00:10:19
    by how well the method is doing we worry
  • 00:10:22
    more about precision uncertainty
  • 00:10:24
    but again the distinctions become more
  • 00:10:25
    and more blurred and there's a lot of
  • 00:10:26
    cross-fertilization between the methods
  • 00:10:29
    machine learning clearly has the upper
  • 00:10:31
    hand in marketing
  • 00:10:32
    they tend to get much bigger grants and
  • 00:10:34
    their their conferences are much nicer
  • 00:10:36
    places but
  • 00:10:37
    we're trying to change that starting
  • 00:10:38
    with this course
  • 00:10:41
    so
  • 00:10:42
    here's the course text
  • 00:10:44
    introduction to statistical learning
  • 00:10:46
    we're very excited this is a new book um
  • 00:10:49
    by two of our graduate students past
  • 00:10:51
    graduate students gareth james and
  • 00:10:53
    daniella witten and robin myself book
  • 00:10:56
    just came out in in august 2013 and this
  • 00:10:59
    course will cover this book in its
  • 00:11:00
    entirety
  • 00:11:02
    the book has
  • 00:11:04
    as at the end of each chapter there's
  • 00:11:07
    examples run through in in the r
  • 00:11:09
    computing language
  • 00:11:10
    and we we do sessions on r and so when
  • 00:11:13
    you do this course you'll actually learn
  • 00:11:15
    to use r as well r is a wonderful
  • 00:11:18
    environment it's free
  • 00:11:20
    and
  • 00:11:22
    and it's a really nice way of doing data
  • 00:11:23
    analysis
  • 00:11:25
    you'll see there's a second book there
  • 00:11:27
    which is our
  • 00:11:29
    more advanced textbook elements of
  • 00:11:31
    statistical learning that's been around
  • 00:11:32
    for a while
  • 00:11:34
    that that would be serve as a reference
  • 00:11:36
    book for this course for people who want
  • 00:11:38
    to understand some of the techniques in
  • 00:11:40
    in more detail now the nice thing is
  • 00:11:43
    this court not only is this course free
  • 00:11:45
    but these books are free as well
  • 00:11:47
    the elements of statistical learning has
  • 00:11:49
    been free and and the pdfs available on
  • 00:11:51
    our websites this new book is going to
  • 00:11:53
    be free beginning of january when the
  • 00:11:55
    course begins
  • 00:11:57
    and uh and that's a with agreement with
  • 00:11:59
    the um with the publishers but if you
  • 00:12:01
    want to buy the book that's okay too
  • 00:12:02
    it's nice having the hard copy but if
  • 00:12:04
    you want the pdf is available
  • 00:12:06
    so
  • 00:12:07
    um we hope you enjoy the rest of the
  • 00:12:09
    class
タグ
  • supervised learning
  • unsupervised learning
  • regression
  • classification
  • Netflix Prize
  • data science
  • machine learning
  • statistical learning
  • R programming
  • clustering