Class 10: Goodness of Fit: Saturated model, Covariate patterns, Deviance, Hosmer-Lemeshow statistic.

00:57:03
https://www.youtube.com/watch?v=uF34C5bMqhs

概要

TLDRLa vidéo traite de l'évaluation de l'ajustement des modèles statistiques, notamment en régression logistique. Elle explique l'importance de comprendre comment les modèles décrivent les données observées, introduisant le concept de modèle saturé, qui ajuste parfaitement les données mais est impraticable dans la pratique. Les statistiques de déviance sont discutées, permettant de quantifier à quel point un modèle s'écarte de ce modèle idéal. La vidéo conclut sur le test de Hosmer-Lemeshow, une méthode robuste pour évaluer l'ajustement, particulièrement adaptée lorsque le nombre de motifs covariants est élevé.

収穫

  • 📊 La goodness-of-fit évalue la capacité d'un modèle à décrire les données.
  • 📉 Le modèle saturé prédit parfaitement mais est impraticable.
  • 🧮 La déviance compare les modèles d'intérêt à un modèle de référence.
  • 🆚 Le test de Hosmer-Lemeshow est plus convivial pour évaluer l'ajustement.
  • 🔍 Comprendre les motifs de covariabilité est crucial pour l'analyse.
  • ⚖️ Les statistiques de déviance mesurent l'écart par rapport au modèle idéal.

タイムライン

  • 00:00:00 - 00:05:00

    Ce segment présente une introduction au concept de "goodness-of-fit" (degré d'adéquation). Il explique que le degré d'adéquation concerne la capacité d'un modèle à décrire les données disponibles, même si le modèle peut donner des résultats statistiquement significatifs sans réellement bien prédire les sorties.

  • 00:05:00 - 00:10:00

    Le conférencier souligne l'importance de comprendre la réaction entre le model et les données observées tant dans le contexte de la régression linéaire que logistique. L'idéal serait que les valeurs prédites coïncident exactement avec les valeurs observées, ce qui est rarement atteint dans la pratique.

  • 00:10:00 - 00:15:00

    Il définit les termes "good fit" et "lack of fit", et explique comment ils sont liés à l'hypothèse nulle (bonne adéquation). Plus de prédicteurs dans un modèle entraînent généralement un meilleur ajustement, mais il peut aussi aboutir à des modèles inutilisables.

  • 00:15:00 - 00:20:00

    Le conférencier introduit deux approches pour évaluer le degré d'adéquation : les approches basées sur la déviance et le test de Hosmer-Lemeshow. Il explique que certaines approches sont idéales mais impraticables, tandis que d'autres sont applicables et fournissent des résultats significatifs.

  • 00:20:00 - 00:25:00

    Le concept de "saturated model" (modèle saturé) est décrit comme un modèle qui peut parfaitement prédire les résultats en fonction de chaque point de données, mais qui est impraticable car il nécessite autant de paramètres que de points de données.

  • 00:25:00 - 00:30:00

    Dans le cadre de l'évaluation de la déviance, il est précisé qu'il y a deux modèles à considérer : le modèle saturé et le modèle pleinement paramétré. Le modèle saturé, bien que parfait, n'est pas réalisable pour des applications pratiques.

  • 00:30:00 - 00:35:00

    Le conférencier montre qu'il est nécessaire de formaliser certaines notations pour parler des résultats des modèles. Il illustre cela avec des tables et des termes tels que Y (observés) et P (prédits) pour aider à la compréhension.

  • 00:35:00 - 00:40:00

    Le "saturated model" est présenté comme capable de générer des prédictions qui ajustent les données exactement, mais ces modèles sont complexes à gérer. En revanche, le "fully parameterized model" prédit les moyennes des résultats en fonction des covariables et est plus utile malgré sa complexité.

  • 00:40:00 - 00:45:00

    Il est expliqué que le modèle pleinement paramétré, bien qu'il ne prédit pas exactement les valeurs 0 ou 1, permet de prédire les résultats moyens de manière adéquate. Ce modèle est alors comparé au modèle que l'on veut évaluer.

  • 00:45:00 - 00:50:00

    Le conférencier explique qu'il faut évaluer le nombre de motifs de covariables qui existent dans un ensemble de données. Si le nombre de motifs est trop élevé, le modèle pleinement paramétré devient quasiment identique au modèle saturé, ce qui entraîne des problèmes d'interprétabilité.

  • 00:50:00 - 00:57:03

    Il conclut ce segment par une introduction au "Hosmer-Lemeshow test", qui évalue la manière dont un modèle peut prédire les résultats en divisant les données en groupes propices pour évaluer le degré d'adéquation. Ce test est pertinent là où le modèle pleinement paramétré ne le serait pas.

もっと見る

マインドマップ

ビデオQ&A

  • Qu'est-ce que le test de Hosmer-Lemeshow?

    C'est un test qui évalue comment le modèle prédit les résultats observés en divisant les données en déciles et en comparant les sommes des prédictions aux sommes des résultats observés.

  • Pourquoi le modèle saturé n'est-il pas pratique?

    Parce qu'il nécessite un nombre de paramètres équivalent au nombre d'observations, ce qui le rend trop complexe et non interprétable.

  • Comment la déviance est-elle utilisée pour évaluer l'ajustement d'un modèle?

    La déviance mesure l'écart entre la vraisemblance du modèle d'intérêt et la vraisemblance d'un modèle de référence, généralement le modèle saturé.

  • Qu'est-ce qu'un modèle complètement paramétré?

    Un modèle qui inclut tous les termes possibles et interactions des covariables, mais ne prédit pas nécessairement les résultats zero-un de manière exacte.

  • Quels sont les inconvénients d'utiliser la déviance avec des données continues?

    Si le nombre de motifs covariants est trop grand, le modèle complètement paramétré ressemble au modèle saturé, rendant la déviance non informative.

  • Comment la covariabilité affecte-t-elle les tests d'ajustement?

    L'augmentation du nombre de covariables continues peut conduire à une grande diversité de motifs dans les données, compliquant l'interprétation des ajustements.

ビデオをもっと見る

AIを活用したYouTubeの無料動画要約に即アクセス!
字幕
en
オートスクロール:
  • 00:00:00
    okay so we're gonna jump into
  • 00:00:02
    goodness-of-fit and this one gets a
  • 00:00:07
    little it's gonna get a little esoteric
  • 00:00:09
    before it gets practical I feel like I
  • 00:00:12
    always get these lectures where we go
  • 00:00:13
    out into a limb then we come back in and
  • 00:00:15
    the reason that we do that is because
  • 00:00:18
    with this topic we're gonna sort of show
  • 00:00:20
    what the ideal scenario would be and it
  • 00:00:23
    turns out the ideal thing that we want
  • 00:00:24
    to do is impossible and the stuff that
  • 00:00:26
    we want to do Hoss comes later because
  • 00:00:29
    we can't do the ideal thing is we have
  • 00:00:31
    to talk about the ideal thing and that's
  • 00:00:33
    the esoteric part but we'll get there
  • 00:00:36
    together it will be okay so so
  • 00:00:40
    goodness-of-fit so what are we talking
  • 00:00:41
    about when we mean or were when we talk
  • 00:00:45
    about goodness-of-fit okay it's really
  • 00:00:46
    about how good is the model in terms of
  • 00:00:49
    describing the data that we have okay
  • 00:00:52
    and it's this basic idea that you could
  • 00:00:54
    have a model that has statistically
  • 00:00:56
    significant things looks great
  • 00:00:57
    cool cool odds ratios tight at
  • 00:00:59
    confidence intervals and things that you
  • 00:01:01
    know based so far and everything we've
  • 00:01:04
    said in the class and all the metrics
  • 00:01:05
    we've described looks like a good model
  • 00:01:07
    but it actually doesn't really describe
  • 00:01:10
    the data very well so you can get
  • 00:01:11
    significant things but the model just
  • 00:01:13
    sort of stinks in terms of actually
  • 00:01:17
    predicting the outcome okay and in
  • 00:01:21
    predicting the probabilities of the
  • 00:01:23
    outcome so this could be you know think
  • 00:01:26
    of simple linear regression where you
  • 00:01:29
    can have a significant you can have a
  • 00:01:31
    scatterplot with something you know some
  • 00:01:33
    points on it and you draw a line and the
  • 00:01:35
    debate ax for the line is significant
  • 00:01:37
    okay it's a statistically significant
  • 00:01:40
    nonzero slope to that line and it go
  • 00:01:42
    cuts through your data but your data
  • 00:01:44
    might be a very sort of cloud like not
  • 00:01:47
    very tight to the line right and it may
  • 00:01:48
    just sort of not fit super well and the
  • 00:01:51
    line fits through the data and you have
  • 00:01:52
    a number for the slope and the p-values
  • 00:01:54
    tiny and it could but it actually may
  • 00:01:56
    not describe all the variation in your
  • 00:01:59
    data very well so you can get the same
  • 00:02:01
    sort of phenomenon with with logistic
  • 00:02:05
    regression you fit the model stuff looks
  • 00:02:06
    great except minute just may not be the
  • 00:02:09
    best model may not describe the data
  • 00:02:11
    very well and
  • 00:02:12
    so what we're talking about here is when
  • 00:02:17
    we mean by fit formally is that for
  • 00:02:19
    every observation in the ideal world we
  • 00:02:23
    would really like the the distance
  • 00:02:26
    between the observed data points that
  • 00:02:28
    this deserved outcomes Y and the
  • 00:02:30
    estimated outcomes Y I hat sorry Y I hat
  • 00:02:35
    this is from the model we'd like them to
  • 00:02:37
    be 0 we want the predict this is the
  • 00:02:38
    predicted value and the observed value
  • 00:02:40
    and we like them to sort on average be
  • 00:02:43
    sort of close alright we want in general
  • 00:02:45
    our model to predict things that are
  • 00:02:47
    close to the observed the observed data
  • 00:02:50
    okay so the the terms that we are often
  • 00:02:55
    using in this line of work or a lack of
  • 00:02:58
    fit and good fit
  • 00:03:00
    okay so lack of fit is evidence of bad
  • 00:03:02
    fit okay that that this this these discs
  • 00:03:06
    these distances tend to be big okay that
  • 00:03:09
    means there's bad fit that the model is
  • 00:03:10
    predicting something that's way off from
  • 00:03:12
    reality the observed wise okay and that
  • 00:03:15
    typically is that we're rejecting some
  • 00:03:18
    null hypothesis that's and then we
  • 00:03:21
    inject the null hypothesis we said
  • 00:03:22
    there's lack of fits the null hypothesis
  • 00:03:24
    is that there's good fit there's a lock
  • 00:03:26
    this is this is this is the total weird
  • 00:03:28
    way as in statistics that we sort of
  • 00:03:29
    describe things the null hypothesis
  • 00:03:32
    typically is that the fit is good or
  • 00:03:34
    that there's lack of evidence of bad fit
  • 00:03:36
    remember all this sort of
  • 00:03:36
    tongue-twisting that we do for the null
  • 00:03:38
    hypothesis that there's like lack of
  • 00:03:40
    evidence that this isn't true and so
  • 00:03:41
    that's sort of what the idea is for the
  • 00:03:44
    null hypothesis typically is that the
  • 00:03:45
    fit is that we don't have evidence that
  • 00:03:46
    the model is bad and then we call it
  • 00:03:48
    good fit this is I'm writing this all
  • 00:03:51
    out this seems like all these double
  • 00:03:52
    negatives but this is exactly how its
  • 00:03:54
    defined so I'm just trying to just
  • 00:03:57
    trying to put you up front with it the
  • 00:03:58
    channeling the null hypothesis is good
  • 00:04:00
    fit and that the alternative is let
  • 00:04:02
    there's a lack of fit and the idea here
  • 00:04:04
    is typically when we add more predictors
  • 00:04:07
    into the model we usually get a better
  • 00:04:09
    fit right the more think terms you add
  • 00:04:11
    to the model the more the fit to the
  • 00:04:13
    observed data goes up at some point you
  • 00:04:15
    add too many predictors and your model
  • 00:04:17
    is sort of meaningless because you added
  • 00:04:18
    500 things to it so this idea this this
  • 00:04:22
    was this is this idea is used all over
  • 00:04:24
    the place and linear regret
  • 00:04:26
    right that you're trying to sort of
  • 00:04:27
    narrow the distance between what the
  • 00:04:29
    model predicts for the outcome versus
  • 00:04:32
    the observed data okay so so we're going
  • 00:04:36
    to talk about three three approaches to
  • 00:04:39
    dealing with with goodness of fit today
  • 00:04:41
    and this first one the first two are
  • 00:04:44
    using this idea of the deviance and
  • 00:04:46
    we'll talk about what we mean by the
  • 00:04:47
    deviance but within that there's sort of
  • 00:04:49
    two there's sort of two subtypes there's
  • 00:04:55
    deviance is based on something called
  • 00:04:56
    the saturated model and we're going to
  • 00:04:58
    talk about this is the thing that's the
  • 00:05:00
    ideal that we it's not actually useful
  • 00:05:02
    or useable in in life in practice and so
  • 00:05:06
    we're going to talk about something
  • 00:05:07
    which is almost as good it's called it's
  • 00:05:10
    gonna be called the avenged trials
  • 00:05:11
    deviance and it's based on something
  • 00:05:12
    called the fully parameterized model and
  • 00:05:14
    we'll talk about all that so this is the
  • 00:05:16
    thing that is what we can use and this
  • 00:05:18
    is the thing that we would like to use
  • 00:05:20
    but we can't okay so this is there's
  • 00:05:22
    things that use these are the deviance
  • 00:05:23
    based approaches then there's this
  • 00:05:25
    totally other thing called the Hosmer
  • 00:05:26
    lemma show test okay and it's based it's
  • 00:05:29
    trying to get it the same idea of
  • 00:05:30
    goodness of fit but is using a totally
  • 00:05:32
    different approach okay and Hosmer lemma
  • 00:05:34
    show toss is what we'll talk about at
  • 00:05:35
    the end and then actually in in the
  • 00:05:37
    literature this is the thing you see the
  • 00:05:39
    most is the Hosmer lemma show test so
  • 00:05:42
    sort of two groups of what we'll talk
  • 00:05:45
    about okay here we go this is exactly
  • 00:05:48
    what I just said okay so this is the
  • 00:05:50
    ideal we can't do it and we're gonna
  • 00:05:55
    talk about for these last two the ones
  • 00:05:56
    you can do which ones are better to use
  • 00:06:00
    when the Hosmer lemma show is a more
  • 00:06:02
    universal test it has a few limits so
  • 00:06:04
    we'll talk about its limits
  • 00:06:05
    this one has some limits and so what
  • 00:06:08
    we're going to talk about the events
  • 00:06:09
    trials and where you can't use it which
  • 00:06:10
    it turns out is it happens a lot
  • 00:06:12
    okay so we're so to get to all of that
  • 00:06:16
    we're gonna have to talk about some
  • 00:06:17
    notation okay and they're you know
  • 00:06:21
    because they all look sort of similar so
  • 00:06:23
    we got a I'm gonna sort of illustrate
  • 00:06:25
    the notation so you're not like oh my
  • 00:06:27
    god they all look like Y and P and then
  • 00:06:29
    we sort of differentiate them to be able
  • 00:06:32
    to work through it because these
  • 00:06:33
    quantities are really important to
  • 00:06:34
    understanding the goodness of fit sort
  • 00:06:36
    of idea and how they're all the methods
  • 00:06:38
    are working okay
  • 00:06:40
    so here's four quantities and here's
  • 00:06:43
    here's some stuff here's a two-by-two
  • 00:06:44
    table okay really really simple two by
  • 00:06:48
    two table here's exposed on the columns
  • 00:06:49
    disease on the rows here's some just
  • 00:06:53
    some data
  • 00:06:54
    okay here's three lines of data real
  • 00:06:56
    pretend these are people here's their
  • 00:06:58
    here's the disease status here is the
  • 00:07:00
    exposed status and we'll talk about this
  • 00:07:02
    is supposed to say P hat okay so this is
  • 00:07:04
    we'll talk about that but this is
  • 00:07:05
    supposed to be the lines of lines of
  • 00:07:07
    data view and this is the sort of
  • 00:07:13
    two-by-two table view okay so why I this
  • 00:07:15
    is the easy one these are the observed
  • 00:07:18
    values of the outcome in your data okay
  • 00:07:22
    the Y eyes so in linear regression these
  • 00:07:24
    were continuous here they're essentially
  • 00:07:27
    your zero one outcomes in your you know
  • 00:07:32
    your zero one outcomes in your data okay
  • 00:07:35
    so here are Y eyes are the the D
  • 00:07:38
    variables right and so your one zero one
  • 00:07:42
    on the bottom and the Y is here are
  • 00:07:44
    summarized as counts right so here we
  • 00:07:46
    have six disease exposed and so forth
  • 00:07:49
    okay so these are the observed values
  • 00:07:51
    and the data this thing called P hat and
  • 00:07:55
    this is supposed to be a subscript
  • 00:07:56
    innocent come out so L subscript X X
  • 00:07:59
    eyes this is the average values of Y of
  • 00:08:05
    being of being diseased and on diseased
  • 00:08:08
    in your data at different values of your
  • 00:08:10
    covariance so what does that mean so
  • 00:08:13
    here in this column of exposed six out
  • 00:08:17
    of ten or point six have disease okay so
  • 00:08:21
    this this is like the risk or the
  • 00:08:23
    probability of disease or what everyone
  • 00:08:25
    will call the probability called let's
  • 00:08:27
    call it the probability of disease given
  • 00:08:28
    your exposed is one your one for
  • 00:08:31
    exposure 0.4 is the probability of
  • 00:08:34
    disease given your unexposed this is
  • 00:08:36
    this we call this quantity many many
  • 00:08:38
    things risk probability and so forth
  • 00:08:40
    this is so here we're going to society
  • 00:08:43
    just say with a really generic way it's
  • 00:08:46
    the
  • 00:08:47
    observed it's basically the party's
  • 00:08:50
    probabilities at the values at these
  • 00:08:54
    values of X so at right here's the X
  • 00:08:57
    variable here is we'll have one X
  • 00:08:59
    variable its exposure so an exposure one
  • 00:09:03
    this is the observed probability and
  • 00:09:05
    here it is it for exposure zero okay it
  • 00:09:08
    turns out what you know this is also
  • 00:09:10
    equal to the average value of y right
  • 00:09:14
    because there's only there's there's six
  • 00:09:19
    times one and four times zero and divide
  • 00:09:22
    it by 10 and you get point six right
  • 00:09:24
    this is sort of it right that this is
  • 00:09:26
    the mean the mean of all the zeros and
  • 00:09:28
    ones for this column is 0.6 it's the
  • 00:09:31
    same thing so it says the observed
  • 00:09:32
    average values in your data then the
  • 00:09:35
    proportion is the mean of the zeros and
  • 00:09:37
    the ones okay so that's that's P sub X I
  • 00:09:44
    here okay cool so that is these this
  • 00:09:48
    little one here okay so why I hat our
  • 00:09:53
    predicted values of Y from some model
  • 00:09:56
    okay this is again this had on the last
  • 00:10:00
    slide this is notation we've seen before
  • 00:10:02
    this this might be you have there we go
  • 00:10:07
    okay well does she does not I don't have
  • 00:10:10
    why I hats on here because we haven't
  • 00:10:11
    actually fit a model this is you you fit
  • 00:10:15
    some model with some covariance in your
  • 00:10:16
    data to your data and you make your
  • 00:10:20
    model predict values of of the outcome
  • 00:10:23
    at give at different values of X so just
  • 00:10:25
    like we have predicted odds ratios you
  • 00:10:28
    can use a logistic regression model to
  • 00:10:29
    have predicted probabilities that's what
  • 00:10:32
    the why I hats mean these are
  • 00:10:33
    predictions that come from a model
  • 00:10:35
    different models of different covariance
  • 00:10:37
    are going to give you different
  • 00:10:37
    predictions right that's sort of that's
  • 00:10:40
    that's the idea here is that you fit a
  • 00:10:43
    model with two these two covariance it
  • 00:10:44
    gives you different certain predictions
  • 00:10:46
    for people and you put in another three
  • 00:10:47
    covariance and you get different
  • 00:10:48
    predictions these other things are sort
  • 00:10:51
    of fixed in your data set the Y's are
  • 00:10:53
    just what your data you have 200 people
  • 00:10:55
    in your study you have 200 Y eyes you
  • 00:10:59
    can get many many Y hats
  • 00:11:00
    is depending on all the different models
  • 00:11:02
    you fit to your 200 data points in your
  • 00:11:05
    data set so this thing changes for every
  • 00:11:08
    model you have and these okay so these
  • 00:11:17
    then this other quantity P hats P hat X
  • 00:11:20
    I are essentially estimates of the of
  • 00:11:25
    the probability of Y from your model
  • 00:11:28
    given some Cove areas so this is
  • 00:11:29
    actually very very similar to the thing
  • 00:11:31
    I said over here this is this this
  • 00:11:35
    quantity up here this I okay sorry the
  • 00:11:39
    bottom one the P P hat X i we used this
  • 00:11:41
    notation for logistic regression here it
  • 00:11:43
    is over here
  • 00:11:44
    okay this these are this is truly the
  • 00:11:46
    predicted probabilities from a logistic
  • 00:11:48
    regression model I sort of use that
  • 00:11:50
    language to describe Y hat some sub sub
  • 00:11:53
    i's which is a very very related concept
  • 00:11:55
    except that these P this P hat X I from
  • 00:12:01
    a logistic regression model is always a
  • 00:12:02
    probability read you you you fiddle a
  • 00:12:05
    district regression model to your zero
  • 00:12:06
    one data and it does not spit out zeros
  • 00:12:09
    and ones it spits out probabilities okay
  • 00:12:11
    it spits out 0.6 0.6 0.4 let's say if
  • 00:12:15
    you fit this model to your data because
  • 00:12:19
    these are the logistic regression model
  • 00:12:22
    predicts the probability of the outcomes
  • 00:12:24
    not the outcomes themselves right that's
  • 00:12:26
    sort of one of our the features of
  • 00:12:28
    logistic regression model it your your
  • 00:12:30
    data are 0 and ones but you have a model
  • 00:12:32
    it spits out probabilities that sort of
  • 00:12:34
    on average describes what happens in
  • 00:12:35
    your population that's a little
  • 00:12:37
    different from y hat sub I which is
  • 00:12:39
    actually a model that might in theory
  • 00:12:41
    spit out zeros and ones directly our
  • 00:12:43
    liberal adjusted congressional model
  • 00:12:45
    doesn't do that but this is a
  • 00:12:47
    theoretical quantity that we might wish
  • 00:12:49
    to talk about it's a little bit
  • 00:12:51
    different this is miss described this a
  • 00:12:54
    little bit more like this one all that's
  • 00:12:57
    to say is observed data the observed
  • 00:13:00
    averaged value of the data and this is
  • 00:13:04
    predicted zero ones and this is
  • 00:13:06
    predicted probabilities so these are
  • 00:13:08
    sort of paired ideas and we're going to
  • 00:13:10
    use these four ideas to talk about
  • 00:13:12
    goodness of fit so
  • 00:13:15
    again so this this one is a little more
  • 00:13:18
    theoretical the Y hat sub eyes and we're
  • 00:13:20
    going to just sort of dive in with all
  • 00:13:21
    this any questions I know it's a little
  • 00:13:24
    tongue twister E and they're all a
  • 00:13:26
    little they all have overlaps in their
  • 00:13:28
    definition so we can come back to it but
  • 00:13:30
    does anyone have any questions okay
  • 00:13:35
    you can shout out you can raise your
  • 00:13:36
    hand I'm approachable okay I promise
  • 00:13:41
    okay so there's this thing called the
  • 00:13:44
    saturated model okay a saturated model
  • 00:13:47
    is a model that is able to generate 0 1
  • 00:13:54
    values 0 1 values it's a type of
  • 00:13:57
    logistic regression model that actually
  • 00:13:58
    can generate 0 1 values B it's a weird
  • 00:14:01
    thing because we normally just said that
  • 00:14:02
    the logistic regression model only spits
  • 00:14:04
    out probabilities but well Walsh I'll
  • 00:14:06
    show you what this looks like it is able
  • 00:14:09
    to perfectly predict the outcome data ok
  • 00:14:12
    and so what that means is for you if you
  • 00:14:17
    have a data set with say 200 why isn't
  • 00:14:19
    it 200 observations it is able to
  • 00:14:22
    produce predictions such that the
  • 00:14:24
    difference between the predictions of
  • 00:14:26
    the observed value is 0 is this is a
  • 00:14:27
    perfectly predicting model it predict
  • 00:14:30
    produces estimates and this just this
  • 00:14:31
    distant this difference is 0 okay and
  • 00:14:34
    what this model looks like is it is a
  • 00:14:38
    model that has as many terms in it as
  • 00:14:41
    you have data points so you have 200
  • 00:14:44
    observations you're going to have 200
  • 00:14:46
    coefficients in your model okay it's a
  • 00:14:49
    sort of impractical model okay it's so
  • 00:14:52
    here this is that's what this is saying
  • 00:14:53
    you have an intercept and if you have n
  • 00:14:55
    data points you have 1 to n minus 1 I
  • 00:15:00
    said a little subscript this is a should
  • 00:15:03
    be little J equals 1 over here you
  • 00:15:06
    basically have n minus 1 code dummy
  • 00:15:08
    variables and then an intercept or n
  • 00:15:10
    total terms and this is a bottle that
  • 00:15:13
    exactly is able to be if you fit as many
  • 00:15:16
    terms in your model as you have people
  • 00:15:17
    in your data set your model will exactly
  • 00:15:20
    predict the outcomes okay well it will
  • 00:15:24
    be beautiful
  • 00:15:24
    you know exactly describe the data but
  • 00:15:27
    it's a sort of
  • 00:15:27
    worthless model you've put as many terms
  • 00:15:30
    and as points and it basically fits a
  • 00:15:31
    term for every single person in the data
  • 00:15:34
    set and that term represents that
  • 00:15:36
    person's value okay so this is a really
  • 00:15:38
    weird thing the this model is gonna spit
  • 00:15:41
    out predicted probabilities that are
  • 00:15:43
    actually zero ones or very close to zero
  • 00:15:46
    and one
  • 00:15:46
    normally predicted probabilities are not
  • 00:15:48
    exactly zero one it's sort of somewhere
  • 00:15:50
    between but because you put this many
  • 00:15:52
    terms of the model it's exactly
  • 00:15:53
    predicting every point of data okay this
  • 00:15:58
    is different than a model that this is
  • 00:16:02
    different than something that we're
  • 00:16:03
    gonna talk about later actually I mean
  • 00:16:04
    this is gonna be the thing that we use
  • 00:16:06
    in the fully pers we're gonna call the
  • 00:16:09
    fully parameterized model but basically
  • 00:16:12
    this is different than something that
  • 00:16:13
    produces predicts the average
  • 00:16:15
    probability of being a case this other
  • 00:16:16
    thing that we talked about the so we're
  • 00:16:19
    talking about a model that predicts 0.6
  • 00:16:21
    and 0.4 we're talking about a model that
  • 00:16:23
    predicts ones and zeroes okay that's all
  • 00:16:26
    this means so this is a really really
  • 00:16:29
    weird model it looks something like
  • 00:16:30
    here's here's an example of a saturated
  • 00:16:32
    model okay on this here's some data
  • 00:16:34
    there's that table the six-four and so
  • 00:16:37
    forth
  • 00:16:37
    we're gonna put that it's we're gonna
  • 00:16:39
    pretend there's some there's even some
  • 00:16:41
    strata in the data let's say so here's
  • 00:16:44
    here's a data set with 40 observations
  • 00:16:46
    okay we have two two by two tables each
  • 00:16:48
    has 20 people to straight us C equals
  • 00:16:51
    one C equals zero the specifics of the
  • 00:16:54
    tables don't matter at this point but
  • 00:16:56
    basically you've got 40 people okay you
  • 00:16:58
    fit you could fit a data you could fit a
  • 00:17:00
    model with essentially 40 terms in it
  • 00:17:02
    and you get something like this it's
  • 00:17:05
    really ugly okay you get here's our
  • 00:17:10
    here's our data set it predicts
  • 00:17:12
    probabilities that are essentially 1 and
  • 00:17:15
    0 okay this is our Saturday mile it's
  • 00:17:17
    not exactly 1 and exactly 0 its but it's
  • 00:17:20
    basically one in 0 okay and what is it
  • 00:17:24
    doing
  • 00:17:25
    it's C it says P hat this is again using
  • 00:17:27
    the output I'm using the output from the
  • 00:17:30
    model to actually just manually - I mean
  • 00:17:32
    I'm using the output statement from the
  • 00:17:33
    model to generate these it's predicting
  • 00:17:35
    that the predicted probability of
  • 00:17:37
    disease for person 1 is basically 1 and
  • 00:17:40
    that match
  • 00:17:40
    their disease status down for these
  • 00:17:42
    people here it's predicting their
  • 00:17:44
    probabilities of disease or zero and
  • 00:17:46
    that is these people's disease status so
  • 00:17:49
    if you compute the difference between
  • 00:17:51
    disease and p-hat the predicted
  • 00:17:55
    probabilities you get zero
  • 00:17:57
    this model exact basically is always
  • 00:18:00
    spits out the probability of disease
  • 00:18:02
    more or less it's not exactly zero okay
  • 00:18:06
    so this model is a saturated model the
  • 00:18:09
    way I sort of fit it was I said I said
  • 00:18:10
    class ID so every person is it has an ID
  • 00:18:13
    variable and so it's making dummy
  • 00:18:15
    variables for every ID and then they say
  • 00:18:17
    model disease equals ID so you get a
  • 00:18:20
    different term in the model for every
  • 00:18:23
    single person don't do this at home this
  • 00:18:25
    is not something you should do you know
  • 00:18:27
    this is a not practical sort of model
  • 00:18:31
    but anyway that's what it does it just
  • 00:18:34
    exactly spits out it exactly spits out
  • 00:18:37
    the probabilities of disease and as
  • 00:18:41
    zeros and ones essentially the why these
  • 00:18:44
    Y hat values okay and what we're going
  • 00:18:47
    to talk about some of these deviant
  • 00:18:49
    statistics in a moment okay but let's
  • 00:18:52
    just say that's the fact is these are
  • 00:18:54
    practically zero they're not exactly
  • 00:18:56
    zero they're mostly zero okay that's an
  • 00:18:59
    important piece of the deviance is
  • 00:19:00
    basically zero alright so in summary
  • 00:19:05
    this is a model to the saturated model
  • 00:19:07
    is this hypothetical thing that exactly
  • 00:19:09
    describes the data it males exactly the
  • 00:19:12
    data but it's useless because it has as
  • 00:19:14
    many terms on it as it has people it's
  • 00:19:16
    totally useless but these are you but
  • 00:19:20
    we're obviously we're talking about it
  • 00:19:21
    for a reason so you'll find out what
  • 00:19:23
    that is it's this ideal model that
  • 00:19:27
    describes the data well but we want to
  • 00:19:28
    compare to okay so this really says in
  • 00:19:33
    words or in more words and formulas what
  • 00:19:35
    we just saw on the previous slide that
  • 00:19:37
    the saturated model has as many terms in
  • 00:19:41
    dummy variables as you have subjects now
  • 00:19:44
    basically what happens is here's for
  • 00:19:46
    example it's pretending that dummy
  • 00:19:48
    variable two is one so let's say this is
  • 00:19:50
    let's pretend this represents ID 2 the
  • 00:19:52
    second observation
  • 00:19:54
    this model reduces to just the logit of
  • 00:19:56
    p of y equals w-2 or a net w little
  • 00:20:01
    Omega 2 and and that's this term over
  • 00:20:04
    here and that means that the probability
  • 00:20:07
    of disease for subject 2 is 1 over 1
  • 00:20:11
    plus e to the minus this coefficient and
  • 00:20:14
    this will always predict this is that
  • 00:20:18
    this is the probability of disease for
  • 00:20:21
    person to this is always true in any
  • 00:20:24
    logistic regression model this is just
  • 00:20:25
    the probability this is just the
  • 00:20:27
    probability and because it's a saturated
  • 00:20:29
    model it actually is spitting out 0 ones
  • 00:20:31
    themselves because it's a saturated
  • 00:20:33
    model so the probability is going to be
  • 00:20:34
    0 1 ok if if you put this into the
  • 00:20:38
    likelihood formula for the logistic
  • 00:20:41
    regression model the likelihood
  • 00:20:43
    expression for such models where the
  • 00:20:45
    probability equals the Y's themselves
  • 00:20:48
    turns out to be 1 the low K so the
  • 00:20:50
    likelihood of this model the saturated
  • 00:20:52
    model is 1 it's the most it has a
  • 00:20:54
    perfect likelihood or something of some
  • 00:20:56
    sort you know it's it's likely it is
  • 00:20:58
    always equal to 1 this is going to be
  • 00:21:00
    important for the deviant statistics ok
  • 00:21:02
    so this is it's not that interesting yet
  • 00:21:04
    ok we have a model that's got as many
  • 00:21:05
    terms as as data points and it perfectly
  • 00:21:09
    describes the model but it's crazy
  • 00:21:10
    because you would never do this
  • 00:21:12
    so the saturated model so that but it's
  • 00:21:14
    as building to this idea of the deviance
  • 00:21:16
    okay so the saturated model is the best
  • 00:21:19
    fitting model in the world it exactly is
  • 00:21:22
    connecting the dots of your data and is
  • 00:21:24
    literally it's got as many terms as data
  • 00:21:26
    points okay but as you've said this is a
  • 00:21:29
    this is not a good model to use right it
  • 00:21:32
    comes at a cost
  • 00:21:32
    this is not a useable model when you
  • 00:21:34
    have 200 observations and 200 terms in
  • 00:21:37
    the model and they're not interpretable
  • 00:21:38
    they mean person 22 they don't mean you
  • 00:21:41
    know history of alcoholism or whatever
  • 00:21:44
    the thing that you're modeling is they
  • 00:21:46
    just mean person or data point so that's
  • 00:21:49
    not useful okay and but what this sort
  • 00:21:54
    of serves is like a baseline for
  • 00:21:56
    comparing other models to okay what so
  • 00:22:00
    we don't want to use the saturated model
  • 00:22:02
    but what if we can compare our model and
  • 00:22:04
    how well it fits the data to how well
  • 00:22:07
    the saturated
  • 00:22:07
    model fits the data so we might be
  • 00:22:10
    interested in it for comparison and
  • 00:22:12
    that's this idea of the deviance okay
  • 00:22:14
    and is our model which is not the
  • 00:22:18
    saturated model but they only have five
  • 00:22:19
    terms that is it close enough to the
  • 00:22:21
    saturated model and if it's close enough
  • 00:22:23
    we might say hey that's a pretty good
  • 00:22:25
    fitting model it's sort of like it's
  • 00:22:27
    pretty close to how the saturated model
  • 00:22:28
    fits and that's the idea of the deviant
  • 00:22:31
    statistics how its measured how much our
  • 00:22:33
    our model deviates from the best model
  • 00:22:37
    okay and it's very similar to the
  • 00:22:39
    likelihood ratio test because the
  • 00:22:41
    likelihood ratio test is like hey I got
  • 00:22:43
    a model that's got two interaction terms
  • 00:22:45
    and one interaction term let's see if
  • 00:22:47
    one changes relative to the other
  • 00:22:50
    all right died was the likelihood ratio
  • 00:22:51
    test like let's see whether one model is
  • 00:22:53
    different for another model and that's
  • 00:22:55
    sort of the idea with the deviance tests
  • 00:22:57
    except that we're always comparing in
  • 00:22:59
    the deviance tests to the the best
  • 00:23:01
    models that fits the data really well
  • 00:23:03
    okay and that's formally this thing here
  • 00:23:06
    this is the deviant statistic it's very
  • 00:23:07
    very similar to the likelihood ratio
  • 00:23:09
    statistic and which I have in its slide
  • 00:23:12
    at the end of the of the PowerPoint we
  • 00:23:14
    have another slide to showing the
  • 00:23:16
    relationship between the two but the
  • 00:23:18
    idea of the deviance is it says it's a
  • 00:23:20
    it's a quantity that you can
  • 00:23:22
    statistically test and it has a ratio in
  • 00:23:25
    it that is the ratio of likelihood so
  • 00:23:27
    it's very similar to like the oratio
  • 00:23:28
    test okay and the numerator is the
  • 00:23:31
    likelihood of the model that you've got
  • 00:23:32
    whatever it is somewhat you got some
  • 00:23:35
    model you think it's great but you're
  • 00:23:36
    not sure it's great which is why you're
  • 00:23:37
    doing this test okay so you have some
  • 00:23:39
    candidate model and an in denominator
  • 00:23:41
    you have the best the best fitting model
  • 00:23:44
    okay which might be the saturated model
  • 00:23:47
    okay and the idea that the smaller that
  • 00:23:50
    this that this quantity is the better
  • 00:23:53
    fitting your model is to the best the do
  • 00:23:57
    your model is close to the best fitting
  • 00:23:59
    model okay so this thing called the
  • 00:24:01
    deviance statistic and the thing is what
  • 00:24:05
    so this is our candidate model that
  • 00:24:07
    we're we got our covariance of interest
  • 00:24:09
    in but what do we choose for the
  • 00:24:11
    comparison model is sort of the key
  • 00:24:13
    piece and this is the difference between
  • 00:24:15
    the first two deviance tests that I
  • 00:24:17
    talked about on the I don't know some
  • 00:24:19
    slides ago and
  • 00:24:21
    the idea here is this thing called this
  • 00:24:22
    there's two choices for the likelihood
  • 00:24:25
    that you might plug into the denominator
  • 00:24:26
    one is called the subject specific
  • 00:24:30
    deviance or the deviant this or the
  • 00:24:33
    subject specific likelihood and then
  • 00:24:36
    once called the events trials based one
  • 00:24:38
    the subject specific one is essentially
  • 00:24:41
    the this is the likelihood of this in
  • 00:24:43
    the saturated model okay so this is the
  • 00:24:46
    model we were just talking about you put
  • 00:24:48
    it in the denominator the likelihood of
  • 00:24:49
    the model where you have as many terms
  • 00:24:51
    as you've got observations because that
  • 00:24:54
    is literally the best model it spits out
  • 00:24:57
    zeros and ones exactly that match your
  • 00:24:59
    observed data okay the other option is
  • 00:25:02
    something called the avenge trials based
  • 00:25:04
    likelihood or basically it's a model
  • 00:25:06
    from a full fully parameter it's the
  • 00:25:08
    fully parameterized model and we'll talk
  • 00:25:10
    about what we mean by that but it's
  • 00:25:11
    basically got as many numbers of it's
  • 00:25:16
    basically every covariant in all of its
  • 00:25:18
    interaction terms slow there I'll
  • 00:25:20
    explain why we're gonna use that so it's
  • 00:25:22
    a different thing and we're gonna we're
  • 00:25:25
    gonna talk a bit about what this is this
  • 00:25:26
    likelihood but this is the likelihood of
  • 00:25:28
    the saturated model this likelihood we
  • 00:25:32
    said was always equal to one okay it's
  • 00:25:35
    always equal to one because it is the
  • 00:25:36
    saturated model which by definition fits
  • 00:25:38
    the data exactly it's little circular
  • 00:25:40
    okay but that's what it's saturated fits
  • 00:25:42
    together perfectly always equals to one
  • 00:25:44
    so for this one the likelihood is
  • 00:25:47
    usually far less than one so the problem
  • 00:25:51
    is we cannot actually use this just the
  • 00:25:54
    test statistic with this likelihood of
  • 00:25:56
    the subject specific or the saturated
  • 00:25:58
    model beats likelihood in the
  • 00:25:59
    denominator so even though we would love
  • 00:26:01
    to use it we can't and the reason is
  • 00:26:04
    there's a bunch of reasons here so here
  • 00:26:10
    is our equation that we just showed on
  • 00:26:12
    them on the previous slides well here's
  • 00:26:13
    negative two times the log of the
  • 00:26:16
    likelihood of our sort of the model we
  • 00:26:18
    care about over the the likelihood from
  • 00:26:21
    the saturated model this thing is old
  • 00:26:23
    this denominator is always equal to one
  • 00:26:25
    okay
  • 00:26:26
    and so it's just negative two times the
  • 00:26:28
    log of the likelihood of our model okay
  • 00:26:32
    and this is this is the expression for
  • 00:26:33
    that
  • 00:26:34
    that likelihood from our model that we
  • 00:26:37
    care about so this sorry these two
  • 00:26:42
    bullet points just summarize that what
  • 00:26:46
    we just said it at this this this is
  • 00:26:48
    this thing is just the D bein this
  • 00:26:50
    quantity here which is just the negative
  • 00:26:52
    to a lot of likelihood of our model is
  • 00:26:54
    that what we call this what the deviance
  • 00:26:57
    statistic is for any given model it's
  • 00:27:00
    the negative to log likelihood and we
  • 00:27:03
    call this thing the deviance
  • 00:27:04
    subject-specific but it's basically the
  • 00:27:07
    it's basically the deviance owing to a
  • 00:27:10
    saturated model and this is very
  • 00:27:14
    uninteresting because if the denominator
  • 00:27:16
    is always equal to one
  • 00:27:18
    we're not really giving or not that's no
  • 00:27:19
    new and that's not telling us anything
  • 00:27:21
    so this Dominator is always equal to one
  • 00:27:23
    and OVA saturated model then the
  • 00:27:25
    deviance statistic is just the negative
  • 00:27:27
    to log likelihood of your given model
  • 00:27:29
    and that is himself is very
  • 00:27:30
    uninteresting and very uninformative so
  • 00:27:33
    there's some part of the problems we're
  • 00:27:36
    not making an actual comparison if the
  • 00:27:39
    denominator is always equal to one I
  • 00:27:40
    said in theory we would love to compare
  • 00:27:42
    our model to some ideal model but that
  • 00:27:45
    comparison is not possible because the
  • 00:27:46
    saturated model always has the same
  • 00:27:48
    likelihood and so it's really it's it's
  • 00:27:53
    something we like to do but
  • 00:27:54
    mathematically we're left by always
  • 00:27:56
    dividing by one and so we're all we have
  • 00:28:00
    is our predicted probabilities from our
  • 00:28:02
    model and we're not comparing to the
  • 00:28:04
    observed values in the data the why is
  • 00:28:06
    that we're not comparing to the
  • 00:28:08
    saturated model all this is doing is
  • 00:28:10
    just this is just the likelihood of the
  • 00:28:12
    model we have but the whole point of
  • 00:28:14
    goodness of fit is that we're comparing
  • 00:28:16
    how we're doing to some to some standard
  • 00:28:19
    we're comparing our models predictions
  • 00:28:21
    to why eyes or we're comparing it to the
  • 00:28:24
    saturated model we've got a compared to
  • 00:28:25
    something otherwise we don't know how
  • 00:28:26
    well we fit if we can't compare our
  • 00:28:29
    model to something so all that's to say
  • 00:28:32
    is that this quantity is cool except
  • 00:28:34
    that this denominator is always equal to
  • 00:28:36
    one and so the thing we wanted to do
  • 00:28:38
    isn't actually possible this is really
  • 00:28:41
    unexpected because we thought we would
  • 00:28:45
    the saturate model would be really good
  • 00:28:47
    it would be good but it's not because
  • 00:28:50
    it's always equal to the likelihood is
  • 00:28:52
    always equal to 1 so this deviance
  • 00:28:55
    statistic isn't very good but it's not
  • 00:28:57
    useful for goodness to fit testing ok
  • 00:29:00
    and so the ideal isn't possible so we're
  • 00:29:03
    gonna settle for the thing that's next
  • 00:29:04
    to the ideal which is a different
  • 00:29:07
    deviant statistic that uses a different
  • 00:29:09
    denominator and that we think is
  • 00:29:11
    actually the next best thing so that's
  • 00:29:14
    why we're gonna we're going to talk
  • 00:29:15
    about so how to show you the thing
  • 00:29:16
    that's like theoretically we want to do
  • 00:29:18
    but never works out and then we're gonna
  • 00:29:20
    talk about the thing that does work out
  • 00:29:22
    okay and we're gonna use something
  • 00:29:25
    called the fully parametrize model so
  • 00:29:28
    the saturated model that we talked about
  • 00:29:30
    was that that first thing it perfectly
  • 00:29:32
    it's it spits out zeros and ones
  • 00:29:34
    directly out of the model of the
  • 00:29:35
    logistic regression model which is
  • 00:29:37
    really weird because they never
  • 00:29:38
    legislate for irrational models we're
  • 00:29:40
    not meant to spit out zeros and ones but
  • 00:29:42
    it does is it does perfectly spits out
  • 00:29:45
    zeros and ones here and reduces this
  • 00:29:47
    difference always to zero okay and the
  • 00:29:50
    people we said it's a really impractical
  • 00:29:51
    model and it's likely it's always equal
  • 00:29:53
    to one so we're gonna instead compare
  • 00:29:57
    our model to something called the fully
  • 00:29:58
    parameterized model and the fully
  • 00:30:01
    parametrized model is essentially the
  • 00:30:05
    most interaction filled model you can
  • 00:30:08
    make out of your covariance okay so
  • 00:30:10
    let's say you have X 1 X 2 or you know
  • 00:30:13
    exposure 1 C 2 whatever you want to call
  • 00:30:15
    it you know be using our other notation
  • 00:30:18
    so you take all of the things all the
  • 00:30:21
    predictors in your model all the basic
  • 00:30:23
    sort of ingredients the main predictors
  • 00:30:25
    in your model all the covariance and you
  • 00:30:27
    make all possible product terms so if
  • 00:30:30
    you've got you know five five terms in
  • 00:30:33
    your model you know it's you know age
  • 00:30:35
    sex whatever you know alcohol use and
  • 00:30:38
    everyone whatever it is you put those in
  • 00:30:40
    the model then you make take all the
  • 00:30:42
    two-way products all the three ways up
  • 00:30:44
    to the five way interaction term this is
  • 00:30:47
    also not a practical model not a good
  • 00:30:50
    idea to have five one interaction terms
  • 00:30:52
    just because what does it mean if it's
  • 00:30:54
    significant I don't know
  • 00:30:56
    and probably you know it just won't be
  • 00:31:00
    significant actually
  • 00:31:01
    but anyway the idea is that it's a it's
  • 00:31:03
    the most interaction filled model that
  • 00:31:06
    you can make so what I mean by that is
  • 00:31:08
    let's say you have just two predictors a
  • 00:31:11
    and C your fully parameterized model is
  • 00:31:15
    alpha plus beta times exposure gamma
  • 00:31:17
    times C and in Delta times e times C
  • 00:31:19
    this is literally the most complicated
  • 00:31:22
    model you can make with the data you've
  • 00:31:24
    got okay this is not the saturated model
  • 00:31:27
    right the saturated model would be as
  • 00:31:28
    many terms on the model as you have
  • 00:31:30
    covariance okay that would be about
  • 00:31:33
    coverts data lines of data right so if
  • 00:31:36
    you have 100 people in your data set
  • 00:31:37
    you'd have 100 terms in your model this
  • 00:31:39
    is the most complicated model you can
  • 00:31:41
    make with the X variables that you have
  • 00:31:43
    okay and again if you have only a binary
  • 00:31:45
    predictor this is then your fully
  • 00:31:47
    parameterized models really boring it's
  • 00:31:48
    just a term for exposure that that's all
  • 00:31:51
    you got so this is not necessarily a
  • 00:31:55
    saturated model right it could be if you
  • 00:31:57
    have a you can sort of have a very tiny
  • 00:31:59
    data set and it could sort of work out
  • 00:32:01
    but what's cool is the fully
  • 00:32:04
    parameterized model completely could
  • 00:32:07
    always predict the average probability
  • 00:32:10
    for each combinations of X's okay so
  • 00:32:13
    what I mean by that is I'm going to jump
  • 00:32:16
    back up here and I think will main even
  • 00:32:18
    illustrate it down below just to trigger
  • 00:32:22
    okay back up on this slide here your
  • 00:32:26
    model the fully parameterized model
  • 00:32:28
    doesn't spit out the zeros and ones
  • 00:32:30
    exactly but for every pattern of
  • 00:32:33
    covariates it perfectly predicts the
  • 00:32:35
    average in that group the average
  • 00:32:36
    probability in that group so for example
  • 00:32:40
    if you had a just a model with exposure
  • 00:32:42
    in it and you fit it to this data your
  • 00:32:44
    model would spit out 0.6 and point four
  • 00:32:49
    for your exposed unexposed so here we go
  • 00:32:52
    over here so it's actually your model
  • 00:32:54
    exactly predicts the average experience
  • 00:32:56
    of that group not the zeros and ones
  • 00:32:58
    themselves but it predicts the average
  • 00:33:01
    and the population of the outcome so
  • 00:33:02
    that's the difference between the that's
  • 00:33:05
    what the fully parameterized model does
  • 00:33:07
    and we're going to illustrate this with
  • 00:33:08
    SAS in a few slides but I wanted to show
  • 00:33:12
    it what I mean by that so it doesn't
  • 00:33:14
    zeroes and ones it predicts the average
  • 00:33:16
    exactly in the data set that the co
  • 00:33:19
    various okay because that you're fitting
  • 00:33:20
    as many terms as you have possible
  • 00:33:24
    products of your of your variables and
  • 00:33:27
    this will always predict the average
  • 00:33:30
    experience of air for all combinations
  • 00:33:32
    of X's okay but not the zeros and ones
  • 00:33:36
    average so okay so what I mean by that
  • 00:33:41
    is here so if you have so sorry okay
  • 00:33:48
    there we go
  • 00:33:49
    so one thing that's sort of important to
  • 00:33:52
    think about is how is this idea of
  • 00:33:54
    covariant patterns which is all the
  • 00:33:57
    combinations of values of your X
  • 00:33:59
    variables that you can have okay and
  • 00:34:01
    this this is what this best is what the
  • 00:34:03
    fully parameterized model is going to
  • 00:34:05
    predict well it's gonna predict the
  • 00:34:06
    average for every pattern of covariance
  • 00:34:08
    SAS calls this thing unique profiles
  • 00:34:11
    okay in the output but co-vary patterns
  • 00:34:14
    are imagine if here if you've got a data
  • 00:34:19
    set with just one exposure e okay and it
  • 00:34:22
    has your only takes on the values zero
  • 00:34:24
    and ones there's only two possible
  • 00:34:26
    covariant patterns x is zero let me
  • 00:34:29
    start X sorry X or exposure is equal to
  • 00:34:32
    1 and exposure is equal to zero the
  • 00:34:34
    people the people in your data set only
  • 00:34:37
    take on two types of X like all across
  • 00:34:39
    all their X's they can only be exposed
  • 00:34:41
    or unexposed if you've got four if
  • 00:34:44
    you've got two covariance to binary
  • 00:34:46
    covariates exposure and confounder C
  • 00:34:49
    then there's four possible types of
  • 00:34:52
    people you can have in your data
  • 00:34:54
    in terms of the covariance right the
  • 00:34:56
    people that are exposed but not but a
  • 00:34:58
    zero for the confounder unexposed one
  • 00:35:00
    for confounder and so forth both ones
  • 00:35:02
    both zeros so you can have zillions of
  • 00:35:05
    people in your data set and if this is
  • 00:35:07
    all you measured on them this is all
  • 00:35:09
    these are the only four types of people
  • 00:35:11
    you really get to observe in terms of
  • 00:35:12
    their expose their covariance their
  • 00:35:14
    exposures they might have different Y's
  • 00:35:15
    and the Y value there's because you have
  • 00:35:18
    y equals 1 or 0 you can have 8 I suppose
  • 00:35:21
    total types of people but in terms of
  • 00:35:24
    the covariance there's only four
  • 00:35:25
    patterns to the covariance okay
  • 00:35:28
    so the fully parametrized model if you
  • 00:35:33
    have a model it's a and C and its
  • 00:35:35
    product is going to exactly predict the
  • 00:35:38
    average y-value for all four of these
  • 00:35:40
    groups okay so that's what it for every
  • 00:35:43
    pattern of covariates it's going to
  • 00:35:44
    exactly predict the average experience
  • 00:35:46
    so if you have data with continuous
  • 00:35:49
    variables now you can have now you can
  • 00:35:52
    have more than just sort of a fixed
  • 00:35:54
    number of groups right the number of
  • 00:35:58
    coab area patterns is going to depend on
  • 00:36:01
    how many values of the continuous
  • 00:36:03
    variable that you have so for example if
  • 00:36:06
    you have just a single X variable in
  • 00:36:08
    your data set age your covariant
  • 00:36:10
    patterns is the number of distinct
  • 00:36:12
    values of age that you have right if you
  • 00:36:14
    have integer years of age and the number
  • 00:36:17
    of you know the number of years you
  • 00:36:19
    observe as the number of covary patterns
  • 00:36:21
    if you have if you have a continuous
  • 00:36:22
    variable and discrete variables then you
  • 00:36:25
    could have way more covariant patterns
  • 00:36:27
    right you have we might have four times
  • 00:36:29
    the number of ages in your data set or
  • 00:36:31
    whatever it is so all that's to say that
  • 00:36:35
    this gets a little more complicated with
  • 00:36:37
    continuous variables that you don't have
  • 00:36:39
    as many you have many a proliferation of
  • 00:36:42
    the number of combinations of X's that
  • 00:36:44
    you have or these Cove area patterns so
  • 00:36:47
    this is going to be very important in
  • 00:36:50
    showing the limitations of this of the
  • 00:36:52
    deviance statistic that we're going to
  • 00:36:54
    talk about this or when you have a
  • 00:36:55
    continuous covariant so this is just
  • 00:37:00
    okay so this we're going to use this
  • 00:37:01
    terminology covariant patterns for the
  • 00:37:03
    deviance test we're also gonna use it on
  • 00:37:05
    Thursday for ROC curves so we're going
  • 00:37:11
    to use the letter G to describe the
  • 00:37:14
    total number of possible covariant
  • 00:37:16
    patterns so G for this predictor to one
  • 00:37:19
    predictor model is two there's only two
  • 00:37:22
    possible covariant patterns here G is
  • 00:37:24
    for for a covariant patterns here it's
  • 00:37:29
    large it's as many levels of age that
  • 00:37:31
    you have and then here if you had a
  • 00:37:33
    binary variable then it might be 2 times
  • 00:37:36
    G where from here great we're two of my
  • 00:37:40
    have twice as many levels as just a
  • 00:37:42
    model with just a Jeanette if you have a
  • 00:37:43
    binary exposure and age you might be
  • 00:37:45
    doubling the number of Cove area
  • 00:37:47
    Patrick's all this is to say is that G
  • 00:37:50
    is just representing the number of
  • 00:37:52
    discrete types of people in your data
  • 00:37:56
    when you have no continuous data we have
  • 00:38:01
    no continuous X variables this is
  • 00:38:02
    usually small usually significantly
  • 00:38:04
    smaller than n the number of people in
  • 00:38:06
    your data set when you have continuous
  • 00:38:09
    data you usually have a number of
  • 00:38:11
    covariant patterns that's almost as much
  • 00:38:13
    as your n alright so if you have
  • 00:38:15
    basically no two people in your data set
  • 00:38:17
    are the same if you have a bunch of
  • 00:38:19
    different continuous variables it's
  • 00:38:20
    unlikely that you're gonna have too many
  • 00:38:22
    people at the exact same intersection of
  • 00:38:25
    covariance right that's just the more
  • 00:38:27
    variables you have the more continuous
  • 00:38:29
    variables you have people get more and
  • 00:38:31
    more distinct unique unique profiles and
  • 00:38:35
    so forth so that then usually closer to
  • 00:38:37
    N and so we're why are we talking about
  • 00:38:41
    this this relates to the fully
  • 00:38:42
    parameterized model as we as I suggested
  • 00:38:44
    earlier so the so given that you have
  • 00:38:50
    given that you have some basic
  • 00:38:52
    predictors you can make a fully x1 to XP
  • 00:38:55
    and data set you can make a fully
  • 00:38:58
    parametrize model and it always has as
  • 00:39:03
    many terms in it as you have unique
  • 00:39:07
    profiles ok this is so before me I
  • 00:39:10
    talked about oh you have a single you
  • 00:39:12
    have a single you have a single exposure
  • 00:39:16
    you only measured one exposure so the
  • 00:39:19
    most complicated the most complicated
  • 00:39:22
    model you can make would have just
  • 00:39:23
    intercept an exposure well that's also
  • 00:39:26
    that has two terms in it that's actually
  • 00:39:29
    the number of unique covariant patterns
  • 00:39:31
    you only can have we just said on the
  • 00:39:33
    earlier slide that G equals two
  • 00:39:35
    similarly the fully parameterized model
  • 00:39:38
    when you have two 0 1 X variables is
  • 00:39:41
    just this this was the fully
  • 00:39:43
    parameterized model its intercept
  • 00:39:46
    exposure see in the interaction for
  • 00:39:48
    terms turns out that's also equal to the
  • 00:39:51
    number of unique covariant patterns so
  • 00:39:54
    it turns out in general the pattern
  • 00:39:56
    this pattern is the mode the most high
  • 00:39:57
    order interaction model that you can
  • 00:39:59
    make also happens to be the number of
  • 00:40:01
    distinct covariant patterns so here if
  • 00:40:04
    you have four if you have two binaries
  • 00:40:06
    this is gonna be four for left four
  • 00:40:08
    types of people that's gonna your to
  • 00:40:10
    your models get it your interaction
  • 00:40:11
    model is gonna have four terms in it
  • 00:40:13
    this is just an observation you can yes
  • 00:40:17
    question no is it a question is it
  • 00:40:28
    possible to have the number of Kouji the
  • 00:40:29
    covariant patterns exceed the sample
  • 00:40:32
    size so no as you can it you can never
  • 00:40:33
    have more intersections of them in your
  • 00:40:37
    data than you have data of these pieces
  • 00:40:40
    you may have some unobserved
  • 00:40:42
    intersections right of of your X
  • 00:40:45
    variables if you may not just not
  • 00:40:46
    literally not have somebody who is you
  • 00:40:50
    know a 99 year old Hispanic from Alaska
  • 00:40:54
    with a mohawk you know or just you know
  • 00:40:58
    ever to some very you know whatever it
  • 00:41:00
    is some very rare intersection that may
  • 00:41:02
    not occur in your data set but the
  • 00:41:04
    number of observed patterns is always
  • 00:41:05
    less than or equal to the sample size so
  • 00:41:10
    so this is just an observation that you
  • 00:41:12
    always it turns out that the highest
  • 00:41:13
    rotor interaction models as many as
  • 00:41:15
    always has as many terms as the number
  • 00:41:17
    of covariant patterns and what happens
  • 00:41:22
    in here is we're gonna fit model we're
  • 00:41:24
    gonna fit the we're gonna fit the model
  • 00:41:27
    with the two we're gonna fit a model
  • 00:41:29
    that is our fully parameterized and this
  • 00:41:33
    model as we said is essentially it's not
  • 00:41:37
    doesn't predict the zero one outcomes
  • 00:41:39
    perfectly the fully parameterized model
  • 00:41:41
    always predicts the observed average
  • 00:41:44
    probabilities and the data perfectly and
  • 00:41:46
    that's actually pretty good
  • 00:41:48
    that's actually almost as good as
  • 00:41:51
    predicting the probabilities itself if
  • 00:41:52
    you can on average predict the probably
  • 00:41:55
    the probability is Bhakthi if you could
  • 00:41:58
    predict the observe zero one you would
  • 00:42:00
    like to predict the observed zero ones
  • 00:42:02
    themselves but instead the fully
  • 00:42:04
    parameterized model predicts the
  • 00:42:06
    probabilities themselves and that still
  • 00:42:08
    really
  • 00:42:08
    be good so all we're going to sort of
  • 00:42:11
    walk through this and make a test
  • 00:42:13
    statistic out of it and talk about it so
  • 00:42:16
    here's here's the data that we showed
  • 00:42:17
    before just it's got exposure 1 0 c1 0
  • 00:42:23
    40 observations if I fit this model
  • 00:42:28
    exposure C and then e times C that's
  • 00:42:32
    this four terms it's going to intercept
  • 00:42:36
    in it we get something that that looks
  • 00:42:39
    like this okay we're going to get you
  • 00:42:42
    can get some odds ratios you can get
  • 00:42:45
    some predicted probabilities so here you
  • 00:42:49
    get or you here this is just from the
  • 00:42:51
    2x2 table even these are the odds ratios
  • 00:42:53
    here's that the predicted probabilities
  • 00:42:55
    and the columns so here this is 0.6
  • 00:42:57
    point for what we saw earlier in this
  • 00:43:00
    table it's point three and point seven
  • 00:43:03
    anyhow SAS is going to spit out some
  • 00:43:06
    stuff of interest this thing 51 point
  • 00:43:10
    three five five is the negative to log
  • 00:43:12
    likelihood of your model you can use
  • 00:43:14
    this for the likelihood ratio test you
  • 00:43:16
    can also use it to compute the deviance
  • 00:43:18
    from the saturated model because this
  • 00:43:19
    divided by one is that deviance
  • 00:43:22
    statistic but we said it's not very
  • 00:43:23
    useful then SAS if you can spit out this
  • 00:43:27
    table that says deviance zero zero zero
  • 00:43:30
    zero it's not very useful okay what is
  • 00:43:36
    this about give me a moment we'll talk
  • 00:43:38
    about it
  • 00:43:39
    so what's interesting about this model
  • 00:43:42
    it's not a saturated model right it'll
  • 00:43:45
    be between no it's not does not have
  • 00:43:46
    forty terms in it this is 40
  • 00:43:48
    observations doesn't have 40 terms it
  • 00:43:50
    has its G equals four there's only four
  • 00:43:52
    covariant patterns to it all right as we
  • 00:43:55
    talked about there's only four levels of
  • 00:43:57
    C and E okay so this thing is what we're
  • 00:44:01
    gonna call them this is the fully
  • 00:44:02
    parametrize model okay this is this is
  • 00:44:05
    it has as many pots as all possible
  • 00:44:07
    terms that it could have given the
  • 00:44:10
    covariance that we have and W that we
  • 00:44:12
    observed okay here's another model that
  • 00:44:17
    we fed this model is just e and C no
  • 00:44:21
    interaction term
  • 00:44:22
    okay suddenly there's information here
  • 00:44:26
    three point six nine six one this is the
  • 00:44:31
    deviance statistic computed for our
  • 00:44:34
    model using the negative two log
  • 00:44:36
    likelihood for our model the numerator
  • 00:44:37
    but in the denominator is the D via is
  • 00:44:41
    the likelihood statistic from the fully
  • 00:44:43
    parameterized model okay it's comparing
  • 00:44:46
    this model that does not have an
  • 00:44:49
    interaction term to the model that does
  • 00:44:52
    have the interact all possible
  • 00:44:54
    interaction terms turns out in this case
  • 00:44:56
    it's sort of a plausible model this may
  • 00:44:58
    might actually be interesting to us if
  • 00:45:00
    you have ten covariates this is going to
  • 00:45:03
    be a model that has a 10-way interaction
  • 00:45:05
    9y interaction terms and all sorts of
  • 00:45:07
    crazy stuff and you'd rather not fit
  • 00:45:09
    that yourself
  • 00:45:11
    so what SAS is gonna do is it'll compare
  • 00:45:13
    your model to the most interaction heavy
  • 00:45:16
    model possible and it's going to give
  • 00:45:18
    you this thing called the deviance week
  • 00:45:20
    we call it the eat the deviance ET
  • 00:45:22
    because it has to do with avenged trials
  • 00:45:23
    data which we could go into but i but
  • 00:45:27
    we've sort of i have i think we may have
  • 00:45:29
    a slide on it at the end it's sort of
  • 00:45:31
    supplementary but this is a this we call
  • 00:45:33
    this the deviance ET and this is the
  • 00:45:35
    deviance statistic that compares your
  • 00:45:37
    model to the fully parametrized model
  • 00:45:39
    not the saturated model because the
  • 00:45:41
    saturated model always has a denominator
  • 00:45:43
    of 1 for the that deviance statistic and
  • 00:45:47
    this thing is essentially a likelihood
  • 00:45:51
    ratio statistic it's comparing your
  • 00:45:53
    models specifically to the fully prepped
  • 00:45:55
    to the likelihood of the fully
  • 00:45:57
    parameterized model so it's a very very
  • 00:45:59
    specific likelihood ratio your model to
  • 00:46:02
    a very not to any model but to the fully
  • 00:46:04
    parametrize one and this is because the
  • 00:46:06
    fully parameterized model always always
  • 00:46:10
    always predicts the observed
  • 00:46:12
    probabilities in your data set so the
  • 00:46:16
    fully parameterized model is always
  • 00:46:17
    predicting these point sixes and point
  • 00:46:19
    fours and so forth your given model may
  • 00:46:21
    not and it's comparing your model to the
  • 00:46:25
    fully parameterized model so this this
  • 00:46:27
    so this sort of makes sense that if you
  • 00:46:29
    zoom back out for a moment
  • 00:46:30
    we said on in one way that the saturated
  • 00:46:33
    model is sort of as good as it gets if
  • 00:46:35
    you've put
  • 00:46:36
    a hundred terms in your model that sort
  • 00:46:38
    of fits it and you have a hundred data
  • 00:46:39
    points that fits your model exactly
  • 00:46:41
    another way to think of the best model
  • 00:46:44
    is the model that has as many terms in
  • 00:46:46
    it as you measured things on in your in
  • 00:46:49
    your data set if I measured 10 things in
  • 00:46:50
    my questionnaire the best model the the
  • 00:46:54
    most descriptive model you could fit
  • 00:46:55
    would be the one that put all 10
  • 00:46:57
    questions in your questionnaire and
  • 00:46:58
    they're all their interaction terms and
  • 00:47:00
    every single every single term that you
  • 00:47:03
    could put in your model you threw it in
  • 00:47:05
    the date in there it's an ugly model but
  • 00:47:07
    that's sort of the best model that you
  • 00:47:09
    could have come up with from the stuff
  • 00:47:10
    you measured that makes that sort of the
  • 00:47:12
    intuitive explanation right you measure
  • 00:47:14
    10 things in your study you put all of
  • 00:47:15
    them in the model and all their
  • 00:47:16
    combinations and that's your best model
  • 00:47:19
    for describing the data it's a totally
  • 00:47:21
    ugly and useless model because it has a
  • 00:47:23
    10-way interaction term and in nine ways
  • 00:47:25
    and eight ways and all of that and it's
  • 00:47:27
    actually a terrible model but it's a
  • 00:47:29
    very good model in terms of predicting
  • 00:47:31
    the most in your data set right it's it
  • 00:47:34
    predicts using what you've gathered the
  • 00:47:37
    outcome as best as you can from what you
  • 00:47:40
    measured and what we're doing here is
  • 00:47:42
    peeling back and saying well I got oh
  • 00:47:44
    hey I got a model that has only two
  • 00:47:45
    terms in it can I please compare my
  • 00:47:48
    model with two terms to the ten to the
  • 00:47:50
    ten term 10-way interaction sort of
  • 00:47:53
    goliath model and are those two
  • 00:47:55
    meaningfully different and that's what
  • 00:47:57
    this deviance statistic is so we're
  • 00:48:00
    comparing to sort of a different ideal
  • 00:48:01
    which is the most full model that you
  • 00:48:04
    can make given the stuff that you
  • 00:48:06
    measured okay a mess is what if this is
  • 00:48:08
    what this is what it formally is this
  • 00:48:13
    deviance statistic is a chi-square
  • 00:48:15
    statistic and the number the degrees of
  • 00:48:18
    freedom is the number of terms dropped
  • 00:48:21
    comparing your turn your model of
  • 00:48:23
    interest to the fully parametrized model
  • 00:48:25
    so in our case it was just one over here
  • 00:48:29
    because it was East we have a model with
  • 00:48:30
    E and C versus E&C C and then the
  • 00:48:33
    interaction terms being a drop one term
  • 00:48:36
    and therefore it has a degree of freedom
  • 00:48:38
    of 1 if you had 10 terms in your model
  • 00:48:42
    then this fully parametrized model has a
  • 00:48:45
    ton of terms in it it's got the 1000
  • 00:48:46
    interaction it's got
  • 00:48:48
    all the nine ways all the eight ways all
  • 00:48:51
    the seven way interactions down to all
  • 00:48:52
    of the may the ten single main effects
  • 00:48:54
    it's a huge model the ton of terms in it
  • 00:48:56
    you'd rather not calculate that yourself
  • 00:48:58
    so SAS actually spits out the degrees of
  • 00:49:00
    freedom right over here okay so the
  • 00:49:04
    number of terms drop from your model and
  • 00:49:06
    that this is very so this is essentially
  • 00:49:09
    a likelihood ratio test but it's a very
  • 00:49:10
    specific test of your test the Met to
  • 00:49:13
    the model that has the most terms in it
  • 00:49:15
    so and you can sort of eyeball it that
  • 00:49:18
    there's the ratio of the deviance over
  • 00:49:20
    degrees of freedom is about one then you
  • 00:49:22
    spend that actually means you have and
  • 00:49:25
    you have okay fed that the null
  • 00:49:27
    hypothesis of good fit stands and sort
  • 00:49:31
    of a non significant result so I don't
  • 00:49:33
    you see you can sort of look at this
  • 00:49:34
    ratio and eyeball it so there's a big
  • 00:49:38
    caveat here to this model here we go
  • 00:49:42
    involves dogs
  • 00:49:43
    I like Google image search so here we
  • 00:49:46
    are we have dogs this only this only
  • 00:49:50
    works if you have a reasonable number of
  • 00:49:54
    covariant patterns so if you've got the
  • 00:49:58
    number of covariant patterns
  • 00:50:00
    that's very large you have a continuous
  • 00:50:03
    variable we talked about you have age
  • 00:50:05
    and you have now as like as many people
  • 00:50:06
    herbage you have is number you have as
  • 00:50:08
    many unique patterns of covariance as
  • 00:50:10
    you have people in your data set
  • 00:50:11
    essentially then what happens is your
  • 00:50:14
    fully parameterized model looks like the
  • 00:50:17
    saturated model right if you have as
  • 00:50:20
    many covariant
  • 00:50:21
    patterns as people it sort of means that
  • 00:50:23
    the most parameterised model has as many
  • 00:50:26
    terms as people if it's not that sounded
  • 00:50:28
    really circular but as you get more and
  • 00:50:29
    more covariant patterns you start
  • 00:50:31
    fitting what essentially becomes a
  • 00:50:33
    saturated model and we said that doesn't
  • 00:50:34
    work so what happens is this test only
  • 00:50:37
    works if the number of G covariant
  • 00:50:40
    patterns is pretty small not like not
  • 00:50:42
    you don't have lots of unique
  • 00:50:44
    intersections of continuous variables
  • 00:50:46
    and you have as many covariant patterns
  • 00:50:48
    as people okay so this only only works
  • 00:50:52
    if you have essentially a handful or not
  • 00:50:56
    too many sort of categorical zero one or
  • 00:51:00
    zero one two three sort of X variables
  • 00:51:02
    if you you if you would basically have
  • 00:51:05
    as many covariant patterns as people and
  • 00:51:08
    you can do this because essentially you
  • 00:51:10
    wind up getting the Satch for this this
  • 00:51:12
    test that doesn't work which is the one
  • 00:51:14
    that's based on the saturated model okay
  • 00:51:16
    so that's that's a problem when we have
  • 00:51:21
    continuous data so when this doesn't
  • 00:51:23
    when this doesn't when this doesn't work
  • 00:51:24
    we don't use this test statistic based
  • 00:51:26
    on the essentially a likelihood ratio
  • 00:51:28
    test versus the fully parameterized
  • 00:51:30
    model we do something called the hosmer
  • 00:51:32
    lemma show test which we're gonna start
  • 00:51:34
    or we are not gonna finish so we're just
  • 00:51:37
    what we can pick it up at the beginning
  • 00:51:38
    of next time but the Hosmer lemma show
  • 00:51:41
    test I'll just say what it does and give
  • 00:51:43
    you a sense of it it's it's actually way
  • 00:51:46
    more user-friendly it's gonna make so
  • 00:51:48
    much sense you're gonna it's gonna make
  • 00:51:50
    a lot more sense than the last thing we
  • 00:51:52
    just talked about which will make more
  • 00:51:54
    sense when you review it and and feel
  • 00:51:57
    free to ask questions of me the the D
  • 00:52:01
    okay so the cosmo lemma show test also
  • 00:52:04
    measures how close we are to the average
  • 00:52:07
    predicted probabilities in our data set
  • 00:52:10
    so it's good to say hey does your model
  • 00:52:12
    predict the average experience of people
  • 00:52:14
    at these covariants well not the exact
  • 00:52:16
    as you are ones of people but the
  • 00:52:18
    average experiences of people with
  • 00:52:19
    giving covariant patterns okay
  • 00:52:22
    it has nothing to do with deviance so
  • 00:52:24
    this is good because it's going to not
  • 00:52:26
    be limited by are the ways that our
  • 00:52:27
    deviance tests were and it's really
  • 00:52:30
    great when you have many many many Cove
  • 00:52:32
    area patterns so if you have if you have
  • 00:52:34
    all these unique people in your dataset
  • 00:52:35
    it's gonna be great when you when you
  • 00:52:40
    have that many Cove area patterns and it
  • 00:52:42
    actually thrives when you have a lot so
  • 00:52:44
    it actually is it's not a very powerful
  • 00:52:46
    test when you have very few co-vary
  • 00:52:48
    patterns if you have just two variables
  • 00:52:51
    that are zero one and there's four
  • 00:52:52
    covariant patterns the test
  • 00:52:54
    like never shows significance it works
  • 00:52:57
    very well where the other test leaves
  • 00:52:58
    off and I'll just I'm just going to
  • 00:53:01
    throw throw this on the screen here very
  • 00:53:05
    very quickly and then we'll pick it up
  • 00:53:07
    next time the example that we're going
  • 00:53:08
    to use is sort of a a old
  • 00:53:13
    or of a classic example used in many
  • 00:53:15
    pedagogical examples from Evans County
  • 00:53:18
    it's a coronary heart disease study and
  • 00:53:20
    here are some here's some model here and
  • 00:53:22
    there's a matter what model we just want
  • 00:53:24
    to know about it what does this model
  • 00:53:26
    fit the data well and here it's got a
  • 00:53:28
    bunch of variables it's got age and we
  • 00:53:31
    expect that if because as age let's say
  • 00:53:33
    age is measured you know as it's
  • 00:53:35
    continuous it's going to have a lot of
  • 00:53:37
    Cabiria patterns and maybe as many
  • 00:53:38
    unique people unique patterns of
  • 00:53:41
    covariates as you have people and we're
  • 00:53:43
    not going to want to use that deviant
  • 00:53:44
    statistic we're going to want to use the
  • 00:53:46
    Hosmer lemma so test and basically what
  • 00:53:49
    it does is it makes a table that looks
  • 00:53:52
    something like this
  • 00:53:53
    it divides your data into into groups
  • 00:53:56
    into deciles of Ritt of what spells
  • 00:53:58
    deciles of risk and we'll talk about it
  • 00:54:01
    we'll talk about what that means but
  • 00:54:04
    basically it takes your model and
  • 00:54:05
    predicts for everybody they're probably
  • 00:54:07
    their personal probability of disease
  • 00:54:09
    it's like hey you you're you're 18
  • 00:54:11
    you're you know this is this is your
  • 00:54:15
    this is your cholesterol level and this
  • 00:54:17
    is your Kobe reott I predict you know
  • 00:54:19
    through the machine I predict your
  • 00:54:21
    probability of heart disease is point
  • 00:54:22
    two okay
  • 00:54:23
    I only observed a zero one right I'm
  • 00:54:26
    that person in the data set I only had a
  • 00:54:29
    zero one but the model says hey your
  • 00:54:31
    probabilities point two that's how the
  • 00:54:32
    logistic regression works right you have
  • 00:54:34
    a zero one it predicts the point two but
  • 00:54:37
    what it does is it summarizes the entire
  • 00:54:39
    data set here in this data set there's
  • 00:54:41
    you know basically 10 times 61 so hunts
  • 00:54:44
    609 people or in this data set do
  • 00:54:47
    essentially it took these this
  • 00:54:49
    represents the entire data set and it
  • 00:54:51
    looks at how well the some of those
  • 00:54:54
    predictions compared to the some of the
  • 00:54:56
    observed events so what this means is
  • 00:54:58
    for example there were two there were 61
  • 00:55:01
    people in this lowest decile and then
  • 00:55:04
    the 61 people only two of them had all
  • 00:55:06
    had heart disease and I took this now
  • 00:55:09
    with it then I took the logistic
  • 00:55:11
    regression model and I sum I ran for
  • 00:55:15
    each of the 61 people I predicted their
  • 00:55:18
    personal probability of disease out of
  • 00:55:20
    it out of the regression model they sum
  • 00:55:21
    those predictions up and they're gonna
  • 00:55:23
    be tiny because these are low risk
  • 00:55:24
    people there was only two only two of
  • 00:55:26
    them at heart
  • 00:55:27
    anyway and the sum of all of those 61
  • 00:55:30
    predictions was 0.94 okay edek so that
  • 00:55:33
    is the sum of everyone's prediction and
  • 00:55:35
    you do this for every deaf style and you
  • 00:55:38
    say hey was the sum of all those
  • 00:55:39
    predictions different from the observed
  • 00:55:41
    so here I saw two and I predicted 0.94
  • 00:55:45
    in this group of 61 people here this is
  • 00:55:47
    the high-risk group and of the 60 people
  • 00:55:51
    29 had heart disease in reality and the
  • 00:55:53
    sum of the 60 predictions for this
  • 00:55:56
    people was almost 29 so you might say
  • 00:55:59
    hey that's pretty good
  • 00:56:00
    on average when I split up the data into
  • 00:56:03
    this pretty 10 row table via the total
  • 00:56:07
    of the observed events was very similar
  • 00:56:09
    the total of what the model predicted
  • 00:56:12
    for this very very same people okay this
  • 00:56:14
    is the Hosmer lemosho test it's gonna
  • 00:56:16
    look at the difference between these two
  • 00:56:18
    and if your model sort of nailing it
  • 00:56:20
    every time or on average you know hey
  • 00:56:23
    look the model said you know the model
  • 00:56:26
    says 29 and I observe 29 that's pretty
  • 00:56:28
    good here I'm only off by one on average
  • 00:56:31
    you have two versus 0.9 for one versus
  • 00:56:34
    two 1.96 that's sort of - those are sort
  • 00:56:37
    of small deviations maybe and so all
  • 00:56:40
    this is doing it's just simple it's a
  • 00:56:41
    chi-square test of this table that's
  • 00:56:43
    just hey on average how does the model
  • 00:56:45
    do relative to reality this is so much
  • 00:56:48
    more user friendly than the deviant
  • 00:56:49
    statistics and this is the basis of the
  • 00:56:51
    hospital MSO test which we'll talk about
  • 00:56:53
    on Thursday so thanks
  • 00:56:56
    [Music]
タグ
  • ajustement des modèles
  • modèle saturé
  • statistiques de déviance
  • régression logistique
  • test de Hosmer-Lemeshow
  • covariables
  • analyse des données
  • vraisemblance
  • interactions
  • modèles statistiques