Class 10: Goodness of Fit: Saturated model, Covariate patterns, Deviance, Hosmer-Lemeshow statistic.
Résumé
TLDRLa vidéo traite de l'évaluation de l'ajustement des modèles statistiques, notamment en régression logistique. Elle explique l'importance de comprendre comment les modèles décrivent les données observées, introduisant le concept de modèle saturé, qui ajuste parfaitement les données mais est impraticable dans la pratique. Les statistiques de déviance sont discutées, permettant de quantifier à quel point un modèle s'écarte de ce modèle idéal. La vidéo conclut sur le test de Hosmer-Lemeshow, une méthode robuste pour évaluer l'ajustement, particulièrement adaptée lorsque le nombre de motifs covariants est élevé.
A retenir
- 📊 La goodness-of-fit évalue la capacité d'un modèle à décrire les données.
- 📉 Le modèle saturé prédit parfaitement mais est impraticable.
- 🧮 La déviance compare les modèles d'intérêt à un modèle de référence.
- 🆚 Le test de Hosmer-Lemeshow est plus convivial pour évaluer l'ajustement.
- 🔍 Comprendre les motifs de covariabilité est crucial pour l'analyse.
- ⚖️ Les statistiques de déviance mesurent l'écart par rapport au modèle idéal.
Chronologie
- 00:00:00 - 00:05:00
Ce segment présente une introduction au concept de "goodness-of-fit" (degré d'adéquation). Il explique que le degré d'adéquation concerne la capacité d'un modèle à décrire les données disponibles, même si le modèle peut donner des résultats statistiquement significatifs sans réellement bien prédire les sorties.
- 00:05:00 - 00:10:00
Le conférencier souligne l'importance de comprendre la réaction entre le model et les données observées tant dans le contexte de la régression linéaire que logistique. L'idéal serait que les valeurs prédites coïncident exactement avec les valeurs observées, ce qui est rarement atteint dans la pratique.
- 00:10:00 - 00:15:00
Il définit les termes "good fit" et "lack of fit", et explique comment ils sont liés à l'hypothèse nulle (bonne adéquation). Plus de prédicteurs dans un modèle entraînent généralement un meilleur ajustement, mais il peut aussi aboutir à des modèles inutilisables.
- 00:15:00 - 00:20:00
Le conférencier introduit deux approches pour évaluer le degré d'adéquation : les approches basées sur la déviance et le test de Hosmer-Lemeshow. Il explique que certaines approches sont idéales mais impraticables, tandis que d'autres sont applicables et fournissent des résultats significatifs.
- 00:20:00 - 00:25:00
Le concept de "saturated model" (modèle saturé) est décrit comme un modèle qui peut parfaitement prédire les résultats en fonction de chaque point de données, mais qui est impraticable car il nécessite autant de paramètres que de points de données.
- 00:25:00 - 00:30:00
Dans le cadre de l'évaluation de la déviance, il est précisé qu'il y a deux modèles à considérer : le modèle saturé et le modèle pleinement paramétré. Le modèle saturé, bien que parfait, n'est pas réalisable pour des applications pratiques.
- 00:30:00 - 00:35:00
Le conférencier montre qu'il est nécessaire de formaliser certaines notations pour parler des résultats des modèles. Il illustre cela avec des tables et des termes tels que Y (observés) et P (prédits) pour aider à la compréhension.
- 00:35:00 - 00:40:00
Le "saturated model" est présenté comme capable de générer des prédictions qui ajustent les données exactement, mais ces modèles sont complexes à gérer. En revanche, le "fully parameterized model" prédit les moyennes des résultats en fonction des covariables et est plus utile malgré sa complexité.
- 00:40:00 - 00:45:00
Il est expliqué que le modèle pleinement paramétré, bien qu'il ne prédit pas exactement les valeurs 0 ou 1, permet de prédire les résultats moyens de manière adéquate. Ce modèle est alors comparé au modèle que l'on veut évaluer.
- 00:45:00 - 00:50:00
Le conférencier explique qu'il faut évaluer le nombre de motifs de covariables qui existent dans un ensemble de données. Si le nombre de motifs est trop élevé, le modèle pleinement paramétré devient quasiment identique au modèle saturé, ce qui entraîne des problèmes d'interprétabilité.
- 00:50:00 - 00:57:03
Il conclut ce segment par une introduction au "Hosmer-Lemeshow test", qui évalue la manière dont un modèle peut prédire les résultats en divisant les données en groupes propices pour évaluer le degré d'adéquation. Ce test est pertinent là où le modèle pleinement paramétré ne le serait pas.
Carte mentale
Vidéo Q&R
Qu'est-ce que le test de Hosmer-Lemeshow?
C'est un test qui évalue comment le modèle prédit les résultats observés en divisant les données en déciles et en comparant les sommes des prédictions aux sommes des résultats observés.
Pourquoi le modèle saturé n'est-il pas pratique?
Parce qu'il nécessite un nombre de paramètres équivalent au nombre d'observations, ce qui le rend trop complexe et non interprétable.
Comment la déviance est-elle utilisée pour évaluer l'ajustement d'un modèle?
La déviance mesure l'écart entre la vraisemblance du modèle d'intérêt et la vraisemblance d'un modèle de référence, généralement le modèle saturé.
Qu'est-ce qu'un modèle complètement paramétré?
Un modèle qui inclut tous les termes possibles et interactions des covariables, mais ne prédit pas nécessairement les résultats zero-un de manière exacte.
Quels sont les inconvénients d'utiliser la déviance avec des données continues?
Si le nombre de motifs covariants est trop grand, le modèle complètement paramétré ressemble au modèle saturé, rendant la déviance non informative.
Comment la covariabilité affecte-t-elle les tests d'ajustement?
L'augmentation du nombre de covariables continues peut conduire à une grande diversité de motifs dans les données, compliquant l'interprétation des ajustements.
Voir plus de résumés vidéo
Demo uterus
Gout - causes, symptoms, diagnosis, treatment, pathology
Évaluer les élèves : en finir avec la constante macabre - André Antibi
Oeufs, poules, cloches et lapins en chocolat, d'où viennent ces traditions de Pâques ?
Oeufs, cloches, chocolats : d'où viennent les traditions de Pâques ?
CD Lecture Series: Silvio Lorusso
- 00:00:00okay so we're gonna jump into
- 00:00:02goodness-of-fit and this one gets a
- 00:00:07little it's gonna get a little esoteric
- 00:00:09before it gets practical I feel like I
- 00:00:12always get these lectures where we go
- 00:00:13out into a limb then we come back in and
- 00:00:15the reason that we do that is because
- 00:00:18with this topic we're gonna sort of show
- 00:00:20what the ideal scenario would be and it
- 00:00:23turns out the ideal thing that we want
- 00:00:24to do is impossible and the stuff that
- 00:00:26we want to do Hoss comes later because
- 00:00:29we can't do the ideal thing is we have
- 00:00:31to talk about the ideal thing and that's
- 00:00:33the esoteric part but we'll get there
- 00:00:36together it will be okay so so
- 00:00:40goodness-of-fit so what are we talking
- 00:00:41about when we mean or were when we talk
- 00:00:45about goodness-of-fit okay it's really
- 00:00:46about how good is the model in terms of
- 00:00:49describing the data that we have okay
- 00:00:52and it's this basic idea that you could
- 00:00:54have a model that has statistically
- 00:00:56significant things looks great
- 00:00:57cool cool odds ratios tight at
- 00:00:59confidence intervals and things that you
- 00:01:01know based so far and everything we've
- 00:01:04said in the class and all the metrics
- 00:01:05we've described looks like a good model
- 00:01:07but it actually doesn't really describe
- 00:01:10the data very well so you can get
- 00:01:11significant things but the model just
- 00:01:13sort of stinks in terms of actually
- 00:01:17predicting the outcome okay and in
- 00:01:21predicting the probabilities of the
- 00:01:23outcome so this could be you know think
- 00:01:26of simple linear regression where you
- 00:01:29can have a significant you can have a
- 00:01:31scatterplot with something you know some
- 00:01:33points on it and you draw a line and the
- 00:01:35debate ax for the line is significant
- 00:01:37okay it's a statistically significant
- 00:01:40nonzero slope to that line and it go
- 00:01:42cuts through your data but your data
- 00:01:44might be a very sort of cloud like not
- 00:01:47very tight to the line right and it may
- 00:01:48just sort of not fit super well and the
- 00:01:51line fits through the data and you have
- 00:01:52a number for the slope and the p-values
- 00:01:54tiny and it could but it actually may
- 00:01:56not describe all the variation in your
- 00:01:59data very well so you can get the same
- 00:02:01sort of phenomenon with with logistic
- 00:02:05regression you fit the model stuff looks
- 00:02:06great except minute just may not be the
- 00:02:09best model may not describe the data
- 00:02:11very well and
- 00:02:12so what we're talking about here is when
- 00:02:17we mean by fit formally is that for
- 00:02:19every observation in the ideal world we
- 00:02:23would really like the the distance
- 00:02:26between the observed data points that
- 00:02:28this deserved outcomes Y and the
- 00:02:30estimated outcomes Y I hat sorry Y I hat
- 00:02:35this is from the model we'd like them to
- 00:02:37be 0 we want the predict this is the
- 00:02:38predicted value and the observed value
- 00:02:40and we like them to sort on average be
- 00:02:43sort of close alright we want in general
- 00:02:45our model to predict things that are
- 00:02:47close to the observed the observed data
- 00:02:50okay so the the terms that we are often
- 00:02:55using in this line of work or a lack of
- 00:02:58fit and good fit
- 00:03:00okay so lack of fit is evidence of bad
- 00:03:02fit okay that that this this these discs
- 00:03:06these distances tend to be big okay that
- 00:03:09means there's bad fit that the model is
- 00:03:10predicting something that's way off from
- 00:03:12reality the observed wise okay and that
- 00:03:15typically is that we're rejecting some
- 00:03:18null hypothesis that's and then we
- 00:03:21inject the null hypothesis we said
- 00:03:22there's lack of fits the null hypothesis
- 00:03:24is that there's good fit there's a lock
- 00:03:26this is this is this is the total weird
- 00:03:28way as in statistics that we sort of
- 00:03:29describe things the null hypothesis
- 00:03:32typically is that the fit is good or
- 00:03:34that there's lack of evidence of bad fit
- 00:03:36remember all this sort of
- 00:03:36tongue-twisting that we do for the null
- 00:03:38hypothesis that there's like lack of
- 00:03:40evidence that this isn't true and so
- 00:03:41that's sort of what the idea is for the
- 00:03:44null hypothesis typically is that the
- 00:03:45fit is that we don't have evidence that
- 00:03:46the model is bad and then we call it
- 00:03:48good fit this is I'm writing this all
- 00:03:51out this seems like all these double
- 00:03:52negatives but this is exactly how its
- 00:03:54defined so I'm just trying to just
- 00:03:57trying to put you up front with it the
- 00:03:58channeling the null hypothesis is good
- 00:04:00fit and that the alternative is let
- 00:04:02there's a lack of fit and the idea here
- 00:04:04is typically when we add more predictors
- 00:04:07into the model we usually get a better
- 00:04:09fit right the more think terms you add
- 00:04:11to the model the more the fit to the
- 00:04:13observed data goes up at some point you
- 00:04:15add too many predictors and your model
- 00:04:17is sort of meaningless because you added
- 00:04:18500 things to it so this idea this this
- 00:04:22was this is this idea is used all over
- 00:04:24the place and linear regret
- 00:04:26right that you're trying to sort of
- 00:04:27narrow the distance between what the
- 00:04:29model predicts for the outcome versus
- 00:04:32the observed data okay so so we're going
- 00:04:36to talk about three three approaches to
- 00:04:39dealing with with goodness of fit today
- 00:04:41and this first one the first two are
- 00:04:44using this idea of the deviance and
- 00:04:46we'll talk about what we mean by the
- 00:04:47deviance but within that there's sort of
- 00:04:49two there's sort of two subtypes there's
- 00:04:55deviance is based on something called
- 00:04:56the saturated model and we're going to
- 00:04:58talk about this is the thing that's the
- 00:05:00ideal that we it's not actually useful
- 00:05:02or useable in in life in practice and so
- 00:05:06we're going to talk about something
- 00:05:07which is almost as good it's called it's
- 00:05:10gonna be called the avenged trials
- 00:05:11deviance and it's based on something
- 00:05:12called the fully parameterized model and
- 00:05:14we'll talk about all that so this is the
- 00:05:16thing that is what we can use and this
- 00:05:18is the thing that we would like to use
- 00:05:20but we can't okay so this is there's
- 00:05:22things that use these are the deviance
- 00:05:23based approaches then there's this
- 00:05:25totally other thing called the Hosmer
- 00:05:26lemma show test okay and it's based it's
- 00:05:29trying to get it the same idea of
- 00:05:30goodness of fit but is using a totally
- 00:05:32different approach okay and Hosmer lemma
- 00:05:34show toss is what we'll talk about at
- 00:05:35the end and then actually in in the
- 00:05:37literature this is the thing you see the
- 00:05:39most is the Hosmer lemma show test so
- 00:05:42sort of two groups of what we'll talk
- 00:05:45about okay here we go this is exactly
- 00:05:48what I just said okay so this is the
- 00:05:50ideal we can't do it and we're gonna
- 00:05:55talk about for these last two the ones
- 00:05:56you can do which ones are better to use
- 00:06:00when the Hosmer lemma show is a more
- 00:06:02universal test it has a few limits so
- 00:06:04we'll talk about its limits
- 00:06:05this one has some limits and so what
- 00:06:08we're going to talk about the events
- 00:06:09trials and where you can't use it which
- 00:06:10it turns out is it happens a lot
- 00:06:12okay so we're so to get to all of that
- 00:06:16we're gonna have to talk about some
- 00:06:17notation okay and they're you know
- 00:06:21because they all look sort of similar so
- 00:06:23we got a I'm gonna sort of illustrate
- 00:06:25the notation so you're not like oh my
- 00:06:27god they all look like Y and P and then
- 00:06:29we sort of differentiate them to be able
- 00:06:32to work through it because these
- 00:06:33quantities are really important to
- 00:06:34understanding the goodness of fit sort
- 00:06:36of idea and how they're all the methods
- 00:06:38are working okay
- 00:06:40so here's four quantities and here's
- 00:06:43here's some stuff here's a two-by-two
- 00:06:44table okay really really simple two by
- 00:06:48two table here's exposed on the columns
- 00:06:49disease on the rows here's some just
- 00:06:53some data
- 00:06:54okay here's three lines of data real
- 00:06:56pretend these are people here's their
- 00:06:58here's the disease status here is the
- 00:07:00exposed status and we'll talk about this
- 00:07:02is supposed to say P hat okay so this is
- 00:07:04we'll talk about that but this is
- 00:07:05supposed to be the lines of lines of
- 00:07:07data view and this is the sort of
- 00:07:13two-by-two table view okay so why I this
- 00:07:15is the easy one these are the observed
- 00:07:18values of the outcome in your data okay
- 00:07:22the Y eyes so in linear regression these
- 00:07:24were continuous here they're essentially
- 00:07:27your zero one outcomes in your you know
- 00:07:32your zero one outcomes in your data okay
- 00:07:35so here are Y eyes are the the D
- 00:07:38variables right and so your one zero one
- 00:07:42on the bottom and the Y is here are
- 00:07:44summarized as counts right so here we
- 00:07:46have six disease exposed and so forth
- 00:07:49okay so these are the observed values
- 00:07:51and the data this thing called P hat and
- 00:07:55this is supposed to be a subscript
- 00:07:56innocent come out so L subscript X X
- 00:07:59eyes this is the average values of Y of
- 00:08:05being of being diseased and on diseased
- 00:08:08in your data at different values of your
- 00:08:10covariance so what does that mean so
- 00:08:13here in this column of exposed six out
- 00:08:17of ten or point six have disease okay so
- 00:08:21this this is like the risk or the
- 00:08:23probability of disease or what everyone
- 00:08:25will call the probability called let's
- 00:08:27call it the probability of disease given
- 00:08:28your exposed is one your one for
- 00:08:31exposure 0.4 is the probability of
- 00:08:34disease given your unexposed this is
- 00:08:36this we call this quantity many many
- 00:08:38things risk probability and so forth
- 00:08:40this is so here we're going to society
- 00:08:43just say with a really generic way it's
- 00:08:46the
- 00:08:47observed it's basically the party's
- 00:08:50probabilities at the values at these
- 00:08:54values of X so at right here's the X
- 00:08:57variable here is we'll have one X
- 00:08:59variable its exposure so an exposure one
- 00:09:03this is the observed probability and
- 00:09:05here it is it for exposure zero okay it
- 00:09:08turns out what you know this is also
- 00:09:10equal to the average value of y right
- 00:09:14because there's only there's there's six
- 00:09:19times one and four times zero and divide
- 00:09:22it by 10 and you get point six right
- 00:09:24this is sort of it right that this is
- 00:09:26the mean the mean of all the zeros and
- 00:09:28ones for this column is 0.6 it's the
- 00:09:31same thing so it says the observed
- 00:09:32average values in your data then the
- 00:09:35proportion is the mean of the zeros and
- 00:09:37the ones okay so that's that's P sub X I
- 00:09:44here okay cool so that is these this
- 00:09:48little one here okay so why I hat our
- 00:09:53predicted values of Y from some model
- 00:09:56okay this is again this had on the last
- 00:10:00slide this is notation we've seen before
- 00:10:02this this might be you have there we go
- 00:10:07okay well does she does not I don't have
- 00:10:10why I hats on here because we haven't
- 00:10:11actually fit a model this is you you fit
- 00:10:15some model with some covariance in your
- 00:10:16data to your data and you make your
- 00:10:20model predict values of of the outcome
- 00:10:23at give at different values of X so just
- 00:10:25like we have predicted odds ratios you
- 00:10:28can use a logistic regression model to
- 00:10:29have predicted probabilities that's what
- 00:10:32the why I hats mean these are
- 00:10:33predictions that come from a model
- 00:10:35different models of different covariance
- 00:10:37are going to give you different
- 00:10:37predictions right that's sort of that's
- 00:10:40that's the idea here is that you fit a
- 00:10:43model with two these two covariance it
- 00:10:44gives you different certain predictions
- 00:10:46for people and you put in another three
- 00:10:47covariance and you get different
- 00:10:48predictions these other things are sort
- 00:10:51of fixed in your data set the Y's are
- 00:10:53just what your data you have 200 people
- 00:10:55in your study you have 200 Y eyes you
- 00:10:59can get many many Y hats
- 00:11:00is depending on all the different models
- 00:11:02you fit to your 200 data points in your
- 00:11:05data set so this thing changes for every
- 00:11:08model you have and these okay so these
- 00:11:17then this other quantity P hats P hat X
- 00:11:20I are essentially estimates of the of
- 00:11:25the probability of Y from your model
- 00:11:28given some Cove areas so this is
- 00:11:29actually very very similar to the thing
- 00:11:31I said over here this is this this
- 00:11:35quantity up here this I okay sorry the
- 00:11:39bottom one the P P hat X i we used this
- 00:11:41notation for logistic regression here it
- 00:11:43is over here
- 00:11:44okay this these are this is truly the
- 00:11:46predicted probabilities from a logistic
- 00:11:48regression model I sort of use that
- 00:11:50language to describe Y hat some sub sub
- 00:11:53i's which is a very very related concept
- 00:11:55except that these P this P hat X I from
- 00:12:01a logistic regression model is always a
- 00:12:02probability read you you you fiddle a
- 00:12:05district regression model to your zero
- 00:12:06one data and it does not spit out zeros
- 00:12:09and ones it spits out probabilities okay
- 00:12:11it spits out 0.6 0.6 0.4 let's say if
- 00:12:15you fit this model to your data because
- 00:12:19these are the logistic regression model
- 00:12:22predicts the probability of the outcomes
- 00:12:24not the outcomes themselves right that's
- 00:12:26sort of one of our the features of
- 00:12:28logistic regression model it your your
- 00:12:30data are 0 and ones but you have a model
- 00:12:32it spits out probabilities that sort of
- 00:12:34on average describes what happens in
- 00:12:35your population that's a little
- 00:12:37different from y hat sub I which is
- 00:12:39actually a model that might in theory
- 00:12:41spit out zeros and ones directly our
- 00:12:43liberal adjusted congressional model
- 00:12:45doesn't do that but this is a
- 00:12:47theoretical quantity that we might wish
- 00:12:49to talk about it's a little bit
- 00:12:51different this is miss described this a
- 00:12:54little bit more like this one all that's
- 00:12:57to say is observed data the observed
- 00:13:00averaged value of the data and this is
- 00:13:04predicted zero ones and this is
- 00:13:06predicted probabilities so these are
- 00:13:08sort of paired ideas and we're going to
- 00:13:10use these four ideas to talk about
- 00:13:12goodness of fit so
- 00:13:15again so this this one is a little more
- 00:13:18theoretical the Y hat sub eyes and we're
- 00:13:20going to just sort of dive in with all
- 00:13:21this any questions I know it's a little
- 00:13:24tongue twister E and they're all a
- 00:13:26little they all have overlaps in their
- 00:13:28definition so we can come back to it but
- 00:13:30does anyone have any questions okay
- 00:13:35you can shout out you can raise your
- 00:13:36hand I'm approachable okay I promise
- 00:13:41okay so there's this thing called the
- 00:13:44saturated model okay a saturated model
- 00:13:47is a model that is able to generate 0 1
- 00:13:54values 0 1 values it's a type of
- 00:13:57logistic regression model that actually
- 00:13:58can generate 0 1 values B it's a weird
- 00:14:01thing because we normally just said that
- 00:14:02the logistic regression model only spits
- 00:14:04out probabilities but well Walsh I'll
- 00:14:06show you what this looks like it is able
- 00:14:09to perfectly predict the outcome data ok
- 00:14:12and so what that means is for you if you
- 00:14:17have a data set with say 200 why isn't
- 00:14:19it 200 observations it is able to
- 00:14:22produce predictions such that the
- 00:14:24difference between the predictions of
- 00:14:26the observed value is 0 is this is a
- 00:14:27perfectly predicting model it predict
- 00:14:30produces estimates and this just this
- 00:14:31distant this difference is 0 okay and
- 00:14:34what this model looks like is it is a
- 00:14:38model that has as many terms in it as
- 00:14:41you have data points so you have 200
- 00:14:44observations you're going to have 200
- 00:14:46coefficients in your model okay it's a
- 00:14:49sort of impractical model okay it's so
- 00:14:52here this is that's what this is saying
- 00:14:53you have an intercept and if you have n
- 00:14:55data points you have 1 to n minus 1 I
- 00:15:00said a little subscript this is a should
- 00:15:03be little J equals 1 over here you
- 00:15:06basically have n minus 1 code dummy
- 00:15:08variables and then an intercept or n
- 00:15:10total terms and this is a bottle that
- 00:15:13exactly is able to be if you fit as many
- 00:15:16terms in your model as you have people
- 00:15:17in your data set your model will exactly
- 00:15:20predict the outcomes okay well it will
- 00:15:24be beautiful
- 00:15:24you know exactly describe the data but
- 00:15:27it's a sort of
- 00:15:27worthless model you've put as many terms
- 00:15:30and as points and it basically fits a
- 00:15:31term for every single person in the data
- 00:15:34set and that term represents that
- 00:15:36person's value okay so this is a really
- 00:15:38weird thing the this model is gonna spit
- 00:15:41out predicted probabilities that are
- 00:15:43actually zero ones or very close to zero
- 00:15:46and one
- 00:15:46normally predicted probabilities are not
- 00:15:48exactly zero one it's sort of somewhere
- 00:15:50between but because you put this many
- 00:15:52terms of the model it's exactly
- 00:15:53predicting every point of data okay this
- 00:15:58is different than a model that this is
- 00:16:02different than something that we're
- 00:16:03gonna talk about later actually I mean
- 00:16:04this is gonna be the thing that we use
- 00:16:06in the fully pers we're gonna call the
- 00:16:09fully parameterized model but basically
- 00:16:12this is different than something that
- 00:16:13produces predicts the average
- 00:16:15probability of being a case this other
- 00:16:16thing that we talked about the so we're
- 00:16:19talking about a model that predicts 0.6
- 00:16:21and 0.4 we're talking about a model that
- 00:16:23predicts ones and zeroes okay that's all
- 00:16:26this means so this is a really really
- 00:16:29weird model it looks something like
- 00:16:30here's here's an example of a saturated
- 00:16:32model okay on this here's some data
- 00:16:34there's that table the six-four and so
- 00:16:37forth
- 00:16:37we're gonna put that it's we're gonna
- 00:16:39pretend there's some there's even some
- 00:16:41strata in the data let's say so here's
- 00:16:44here's a data set with 40 observations
- 00:16:46okay we have two two by two tables each
- 00:16:48has 20 people to straight us C equals
- 00:16:51one C equals zero the specifics of the
- 00:16:54tables don't matter at this point but
- 00:16:56basically you've got 40 people okay you
- 00:16:58fit you could fit a data you could fit a
- 00:17:00model with essentially 40 terms in it
- 00:17:02and you get something like this it's
- 00:17:05really ugly okay you get here's our
- 00:17:10here's our data set it predicts
- 00:17:12probabilities that are essentially 1 and
- 00:17:150 okay this is our Saturday mile it's
- 00:17:17not exactly 1 and exactly 0 its but it's
- 00:17:20basically one in 0 okay and what is it
- 00:17:24doing
- 00:17:25it's C it says P hat this is again using
- 00:17:27the output I'm using the output from the
- 00:17:30model to actually just manually - I mean
- 00:17:32I'm using the output statement from the
- 00:17:33model to generate these it's predicting
- 00:17:35that the predicted probability of
- 00:17:37disease for person 1 is basically 1 and
- 00:17:40that match
- 00:17:40their disease status down for these
- 00:17:42people here it's predicting their
- 00:17:44probabilities of disease or zero and
- 00:17:46that is these people's disease status so
- 00:17:49if you compute the difference between
- 00:17:51disease and p-hat the predicted
- 00:17:55probabilities you get zero
- 00:17:57this model exact basically is always
- 00:18:00spits out the probability of disease
- 00:18:02more or less it's not exactly zero okay
- 00:18:06so this model is a saturated model the
- 00:18:09way I sort of fit it was I said I said
- 00:18:10class ID so every person is it has an ID
- 00:18:13variable and so it's making dummy
- 00:18:15variables for every ID and then they say
- 00:18:17model disease equals ID so you get a
- 00:18:20different term in the model for every
- 00:18:23single person don't do this at home this
- 00:18:25is not something you should do you know
- 00:18:27this is a not practical sort of model
- 00:18:31but anyway that's what it does it just
- 00:18:34exactly spits out it exactly spits out
- 00:18:37the probabilities of disease and as
- 00:18:41zeros and ones essentially the why these
- 00:18:44Y hat values okay and what we're going
- 00:18:47to talk about some of these deviant
- 00:18:49statistics in a moment okay but let's
- 00:18:52just say that's the fact is these are
- 00:18:54practically zero they're not exactly
- 00:18:56zero they're mostly zero okay that's an
- 00:18:59important piece of the deviance is
- 00:19:00basically zero alright so in summary
- 00:19:05this is a model to the saturated model
- 00:19:07is this hypothetical thing that exactly
- 00:19:09describes the data it males exactly the
- 00:19:12data but it's useless because it has as
- 00:19:14many terms on it as it has people it's
- 00:19:16totally useless but these are you but
- 00:19:20we're obviously we're talking about it
- 00:19:21for a reason so you'll find out what
- 00:19:23that is it's this ideal model that
- 00:19:27describes the data well but we want to
- 00:19:28compare to okay so this really says in
- 00:19:33words or in more words and formulas what
- 00:19:35we just saw on the previous slide that
- 00:19:37the saturated model has as many terms in
- 00:19:41dummy variables as you have subjects now
- 00:19:44basically what happens is here's for
- 00:19:46example it's pretending that dummy
- 00:19:48variable two is one so let's say this is
- 00:19:50let's pretend this represents ID 2 the
- 00:19:52second observation
- 00:19:54this model reduces to just the logit of
- 00:19:56p of y equals w-2 or a net w little
- 00:20:01Omega 2 and and that's this term over
- 00:20:04here and that means that the probability
- 00:20:07of disease for subject 2 is 1 over 1
- 00:20:11plus e to the minus this coefficient and
- 00:20:14this will always predict this is that
- 00:20:18this is the probability of disease for
- 00:20:21person to this is always true in any
- 00:20:24logistic regression model this is just
- 00:20:25the probability this is just the
- 00:20:27probability and because it's a saturated
- 00:20:29model it actually is spitting out 0 ones
- 00:20:31themselves because it's a saturated
- 00:20:33model so the probability is going to be
- 00:20:340 1 ok if if you put this into the
- 00:20:38likelihood formula for the logistic
- 00:20:41regression model the likelihood
- 00:20:43expression for such models where the
- 00:20:45probability equals the Y's themselves
- 00:20:48turns out to be 1 the low K so the
- 00:20:50likelihood of this model the saturated
- 00:20:52model is 1 it's the most it has a
- 00:20:54perfect likelihood or something of some
- 00:20:56sort you know it's it's likely it is
- 00:20:58always equal to 1 this is going to be
- 00:21:00important for the deviant statistics ok
- 00:21:02so this is it's not that interesting yet
- 00:21:04ok we have a model that's got as many
- 00:21:05terms as as data points and it perfectly
- 00:21:09describes the model but it's crazy
- 00:21:10because you would never do this
- 00:21:12so the saturated model so that but it's
- 00:21:14as building to this idea of the deviance
- 00:21:16okay so the saturated model is the best
- 00:21:19fitting model in the world it exactly is
- 00:21:22connecting the dots of your data and is
- 00:21:24literally it's got as many terms as data
- 00:21:26points okay but as you've said this is a
- 00:21:29this is not a good model to use right it
- 00:21:32comes at a cost
- 00:21:32this is not a useable model when you
- 00:21:34have 200 observations and 200 terms in
- 00:21:37the model and they're not interpretable
- 00:21:38they mean person 22 they don't mean you
- 00:21:41know history of alcoholism or whatever
- 00:21:44the thing that you're modeling is they
- 00:21:46just mean person or data point so that's
- 00:21:49not useful okay and but what this sort
- 00:21:54of serves is like a baseline for
- 00:21:56comparing other models to okay what so
- 00:22:00we don't want to use the saturated model
- 00:22:02but what if we can compare our model and
- 00:22:04how well it fits the data to how well
- 00:22:07the saturated
- 00:22:07model fits the data so we might be
- 00:22:10interested in it for comparison and
- 00:22:12that's this idea of the deviance okay
- 00:22:14and is our model which is not the
- 00:22:18saturated model but they only have five
- 00:22:19terms that is it close enough to the
- 00:22:21saturated model and if it's close enough
- 00:22:23we might say hey that's a pretty good
- 00:22:25fitting model it's sort of like it's
- 00:22:27pretty close to how the saturated model
- 00:22:28fits and that's the idea of the deviant
- 00:22:31statistics how its measured how much our
- 00:22:33our model deviates from the best model
- 00:22:37okay and it's very similar to the
- 00:22:39likelihood ratio test because the
- 00:22:41likelihood ratio test is like hey I got
- 00:22:43a model that's got two interaction terms
- 00:22:45and one interaction term let's see if
- 00:22:47one changes relative to the other
- 00:22:50all right died was the likelihood ratio
- 00:22:51test like let's see whether one model is
- 00:22:53different for another model and that's
- 00:22:55sort of the idea with the deviance tests
- 00:22:57except that we're always comparing in
- 00:22:59the deviance tests to the the best
- 00:23:01models that fits the data really well
- 00:23:03okay and that's formally this thing here
- 00:23:06this is the deviant statistic it's very
- 00:23:07very similar to the likelihood ratio
- 00:23:09statistic and which I have in its slide
- 00:23:12at the end of the of the PowerPoint we
- 00:23:14have another slide to showing the
- 00:23:16relationship between the two but the
- 00:23:18idea of the deviance is it says it's a
- 00:23:20it's a quantity that you can
- 00:23:22statistically test and it has a ratio in
- 00:23:25it that is the ratio of likelihood so
- 00:23:27it's very similar to like the oratio
- 00:23:28test okay and the numerator is the
- 00:23:31likelihood of the model that you've got
- 00:23:32whatever it is somewhat you got some
- 00:23:35model you think it's great but you're
- 00:23:36not sure it's great which is why you're
- 00:23:37doing this test okay so you have some
- 00:23:39candidate model and an in denominator
- 00:23:41you have the best the best fitting model
- 00:23:44okay which might be the saturated model
- 00:23:47okay and the idea that the smaller that
- 00:23:50this that this quantity is the better
- 00:23:53fitting your model is to the best the do
- 00:23:57your model is close to the best fitting
- 00:23:59model okay so this thing called the
- 00:24:01deviance statistic and the thing is what
- 00:24:05so this is our candidate model that
- 00:24:07we're we got our covariance of interest
- 00:24:09in but what do we choose for the
- 00:24:11comparison model is sort of the key
- 00:24:13piece and this is the difference between
- 00:24:15the first two deviance tests that I
- 00:24:17talked about on the I don't know some
- 00:24:19slides ago and
- 00:24:21the idea here is this thing called this
- 00:24:22there's two choices for the likelihood
- 00:24:25that you might plug into the denominator
- 00:24:26one is called the subject specific
- 00:24:30deviance or the deviant this or the
- 00:24:33subject specific likelihood and then
- 00:24:36once called the events trials based one
- 00:24:38the subject specific one is essentially
- 00:24:41the this is the likelihood of this in
- 00:24:43the saturated model okay so this is the
- 00:24:46model we were just talking about you put
- 00:24:48it in the denominator the likelihood of
- 00:24:49the model where you have as many terms
- 00:24:51as you've got observations because that
- 00:24:54is literally the best model it spits out
- 00:24:57zeros and ones exactly that match your
- 00:24:59observed data okay the other option is
- 00:25:02something called the avenge trials based
- 00:25:04likelihood or basically it's a model
- 00:25:06from a full fully parameter it's the
- 00:25:08fully parameterized model and we'll talk
- 00:25:10about what we mean by that but it's
- 00:25:11basically got as many numbers of it's
- 00:25:16basically every covariant in all of its
- 00:25:18interaction terms slow there I'll
- 00:25:20explain why we're gonna use that so it's
- 00:25:22a different thing and we're gonna we're
- 00:25:25gonna talk a bit about what this is this
- 00:25:26likelihood but this is the likelihood of
- 00:25:28the saturated model this likelihood we
- 00:25:32said was always equal to one okay it's
- 00:25:35always equal to one because it is the
- 00:25:36saturated model which by definition fits
- 00:25:38the data exactly it's little circular
- 00:25:40okay but that's what it's saturated fits
- 00:25:42together perfectly always equals to one
- 00:25:44so for this one the likelihood is
- 00:25:47usually far less than one so the problem
- 00:25:51is we cannot actually use this just the
- 00:25:54test statistic with this likelihood of
- 00:25:56the subject specific or the saturated
- 00:25:58model beats likelihood in the
- 00:25:59denominator so even though we would love
- 00:26:01to use it we can't and the reason is
- 00:26:04there's a bunch of reasons here so here
- 00:26:10is our equation that we just showed on
- 00:26:12them on the previous slides well here's
- 00:26:13negative two times the log of the
- 00:26:16likelihood of our sort of the model we
- 00:26:18care about over the the likelihood from
- 00:26:21the saturated model this thing is old
- 00:26:23this denominator is always equal to one
- 00:26:25okay
- 00:26:26and so it's just negative two times the
- 00:26:28log of the likelihood of our model okay
- 00:26:32and this is this is the expression for
- 00:26:33that
- 00:26:34that likelihood from our model that we
- 00:26:37care about so this sorry these two
- 00:26:42bullet points just summarize that what
- 00:26:46we just said it at this this this is
- 00:26:48this thing is just the D bein this
- 00:26:50quantity here which is just the negative
- 00:26:52to a lot of likelihood of our model is
- 00:26:54that what we call this what the deviance
- 00:26:57statistic is for any given model it's
- 00:27:00the negative to log likelihood and we
- 00:27:03call this thing the deviance
- 00:27:04subject-specific but it's basically the
- 00:27:07it's basically the deviance owing to a
- 00:27:10saturated model and this is very
- 00:27:14uninteresting because if the denominator
- 00:27:16is always equal to one
- 00:27:18we're not really giving or not that's no
- 00:27:19new and that's not telling us anything
- 00:27:21so this Dominator is always equal to one
- 00:27:23and OVA saturated model then the
- 00:27:25deviance statistic is just the negative
- 00:27:27to log likelihood of your given model
- 00:27:29and that is himself is very
- 00:27:30uninteresting and very uninformative so
- 00:27:33there's some part of the problems we're
- 00:27:36not making an actual comparison if the
- 00:27:39denominator is always equal to one I
- 00:27:40said in theory we would love to compare
- 00:27:42our model to some ideal model but that
- 00:27:45comparison is not possible because the
- 00:27:46saturated model always has the same
- 00:27:48likelihood and so it's really it's it's
- 00:27:53something we like to do but
- 00:27:54mathematically we're left by always
- 00:27:56dividing by one and so we're all we have
- 00:28:00is our predicted probabilities from our
- 00:28:02model and we're not comparing to the
- 00:28:04observed values in the data the why is
- 00:28:06that we're not comparing to the
- 00:28:08saturated model all this is doing is
- 00:28:10just this is just the likelihood of the
- 00:28:12model we have but the whole point of
- 00:28:14goodness of fit is that we're comparing
- 00:28:16how we're doing to some to some standard
- 00:28:19we're comparing our models predictions
- 00:28:21to why eyes or we're comparing it to the
- 00:28:24saturated model we've got a compared to
- 00:28:25something otherwise we don't know how
- 00:28:26well we fit if we can't compare our
- 00:28:29model to something so all that's to say
- 00:28:32is that this quantity is cool except
- 00:28:34that this denominator is always equal to
- 00:28:36one and so the thing we wanted to do
- 00:28:38isn't actually possible this is really
- 00:28:41unexpected because we thought we would
- 00:28:45the saturate model would be really good
- 00:28:47it would be good but it's not because
- 00:28:50it's always equal to the likelihood is
- 00:28:52always equal to 1 so this deviance
- 00:28:55statistic isn't very good but it's not
- 00:28:57useful for goodness to fit testing ok
- 00:29:00and so the ideal isn't possible so we're
- 00:29:03gonna settle for the thing that's next
- 00:29:04to the ideal which is a different
- 00:29:07deviant statistic that uses a different
- 00:29:09denominator and that we think is
- 00:29:11actually the next best thing so that's
- 00:29:14why we're gonna we're going to talk
- 00:29:15about so how to show you the thing
- 00:29:16that's like theoretically we want to do
- 00:29:18but never works out and then we're gonna
- 00:29:20talk about the thing that does work out
- 00:29:22okay and we're gonna use something
- 00:29:25called the fully parametrize model so
- 00:29:28the saturated model that we talked about
- 00:29:30was that that first thing it perfectly
- 00:29:32it's it spits out zeros and ones
- 00:29:34directly out of the model of the
- 00:29:35logistic regression model which is
- 00:29:37really weird because they never
- 00:29:38legislate for irrational models we're
- 00:29:40not meant to spit out zeros and ones but
- 00:29:42it does is it does perfectly spits out
- 00:29:45zeros and ones here and reduces this
- 00:29:47difference always to zero okay and the
- 00:29:50people we said it's a really impractical
- 00:29:51model and it's likely it's always equal
- 00:29:53to one so we're gonna instead compare
- 00:29:57our model to something called the fully
- 00:29:58parameterized model and the fully
- 00:30:01parametrized model is essentially the
- 00:30:05most interaction filled model you can
- 00:30:08make out of your covariance okay so
- 00:30:10let's say you have X 1 X 2 or you know
- 00:30:13exposure 1 C 2 whatever you want to call
- 00:30:15it you know be using our other notation
- 00:30:18so you take all of the things all the
- 00:30:21predictors in your model all the basic
- 00:30:23sort of ingredients the main predictors
- 00:30:25in your model all the covariance and you
- 00:30:27make all possible product terms so if
- 00:30:30you've got you know five five terms in
- 00:30:33your model you know it's you know age
- 00:30:35sex whatever you know alcohol use and
- 00:30:38everyone whatever it is you put those in
- 00:30:40the model then you make take all the
- 00:30:42two-way products all the three ways up
- 00:30:44to the five way interaction term this is
- 00:30:47also not a practical model not a good
- 00:30:50idea to have five one interaction terms
- 00:30:52just because what does it mean if it's
- 00:30:54significant I don't know
- 00:30:56and probably you know it just won't be
- 00:31:00significant actually
- 00:31:01but anyway the idea is that it's a it's
- 00:31:03the most interaction filled model that
- 00:31:06you can make so what I mean by that is
- 00:31:08let's say you have just two predictors a
- 00:31:11and C your fully parameterized model is
- 00:31:15alpha plus beta times exposure gamma
- 00:31:17times C and in Delta times e times C
- 00:31:19this is literally the most complicated
- 00:31:22model you can make with the data you've
- 00:31:24got okay this is not the saturated model
- 00:31:27right the saturated model would be as
- 00:31:28many terms on the model as you have
- 00:31:30covariance okay that would be about
- 00:31:33coverts data lines of data right so if
- 00:31:36you have 100 people in your data set
- 00:31:37you'd have 100 terms in your model this
- 00:31:39is the most complicated model you can
- 00:31:41make with the X variables that you have
- 00:31:43okay and again if you have only a binary
- 00:31:45predictor this is then your fully
- 00:31:47parameterized models really boring it's
- 00:31:48just a term for exposure that that's all
- 00:31:51you got so this is not necessarily a
- 00:31:55saturated model right it could be if you
- 00:31:57have a you can sort of have a very tiny
- 00:31:59data set and it could sort of work out
- 00:32:01but what's cool is the fully
- 00:32:04parameterized model completely could
- 00:32:07always predict the average probability
- 00:32:10for each combinations of X's okay so
- 00:32:13what I mean by that is I'm going to jump
- 00:32:16back up here and I think will main even
- 00:32:18illustrate it down below just to trigger
- 00:32:22okay back up on this slide here your
- 00:32:26model the fully parameterized model
- 00:32:28doesn't spit out the zeros and ones
- 00:32:30exactly but for every pattern of
- 00:32:33covariates it perfectly predicts the
- 00:32:35average in that group the average
- 00:32:36probability in that group so for example
- 00:32:40if you had a just a model with exposure
- 00:32:42in it and you fit it to this data your
- 00:32:44model would spit out 0.6 and point four
- 00:32:49for your exposed unexposed so here we go
- 00:32:52over here so it's actually your model
- 00:32:54exactly predicts the average experience
- 00:32:56of that group not the zeros and ones
- 00:32:58themselves but it predicts the average
- 00:33:01and the population of the outcome so
- 00:33:02that's the difference between the that's
- 00:33:05what the fully parameterized model does
- 00:33:07and we're going to illustrate this with
- 00:33:08SAS in a few slides but I wanted to show
- 00:33:12it what I mean by that so it doesn't
- 00:33:14zeroes and ones it predicts the average
- 00:33:16exactly in the data set that the co
- 00:33:19various okay because that you're fitting
- 00:33:20as many terms as you have possible
- 00:33:24products of your of your variables and
- 00:33:27this will always predict the average
- 00:33:30experience of air for all combinations
- 00:33:32of X's okay but not the zeros and ones
- 00:33:36average so okay so what I mean by that
- 00:33:41is here so if you have so sorry okay
- 00:33:48there we go
- 00:33:49so one thing that's sort of important to
- 00:33:52think about is how is this idea of
- 00:33:54covariant patterns which is all the
- 00:33:57combinations of values of your X
- 00:33:59variables that you can have okay and
- 00:34:01this this is what this best is what the
- 00:34:03fully parameterized model is going to
- 00:34:05predict well it's gonna predict the
- 00:34:06average for every pattern of covariance
- 00:34:08SAS calls this thing unique profiles
- 00:34:11okay in the output but co-vary patterns
- 00:34:14are imagine if here if you've got a data
- 00:34:19set with just one exposure e okay and it
- 00:34:22has your only takes on the values zero
- 00:34:24and ones there's only two possible
- 00:34:26covariant patterns x is zero let me
- 00:34:29start X sorry X or exposure is equal to
- 00:34:321 and exposure is equal to zero the
- 00:34:34people the people in your data set only
- 00:34:37take on two types of X like all across
- 00:34:39all their X's they can only be exposed
- 00:34:41or unexposed if you've got four if
- 00:34:44you've got two covariance to binary
- 00:34:46covariates exposure and confounder C
- 00:34:49then there's four possible types of
- 00:34:52people you can have in your data
- 00:34:54in terms of the covariance right the
- 00:34:56people that are exposed but not but a
- 00:34:58zero for the confounder unexposed one
- 00:35:00for confounder and so forth both ones
- 00:35:02both zeros so you can have zillions of
- 00:35:05people in your data set and if this is
- 00:35:07all you measured on them this is all
- 00:35:09these are the only four types of people
- 00:35:11you really get to observe in terms of
- 00:35:12their expose their covariance their
- 00:35:14exposures they might have different Y's
- 00:35:15and the Y value there's because you have
- 00:35:18y equals 1 or 0 you can have 8 I suppose
- 00:35:21total types of people but in terms of
- 00:35:24the covariance there's only four
- 00:35:25patterns to the covariance okay
- 00:35:28so the fully parametrized model if you
- 00:35:33have a model it's a and C and its
- 00:35:35product is going to exactly predict the
- 00:35:38average y-value for all four of these
- 00:35:40groups okay so that's what it for every
- 00:35:43pattern of covariates it's going to
- 00:35:44exactly predict the average experience
- 00:35:46so if you have data with continuous
- 00:35:49variables now you can have now you can
- 00:35:52have more than just sort of a fixed
- 00:35:54number of groups right the number of
- 00:35:58coab area patterns is going to depend on
- 00:36:01how many values of the continuous
- 00:36:03variable that you have so for example if
- 00:36:06you have just a single X variable in
- 00:36:08your data set age your covariant
- 00:36:10patterns is the number of distinct
- 00:36:12values of age that you have right if you
- 00:36:14have integer years of age and the number
- 00:36:17of you know the number of years you
- 00:36:19observe as the number of covary patterns
- 00:36:21if you have if you have a continuous
- 00:36:22variable and discrete variables then you
- 00:36:25could have way more covariant patterns
- 00:36:27right you have we might have four times
- 00:36:29the number of ages in your data set or
- 00:36:31whatever it is so all that's to say that
- 00:36:35this gets a little more complicated with
- 00:36:37continuous variables that you don't have
- 00:36:39as many you have many a proliferation of
- 00:36:42the number of combinations of X's that
- 00:36:44you have or these Cove area patterns so
- 00:36:47this is going to be very important in
- 00:36:50showing the limitations of this of the
- 00:36:52deviance statistic that we're going to
- 00:36:54talk about this or when you have a
- 00:36:55continuous covariant so this is just
- 00:37:00okay so this we're going to use this
- 00:37:01terminology covariant patterns for the
- 00:37:03deviance test we're also gonna use it on
- 00:37:05Thursday for ROC curves so we're going
- 00:37:11to use the letter G to describe the
- 00:37:14total number of possible covariant
- 00:37:16patterns so G for this predictor to one
- 00:37:19predictor model is two there's only two
- 00:37:22possible covariant patterns here G is
- 00:37:24for for a covariant patterns here it's
- 00:37:29large it's as many levels of age that
- 00:37:31you have and then here if you had a
- 00:37:33binary variable then it might be 2 times
- 00:37:36G where from here great we're two of my
- 00:37:40have twice as many levels as just a
- 00:37:42model with just a Jeanette if you have a
- 00:37:43binary exposure and age you might be
- 00:37:45doubling the number of Cove area
- 00:37:47Patrick's all this is to say is that G
- 00:37:50is just representing the number of
- 00:37:52discrete types of people in your data
- 00:37:56when you have no continuous data we have
- 00:38:01no continuous X variables this is
- 00:38:02usually small usually significantly
- 00:38:04smaller than n the number of people in
- 00:38:06your data set when you have continuous
- 00:38:09data you usually have a number of
- 00:38:11covariant patterns that's almost as much
- 00:38:13as your n alright so if you have
- 00:38:15basically no two people in your data set
- 00:38:17are the same if you have a bunch of
- 00:38:19different continuous variables it's
- 00:38:20unlikely that you're gonna have too many
- 00:38:22people at the exact same intersection of
- 00:38:25covariance right that's just the more
- 00:38:27variables you have the more continuous
- 00:38:29variables you have people get more and
- 00:38:31more distinct unique unique profiles and
- 00:38:35so forth so that then usually closer to
- 00:38:37N and so we're why are we talking about
- 00:38:41this this relates to the fully
- 00:38:42parameterized model as we as I suggested
- 00:38:44earlier so the so given that you have
- 00:38:50given that you have some basic
- 00:38:52predictors you can make a fully x1 to XP
- 00:38:55and data set you can make a fully
- 00:38:58parametrize model and it always has as
- 00:39:03many terms in it as you have unique
- 00:39:07profiles ok this is so before me I
- 00:39:10talked about oh you have a single you
- 00:39:12have a single you have a single exposure
- 00:39:16you only measured one exposure so the
- 00:39:19most complicated the most complicated
- 00:39:22model you can make would have just
- 00:39:23intercept an exposure well that's also
- 00:39:26that has two terms in it that's actually
- 00:39:29the number of unique covariant patterns
- 00:39:31you only can have we just said on the
- 00:39:33earlier slide that G equals two
- 00:39:35similarly the fully parameterized model
- 00:39:38when you have two 0 1 X variables is
- 00:39:41just this this was the fully
- 00:39:43parameterized model its intercept
- 00:39:46exposure see in the interaction for
- 00:39:48terms turns out that's also equal to the
- 00:39:51number of unique covariant patterns so
- 00:39:54it turns out in general the pattern
- 00:39:56this pattern is the mode the most high
- 00:39:57order interaction model that you can
- 00:39:59make also happens to be the number of
- 00:40:01distinct covariant patterns so here if
- 00:40:04you have four if you have two binaries
- 00:40:06this is gonna be four for left four
- 00:40:08types of people that's gonna your to
- 00:40:10your models get it your interaction
- 00:40:11model is gonna have four terms in it
- 00:40:13this is just an observation you can yes
- 00:40:17question no is it a question is it
- 00:40:28possible to have the number of Kouji the
- 00:40:29covariant patterns exceed the sample
- 00:40:32size so no as you can it you can never
- 00:40:33have more intersections of them in your
- 00:40:37data than you have data of these pieces
- 00:40:40you may have some unobserved
- 00:40:42intersections right of of your X
- 00:40:45variables if you may not just not
- 00:40:46literally not have somebody who is you
- 00:40:50know a 99 year old Hispanic from Alaska
- 00:40:54with a mohawk you know or just you know
- 00:40:58ever to some very you know whatever it
- 00:41:00is some very rare intersection that may
- 00:41:02not occur in your data set but the
- 00:41:04number of observed patterns is always
- 00:41:05less than or equal to the sample size so
- 00:41:10so this is just an observation that you
- 00:41:12always it turns out that the highest
- 00:41:13rotor interaction models as many as
- 00:41:15always has as many terms as the number
- 00:41:17of covariant patterns and what happens
- 00:41:22in here is we're gonna fit model we're
- 00:41:24gonna fit the we're gonna fit the model
- 00:41:27with the two we're gonna fit a model
- 00:41:29that is our fully parameterized and this
- 00:41:33model as we said is essentially it's not
- 00:41:37doesn't predict the zero one outcomes
- 00:41:39perfectly the fully parameterized model
- 00:41:41always predicts the observed average
- 00:41:44probabilities and the data perfectly and
- 00:41:46that's actually pretty good
- 00:41:48that's actually almost as good as
- 00:41:51predicting the probabilities itself if
- 00:41:52you can on average predict the probably
- 00:41:55the probability is Bhakthi if you could
- 00:41:58predict the observe zero one you would
- 00:42:00like to predict the observed zero ones
- 00:42:02themselves but instead the fully
- 00:42:04parameterized model predicts the
- 00:42:06probabilities themselves and that still
- 00:42:08really
- 00:42:08be good so all we're going to sort of
- 00:42:11walk through this and make a test
- 00:42:13statistic out of it and talk about it so
- 00:42:16here's here's the data that we showed
- 00:42:17before just it's got exposure 1 0 c1 0
- 00:42:2340 observations if I fit this model
- 00:42:28exposure C and then e times C that's
- 00:42:32this four terms it's going to intercept
- 00:42:36in it we get something that that looks
- 00:42:39like this okay we're going to get you
- 00:42:42can get some odds ratios you can get
- 00:42:45some predicted probabilities so here you
- 00:42:49get or you here this is just from the
- 00:42:512x2 table even these are the odds ratios
- 00:42:53here's that the predicted probabilities
- 00:42:55and the columns so here this is 0.6
- 00:42:57point for what we saw earlier in this
- 00:43:00table it's point three and point seven
- 00:43:03anyhow SAS is going to spit out some
- 00:43:06stuff of interest this thing 51 point
- 00:43:10three five five is the negative to log
- 00:43:12likelihood of your model you can use
- 00:43:14this for the likelihood ratio test you
- 00:43:16can also use it to compute the deviance
- 00:43:18from the saturated model because this
- 00:43:19divided by one is that deviance
- 00:43:22statistic but we said it's not very
- 00:43:23useful then SAS if you can spit out this
- 00:43:27table that says deviance zero zero zero
- 00:43:30zero it's not very useful okay what is
- 00:43:36this about give me a moment we'll talk
- 00:43:38about it
- 00:43:39so what's interesting about this model
- 00:43:42it's not a saturated model right it'll
- 00:43:45be between no it's not does not have
- 00:43:46forty terms in it this is 40
- 00:43:48observations doesn't have 40 terms it
- 00:43:50has its G equals four there's only four
- 00:43:52covariant patterns to it all right as we
- 00:43:55talked about there's only four levels of
- 00:43:57C and E okay so this thing is what we're
- 00:44:01gonna call them this is the fully
- 00:44:02parametrize model okay this is this is
- 00:44:05it has as many pots as all possible
- 00:44:07terms that it could have given the
- 00:44:10covariance that we have and W that we
- 00:44:12observed okay here's another model that
- 00:44:17we fed this model is just e and C no
- 00:44:21interaction term
- 00:44:22okay suddenly there's information here
- 00:44:26three point six nine six one this is the
- 00:44:31deviance statistic computed for our
- 00:44:34model using the negative two log
- 00:44:36likelihood for our model the numerator
- 00:44:37but in the denominator is the D via is
- 00:44:41the likelihood statistic from the fully
- 00:44:43parameterized model okay it's comparing
- 00:44:46this model that does not have an
- 00:44:49interaction term to the model that does
- 00:44:52have the interact all possible
- 00:44:54interaction terms turns out in this case
- 00:44:56it's sort of a plausible model this may
- 00:44:58might actually be interesting to us if
- 00:45:00you have ten covariates this is going to
- 00:45:03be a model that has a 10-way interaction
- 00:45:059y interaction terms and all sorts of
- 00:45:07crazy stuff and you'd rather not fit
- 00:45:09that yourself
- 00:45:11so what SAS is gonna do is it'll compare
- 00:45:13your model to the most interaction heavy
- 00:45:16model possible and it's going to give
- 00:45:18you this thing called the deviance week
- 00:45:20we call it the eat the deviance ET
- 00:45:22because it has to do with avenged trials
- 00:45:23data which we could go into but i but
- 00:45:27we've sort of i have i think we may have
- 00:45:29a slide on it at the end it's sort of
- 00:45:31supplementary but this is a this we call
- 00:45:33this the deviance ET and this is the
- 00:45:35deviance statistic that compares your
- 00:45:37model to the fully parametrized model
- 00:45:39not the saturated model because the
- 00:45:41saturated model always has a denominator
- 00:45:43of 1 for the that deviance statistic and
- 00:45:47this thing is essentially a likelihood
- 00:45:51ratio statistic it's comparing your
- 00:45:53models specifically to the fully prepped
- 00:45:55to the likelihood of the fully
- 00:45:57parameterized model so it's a very very
- 00:45:59specific likelihood ratio your model to
- 00:46:02a very not to any model but to the fully
- 00:46:04parametrize one and this is because the
- 00:46:06fully parameterized model always always
- 00:46:10always predicts the observed
- 00:46:12probabilities in your data set so the
- 00:46:16fully parameterized model is always
- 00:46:17predicting these point sixes and point
- 00:46:19fours and so forth your given model may
- 00:46:21not and it's comparing your model to the
- 00:46:25fully parameterized model so this this
- 00:46:27so this sort of makes sense that if you
- 00:46:29zoom back out for a moment
- 00:46:30we said on in one way that the saturated
- 00:46:33model is sort of as good as it gets if
- 00:46:35you've put
- 00:46:36a hundred terms in your model that sort
- 00:46:38of fits it and you have a hundred data
- 00:46:39points that fits your model exactly
- 00:46:41another way to think of the best model
- 00:46:44is the model that has as many terms in
- 00:46:46it as you measured things on in your in
- 00:46:49your data set if I measured 10 things in
- 00:46:50my questionnaire the best model the the
- 00:46:54most descriptive model you could fit
- 00:46:55would be the one that put all 10
- 00:46:57questions in your questionnaire and
- 00:46:58they're all their interaction terms and
- 00:47:00every single every single term that you
- 00:47:03could put in your model you threw it in
- 00:47:05the date in there it's an ugly model but
- 00:47:07that's sort of the best model that you
- 00:47:09could have come up with from the stuff
- 00:47:10you measured that makes that sort of the
- 00:47:12intuitive explanation right you measure
- 00:47:1410 things in your study you put all of
- 00:47:15them in the model and all their
- 00:47:16combinations and that's your best model
- 00:47:19for describing the data it's a totally
- 00:47:21ugly and useless model because it has a
- 00:47:2310-way interaction term and in nine ways
- 00:47:25and eight ways and all of that and it's
- 00:47:27actually a terrible model but it's a
- 00:47:29very good model in terms of predicting
- 00:47:31the most in your data set right it's it
- 00:47:34predicts using what you've gathered the
- 00:47:37outcome as best as you can from what you
- 00:47:40measured and what we're doing here is
- 00:47:42peeling back and saying well I got oh
- 00:47:44hey I got a model that has only two
- 00:47:45terms in it can I please compare my
- 00:47:48model with two terms to the ten to the
- 00:47:50ten term 10-way interaction sort of
- 00:47:53goliath model and are those two
- 00:47:55meaningfully different and that's what
- 00:47:57this deviance statistic is so we're
- 00:48:00comparing to sort of a different ideal
- 00:48:01which is the most full model that you
- 00:48:04can make given the stuff that you
- 00:48:06measured okay a mess is what if this is
- 00:48:08what this is what it formally is this
- 00:48:13deviance statistic is a chi-square
- 00:48:15statistic and the number the degrees of
- 00:48:18freedom is the number of terms dropped
- 00:48:21comparing your turn your model of
- 00:48:23interest to the fully parametrized model
- 00:48:25so in our case it was just one over here
- 00:48:29because it was East we have a model with
- 00:48:30E and C versus E&C C and then the
- 00:48:33interaction terms being a drop one term
- 00:48:36and therefore it has a degree of freedom
- 00:48:38of 1 if you had 10 terms in your model
- 00:48:42then this fully parametrized model has a
- 00:48:45ton of terms in it it's got the 1000
- 00:48:46interaction it's got
- 00:48:48all the nine ways all the eight ways all
- 00:48:51the seven way interactions down to all
- 00:48:52of the may the ten single main effects
- 00:48:54it's a huge model the ton of terms in it
- 00:48:56you'd rather not calculate that yourself
- 00:48:58so SAS actually spits out the degrees of
- 00:49:00freedom right over here okay so the
- 00:49:04number of terms drop from your model and
- 00:49:06that this is very so this is essentially
- 00:49:09a likelihood ratio test but it's a very
- 00:49:10specific test of your test the Met to
- 00:49:13the model that has the most terms in it
- 00:49:15so and you can sort of eyeball it that
- 00:49:18there's the ratio of the deviance over
- 00:49:20degrees of freedom is about one then you
- 00:49:22spend that actually means you have and
- 00:49:25you have okay fed that the null
- 00:49:27hypothesis of good fit stands and sort
- 00:49:31of a non significant result so I don't
- 00:49:33you see you can sort of look at this
- 00:49:34ratio and eyeball it so there's a big
- 00:49:38caveat here to this model here we go
- 00:49:42involves dogs
- 00:49:43I like Google image search so here we
- 00:49:46are we have dogs this only this only
- 00:49:50works if you have a reasonable number of
- 00:49:54covariant patterns so if you've got the
- 00:49:58number of covariant patterns
- 00:50:00that's very large you have a continuous
- 00:50:03variable we talked about you have age
- 00:50:05and you have now as like as many people
- 00:50:06herbage you have is number you have as
- 00:50:08many unique patterns of covariance as
- 00:50:10you have people in your data set
- 00:50:11essentially then what happens is your
- 00:50:14fully parameterized model looks like the
- 00:50:17saturated model right if you have as
- 00:50:20many covariant
- 00:50:21patterns as people it sort of means that
- 00:50:23the most parameterised model has as many
- 00:50:26terms as people if it's not that sounded
- 00:50:28really circular but as you get more and
- 00:50:29more covariant patterns you start
- 00:50:31fitting what essentially becomes a
- 00:50:33saturated model and we said that doesn't
- 00:50:34work so what happens is this test only
- 00:50:37works if the number of G covariant
- 00:50:40patterns is pretty small not like not
- 00:50:42you don't have lots of unique
- 00:50:44intersections of continuous variables
- 00:50:46and you have as many covariant patterns
- 00:50:48as people okay so this only only works
- 00:50:52if you have essentially a handful or not
- 00:50:56too many sort of categorical zero one or
- 00:51:00zero one two three sort of X variables
- 00:51:02if you you if you would basically have
- 00:51:05as many covariant patterns as people and
- 00:51:08you can do this because essentially you
- 00:51:10wind up getting the Satch for this this
- 00:51:12test that doesn't work which is the one
- 00:51:14that's based on the saturated model okay
- 00:51:16so that's that's a problem when we have
- 00:51:21continuous data so when this doesn't
- 00:51:23when this doesn't when this doesn't work
- 00:51:24we don't use this test statistic based
- 00:51:26on the essentially a likelihood ratio
- 00:51:28test versus the fully parameterized
- 00:51:30model we do something called the hosmer
- 00:51:32lemma show test which we're gonna start
- 00:51:34or we are not gonna finish so we're just
- 00:51:37what we can pick it up at the beginning
- 00:51:38of next time but the Hosmer lemma show
- 00:51:41test I'll just say what it does and give
- 00:51:43you a sense of it it's it's actually way
- 00:51:46more user-friendly it's gonna make so
- 00:51:48much sense you're gonna it's gonna make
- 00:51:50a lot more sense than the last thing we
- 00:51:52just talked about which will make more
- 00:51:54sense when you review it and and feel
- 00:51:57free to ask questions of me the the D
- 00:52:01okay so the cosmo lemma show test also
- 00:52:04measures how close we are to the average
- 00:52:07predicted probabilities in our data set
- 00:52:10so it's good to say hey does your model
- 00:52:12predict the average experience of people
- 00:52:14at these covariants well not the exact
- 00:52:16as you are ones of people but the
- 00:52:18average experiences of people with
- 00:52:19giving covariant patterns okay
- 00:52:22it has nothing to do with deviance so
- 00:52:24this is good because it's going to not
- 00:52:26be limited by are the ways that our
- 00:52:27deviance tests were and it's really
- 00:52:30great when you have many many many Cove
- 00:52:32area patterns so if you have if you have
- 00:52:34all these unique people in your dataset
- 00:52:35it's gonna be great when you when you
- 00:52:40have that many Cove area patterns and it
- 00:52:42actually thrives when you have a lot so
- 00:52:44it actually is it's not a very powerful
- 00:52:46test when you have very few co-vary
- 00:52:48patterns if you have just two variables
- 00:52:51that are zero one and there's four
- 00:52:52covariant patterns the test
- 00:52:54like never shows significance it works
- 00:52:57very well where the other test leaves
- 00:52:58off and I'll just I'm just going to
- 00:53:01throw throw this on the screen here very
- 00:53:05very quickly and then we'll pick it up
- 00:53:07next time the example that we're going
- 00:53:08to use is sort of a a old
- 00:53:13or of a classic example used in many
- 00:53:15pedagogical examples from Evans County
- 00:53:18it's a coronary heart disease study and
- 00:53:20here are some here's some model here and
- 00:53:22there's a matter what model we just want
- 00:53:24to know about it what does this model
- 00:53:26fit the data well and here it's got a
- 00:53:28bunch of variables it's got age and we
- 00:53:31expect that if because as age let's say
- 00:53:33age is measured you know as it's
- 00:53:35continuous it's going to have a lot of
- 00:53:37Cabiria patterns and maybe as many
- 00:53:38unique people unique patterns of
- 00:53:41covariates as you have people and we're
- 00:53:43not going to want to use that deviant
- 00:53:44statistic we're going to want to use the
- 00:53:46Hosmer lemma so test and basically what
- 00:53:49it does is it makes a table that looks
- 00:53:52something like this
- 00:53:53it divides your data into into groups
- 00:53:56into deciles of Ritt of what spells
- 00:53:58deciles of risk and we'll talk about it
- 00:54:01we'll talk about what that means but
- 00:54:04basically it takes your model and
- 00:54:05predicts for everybody they're probably
- 00:54:07their personal probability of disease
- 00:54:09it's like hey you you're you're 18
- 00:54:11you're you know this is this is your
- 00:54:15this is your cholesterol level and this
- 00:54:17is your Kobe reott I predict you know
- 00:54:19through the machine I predict your
- 00:54:21probability of heart disease is point
- 00:54:22two okay
- 00:54:23I only observed a zero one right I'm
- 00:54:26that person in the data set I only had a
- 00:54:29zero one but the model says hey your
- 00:54:31probabilities point two that's how the
- 00:54:32logistic regression works right you have
- 00:54:34a zero one it predicts the point two but
- 00:54:37what it does is it summarizes the entire
- 00:54:39data set here in this data set there's
- 00:54:41you know basically 10 times 61 so hunts
- 00:54:44609 people or in this data set do
- 00:54:47essentially it took these this
- 00:54:49represents the entire data set and it
- 00:54:51looks at how well the some of those
- 00:54:54predictions compared to the some of the
- 00:54:56observed events so what this means is
- 00:54:58for example there were two there were 61
- 00:55:01people in this lowest decile and then
- 00:55:04the 61 people only two of them had all
- 00:55:06had heart disease and I took this now
- 00:55:09with it then I took the logistic
- 00:55:11regression model and I sum I ran for
- 00:55:15each of the 61 people I predicted their
- 00:55:18personal probability of disease out of
- 00:55:20it out of the regression model they sum
- 00:55:21those predictions up and they're gonna
- 00:55:23be tiny because these are low risk
- 00:55:24people there was only two only two of
- 00:55:26them at heart
- 00:55:27anyway and the sum of all of those 61
- 00:55:30predictions was 0.94 okay edek so that
- 00:55:33is the sum of everyone's prediction and
- 00:55:35you do this for every deaf style and you
- 00:55:38say hey was the sum of all those
- 00:55:39predictions different from the observed
- 00:55:41so here I saw two and I predicted 0.94
- 00:55:45in this group of 61 people here this is
- 00:55:47the high-risk group and of the 60 people
- 00:55:5129 had heart disease in reality and the
- 00:55:53sum of the 60 predictions for this
- 00:55:56people was almost 29 so you might say
- 00:55:59hey that's pretty good
- 00:56:00on average when I split up the data into
- 00:56:03this pretty 10 row table via the total
- 00:56:07of the observed events was very similar
- 00:56:09the total of what the model predicted
- 00:56:12for this very very same people okay this
- 00:56:14is the Hosmer lemosho test it's gonna
- 00:56:16look at the difference between these two
- 00:56:18and if your model sort of nailing it
- 00:56:20every time or on average you know hey
- 00:56:23look the model said you know the model
- 00:56:26says 29 and I observe 29 that's pretty
- 00:56:28good here I'm only off by one on average
- 00:56:31you have two versus 0.9 for one versus
- 00:56:34two 1.96 that's sort of - those are sort
- 00:56:37of small deviations maybe and so all
- 00:56:40this is doing it's just simple it's a
- 00:56:41chi-square test of this table that's
- 00:56:43just hey on average how does the model
- 00:56:45do relative to reality this is so much
- 00:56:48more user friendly than the deviant
- 00:56:49statistics and this is the basis of the
- 00:56:51hospital MSO test which we'll talk about
- 00:56:53on Thursday so thanks
- 00:56:56[Music]
- ajustement des modèles
- modèle saturé
- statistiques de déviance
- régression logistique
- test de Hosmer-Lemeshow
- covariables
- analyse des données
- vraisemblance
- interactions
- modèles statistiques