00:00:00
if you're trying to take the data from
00:00:02
the front end of a website there's a
00:00:04
good chance that you're going to be
00:00:05
doing it wrong and you're not going to
00:00:06
get what you need
00:00:08
modern websites are made up of a
00:00:10
front-end and a back-end system and it's
00:00:12
the back-end that has all the
00:00:14
information all the data on it that we
00:00:16
want so why would we make a request to
00:00:18
the front end when it's the back end
00:00:20
that's actually got the data well to
00:00:22
work this out and understand we need to
00:00:23
talk a little bit about how a modern
00:00:25
website works including using cores
00:00:27
which is the cross origin resource
00:00:30
sharing so the front-end website that we
00:00:32
load up in our browser is pretty much
00:00:34
always javascript whichever framework is
00:00:37
most popular at the time probably and
00:00:39
what that does is you go to the page and
00:00:41
it will use something like ajax with
00:00:43
axios or something like that that will
00:00:44
make a request to an endpoint on the
00:00:46
back end of the on the back end website
00:00:49
which will be completely separate that
00:00:51
will then send that data to the front
00:00:53
end so then it will be displayed and
00:00:55
rendered properly so for for us the end
00:00:57
user so what we want to do is we want to
00:01:00
be able to go straight to the back end
00:01:01
and get the data but you see it's not
00:01:03
going to allow us to do that unless we
00:01:06
pretend that we are coming through the
00:01:07
front end of this at front end through
00:01:09
cause which is generally going to
00:01:11
involve a cookie so what i'm going to do
00:01:13
is i'm going to walk you through an
00:01:15
example that i've done here i'll just
00:01:16
show you the code now
00:01:18
and i'm going to tell you about why i've
00:01:20
made some of these decisions what they
00:01:23
mean and also how you can take a cookie
00:01:26
from loading up a headless chrome using
00:01:29
something like playwright playwright in
00:01:31
this case and then we can send it to
00:01:33
requests so we can actually get a new
00:01:36
cookie every time that we want to do
00:01:38
this because cookies do expire before we
00:01:41
get to that today's video is sponsored
00:01:43
by skillshare skillshare is an online
00:01:46
learning community with thousands of
00:01:47
classes ready to help you explore your
00:01:49
creativity and inspire you if you have a
00:01:51
specific skill you're trying to learn or
00:01:54
maybe you're like me and you like to
00:01:55
utilize the breadth and depth of classes
00:01:58
to help you with the other parts of
00:02:00
personal growth to support your site
00:02:02
projects this week i've been watching
00:02:04
creativity unleashed discover hone and
00:02:07
share your voice online by nathaniel
00:02:09
drew nathaniel is a youtuber who i am
00:02:11
very familiar with having followed
00:02:13
online for several years now and i was
00:02:15
very excited to take his class i believe
00:02:17
there's great value had to be had in
00:02:19
watching and learning from someone who's
00:02:21
out there creating and making stuff
00:02:23
every day and this was exactly that so
00:02:25
the first 1000 people to use the link in
00:02:28
the description below or my code john
00:02:30
watson rooney will get one month free
00:02:33
access to skillshare so once again click
00:02:35
that link in the description below or my
00:02:37
code john watson rooney and thank you to
00:02:39
skillshare for sponsoring this episode
00:02:41
so let's move over to the actual website
00:02:44
which i've got here and you'll see that
00:02:46
when you load this up for the first time
00:02:47
especially in private browsing it tells
00:02:50
you you need to accept cookies and this
00:02:52
is very common and this is exactly what
00:02:54
we need to do so i'm going to hit accept
00:02:56
all it's going to load up the page and
00:02:58
it's going to have all the information
00:02:59
on now you'll see that here here's the
00:03:01
list and it's all done in a nice fancy
00:03:03
way so you click on it and it loads up
00:03:05
more stuff etc etc we're all familiar
00:03:07
with how these websites work what i'm
00:03:09
going to do is we're going to go to the
00:03:10
inspect element tool and go to the
00:03:12
network tab try and make this a bit
00:03:14
bigger hit reload and we're going to see
00:03:17
that the front end is making requests to
00:03:19
the back end for the actual information
00:03:22
there's quite a few here but what i'm
00:03:24
going to show you is the page data let's
00:03:27
move this out of the way here
00:03:30
move it
00:03:31
so you can see that in this one we have
00:03:34
these specific headers that we are
00:03:37
requesting with our request headers and
00:03:39
the response headers these ones up here
00:03:41
and we can see that the actual response
00:03:43
and even though in this case has been
00:03:45
truncated and i'll come back to that
00:03:47
actually has the information from the
00:03:50
website that we are after
00:03:52
so what we want to do is we want to just
00:03:54
make this request ourselves
00:03:56
but it's not that simple because we need
00:03:59
to obey the rules of the cause across
00:04:01
origin resource sharing so we need to
00:04:04
have a cookie so we can actually
00:04:06
mimic this and be a part of this now in
00:04:08
my previous videos if you've watched any
00:04:10
of those i've said just copy this copy
00:04:13
it as curl and we'll use postman or
00:04:15
insomnia and that's great and that works
00:04:18
but when you actually get to the point
00:04:19
where you need a new cookie you have to
00:04:21
make a new request
00:04:23
what i did is i did copy as curl and i
00:04:26
opened up insomnia which i've got here
00:04:28
and what i've done is i've just been
00:04:30
through the header section this is the
00:04:32
request and i've ticked out all of the
00:04:34
ones that i don't think that we need
00:04:36
except for the cookie and when i run
00:04:38
this it will take a second because
00:04:41
as i said this response is quite big on
00:04:43
the opposite side which is just hidden
00:04:45
by my head let's move that out of the
00:04:47
way you'll see that we get this neat
00:04:50
json data with all of the information
00:04:53
that we could possibly want now this is
00:04:54
the information that the back end is
00:04:56
sent to the front end part of the
00:04:58
website which has rendered all nice and
00:05:00
neat
00:05:01
in here to show us this
00:05:04
and you can actually click through and
00:05:06
every time you click on a person's name
00:05:08
it makes a new request and this is its
00:05:10
own endpoint but we're still using the
00:05:12
same cookie so if you wanted to do that
00:05:14
you could actually expand on this and
00:05:16
get the information from each one of
00:05:17
these as well so let's go back to our
00:05:21
insomnia or postman or whatever you're
00:05:23
using if i untick the cookie and tick
00:05:26
everything else
00:05:27
for example so we just have the
00:05:30
we don't send the cookie so you can see
00:05:32
here's our cause and everything like
00:05:33
that it's basically all of the
00:05:35
information that's being sent over if we
00:05:37
send this we get this blank page and
00:05:40
that is basically the response is
00:05:43
there'll be some javascript in here
00:05:44
which insomnia is not loading up telling
00:05:46
us that we need to have a cookie or need
00:05:48
to accept the cookie or something
00:05:49
similar okay so let's unselect all of
00:05:51
these again
00:05:54
to do click the cookie back on
00:05:58
and then run this now
00:06:01
we're gonna get all the information back
00:06:04
so this is the main header that's the
00:06:05
most important one this is what's
00:06:07
identifying us what i like to do from
00:06:09
here is to use i uh my
00:06:12
api tool to actually generate some code
00:06:15
for me you can see here because i've
00:06:17
only got the cookie header
00:06:21
selected that's the one that's come back
00:06:23
out and this is the one that we need so
00:06:25
as i said before
00:06:27
we could just use this code here exactly
00:06:29
and paste it into vs code or whatever
00:06:31
and this would give us that json data
00:06:34
but as soon as this cookie expires and
00:06:35
that's different for different websites
00:06:38
this will no longer work so we needed to
00:06:40
make it more repeatable and that's where
00:06:41
we're going to use playwright to load a
00:06:43
browser up
00:06:45
so if we go back to our code you'll see
00:06:48
here that i'm using playwrights to load
00:06:51
up my chromium browser and i'm asking
00:06:53
for the context because the context is
00:06:55
where the
00:06:56
cookie information is so if we come back
00:06:59
to one of my working files so this is
00:07:01
just the playwright part let's move this
00:07:03
over here
00:07:04
and i print out the cookie context from
00:07:07
from playwright
00:07:08
you'll see that it loaded the browser up
00:07:10
and that's because we needed to do that
00:07:12
and i've got this in headless is true
00:07:14
it's false at the moment so i could see
00:07:16
what's going on but you'll see that we
00:07:18
get this dictionary back with all the
00:07:19
cookies with all the headers rather and
00:07:21
this is the one that we were interested
00:07:23
in
00:07:24
and this should be very similar to the
00:07:26
one i was parting off into requests so
00:07:29
we want to take this out and then move
00:07:31
it into requests but why i wanted to do
00:07:34
that was because of the actual size of
00:07:38
the json response that i was getting so
00:07:40
if you're trying to do this on a
00:07:41
different site and the actual response
00:07:44
that you're after for json is not that
00:07:47
big you could just stop right here and
00:07:49
then get the response.json
00:07:52
but because the actual json file that
00:07:54
we're getting back from this website has
00:07:56
so much information you can see it's
00:07:58
super long
00:07:59
it was too big and it was causing my
00:08:02
playwright to fail
00:08:04
but that led me on to pushing the cookie
00:08:06
into requests which i think is quite
00:08:08
valuable
00:08:09
so we can go back to it here and we can
00:08:11
see then
00:08:13
i'm taking the cookie for requests and
00:08:15
the cookie context
00:08:17
number three which was the third
00:08:19
index of the list we're grabbing the
00:08:21
value and taking the code from what
00:08:24
our um insomnia had generated we can see
00:08:28
that the cookie is in this format here
00:08:30
and this is specific to requests on how
00:08:32
it's going to be sent over they're just
00:08:34
formatted slightly differently so all i
00:08:36
did was copy this
00:08:38
into here and then used an f string
00:08:42
to add in the actual
00:08:45
cookie part v with all the information
00:08:47
that i was getting back from
00:08:49
playwright and that means that we can
00:08:51
then use the same cookie and we could
00:08:53
have a session in here if we were going
00:08:55
to
00:08:56
want to make the other requests like i
00:08:59
showed you uh down here these ones with
00:09:01
all the extra specific information
00:09:04
we would use a request session to use
00:09:07
the cookie the same cookie over and over
00:09:09
again
00:09:10
from here it was just a case of then
00:09:13
printing out the json and i've
00:09:15
specifically indexed it down here
00:09:18
this is actually all the information so
00:09:20
what i liked about this was using
00:09:22
playwright to do one thing grab me the
00:09:24
cookie and then pass it off onto
00:09:26
requests to then
00:09:29
use it so we could actually make that
00:09:31
request so if we didn't have the cookie
00:09:33
to send through with requests our
00:09:35
request would be failed like i showed
00:09:37
you when we were doing it in insomnia so
00:09:39
i'm going to put this code in the
00:09:40
description down below for you to have a
00:09:42
look at and have a play with what i was
00:09:45
trying to show you here is that if
00:09:47
you're trying to get data from a website
00:09:49
and you're getting it trying to grab it
00:09:51
from the front end and it's a modern
00:09:52
website you really want to try to put
00:09:55
your efforts into grabbing it from the
00:09:57
back end directly
00:10:00
using the cookie that you can grab this
00:10:02
way or from the actual request you made
00:10:04
in your browser initially if that works
00:10:07
for you
00:10:08
if you've enjoyed this video i think
00:10:10
you're going to like this one here which
00:10:11
goes into this method in a slightly
00:10:13
different way but more in-depth coding
00:10:16
it out so that might be more useful to
00:10:18
some of you