00:00:00
[Music]
00:00:10
in this first section on http
00:00:12
we're going to start off with just a
00:00:14
very quick overview of the web and http
00:00:17
then we're going to dive down into the
00:00:18
two types of http connections
00:00:21
persistent and non-persistent
00:00:23
connections and
00:00:24
look at the major messages used by http
00:00:27
the request message and the response
00:00:29
message and then finally we'll wrap up
00:00:31
this first
00:00:32
section by taking a look at what are
00:00:34
known as cookies a ways for
00:00:36
state to persist at a server between one
00:00:39
client connection
00:00:40
and another client connection now this
00:00:43
is just the first
00:00:44
of two parts that we're going to cover
00:00:46
on http
00:00:47
but we've got a lot to cover here so
00:00:49
let's get started
00:00:51
well let's begin our discussion of the
00:00:53
web and http with just a very quick
00:00:55
review to set the stage
00:00:57
remember that a web page consists of a
00:01:00
base
00:01:00
html file as well as a set of referenced
00:01:04
objects and each of these objects can be
00:01:06
stored on different web servers
00:01:08
an object can be an html file jpeg image
00:01:11
a java apple
00:01:12
audio file or many other things and the
00:01:15
web page itself as well as the reference
00:01:17
objects are each addressable by what's
00:01:19
known as a url
00:01:20
there's a host name associated with it
00:01:22
and then on that host there's a path
00:01:25
name piece to it
00:01:27
well let's get started with our study of
00:01:29
http and the very first thing to know
00:01:32
is that http adopts the client server
00:01:34
model that we just studied in section
00:01:36
2.1
00:01:38
a client might be a web browser like
00:01:41
firefox or safari or
00:01:42
edge or it might actually be embedded in
00:01:44
a device and you might not even see it
00:01:46
a server might be a traditional web
00:01:49
server that
00:01:50
only serves web pages or a more general
00:01:52
purpose server
00:01:53
like gaia.cs.umass.edu which provides a
00:01:56
number of services
00:01:59
in this animation here we see on the top
00:02:01
pc running the firefox browser
00:02:04
making an http request to a server and
00:02:07
getting a response
00:02:08
back at the bottom we see an iphone
00:02:10
running safari browser also making an
00:02:12
http request
00:02:14
and getting an http response back both
00:02:16
of those browsers are speaking the http
00:02:19
protocol with the apache web server
00:02:23
http uses the transport services
00:02:25
provided by the tcp
00:02:27
protocol and here's how an http
00:02:29
transaction works
00:02:31
the http client your web browser for
00:02:33
example
00:02:34
opens a tcp connection to a web server
00:02:37
using port 80. we'll learn about port
00:02:39
numbers in just a second
00:02:40
one or more messages http messages are
00:02:43
then exchanged between the client and
00:02:45
the server
00:02:46
and then the tcp connection is closed
00:02:49
http is a stateless protocol
00:02:52
this means that the server maintains no
00:02:54
internal state about the ongoing request
00:02:56
there's a single request for an object
00:02:58
and a single reply that's it
00:03:00
there's no worrying about something like
00:03:02
well this transaction has five steps
00:03:04
we're now in step three
00:03:06
and if i fail i'm going to need to roll
00:03:08
back my state to what it was
00:03:10
before this five step transaction
00:03:11
started none of that with http
00:03:13
it's stateless and you might ask why
00:03:15
well because of its simplicity
00:03:18
as we'll see protocols that maintain
00:03:20
state will need to deal with clean up
00:03:21
problems when a multi-step transaction
00:03:23
fails
00:03:24
returning to the initial state and
00:03:26
resolving inconsistent state
00:03:29
there are two types of http connections
00:03:32
what are called persistent connections
00:03:34
and non-persistent connections and it's
00:03:36
important to remember
00:03:38
these http connections between a browser
00:03:41
and a server are different than the tcp
00:03:44
connection which is provided at the
00:03:46
transport layer underneath
00:03:48
in non-persistent http a tcp connection
00:03:51
is opened
00:03:52
at most one object is then sent over the
00:03:54
tcp connection
00:03:56
and the tcp connection is closed
00:03:58
downloading multiple
00:04:00
objects is going to require multiple tcp
00:04:02
connections to be established
00:04:04
and we'll see that it takes a round-trip
00:04:06
time first to open the tcp connection
00:04:08
and then another rtt
00:04:10
to make the request and receive the
00:04:11
response
00:04:13
in persistent http a tcp connection is
00:04:17
again open to the server
00:04:18
but this time multiple objects can be
00:04:21
transferred
00:04:22
serially over the single tcp connection
00:04:24
between the client
00:04:26
and the server once those multiple
00:04:28
objects have been requested and returned
00:04:30
the tcp connection can be closed
00:04:32
persistent http corresponds to http
00:04:36
version 1.1 which is probably the most
00:04:39
common form of http in use today
00:04:43
here's an example of non-persistent http
00:04:46
and operation
00:04:47
let's assume that the user in step 1a
00:04:49
here enters a url
00:04:53
www.sumschool.edu
00:04:54
and asks for a web page that contains
00:04:56
text as well as references to
00:04:58
10 jpeg image in that base html file
00:05:02
so in step 1a we see the client
00:05:04
initiating the tcp connection to the
00:05:06
http server
00:05:08
at port 80 at www.sumschool.edu
00:05:12
in step 1b the http server at the host
00:05:16
which has been waiting for a tcp
00:05:17
connection at port 80 accepts this tcp
00:05:20
connection
00:05:21
and notifies the client and notice in
00:05:23
steps one a and one b
00:05:25
no http requests have actually flowed
00:05:28
yet
00:05:28
this happens in step two in step two the
00:05:31
http client
00:05:32
sends the http request message into the
00:05:35
tcp connection that's just been
00:05:37
established
00:05:38
this http message indicates that the
00:05:41
client wants to receive
00:05:43
this base html file in step 3 the
00:05:46
servers now receive this http request
00:05:49
message
00:05:50
and forms the response message that
00:05:52
contains the requested object
00:05:54
and sends this message back to the http
00:05:57
client
00:05:58
after sending the response message in
00:05:59
step 4 the http server closes the tcp
00:06:02
connection
00:06:03
in step 5 the http client receives the
00:06:06
response message containing the html
00:06:08
file
00:06:09
displays the html file parses it finds
00:06:12
the 10 reference
00:06:13
jpeg objects and in step 6 now
00:06:16
it's going to have to repeat the
00:06:18
preceding five steps
00:06:20
for each of the reference jpeg objects
00:06:24
now we can take a look at the response
00:06:26
time for non-persistent http
00:06:28
we can define the response time as the
00:06:30
amount of time from when a user
00:06:32
first enters a url into a browser
00:06:35
until that base html file is received
00:06:38
and displayed
00:06:39
so let's define the rtt the round trip
00:06:42
time as the amount of time
00:06:44
needed for a very small packet to travel
00:06:46
from the client
00:06:47
to the server and back and so we can see
00:06:50
that the non-persistent
00:06:52
http response time per object has the
00:06:55
following components
00:06:56
well one rtt is needed to initiate the
00:06:59
tcp connection
00:07:01
and another rtt a second rtt is needed
00:07:04
for the http request to be transmitted
00:07:06
and received
00:07:07
and for the first bytes of the http
00:07:10
response
00:07:10
to be returned and then finally there's
00:07:13
the amount of time
00:07:14
needed for the server to actually
00:07:16
transmit the file
00:07:17
into its internet connection
00:07:21
and so overall the non-persistent http
00:07:24
response time
00:07:26
is two rtt plus whatever amount of time
00:07:28
is needed to transmit the file
00:07:32
so we've seen that two rtts are needed
00:07:34
to
00:07:35
fetch a web object now multiple objects
00:07:38
can often be retrieved in parallel but
00:07:40
still
00:07:41
two rtts is well two rtts and you know
00:07:44
we really want to get our information
00:07:46
as fast as possible so it's possible
00:07:49
using a simple technique to actually
00:07:51
cut this latency from two rtts to one
00:07:55
rtt
00:07:56
and that's to use a technique known as
00:07:57
persistent connection
00:07:59
and this is the way that most web
00:08:01
servers are now operating
00:08:03
in persistent http http 1.1
00:08:06
the server leaves the connection open
00:08:08
after sending the response
00:08:11
subsequent http messages between the
00:08:13
same client and server can then be sent
00:08:16
over this open connection without having
00:08:18
to wait that
00:08:19
rtt to establish a new tcp connection
00:08:23
when a client has a new request to send
00:08:25
it sends it as soon as it encounters
00:08:27
a referenced object and so we can see
00:08:30
here how persistent http cuts the
00:08:32
response time
00:08:33
in half to one rdt
00:08:37
now that we've looked at the two styles
00:08:39
of http connections we can dive down
00:08:41
into the details
00:08:42
of the http messages themselves and
00:08:46
remember
00:08:46
way back in section 1.1 we said that a
00:08:49
protocol defines the format and the
00:08:52
order
00:08:53
of messages sent and received among
00:08:55
network entities
00:08:56
and the actions taken when these
00:08:58
messages are sent and received
00:09:00
let's take a look at the http protocol
00:09:02
and let's take a look at its messages
00:09:05
there are two types of http messages
00:09:08
request messages and response messages
00:09:10
let's take a look at a request message
00:09:12
first a request message
00:09:14
starts with a single request line with
00:09:16
that line beginning with a method name
00:09:18
for example
00:09:19
get as shown here a url in this case the
00:09:23
name of the html file being requested
00:09:25
the version of http in this case 1.1
00:09:29
and a new line that is a carriage return
00:09:31
line feed
00:09:32
the single request line is then followed
00:09:34
by a number of header lines that provide
00:09:36
additional information
00:09:38
for example the name of the host where
00:09:40
the request is being made
00:09:41
the type of browser making a request in
00:09:43
this case firefox
00:09:45
the types of objects that can be
00:09:47
accepted and preferred language in this
00:09:48
case u.s
00:09:49
english and the fact that this
00:09:51
connection should be kept alive
00:09:53
the request message ends with an empty
00:09:55
line so as you can see
00:09:56
very human readable
00:09:59
here's the general format for an http
00:10:02
request message
00:10:03
we see the request line followed by
00:10:05
header lines http protocol specification
00:10:08
rfc 7320 is 85 pages long
00:10:12
and has all the details about the
00:10:14
methods header field names and values
00:10:16
fortunately for us as networking
00:10:18
students we don't need to know
00:10:20
all of those details but if we were
00:10:22
implementing an http client or server
00:10:25
we'd have to know
00:10:26
every detail when we looked at the get
00:10:28
message in the previous slide there was
00:10:29
no entity body
00:10:31
that's needed by some request messages
00:10:33
like post which we'll see in a second
00:10:35
that need to send additional information
00:10:38
not in the header fields to the server
00:10:41
here are the four types of http request
00:10:43
messages we've seen the get method
00:10:45
already the post method is used to
00:10:47
upload completed form data to the server
00:10:50
and the put method can be used to upload
00:10:52
a new object to the server
00:10:54
with a given url possibly replacing an
00:10:57
existing object
00:10:58
and last the head method ask for a
00:11:00
response that's
00:11:01
identical to that of a get request but
00:11:04
without the response body
00:11:06
this could be used for example to
00:11:07
determine the size of an object that
00:11:09
would be retrieved
00:11:10
but without actually retrieving that
00:11:12
object
00:11:14
now let's quickly take a look at the
00:11:15
http response message
00:11:18
the response message begins with a
00:11:20
status line and the first thing on the
00:11:21
status line
00:11:22
is the version number of the http
00:11:25
protocol being used
00:11:26
in this example here 1.1 following the
00:11:29
version number are the two most
00:11:31
important pieces of information
00:11:33
in the response message a status code
00:11:36
and a short message
00:11:37
in this case the status code shown is
00:11:39
200 which means that everything went
00:11:42
okay and a short status phrase in this
00:11:44
case
00:11:45
the word ok following the status line
00:11:49
there are response header lines
00:11:51
as in the case of the request message
00:11:54
that provide additional information
00:11:56
for example shown here you can see that
00:11:58
the date and time the response was sent
00:12:01
the type of server is also shown in this
00:12:03
case an apache server version
00:12:05
2.4.6 the last modified field shows the
00:12:09
time when the document was last modified
00:12:12
the content length field shows how long
00:12:14
the document is
00:12:15
and the content type field indicates the
00:12:17
type of object that's being returned
00:12:19
in this case an html document and
00:12:22
finally the body of the object
00:12:24
being returned in this case the html
00:12:26
document itself
00:12:28
and here are just a few of the http
00:12:30
response status codes and phrases
00:12:33
we've already seen the 200 okay
00:12:36
indicating that a request has succeeded
00:12:38
you've probably seen a 404 not found
00:12:41
when a requested document
00:12:43
was not found on the server all of the
00:12:45
response status codes and phrases are
00:12:47
contained
00:12:48
in the rfc document so take a look there
00:12:51
if you're
00:12:52
really interested in learning about all
00:12:54
of the status codes and response phrases
00:12:57
so that wraps up our discussion of http
00:12:59
messages
00:13:00
it's pretty simple as promised and
00:13:02
there's only two types of messages
00:13:04
and as also promised it's pretty human
00:13:06
digestible we could look at those http
00:13:08
messages
00:13:09
and pretty much understand them let's
00:13:11
wrap up our initial study of http
00:13:14
by coming back to this question of
00:13:16
statelessness we said that http
00:13:19
is a stateless protocol as it turns out
00:13:22
it's possible for a web server to
00:13:24
actually maintain some user state
00:13:26
to remember what a user's done in the
00:13:29
past
00:13:30
and how that might influence what a user
00:13:32
wants to do in a current session
00:13:34
by using a technique known as cookies
00:13:37
let's take a look
00:13:39
websites will use cookies to maintain
00:13:41
information about a user
00:13:43
more specifically a user's browser in
00:13:46
between transactions
00:13:47
and there are four components to using
00:13:49
cookies first
00:13:50
a server at some point is going to send
00:13:53
a cookie
00:13:53
to a client the cookie is just a number
00:13:56
and it's contained in a cookie header
00:13:58
line
00:13:58
of an http response message being sent
00:14:01
to a client
00:14:02
then later when the client next makes a
00:14:04
request to that server it will
00:14:06
send along that cookie value to the
00:14:08
server in a cookie header line
00:14:11
now the server is going to remember all
00:14:13
of the requests that's received and the
00:14:15
responses it's sent
00:14:17
associated with that cookie value so
00:14:19
it'll have a history
00:14:20
of the interactions with that user let's
00:14:23
take a look at an example
00:14:24
let's take a look at an example in this
00:14:27
example a client on the left here is
00:14:30
going to make a couple of http requests
00:14:33
to an amazon server for example on the
00:14:35
right
00:14:36
which has a back-end database where it's
00:14:38
going to store cookie-related
00:14:40
information
00:14:41
the client also has cookie information
00:14:43
from other websites it's visited
00:14:45
in this case a cookie from ebay for
00:14:47
example but it doesn't have a cookie
00:14:49
from amazon
00:14:50
yet the client makes a request as usual
00:14:54
to the amazon web server initially
00:14:57
without a cookie
00:14:58
line when the amazon web server gets the
00:15:00
http request
00:15:02
creates a cookie stores the cookie and
00:15:04
the transaction
00:15:05
information in its database and sends an
00:15:08
http response to the client
00:15:10
including a cookie value
00:15:13
now here in the second request by the
00:15:15
client the client includes its cookie
00:15:17
value to amazon
00:15:19
allowing the amazon server to take a
00:15:21
cookie specific action
00:15:22
perhaps taking the first http request
00:15:25
now into account
00:15:26
for example maybe the first transaction
00:15:28
was a request to look at one item of
00:15:30
merchandise and the second http request
00:15:33
wanted to look at a second item of
00:15:35
merchandise
00:15:36
with cookies the second reply could be
00:15:39
crafted to offer the client a deal on
00:15:41
buying
00:15:41
both of those items of merchandise
00:15:43
together
00:15:44
even though the second request only
00:15:46
wanted to look at one particular
00:15:48
piece of merchandise the client comes
00:15:50
back a week later and again provides a
00:15:52
cookie
00:15:53
the server can again take a cookie
00:15:55
specific action
00:15:56
for example it could say hey i missed
00:15:58
you during the past week you sure you
00:16:00
don't want to buy those
00:16:01
items that you looked at during the past
00:16:03
week
00:16:04
so in this example we can see how
00:16:07
cookies can be used
00:16:08
to store state about a user at a website
00:16:11
in between http interactions
00:16:15
given that cookies can be used to store
00:16:17
user state over multiple transactions
00:16:19
they've got lots of uses remembering
00:16:22
that you've authenticated yourself to a
00:16:24
site before
00:16:25
for remembering the contents of your
00:16:26
shopping cart or for making
00:16:28
recommendations
00:16:29
based on past behavior you may have also
00:16:32
read about the many privacy concerns
00:16:34
about cookies
00:16:35
cookies can allow websites to learn a
00:16:37
lot about you
00:16:38
there are what are called third-party
00:16:40
cookies that can be put into your
00:16:41
browser
00:16:42
and allow websites to establish a common
00:16:44
identity across
00:16:45
multiple websites and you may have also
00:16:47
heard about the eu's general data
00:16:49
protection regulation gdpr
00:16:52
under gdpr cookies that aren't strictly
00:16:54
necessary for the operation of a website
00:16:57
can only be activated after you've given
00:17:00
your explicit consent
00:17:02
for their use and for the collection of
00:17:04
personal data
00:17:05
you may have noticed recently that at a
00:17:07
lot of sites before you're able to use
00:17:09
that site you have to agree
00:17:11
to a certain cookie policy
00:17:15
well that wraps up the first part of our
00:17:17
study of the web
00:17:18
and http and we've learned a lot already
00:17:21
we after a quick introduction we dove
00:17:24
down into the different types
00:17:25
of http connections we studied the
00:17:28
request and the response messages
00:17:30
and we also took a look at cookies a way
00:17:32
to remember some state
00:17:34
on the server side between user
00:17:36
interactions with that server
00:17:38
so coming up next in the second part
00:17:40
we're going to take another deeper dive
00:17:42
into some additional aspects of
00:17:44
improving
00:17:45
http performance we're going to look at
00:17:47
web caches
00:17:48
we're going to look at the conditional
00:17:50
get request and then finally
00:17:52
we're going to look at http version 2
00:17:54
the most recent version of http
00:17:57
and take a quick peek at http version 3
00:18:00
which is coming within the next year or
00:18:04
[Music]
00:18:12
so
00:18:18
you