00:00:03
Hi, I’m Carrie Anne, and welcome to CrashCourse
Computer Science.
00:00:05
Over the past two episodes, we’ve delved
into the wires, signals, switches, packets,
00:00:10
routers and protocols that make up the internet.
00:00:12
Today we’re going to move up yet another
level of abstraction and talk about the World
00:00:16
Wide Web.This is not the same thing as the
Internet, even though people often use the
00:00:20
two terms interchangeably in everyday language.
00:00:21
The World Wide Web runs on top of the internet,
in the same way that Skype, Minecraft or Instagram do.
00:00:27
The Internet is the underlying plumbing that
conveys the data for all these different applications.
00:00:31
And The World Wide Web is the biggest of them
all – a huge distributed application running
00:00:35
on millions of servers worldwide, accessed
using a special program called a web browser.
00:00:40
We’re going to learn about that, and much
more, in today’s episode.
00:00:43
INTRO
00:00:53
The fundamental building block of the World
Wide Web – or web for short – is a single
00:00:57
page.
00:00:58
This is a document, containing content, which
can include links to other pages.
00:01:01
These are called hyperlinks.
00:01:03
You all know what these look like: text or
images that you can click, and they jump you
00:01:06
to another page.
00:01:08
These hyperlinks form a huge web of interconnected
information, which is where the whole thing
00:01:12
gets its name.
00:01:13
This seems like such an obvious idea.
00:01:15
But before hyperlinks were implemented, every
time you wanted to switch to another piece
00:01:18
of information on a computer, you had to rummage
through the file system to find it, or type
00:01:22
it into a search box.
00:01:24
With hyperlinks, you can easily flow
from one related topic to another.
00:01:28
The value of hyperlinked information was conceptualized by Vannevar Bush way back in 1945.
00:01:33
He published an article describing a hypothetical
machine called a Memex, which we discussed
00:01:37
in Episode 24.
00:01:39
Bush described it as "associative indexing
... whereby any item may be caused at will
00:01:44
to select another immediately and automatically."
00:01:47
He elaborated: "The process of tying two things
together is the important thing...thereafter,
00:01:52
at any time, when one of those items is in
view, the other [item] can be instantly recalled
00:01:57
merely by tapping a button."
00:01:59
In 1945, computers didn’t even have screens,
so this idea was way ahead of its time!
00:02:04
Text containing hyperlinks is so powerful,
it got an equally awesome name: hypertext!
00:02:09
Web pages are the most common type of hypertext
document today.
00:02:12
They’re retrieved and rendered by web browsers which we'll get to in a few minutes.
00:02:15
In order for pages to link to one another,
each hypertext page needs a unique address.
00:02:20
On the web, this is specified by a Uniform
Resource Locator, or URL for short.
00:02:25
An example web page URL is thecrashcourse.com/courses.
00:02:29
Like we discussed last episode, when you request
a site, the first thing your computer does
00:02:33
is a DNS lookup.
00:02:34
This takes a domain name as input – like
“the crash course dot com” – and replies
00:02:38
back with the corresponding computer’s IP
address.
00:02:40
Now, armed with the IP address of the computer
you want, your web browser opens a TCP connection
00:02:45
to a computer that’s running a special piece
of software called a web server.
00:02:49
The standard port number for web servers is
port 80.
00:02:52
At this point, all your computer has done
is connect to the web server at the address
00:02:55
thecrashcourse.com
00:02:57
The next step is to ask that web server for
the “courses” hypertext page.
00:03:01
To do this, it uses the aptly named Hypertext
Transfer Protocol, or HTTP.
00:03:05
The very first documented version of this
spec, HTTP 0.9, created in 1991, only had
00:03:11
one command – “GET”.
00:03:13
Fortunately, that’s pretty much all you
need.
00:03:15
Because we’re trying to get the “courses”
page, we send the server the following command
00:03:19
– GET /courses.
00:03:21
This command is sent as raw ASCII text to
the web server, which then replies back with
00:03:25
the web page hypertext we requested.
00:03:27
This is interpreted by your computer's web
browser and rendered to your screen.
00:03:31
If the user follows a link to another page,
the computer just issues another GET request.
00:03:35
And this goes on and on as you surf around
the website.
00:03:38
In later versions, HTTP added status codes,
which prefixed any hypertext that was sent
00:03:43
following a GET request.
00:03:45
For example, status code 200 means OK – I’ve
got the page and here it is!
00:03:49
Status codes in the four hundreds are for
client errors.
00:03:51
Like, if a user asks the web server for a
page that doesn’t exist, that’s the dreaded
00:03:56
404 error!
00:03:57
Web page hypertext is stored and sent as plain
old text, for example, encoded in ASCII or
00:04:01
UTF-16, which we talked about in Episodes
4 and 20.
00:04:05
Because plain text files don’t have a way
to specify what’s a link and what’s not,
00:04:09
it was necessary to develop a way to “mark
up” a text file with hypertext elements.
00:04:13
For this, the Hypertext Markup Language was
developed.
00:04:16
The very first version of HTML version 0.a,
created in 1990, provided 18 HTML commands
00:04:22
to markup pages.
00:04:23
That’s it!
00:04:24
Let’s build a webpage with these!
00:04:25
First, let’s give our web page a big heading.
00:04:28
To do this, we type in the letters “H 1”,
which indicates the start of a first level
00:04:32
heading, and we surround that in angle brackets.
00:04:35
This is one example of an HTML tag.
00:04:38
Then, we enter whatever heading text we want.
00:04:40
We don’t want the whole page to be a heading.
00:04:42
So, we need to “close” the “h1” tag
like so, with a little slash in the front.
00:04:45
Now lets add some content.
00:04:47
Visitors may not know what Klingons are, so
let’s make that word a hyperlink to the
00:04:51
Klingon Language Institute for more information.
00:04:53
We do this with an “A” tag, inside of
which we include an attribute that specifies
00:04:57
a hyperlink reference.
00:04:58
That’s the page to jump to if the link is
clicked.
00:05:00
And finally, we need to close the A tag.
00:05:03
Now lets add a second level heading, which
uses an “h2” tag.
00:05:06
HTML also provides tags to create lists.
00:05:09
We start this by adding the tag for an ordered
list.
00:05:12
Then we can add as many items as we want,
surrounded in “L i” tags, which stands
00:05:16
for list item.
00:05:17
People may not know what a bat'leth is, so
let’s make that a hyperlink too.
00:05:21
Lastly, for good form, we need to close the
ordered list tag.
00:05:24
And we’re done – that’s a very simple
web page!
00:05:27
If you save this text into notepad or textedit,
and name it something like “test.html”,
00:05:31
you should be able to open it by dragging
it into your computer’s web browser.
00:05:35
Of course, today’s web pages are a tad more
sophisticated.
00:05:38
The newest version of HTML, version 5, has
over a hundred different tags – for things
00:05:42
like images, tables, forms and buttons.
00:05:44
And there are other technologies we’re not
going to discuss, like Cascading Style Sheets
00:05:48
or CSS and JavaScript, which can be embedded
into HTML pages and do even fancier things.
00:05:54
That brings us back to web browsers.
00:05:56
This is the application on your computer that
lets you talk with all these web servers.
00:06:00
Browsers not only request pages and media,
but also render the content that’s being
00:06:03
returned.
00:06:04
The first web browser, and web server, was
written by (now Sir) Tim Berners-Lee over
00:06:09
the course of two months in 1990.
00:06:10
At the time, he was working at CERN in Switzerland.
00:06:13
To pull this feat off, he simultaneously created
several of the fundamental web standards we
00:06:18
discussed today: URLs, HTML and HTTP.
00:06:21
Not bad for two months work!
00:06:23
Although to be fair, he’d been researching
hypertext systems for over a decade.
00:06:27
After initially circulating his software amongst
colleagues at CERN, it was released to the
00:06:30
public in 1991.
00:06:32
The World Wide Web was born.
00:06:34
Importantly, the web was an open standard,
making it possible for anyone to develop new
00:06:38
web servers and browsers.
00:06:39
This allowed a team at the University of Illinois
at Urbana-Champaign to create the Mosaic web
00:06:43
browser in 1993.
00:06:45
It was the first browser that allowed graphics
to be embedded alongside text; previous browsers
00:06:50
displayed graphics in separate windows.
00:06:52
It also introduced new features like bookmarks,
and had a friendly GUI interface, which made
00:06:56
it popular.
00:06:57
Even though it looks pretty crusty, it’s
recognizable as the web we know today!
00:07:01
By the end of the 1990s, there were many web
browsers in use, like Netscape Navigator,
00:07:05
Internet Explorer, Opera, OmniWeb and Mozilla.
00:07:08
Many web servers were also developed, like
Apache and Microsoft’s Internet Information
00:07:11
Services (IIS).
00:07:13
New websites popped up daily, and web mainstays
like Amazon and eBay were founded in the mid-1990s.
00:07:18
A golden era!
00:07:19
The web was flourishing and people increasingly
needed ways to find things.
00:07:23
If you knew the web address of where you wanted
to go – like ebay.com – you could just
00:07:27
type it into the browser.
00:07:28
But what if you didn’t know where to go?
00:07:30
Like, you only knew that you wanted pictures
of cute cats.
00:07:33
Right now!
00:07:34
Where do you go?
00:07:35
At first, people maintained web pages which
served as directories hyperlinking to other
00:07:39
websites.
00:07:40
Most famous among these was "Jerry and David's
guide to the World Wide Web", renamed Yahoo
00:07:44
in 1994.
00:07:45
As the web grew, these human-edited directories
started to get unwieldy, and so search engines
00:07:50
were developed.
00:07:51
Let’s go to the thought bubble!
00:07:52
The earliest web search engine that operated
like the ones we use today, was JumpStation,
00:07:57
created by Jonathon Fletcher in 1993 at the
University of Stirling.
00:08:01
This consisted of three pieces of software
that worked together.
00:08:04
The first was a web crawler, software that
followed all the links it could find on the
00:08:07
web; anytime it followed a link to a page
that had new links, it would add those to
00:08:11
its list.
00:08:12
The second component was an ever enlarging
index, recording what text terms appeared
00:08:16
on what pages the crawler had visited.
00:08:18
The final piece was a search algorithm that
consulted the index; for example, if I typed
00:08:22
the word “cat” into JumpStation, every
webpage where the word “cat” appeared
00:08:26
would come up in a list.
00:08:28
Early search engines used very simple metrics
to rank order their search results, most often
00:08:32
just the number of times a search term appeared
on a page.
00:08:35
This worked okay, until people started gaming
the system, like by writing “cat” hundreds
00:08:40
of times on their web pages just to steer
traffic their way.
00:08:43
Google’s rise to fame was in large part
due to a clever algorithm that sidestepped
00:08:47
this issue.
00:08:48
Instead of trusting the content on a web page,
they looked at how other websites linked to
00:08:52
that page.
00:08:53
If it was a spam page with the word cat over
and over again, no site would link to it.
00:08:57
But if the webpage was an authority on cats,
then other sites would likely link to it.
00:09:01
So the number of what are called “backlinks”,
especially from reputable sites, was often
00:09:05
a good sign of quality.
00:09:07
This started as a research project called
BackRub at Stanford University in 1996, before
00:09:12
being spun out, two years later, into the
Google we know today.
00:09:15
Thanks thought bubble!
00:09:16
Finally, I want to take a second to talk about
a term you’ve probably heard a lot recently,
00:09:20
“Net Neutrality”.
00:09:21
Now that you’ve built an understanding of
packets, internet routing, and the World Wide
00:09:25
Web, you know enough to understand the essence
– at least the technical essence – of
00:09:29
this big debate.
00:09:30
In short, network neutrality is the principle
that all packets on the internet should be
00:09:34
treated equally.
00:09:35
It doesn’t matter if the packets are my
email or you streaming this video, they should
00:09:38
all chug along at the same speed and priority.
00:09:41
But many companies would prefer that their
data arrive to you preferentially.
00:09:45
Take for example, Comcast, a large ISP that
also owns many TV channels, like NBC and The
00:09:50
Weather Channel, which are streamed online.
00:09:52
Not to pick on Comcast, but in the absence
of Net Neutrality rules, they could for example say that
00:09:57
they want their content to be delivered silky
smooth, with high priority…
00:10:01
But other streaming videos are going to get
throttled, that is, intentionally given less
00:10:04
bandwidth and lower priority. Again I just want to reiterate here this is just conjecture.
00:10:09
At a high level, Net Neutrality advocates
argue that giving internet providers this
00:10:13
ability to essentially set up tolls on the
internet – to provide premium packet delivery
00:10:17
– plants the seeds for an exploitative business
model.
00:10:20
ISPs could be gatekeepers to content, with
strong incentives to not play nice with competitors.
00:10:25
Also, if big companies like Netflix and Google
can pay to get special treatment, small companies,
00:10:30
like start-ups, will be at a disadvantage,
stifling innovation.
00:10:34
On the other hand, there are good technical
reasons why you might want different types
00:10:37
of data to flow at different speeds.
00:10:39
That skype call needs high priority, but it’s
not a big deal if an email comes in a few
00:10:43
seconds late.
00:10:44
Net-neutrality opponents also argue that market
forces and competition would discourage bad
00:10:49
behavior, because customers would leave ISPs
that are throttling sites they like.
00:10:53
This debate will rage on for a while yet,
and as we always encourage on Crash Course,
00:10:57
you should go out and learn more because the
implications of Net Neutrality are complex
00:11:01
and wide-reaching.
00:11:02
I’ll see you next week.