00:00:00
If you’ve had some exposure to relational
databases, you’ve probably come across the term
00:00:05
“normalization”. But what is normalization?
Why do we do it? How do we do it? And what
00:00:12
bad things can happen if we don’t do it?
In this video, we’re going to explore database
00:00:18
normalization from a practical perspective.
We’ll keep the jargon to a minimum,
00:00:23
and we’ll use lots of examples as we go. By the
end of it, you’ll understand the so-called normal
00:00:29
forms from First Normal Form all the way up to
Fifth Normal Form – and you’ll have a clear sense
00:00:35
of what we gain by doing normalization,
and what we lose by failing to do it.
00:00:41
This is Decomplexify, bringing a welcome
dose of simplicity to complex topics.
00:00:48
Data: it’s everywhere. And some of it is wrong.
00:00:54
By and large, even a good database
design can’t protect against bad data.
00:00:59
But there are some cases of bad data that a good
database design can protect against. These are
00:01:05
cases where the data is telling us something
that logically cannot possibly be true:
00:01:12
One customer with two dates of birth is
logically impossible. It’s what we might
00:01:17
call a failure of data integrity. The data can’t
be trusted because it disagrees with itself.
00:01:24
When data disagrees with itself, that’s
more than just a problem of bad data.
00:01:29
It’s a problem of bad database design.
00:01:32
Specifically, it’s what happens when a
database design isn’t properly normalized.
00:01:38
So what does normalization mean?
When you normalize a database table,
00:01:43
you structure it in such a way that
can’t express redundant information.
00:01:48
So, for example, in a normalized table, you
wouldn’t be able to give Customer 1001 two dates
00:01:54
of birth even if you wanted to. Very broadly, the
table can only express one version of the truth.
00:02:03
Normalized database tables are not only
protected from contradictory data, they’re also:
00:02:08
easier to understand
easier to enhance and extend
00:02:14
protected from insertion
anomalies, update anomalies,
00:02:18
and deletion anomalies (more on these later)
00:02:23
How do we determine whether a table isn’t
normalized enough – in other words, how do
00:02:27
we determine if there’s a danger that redundant
data could creep into the table? Well, it turns
00:02:34
out that there are sets of criteria we can use to
assess the level of danger. These sets of criteria
00:02:40
have names like “first normal form”, “second
normal form”, “third normal form”, and so on.
00:02:47
Think of these normal forms by analogy to safety
assessments. We might imagine an engineer doing a
00:02:54
very basic safety assessment on a bridge. Let’s
say the bridge passes the basic assessment,
00:03:00
which means it achieves “Safety Level
1: Safe for Pedestrian Traffic”.
00:03:05
That gives us some comfort, but suppose we want to
know if cars can safely drive across the bridge?
00:03:11
To answer that question, we need the engineer to
perform an even stricter assessment of the bridge.
00:03:16
Let’s imagine that the engineer goes ahead and
does this stricter assessment, and again the
00:03:21
bridge passes, achieving “Safety Level 2: Safe
for Cars”. If even this doesn’t satisfy us,
00:03:27
we might ask the engineer to assess the bridge
for “Safety Level 3: Safe for Trucks.” And so on.
00:03:35
The normal forms of database
theory work the same way.
00:03:39
If we discover that a table meets the
requirements of first normal form,
00:03:43
that’s a bare minimum safety guarantee. If
we further discover that the table meets
00:03:48
the requirements of second normal form, that’s
an even greater safety guarantee. And so on.
00:03:55
So let’s begin at the beginning,
with First Normal Form.
00:03:59
Suppose you and I are both
confronted by this question:
00:04:02
“Who were the members of the Beatles?”
You might answer “John, Paul, George, and Ringo”.
00:04:08
I might answer “Paul, John, Ringo, and George”.
Of course, my answer and your answer are
00:04:14
equivalent, despite having the
names in a different order.
00:04:18
When it comes to relational databases, the same
principle applies. Let’s record the names of the
00:04:23
Beatles in a table, and then let’s ask the
database to return those names back to us.
00:04:30
The results will get returned to us in an
arbitrary order. For example, they might
00:04:35
get returned like this.
Or like this.
00:04:41
Or in any other order. There is no “right” order.
Are there ever situations where there’s a right
00:04:48
order? Suppose we write down the members
of the Beatles from tallest to shortest,
00:04:53
like this. We title our list “Members Of
The Beatles From Tallest To Shortest”.
00:05:02
In this list, it’s not just the names that
convey meaning. The order of the names conveys
00:05:06
meaning too. Paul is the tallest, John is the
second-tallest, and so on. Lists like this are
00:05:12
totally comprehensible to us – but they’re not
normalized. Remember, there’s no such thing as row
00:05:19
order within a relational database table. So here
we have our first violation of First Normal Form.
00:05:25
When we use row order to convey information,
we’re violating First Normal Form.
00:05:31
The solution is very simple. Be explicit – if
we want to capture height information, we should
00:05:37
devote a separate column to it – like this.
Or even better, like this.
00:05:45
So far, we’ve seen one way in
which a design can fail to achieve
00:05:49
First Normal Form. But there are others.
A second way of violating First Normal Form
00:05:54
involves mixing data types. Suppose our
Beatle_Height dataset looked like this.
00:06:02
If you’re accustomed to spreadsheets, you’ll be
aware that they typically won’t stop you from
00:06:06
having more than one datatype within a single
column – for example, they won’t stop you from
00:06:11
storing both numbers and strings in a column. But
in a relational database, you’re not allowed to be
00:06:17
cagey or ambiguous about a column’s data type.
The values that go in the Height_In_Cm column
00:06:23
can’t be a mix of integers and strings. Once you
define Height_In_Cm as being an integer column,
00:06:29
then every value that goes into that column
will be an integer – no strings, no timestamps,
00:06:35
no data types of any kind other than
integers. So: mixing datatypes within a column
00:06:41
is a violation of First Normal Form, and in fact
the database platform won’t even let you do it.
00:06:48
A third way of violating First Normal Form is by
designing a table without a primary key. A primary
00:06:55
key is a column, or combination of columns,
that uniquely identifies a row in the table.
00:07:01
For example, in the table Beatle_Height,
our intention is that each row should tell
00:07:06
us about one particular Beatle, so we ought to
designate “Beatle” as the primary key of the
00:07:11
Beatle_Height table. The database platform will
need to know about our choice of primary key,
00:07:18
so we’ll want to get the primary key into
the database by doing something like this.
00:07:24
With the primary key in place, the
database platform will prevent multiple
00:07:27
rows for the same Beatle from ever
being inserted. That’s a good thing,
00:07:32
because multiple rows for the same Beatle would
be nonsensical, and perhaps contradictory.
00:07:38
Obviously, a Beatle can’t have
two different heights at once.
00:07:42
Every table we design should have a primary key.
If it doesn’t, it’s not in First Normal Form.
00:07:49
The last way of failing to achieve
First Normal Form involves the notion of
00:07:52
“repeating groups”. Suppose we’re designing
a database for an online multiplayer game.
00:07:58
At a given time, each player has a number
of items of different types, like arrows,
00:08:02
shields, and copper coins. We might
represent the situation like this.
00:08:09
A player’s inventory is what we call a “repeating
group”. Each inventory contains potentially many
00:08:15
different types of items: arrows, shields,
copper coins, and so on; and in fact there
00:08:21
may be hundreds of different types of items
that a player might have in their inventory.
00:08:26
We could design a database table that
represents the Inventory as a string of text:
00:08:32
But this is a terrible design because
there’s no easy way of querying it.
00:08:36
For example, if we want to know which players
currently have more than 10 copper coins,
00:08:41
then having the inventory data
lumped together in a text string
00:08:45
will make it very impractical to write
a query that gives us the answer.
00:08:50
We might be tempted to
represent the data like this.
00:08:54
This lets us record up to 4 items per inventory.
But given that a player can have an inventory
00:09:00
consisting of hundreds of different types of
items, how practical is it going to be to design
00:09:04
a table with hundreds of columns? Even if we were
to go ahead and create a super-wide table to hold
00:09:10
all possible inventory data, querying
it would still be extremely awkward.
00:09:16
The bottom line is that storing a repeating group
of data items on a single row violates First
00:09:21
Normal Form. So what sort of alternative
design would respect First Normal Form?
00:09:28
It would be this.
To communicate the fact that
00:09:32
trev73 owns 3 shields, we have a row for Player
“trev73”, Item_Type “shields”, Item_Quantity 3.
00:09:39
To communicate the fact that
trev73 also owns 5 arrows,
00:09:43
we have a row for Player “trev73”, Item_Type
“arrows”, Item_Quantity 5. And so on.
00:09:49
And because each row in the table tells
us about one unique combination of Player
00:09:54
and Item_Type, the primary key is the
combination of Player and Item_Type.
00:10:00
So let’s review what we know
about First Normal Form.
00:10:04
1. using row order to convey
information is not permitted
00:10:08
2. mixing data types within the
same column is not permitted
00:10:13
3. having a table without a
primary key is not permitted
00:10:17
4. repeating groups are not permitted
Next up: Second Normal Form.
00:10:25
Let’s look again at our Player Inventory table.
This table is fully normalized. But suppose we
00:10:32
enhance the table slightly. Let’s imagine
that every player has a rating: Beginner,
00:10:37
Intermediate, or Advanced. We want to record the
current rating of each player – and to achieve
00:10:43
that, we simply include in our table
an extra column called Player_Rating.
00:10:49
Notice what’s happening here. Player
jdog21 has a Player_Rating of Intermediate,
00:10:55
but because jdog21 has two rows in the table,
both those rows have to be marked Intermediate.
00:11:01
Player trev73 has a Player_Rating of Advanced,
00:11:05
but because trev73 has four rows in the table, all
four of those rows have to be marked Advanced.
00:11:11
This is not a good design. Why not? Well,
suppose player gila19 loses all her copper coins,
00:11:19
leaving her with nothing in her inventory.
The single entry that she did have in the
00:11:24
Player_Inventory table is now gone.
If we try to query the database to find
00:11:30
out what gila19’s Player Rating is, we’re out
of luck. We can no longer access gila19’s Player
00:11:36
Rating because the database no longer knows it.
This problem is known as a deletion anomaly.
00:11:43
And that’s not all. Suppose jdog21 improves
his rating from Intermediate to Advanced.
00:11:49
To capture his new Advanced rating
in the Player_Inventory table,
00:11:53
we run an update on his two records.
But let’s imagine the update goes wrong.
00:11:59
By accident, only one of jdog21’s records gets
updated, and the other record gets left alone.
00:12:06
Now the data looks like this.
As far as the database is concerned,
00:12:11
jdog21 is somehow both Intermediate
and Advanced at the same time.
00:12:16
Our table design has left the door open
for this type of logical inconsistency.
00:12:20
This problem is called an update anomaly.
Or suppose a new player called tina42 comes along.
00:12:27
She’s a Beginner and she doesn’t have anything
in her inventory yet. We want to record the fact
00:12:32
that she’s a Beginner, but because she
has nothing in her inventory, we can’t
00:12:37
insert a tina42 row into the Player_Inventory
table. So her rating goes unrecorded. This
00:12:42
problem is known as an insertion anomaly.
The reason our design is vulnerable to these
00:12:49
problems is that it isn’t in Second Normal
Form. Why not? What is Second Normal Form?
00:12:56
Second Normal Form is about how a table’s non-key
columns relate to the primary key. In our table,
00:13:02
the non-key columns – or to use slightly
different terminology, non-key attributes – are
00:13:08
Item_Quantity and Player_Rating. They are columns
(also called attributes), that don’t belong
00:13:15
to the primary key. As we saw earlier, the primary
key is the combination of Player and Item Type.
00:13:22
Now we’re in a position to give a
definition of Second Normal Form.
00:13:25
The definition we’re going to give is
an informal one which leaves out some
00:13:29
nuances – but for most practical
purposes, that shouldn’t matter.
00:13:34
Informally, what Second Normal Form says
is that each non-key attribute in the table
00:13:38
must be dependent on the entire primary key.
How does our table measure up to this definition?
00:13:45
Let’s examine our non-key attributes, which are
the attributes Item_Quantity and Player_Rating.
00:13:51
Does Item_Quantity depend on the entire primary
key? Yes, because an Item_Quantity is about a
00:13:57
specific Item_Type owned by specific
Player. We can express it like this.
00:14:03
The arrow signifies a dependency – or to give
it its proper name, a functional dependency.
00:14:08
This simply means that each value of the thing
on the left side of the arrow is associated with
00:14:13
exactly one value of the thing on the right side
of the arrow. Each combination of Player_ID and
00:14:18
Item_Type is associated with a specific value
of Item_Quantity – for example the combination
00:14:25
of Player_ID jdog21 / Item_Type “amulets”
is associated with an Item_Quantity of 2.
00:14:31
As far as Second Normal Form is
concerned, this dependency is fine,
00:14:34
because it’s a dependency on the entire primary
key. But what about the other dependency?
00:14:40
Does Player_Rating depend on the entire primary
key? No, it doesn’t. Player_Rating is a property
00:14:46
of the Player only. In other words, for any
given Player, there’s one Player_Rating.
00:14:52
This dependency on Player is the problem.
It’s a problem because Player isn’t the
00:14:58
primary key – Player is part of the
primary key, but it’s not the whole key.
00:15:02
That’s why the table isn’t in Second Normal Form,
and that’s why it’s vulnerable to problems.
00:15:08
At what point did our design go wrong, and
how can we fix it? The design went wrong
00:15:13
when we chose to add a Player_Rating column
to a table where it didn’t really belong.
00:15:18
The fact that a Player_Rating is a property
of a Player should have helped us to realise
00:15:22
that a Player is an important concept in its own
right – so surely Player deserves its own table:
00:15:29
Nothing could be simpler than that. A Player
table will contain one row per Player,
00:15:34
and in it we can include as columns the ID of
the player, the rating of the player, as well
00:15:39
as all sorts of other properties of the player
– maybe the player’s date of birth, for example,
00:15:43
maybe the player’s email address. Our other
table, Player_Inventory, can stay as it was.
00:15:50
For both tables, we can say that
there are no part-key dependencies.
00:15:54
In other words, it’s always the case that every
attribute depends on the whole primary key,
00:15:59
not just part of it. And so our
tables are in Second Normal Form.
00:16:06
Now let’s move on to Third Normal Form.
Suppose we decide to enhance the Player table.
00:16:12
We decide to add a new column
called Player_Skill_Level.
00:16:16
Imagine that in this particular multiplayer
game, there’s a nine-point scale for skill level.
00:16:22
At one extreme, a player with skill
level 1 is an absolute beginner;
00:16:25
at the opposite extreme, a player with skill
level 9 is as skilful as it’s possible to be.
00:16:30
And let’s say that we’ve defined exactly how
Player Skill Levels relate to Player Ratings.
00:16:36
“Beginner” means a skill level between
1 and 3. “Intermediate” means a skill
00:16:40
level between 4 and 6. And “Advanced”
means a skill level between 7 and 9.
00:16:46
But now that both the Player_Rating and the
Player_Skill_Level exist in the Player table,
00:16:51
a problem can arise. Let’s say that tomorrow,
player gila19’s skill level increases from 3
00:16:56
to 4. If that happens, we’ll update her row in the
Player table to reflect this new skill level.
00:17:04
By rights, we should also update her Player_Rating
to Intermediate – but suppose something goes
00:17:10
wrong, and we fail to update the Player_Rating.
Now we’ve got a data inconsistency. gila19’s
00:17:15
Player_Rating says she’s a Beginner, but her
Player_Skill_Level implies she’s Intermediate.
00:17:22
How did the design allow this happen? Second
Normal Form didn’t flag up any problems. There’s
00:17:27
no attribute here that depends only partially
on the primary key – as a matter of fact,
00:17:31
the primary key doesn’t have any parts; it’s
just a single attribute. And both Player_Rating
00:17:37
and Player_Skill_Level are dependent on it.
But in what way are they dependent on it? Let’s
00:17:42
look more closely. Player_Skill_Level
is dependent on Player_ID.
00:17:48
Player_Rating is dependent on Player ID
too, but only indirectly – like this.
00:17:54
A dependency of this kind is called a transitive
dependency. Player Rating depends on Player Skill
00:18:00
Level which in turn depends on the primary
key: Player ID. The problem is located just
00:18:07
here – because what Third Normal Form forbids is
exactly this type of dependency: the dependency of
00:18:13
a non-key attribute on another non-key attribute.
Because Player Rating depends on Player Skill
00:18:21
Level – which is a non-key attribute –
this table is not in Third Normal Form.
00:18:26
There’s a very simple way of repairing the
design to get it into Third Normal Form.
00:18:30
We remove Player Rating from the Player table;
so now the Player table looks like this.
00:18:35
And we introduce a new table
called Player_Skill_Levels.
00:18:40
The Player Skill Levels table tells us everything
we need to know about how to translate a player
00:18:44
skill level into a player rating.
Third Normal Form is the culmination of everything
00:18:49
we’ve covered about database normalization so
far. It can be summarised in this way: Every
00:18:55
non-key attribute in a table should depend on
the key, the whole key, and nothing but the key.
00:19:01
If you commit this to memory, and keep it
constantly in mind while you’re designing a
00:19:05
database, then 99% of the time you will
end up with fully normalized tables.
00:19:12
It’s even possible to shorten this guideline
slightly by knocking out the phrase
00:19:15
“non-key” – giving us the revised guideline: every
attribute in a table should depend on the key, the
00:19:21
whole key, and nothing but the key. And this new
guideline represents a slightly stronger flavor of
00:19:27
Third Normal Form known as Boyce-Codd Normal Form.
In practice, the difference between Third Normal
00:19:33
Form and Boyce-Codd Normal Form is extremely
small, and the chances of you ever encountering
00:19:39
a real-life Third Normal Form table that doesn’t
meet Boyce-Codd Normal Form are almost zero.
00:19:45
Any such table would have to have what we call
multiple overlapping candidate keys – which gets
00:19:50
us into realms of obscurity and theoretical
rigor that are a little bit beyond the scope
00:19:55
of this video. So as a practical matter, just
follow the guideline that every attribute in a
00:20:00
table should depend on the key, the whole
key, and nothing but the key, and you can
00:20:05
be confident that the table will be in both
Third Normal Form and Boyce-Codd Normal Form.
00:20:13
In almost all cases, once you’ve normalized
a table this far, you’ve fully normalized
00:20:17
it. There are some instances where this
level of normalization isn’t enough.
00:20:22
These rare instances are dealt with
by Fourth and Fifth Normal Form.
00:20:27
So let’s move on to Fourth Normal Form. We’ll
look at an example of a situation where Third
00:20:32
Normal Form isn’t quite good enough and something
a bit stronger is needed. In our example, there’s
00:20:37
a website called DesignMyBirdhouse.com – the
world’s leading supplier of customized birdhouses.
00:20:44
On DesignMyBirdhouse.com, customers can
choose from different birdhouse models,
00:20:49
and, for the model they’ve selected,
they can choose both a custom color
00:20:53
and a custom style. Each model has its
own range of available colors and styles.
00:21:00
One way of capturing this information
is to put it all the possible
00:21:02
combinations in a single table, like this.
This table is in Third Normal Form. The primary
00:21:10
key consists of all three columns: {Model,
Color, Style}. Everything depends on the key,
00:21:16
the whole key, and nothing but the key.
And yet this table is still vulnerable
00:21:21
to problems. Let’s look at the rows
for the birdhouse model “Prairie”:
00:21:27
The available colors for the “Prairie”
birdhouse model are brown and beige.
00:21:32
Now suppose DesignMyBirdhouse.com decides
to introduce a third available color for
00:21:36
the “Prairie” model: green. This will mean we’ll
have to add two extra “Prairie” rows to the table:
00:21:44
one for green bungalow, and
one for green schoolhouse.
00:21:49
If by mistake we only add a row for green
bungalow, and fail to add the row for green
00:21:53
schoolhouse, then we have a data inconsistency.
Available colors are supposed to be completely
00:22:00
independent of available styles. But our
table is saying that a customer can choose
00:22:05
green only for the bungalow style, not for
the schoolhouse style. That makes no sense.
00:22:11
The prairie birdhouse model is available in green,
so all its styles should be available in green.
00:22:16
Something about the way the table is designed has
allowed us to represent an impossible situation.
00:22:22
To see what’s gone wrong, let’s have a
closer look at the dependencies among Models,
00:22:26
Colors, and styles. Can we say that Color
has a functional dependency on Model?
00:22:33
Actually no, because a specific Model
isn’t associated with just one Color.
00:22:38
And yet it does feel as though Color has some
relationship to Model. How can we express it?
00:22:44
We can say that each Model has a specific set
of available Colors. This kind of dependency is
00:22:50
called a multivalued dependency, and we express
it with a double-headed arrow, like this:
00:22:58
And it’s equally true that each Model
has a specific set of available Styles.
00:23:04
What Fourth Normal Form says is that the only
kinds of multivalued dependency we’re allowed
00:23:08
to have in a table are multivalued dependencies
on the key. Model is not the key; so the table
00:23:15
Model_Colors_And_Styles_Available
is not in Fourth Normal Form.
00:23:20
As always, the fix is to split
things out into multiple tables.
00:23:26
Now, if DesignMyBirdhouse.com expands the range of
Prairie-Model colors to include green, we simply
00:23:32
add a row to the Model_Colors_Available table:
And no anomalies are possible.
00:23:42
We’re now ready for Fifth Normal Form, the
last normal form covered in this video.
00:23:48
For our Fifth Normal Form example, we imagine
that there are three different brands of ice
00:23:52
cream available: Frosty’s, Alpine, and Ice
Queen. Each of the three brands of ice cream
00:23:58
offers a different range of flavors:
Frosty’s offers vanilla, chocolate,
00:24:03
strawberry, and mint chocolate chip
Alpine offers vanilla and rum raisin
00:24:08
Ice Queen offers vanilla,
strawberry, and mint chocolate chip
00:24:13
Now we ask our friend Jason what
types of ice cream he likes.
00:24:16
Jason says: I only like vanilla and chocolate.
And I only like the brands Frosty and Alpine.
00:24:24
We ask our other friend, Suzy, what types of
ice cream she likes. Suzy says: I only like
00:24:30
rum raisin, mint chocolate chip, and strawberry.
And I only like the brands Alpine and Ice Queen.
00:24:37
So, after a little bit of brainwork, we
deduce exactly which ice cream products
00:24:41
Jason and Suzy are willing to eat;
and we express this in a table:
00:24:48
But time passes, tastes change, and at some point
Suzy announces that she now likes Frosty’s brand
00:24:54
ice cream too. So we need to update our table.
It won’t come as any surprise that we might get
00:25:03
this update wrong. We might successfully add a
row for Person Suzy – Brand Frosty’s – Flavor
00:25:09
Strawberry, but fail to add a row for Person Suzy
– Brand Frosty’s – Flavor Mint Chocolate Chip.
00:25:15
And this outcome wouldn’t just be wrong – it
would be logically inconsistent – because we’ve
00:25:20
already established that Suzy likes Frosty’s
brand, and likes Mint Chocolate Chip flavor,
00:25:26
and therefore there’s no way she can
dislike Frosty’s Mint Chocolate Chip.
00:25:32
In this example, we went wrong right at the
beginning. At the beginning, we were given
00:25:36
three pieces of information. First, we were told
which brands offered which flavors. Second, we
00:25:42
were told which people liked which brands. Third,
we were told which people liked which flavors.
00:25:49
From those three pieces of information, we
should have simply created three tables.
00:25:56
And that’s all we needed to do. All the
facts of the situation have been represented.
00:26:02
If we ever want to know what
specific products everyone likes,
00:26:05
we can simply ask the database platform,
expressing our question in the form of a
00:26:10
piece of SQL that logically deduces the
answer by joining the tables together.
00:26:17
To sum things up: if we want to ensure
that a table that’s in Fourth Normal
00:26:21
Form is also in Fifth Normal Form, we need
to ask ourselves whether the table can be
00:26:26
logically thought of as being the result
of joining some other tables together.
00:26:31
If it can be thought of that way,
then it’s not in Fifth Normal Form.
00:26:36
If it can’t be thought of that way,
then it is in Fifth Normal Form.
00:26:42
We’ve now covered all the normal forms from First
Normal Form to Fifth Normal Form. Let’s review,
00:26:49
keeping in mind that for a table to comply with
a particular normal form, it must comply with
00:26:54
all the lower normal forms as well.
The rules for first normal form are:
00:27:00
1. using row order to convey
information is not permitted
00:27:04
2. mixing data types within the
same column is not permitted
00:27:08
3. having a table without a
primary key is not permitted
00:27:12
4. repeating groups are not permitted
The rule for second normal form is:
00:27:19
Each non-key attribute in the table must
be dependent on the entire primary key.
00:27:25
The rule for third normal form is: Each non-key
attribute in a table must depend on the key,
00:27:30
the whole key, and nothing but the key. If we
prefer to drop the phrase “non-key”, we end up
00:27:37
with an even simpler and even stronger version of
third normal form called “Boyce-Codd Normal Form”:
00:27:43
Each attribute in a table must depend on the
key, the whole key, and nothing but the key.
00:27:49
The rule for fourth normal form is that
the only kinds of multivalued dependency
00:27:54
we’re allowed to have in a table are
multivalued dependencies on the key.
00:27:59
Finally, the rule for Fifth Normal Form
is: it must not be possible to describe
00:28:04
the table as being the logical result
of joining some other tables together.
00:28:11
I hope you’ve found this video helpful.
If you have any comments or questions
00:28:14
on what you’ve just seen, by all means
put them in the comments section below.
00:28:19
And if you have any suggestions for other
complex topics that you’d like to see explained
00:28:24
on Decomplexify, again let me know in the
comments. So long, and thanks for watching!