00:00:00
welcome to part five of my series
00:00:03
on exploratory data analysis with excel
00:00:06
the subject of this video
00:00:07
is going to be creating and using bar
00:00:10
charts in excel
00:00:11
to explore a data set so if you've
00:00:14
reached this video prematurely
00:00:16
if you need to see video one of the
00:00:18
series go ahead and click up here
00:00:20
and you can go ahead and find that video
00:00:21
right there next up
00:00:23
if you want the workbooks in this series
00:00:25
you can get them by
00:00:27
looking down in the description there'll
00:00:28
be a link to a github repo
00:00:31
where you can download all of the excel
00:00:32
workbooks in this series
00:00:34
okay let's go ahead and get started okay
00:00:37
you can see here i'm in excel
00:00:39
and all i've done is taken one of the
00:00:42
worksheets here
00:00:43
copied it and renamed it part 5 bar
00:00:46
charts and what we can see here is the
00:00:48
titanic data set
00:00:49
which you've been using throughout this
00:00:51
series
00:00:53
so we're going to create a bar chart and
00:00:55
the easiest way to create bar charts
00:00:57
when you're doing exploratory data
00:00:59
analysis with excel
00:01:01
is to create a bar chart as a pivot
00:01:04
chart
00:01:04
create a bar chart from a pivot table it
00:01:06
makes things a lot easier
00:01:08
you can drag and drop all the kinds of
00:01:09
stuff you can do a lot of exploratory
00:01:11
analyses very quickly
00:01:13
using a bar chart created from a pivot
00:01:16
table so that's what we're going to do
00:01:17
first up we're going to go ahead
00:01:19
and insert a pivot table and i'm going
00:01:22
to put it in the existing worksheet here
00:01:25
and let's just go ahead and drop it in
00:01:28
right
00:01:28
here okay
00:01:31
boom and we're gonna go ahead and scroll
00:01:34
over
00:01:35
here actually i'm gonna go ahead and
00:01:38
maybe shrink up excel a little bit
00:01:40
so that my smiling face doesn't cover it
00:01:42
up so much
00:01:44
so we've got a bear pivot table it's
00:01:47
empty
00:01:47
so one of the first things that we would
00:01:49
like to explore in this data set
00:01:51
is obviously survival rates
00:01:54
because that's that's the business
00:01:56
question that we're trying to answer
00:01:57
with this data
00:01:58
what patterns in the data are highly
00:02:00
associated with
00:02:02
passengers on the titanic that survived
00:02:04
that's what we're looking for
00:02:06
pivot tables work with categorical data
00:02:09
they don't work with numeric data we've
00:02:11
looked at histograms already we've
00:02:12
looked at
00:02:13
box and whisker plots box plots in this
00:02:15
series those are great
00:02:17
for working with numeric data the next
00:02:19
video in this series will be working
00:02:21
with scatter plots which is also for
00:02:23
working with numeric data
00:02:24
but bar charts are really about
00:02:26
categories
00:02:27
and counting things so we've got
00:02:30
four columns of data in the data set
00:02:33
already that
00:02:34
are categorical we've got survived
00:02:37
p-class right which class of ticket you
00:02:39
have on the titanic
00:02:41
your gender male or female as evidenced
00:02:44
in the data by the sex
00:02:45
column the sex feature and lastly we
00:02:47
have embarked
00:02:48
of what port did you get on the titanic
00:02:51
or did the passenger get on the titanic
00:02:53
so let's create a bar chart with that
00:02:55
stuff
00:02:56
so first up what we'll do is we'll drag
00:02:59
sex down to the rows here
00:03:02
and then we'll drag down new p class to
00:03:05
create a hierarchy here and then lastly
00:03:07
we will drag down
00:03:08
embarked and then we get a nice little
00:03:12
pivot table let me
00:03:12
hide the ribbon there so you can see it
00:03:14
all right
00:03:16
and now we'll throw in survived because
00:03:19
that's what we're interested in right we
00:03:20
want to know
00:03:21
if there are any patterns in these
00:03:23
characteristics of the data
00:03:24
that are highly associated with survival
00:03:26
so we'll go ahead and drag
00:03:28
new survived to the columns and then
00:03:31
we'll drag it down to the values as well
00:03:34
i'm going to close some of this up real
00:03:35
quick so close up first and second and
00:03:38
third class
00:03:39
because the first thing i want to show
00:03:42
you
00:03:43
is when you use bar charts you should
00:03:46
use
00:03:46
two forms of a bar chart and we'll use
00:03:49
both in this video
00:03:50
one you want a bar chart that shows you
00:03:53
the absolute
00:03:53
counts and then you want a bar chart
00:03:56
that shows you the proportions
00:03:58
and here's the reason why if you only
00:04:00
work with bar charts that show
00:04:01
proportions which are very cool and very
00:04:04
useful
00:04:05
they mask where the gravity of the data
00:04:09
is located
00:04:10
when what i mean by that is where are
00:04:12
the most number
00:04:14
of individual rows of data located
00:04:16
proportions
00:04:17
smooth all that out you don't know if
00:04:20
this bar chart proportion
00:04:22
is for 100 records or 100 rows of data
00:04:24
and this proportion is for a thousand
00:04:26
rows
00:04:27
but generally speaking when you're doing
00:04:28
business analysis when you're doing
00:04:30
exploratory analysis with business data
00:04:32
you tend to want to focus on groups
00:04:34
chunks of data that have the most rows
00:04:36
because those typically have the most
00:04:37
impact
00:04:38
so that's why you want both kinds of bar
00:04:39
charts one with counts
00:04:41
and one with proportions and you can see
00:04:44
that here in the quick pivot table that
00:04:46
i've got
00:04:47
so you can see here we have 891
00:04:50
total rows of data that's what we have
00:04:53
but notice that more than a third of all
00:04:57
of the rows of data in the data set
00:04:59
347 of them to be exact are males
00:05:03
in third class so what that tells you is
00:05:06
that
00:05:07
from a center of gravity perspective
00:05:09
third class males are extremely
00:05:11
important because they represent
00:05:12
more than a third of all the data right
00:05:14
so that
00:05:15
you would lose that if you just looked
00:05:16
at a bar chart was that simply had
00:05:18
proportions on it
00:05:19
okay enough about that so let's go ahead
00:05:21
and expand this back
00:05:23
out and we're going to go ahead and
00:05:26
create
00:05:26
a pivot chart and since we don't need to
00:05:28
see this anymore i'm just going to go
00:05:29
ahead and maximize excel again
00:05:32
and let's go ahead and insert a pivot
00:05:35
chart a bar chart
00:05:36
created from this pivot table so we're
00:05:37
going to go ahead and insert
00:05:39
and we're going to go up to pivot chart
00:05:42
and
00:05:43
select pivot chart so the first thing
00:05:45
that we want is we want
00:05:47
counts so the easiest way to do that is
00:05:49
to go with a stacked
00:05:51
column okay so we're going to go with
00:05:53
stack column here
00:05:54
click ok and we get a nice
00:05:58
bar charger i'm going to go ahead and
00:06:00
move this down so we get some more real
00:06:01
estate here it's going to be pretty cool
00:06:05
move it down all right i'll make this
00:06:08
bigger so we can see it
00:06:10
awesome so i'm just going to get rid of
00:06:11
this because i think it just you know
00:06:14
just makes things a little more
00:06:15
complicated okay awesome so we've got a
00:06:18
bar chart here
00:06:18
and what the bar chart is showing us is
00:06:21
counts so for example this is females in
00:06:24
first class
00:06:26
first class that got on the ship in
00:06:28
sherborg
00:06:29
in france and we can see here they
00:06:32
basically
00:06:32
looks like all of them except for maybe
00:06:34
one survived and it was around maybe
00:06:37
40-something people 40-something
00:06:40
passengers that fall
00:06:41
in this particular category and just
00:06:43
generally speaking we're going to be
00:06:44
looking at
00:06:45
two things in this particular
00:06:47
visualization one
00:06:48
we're going to be looking at the
00:06:49
relative proportion of the colored bars
00:06:52
right because orange means that they
00:06:53
survived and blue means that they
00:06:55
perished
00:06:56
so that gives us some general indication
00:06:58
of the survival rates
00:07:00
and we're also going to look at the
00:07:02
length of the bar because that tells us
00:07:03
how many
00:07:04
observations how many rows of data where
00:07:06
the center of gravity is and as
00:07:08
not surprisingly we can see that males
00:07:11
in third class they got on in
00:07:13
um southampton look at that more than
00:07:16
250 of them and very few survived
00:07:18
so this visualization right here tells
00:07:20
us a lot
00:07:22
it says okay look we got a lot of males
00:07:25
in third class
00:07:26
and they don't survive and it doesn't
00:07:28
really seem like proportion wise
00:07:31
any particular place where a third class
00:07:34
male passenger got on the titanic
00:07:36
matters because there are the orange
00:07:39
portions of each of these bars is very
00:07:41
very thin
00:07:41
it's very skinny you say okay cool
00:07:45
we already kind of know that females in
00:07:47
first and second class overwhelmingly
00:07:49
survive
00:07:50
no matter where they got on the ship it
00:07:51
looks like third class females
00:07:54
okay it's kind of interesting you see
00:07:56
that sherberg in uh
00:07:58
queenstown i believe that this is they
00:07:59
seem to have much better proportions
00:08:02
than those that got on in sherborg
00:08:05
excuse me southampton excuse me
00:08:07
southampton s stands for southampton
00:08:11
and you can see here a lot going on this
00:08:14
is a great
00:08:14
data visualization now what really will
00:08:17
make this pop in terms of
00:08:18
proportions actually answering the
00:08:20
questions of like which
00:08:22
which segment of the data overwhelmingly
00:08:24
like will just jump out to your
00:08:26
eye that they survived is creating a
00:08:28
proportions chart
00:08:30
um which which is known in excel is a
00:08:32
stacked bar
00:08:33
chart so once again we'll just go up to
00:08:35
here we'll click insert
00:08:37
we're going to do a pivot chart and
00:08:39
we're gonna do stack this time
00:08:43
boom and you can see already even
00:08:46
without me
00:08:47
increasing the size of the chart that
00:08:50
obviously the orange dominates over here
00:08:53
which is all
00:08:54
females and especially females in first
00:08:55
and second class so inside all this
00:08:57
stuff
00:08:58
just to have more real estate
00:09:01
and now we can see the proportions
00:09:05
this is really cool right and what we
00:09:06
can do now that we've got these pivot
00:09:08
charts is we can obviously take things
00:09:09
in and out so like for example we can
00:09:11
remove embarked
00:09:13
and we can just look at first class and
00:09:14
class third class males
00:09:16
or we can put embarked back in
00:09:20
and then get rid of p class
00:09:23
see oh yeah look at that right so you do
00:09:26
these kinds of things are pretty useful
00:09:28
right creating pivot charts and being
00:09:30
able to quickly and easily
00:09:31
move data in and out is one of the
00:09:33
hallmarks of doing exploratory data
00:09:35
analysis in excel right
00:09:36
especially with pivot charts this is
00:09:37
wildly awesome stuff so let's put a new
00:09:40
p
00:09:40
class back in now this is a great data
00:09:43
visualization
00:09:45
because it incorporates four dimensions
00:09:48
at the same time
00:09:49
we've got our survival right orange or
00:09:51
blue that's one dimension
00:09:52
we have male versus female that's our
00:09:55
second dimension
00:09:56
we've got p-class for second or third
00:09:58
that's our third dimension and lastly we
00:10:00
have embarked where you got on the ship
00:10:02
that's fourth it's four dimensions so
00:10:04
this is a pretty powerful visualization
00:10:07
unfortunately one of the things with
00:10:11
excel and the way it chooses to do
00:10:12
visualizations
00:10:14
is at least at least to me anyway is
00:10:17
that
00:10:17
they get a little unwieldy to look at
00:10:20
and understand what's going on because
00:10:22
of the way they structure the actual
00:10:23
visualization
00:10:25
so three dimensions isn't too bad so i'm
00:10:27
going to remove it embarked again
00:10:29
this isn't too bad right this is three
00:10:31
dimensions we got males here
00:10:32
females first second and third and then
00:10:34
the color coding of course is our
00:10:36
third dimension of survived this is a
00:10:38
pretty
00:10:40
decent visualization however
00:10:43
if you're looking for more power
00:10:46
this is a little bit unfortunate because
00:10:47
you want to be able to add more
00:10:48
dimensions and then but still have the
00:10:50
actual resulting visualization
00:10:53
really work well for you as a data
00:10:56
analyst so you can just kind of look and
00:10:58
just kind of sit back
00:10:59
and you just kind of look at it and see
00:11:01
you know what's going on
00:11:03
and when we add dimensions here i put it
00:11:05
barked back in it gets more and more
00:11:06
complicated
00:11:07
so let me give you an example of what
00:11:08
i'm talking about regarding
00:11:11
something that's a little bit better in
00:11:13
terms of using bar charts
00:11:14
it also works with lots of dimensions
00:11:17
okay so this is a good example of what
00:11:19
i'm talking about here so this
00:11:21
is a four dimensional bar chart
00:11:24
you can see here we have males and
00:11:27
females
00:11:28
males and females males and females
00:11:30
males and females
00:11:31
that's our first dimension we have third
00:11:34
class folks
00:11:35
we have second class folks and we got
00:11:38
first class folks
00:11:39
so that's our second dimension we've got
00:11:41
where
00:11:42
folks got on the ship this is embarked
00:11:44
so it's our third dimension and then of
00:11:45
course our color coding
00:11:47
is our fourth dimension and i hope you
00:11:49
would agree that this
00:11:50
representation of the data is superior
00:11:54
to what you would get in excel what we
00:11:56
saw in excel
00:11:57
this was created using the r programming
00:11:59
language
00:12:01
and it allows you to quickly and easily
00:12:02
create super awesome data visualizations
00:12:05
like this
00:12:06
so this is the counts you can see here
00:12:07
we have passenger account on the
00:12:09
y-axis here and you can quickly and
00:12:12
easily see
00:12:12
what's going on in the data it's a
00:12:14
little bit i would argue this grid
00:12:16
representation is a lot more
00:12:19
powerful than what excel does out of the
00:12:20
box now i can also show you the
00:12:22
proportions here so this is the
00:12:24
proportion
00:12:25
chart so these are the two equivalent
00:12:26
charts that we saw in excel once again
00:12:28
the grid
00:12:29
i think works a lot better than the way
00:12:32
excel does it in terms of just like
00:12:34
putting all the bars
00:12:35
along the x-axis so this
00:12:39
is good stuff powerful stuff and it
00:12:41
really catches your eye so bar charts
00:12:42
like this especially multi-dimensional
00:12:44
bar charts
00:12:45
are awesome they're one of the best ways
00:12:47
to create
00:12:48
insightful data visualizations and
00:12:50
explore your
00:12:52
data by the way just so that you know
00:12:55
i have an online program that i teach
00:12:58
which takes
00:12:59
your skills as an excel user and teaches
00:13:02
you how to do our programming and create
00:13:04
data visualizations just like this
00:13:06
and if you're interested in learning
00:13:07
more about that you just go ahead and
00:13:08
click up here and i've got a video on my
00:13:10
channel that will talk
00:13:11
that talks all about that bar charts
00:13:15
totally awesome data visualization if
00:13:18
you're going to be serious about working
00:13:19
with your business data
00:13:21
exploring it understanding what's going
00:13:22
on bar charts
00:13:24
needs to be a tool in your visualization
00:13:27
tool belt without a doubt
00:13:28
so next up in the series as i mentioned
00:13:30
earlier we're going to be talking about
00:13:31
scatter plots
00:13:32
which are a data visualization where you
00:13:36
have numeric
00:13:36
features on both the x-axis and on the
00:13:39
y-axis
00:13:40
and then they're super powerful when you
00:13:42
color code the dots you add a third
00:13:44
dimension to them and that's exactly
00:13:45
what we will do
00:13:46
in the next video and when that's ready
00:13:48
that'll show up either here or here
00:13:51
on the video here and you just click the
00:13:53
card and it'll take you to that video
00:13:55
when it's ready
00:13:56
there you have it part five of
00:13:58
exploratory data analysis with excel
00:14:00
bar charts until next time please stay
00:14:03
healthy
00:14:04
and i wish you very happy data smoothing