Exploratory Data Analysis With Excel - Part 5 - Bar Charts

00:14:08
https://www.youtube.com/watch?v=pNN1-IfsOF0

Summary

TLDRIn this video, part five of a series on exploratory data analysis with Excel, the presenter demonstrates how to create bar charts using the Titanic dataset to explore survival patterns among passengers. The video covers the steps to set up a pivot table and then create pivot charts to visualize both counts and proportions of survivors grouped by gender, ticket class, and embarkation ports. The importance of using both types of bar charts for meaningful data analysis is emphasized. The video also compares Excel’s capabilities with R programming for superior data visualizations and previews the next tutorial on scatter plots.

Takeaways

  • πŸ“Š Use bar charts for categorical data analysis.
  • πŸ” Pivot tables make exploratory analysis easier.
  • 🚒 Titanic dataset reveals survival patterns by class.
  • πŸ”’ Absolute counts and proportions provide insights.
  • πŸ‘©β€πŸ‘¦ Female survivors predominantly from higher classes.
  • 🚫 Avoid relying solely on proportions; counts are crucial.
  • πŸ“‰ R programming offers better visualization capabilities.
  • πŸŽ“ Learn to create powerful data visualizations.
  • πŸ”— Access workbooks via GitHub for hands-on practice.
  • πŸ”œ Next video: Understanding scatter plots.

Timeline

  • 00:00:00 - 00:05:00

    In this video, the focus is on creating bar charts using Excel for exploratory data analysis, specifically with the Titanic dataset. The video begins by discussing the necessity and benefits of pivot charts for visualizing categorical data, emphasizing how they simplify the process of analyzing patterns in survival rates among Titanic passengers. The presenter explains how to create a pivot table, detailing the importance of using both counts and proportions to uncover insights about the data, particularly in terms of identifying which groups of passengers had the highest number of records, thereby influencing analysis outcome.

  • 00:05:00 - 00:14:08

    The second part of the video demonstrates the creation of a stacked column bar chart and a stacked bar chart for visualizing survival rates of Titanic passengers. The presenter highlights the importance of examining both the counts of passengers and their survival proportions, showcasing how to manipulate the data in pivot charts for deeper insight. Excel is contrasted with R programming for visualization efficiency, emphasizing the clarity and advantage of multi-dimensional data representations in exploratory analysis, while suggesting a course for those interested in enhancing their data visualization skills.

Mind Map

Video Q&A

  • What is the focus of part five of the EDA series?

    It focuses on creating and using bar charts in Excel to explore the Titanic dataset.

  • How can I access the workbooks for this series?

    You can download the workbooks from a link in the video description that leads to a GitHub repository.

  • What are the types of data visualizations covered in previous videos?

    Previous videos covered histograms, box plots, and the upcoming video will cover scatter plots.

  • Why should I use both counts and proportions in bar charts?

    Counts show the gravity of the data while proportions provide context, helping to identify key patterns.

  • What programming language is suggested for advanced data visualizations?

    The viewer is encouraged to consider R programming for creating more powerful and insightful visualizations.

  • What will the next video cover?

    The next video will cover scatter plots and how to visualize numeric data effectively.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:00
    welcome to part five of my series
  • 00:00:03
    on exploratory data analysis with excel
  • 00:00:06
    the subject of this video
  • 00:00:07
    is going to be creating and using bar
  • 00:00:10
    charts in excel
  • 00:00:11
    to explore a data set so if you've
  • 00:00:14
    reached this video prematurely
  • 00:00:16
    if you need to see video one of the
  • 00:00:18
    series go ahead and click up here
  • 00:00:20
    and you can go ahead and find that video
  • 00:00:21
    right there next up
  • 00:00:23
    if you want the workbooks in this series
  • 00:00:25
    you can get them by
  • 00:00:27
    looking down in the description there'll
  • 00:00:28
    be a link to a github repo
  • 00:00:31
    where you can download all of the excel
  • 00:00:32
    workbooks in this series
  • 00:00:34
    okay let's go ahead and get started okay
  • 00:00:37
    you can see here i'm in excel
  • 00:00:39
    and all i've done is taken one of the
  • 00:00:42
    worksheets here
  • 00:00:43
    copied it and renamed it part 5 bar
  • 00:00:46
    charts and what we can see here is the
  • 00:00:48
    titanic data set
  • 00:00:49
    which you've been using throughout this
  • 00:00:51
    series
  • 00:00:53
    so we're going to create a bar chart and
  • 00:00:55
    the easiest way to create bar charts
  • 00:00:57
    when you're doing exploratory data
  • 00:00:59
    analysis with excel
  • 00:01:01
    is to create a bar chart as a pivot
  • 00:01:04
    chart
  • 00:01:04
    create a bar chart from a pivot table it
  • 00:01:06
    makes things a lot easier
  • 00:01:08
    you can drag and drop all the kinds of
  • 00:01:09
    stuff you can do a lot of exploratory
  • 00:01:11
    analyses very quickly
  • 00:01:13
    using a bar chart created from a pivot
  • 00:01:16
    table so that's what we're going to do
  • 00:01:17
    first up we're going to go ahead
  • 00:01:19
    and insert a pivot table and i'm going
  • 00:01:22
    to put it in the existing worksheet here
  • 00:01:25
    and let's just go ahead and drop it in
  • 00:01:28
    right
  • 00:01:28
    here okay
  • 00:01:31
    boom and we're gonna go ahead and scroll
  • 00:01:34
    over
  • 00:01:35
    here actually i'm gonna go ahead and
  • 00:01:38
    maybe shrink up excel a little bit
  • 00:01:40
    so that my smiling face doesn't cover it
  • 00:01:42
    up so much
  • 00:01:44
    so we've got a bear pivot table it's
  • 00:01:47
    empty
  • 00:01:47
    so one of the first things that we would
  • 00:01:49
    like to explore in this data set
  • 00:01:51
    is obviously survival rates
  • 00:01:54
    because that's that's the business
  • 00:01:56
    question that we're trying to answer
  • 00:01:57
    with this data
  • 00:01:58
    what patterns in the data are highly
  • 00:02:00
    associated with
  • 00:02:02
    passengers on the titanic that survived
  • 00:02:04
    that's what we're looking for
  • 00:02:06
    pivot tables work with categorical data
  • 00:02:09
    they don't work with numeric data we've
  • 00:02:11
    looked at histograms already we've
  • 00:02:12
    looked at
  • 00:02:13
    box and whisker plots box plots in this
  • 00:02:15
    series those are great
  • 00:02:17
    for working with numeric data the next
  • 00:02:19
    video in this series will be working
  • 00:02:21
    with scatter plots which is also for
  • 00:02:23
    working with numeric data
  • 00:02:24
    but bar charts are really about
  • 00:02:26
    categories
  • 00:02:27
    and counting things so we've got
  • 00:02:30
    four columns of data in the data set
  • 00:02:33
    already that
  • 00:02:34
    are categorical we've got survived
  • 00:02:37
    p-class right which class of ticket you
  • 00:02:39
    have on the titanic
  • 00:02:41
    your gender male or female as evidenced
  • 00:02:44
    in the data by the sex
  • 00:02:45
    column the sex feature and lastly we
  • 00:02:47
    have embarked
  • 00:02:48
    of what port did you get on the titanic
  • 00:02:51
    or did the passenger get on the titanic
  • 00:02:53
    so let's create a bar chart with that
  • 00:02:55
    stuff
  • 00:02:56
    so first up what we'll do is we'll drag
  • 00:02:59
    sex down to the rows here
  • 00:03:02
    and then we'll drag down new p class to
  • 00:03:05
    create a hierarchy here and then lastly
  • 00:03:07
    we will drag down
  • 00:03:08
    embarked and then we get a nice little
  • 00:03:12
    pivot table let me
  • 00:03:12
    hide the ribbon there so you can see it
  • 00:03:14
    all right
  • 00:03:16
    and now we'll throw in survived because
  • 00:03:19
    that's what we're interested in right we
  • 00:03:20
    want to know
  • 00:03:21
    if there are any patterns in these
  • 00:03:23
    characteristics of the data
  • 00:03:24
    that are highly associated with survival
  • 00:03:26
    so we'll go ahead and drag
  • 00:03:28
    new survived to the columns and then
  • 00:03:31
    we'll drag it down to the values as well
  • 00:03:34
    i'm going to close some of this up real
  • 00:03:35
    quick so close up first and second and
  • 00:03:38
    third class
  • 00:03:39
    because the first thing i want to show
  • 00:03:42
    you
  • 00:03:43
    is when you use bar charts you should
  • 00:03:46
    use
  • 00:03:46
    two forms of a bar chart and we'll use
  • 00:03:49
    both in this video
  • 00:03:50
    one you want a bar chart that shows you
  • 00:03:53
    the absolute
  • 00:03:53
    counts and then you want a bar chart
  • 00:03:56
    that shows you the proportions
  • 00:03:58
    and here's the reason why if you only
  • 00:04:00
    work with bar charts that show
  • 00:04:01
    proportions which are very cool and very
  • 00:04:04
    useful
  • 00:04:05
    they mask where the gravity of the data
  • 00:04:09
    is located
  • 00:04:10
    when what i mean by that is where are
  • 00:04:12
    the most number
  • 00:04:14
    of individual rows of data located
  • 00:04:16
    proportions
  • 00:04:17
    smooth all that out you don't know if
  • 00:04:20
    this bar chart proportion
  • 00:04:22
    is for 100 records or 100 rows of data
  • 00:04:24
    and this proportion is for a thousand
  • 00:04:26
    rows
  • 00:04:27
    but generally speaking when you're doing
  • 00:04:28
    business analysis when you're doing
  • 00:04:30
    exploratory analysis with business data
  • 00:04:32
    you tend to want to focus on groups
  • 00:04:34
    chunks of data that have the most rows
  • 00:04:36
    because those typically have the most
  • 00:04:37
    impact
  • 00:04:38
    so that's why you want both kinds of bar
  • 00:04:39
    charts one with counts
  • 00:04:41
    and one with proportions and you can see
  • 00:04:44
    that here in the quick pivot table that
  • 00:04:46
    i've got
  • 00:04:47
    so you can see here we have 891
  • 00:04:50
    total rows of data that's what we have
  • 00:04:53
    but notice that more than a third of all
  • 00:04:57
    of the rows of data in the data set
  • 00:04:59
    347 of them to be exact are males
  • 00:05:03
    in third class so what that tells you is
  • 00:05:06
    that
  • 00:05:07
    from a center of gravity perspective
  • 00:05:09
    third class males are extremely
  • 00:05:11
    important because they represent
  • 00:05:12
    more than a third of all the data right
  • 00:05:14
    so that
  • 00:05:15
    you would lose that if you just looked
  • 00:05:16
    at a bar chart was that simply had
  • 00:05:18
    proportions on it
  • 00:05:19
    okay enough about that so let's go ahead
  • 00:05:21
    and expand this back
  • 00:05:23
    out and we're going to go ahead and
  • 00:05:26
    create
  • 00:05:26
    a pivot chart and since we don't need to
  • 00:05:28
    see this anymore i'm just going to go
  • 00:05:29
    ahead and maximize excel again
  • 00:05:32
    and let's go ahead and insert a pivot
  • 00:05:35
    chart a bar chart
  • 00:05:36
    created from this pivot table so we're
  • 00:05:37
    going to go ahead and insert
  • 00:05:39
    and we're going to go up to pivot chart
  • 00:05:42
    and
  • 00:05:43
    select pivot chart so the first thing
  • 00:05:45
    that we want is we want
  • 00:05:47
    counts so the easiest way to do that is
  • 00:05:49
    to go with a stacked
  • 00:05:51
    column okay so we're going to go with
  • 00:05:53
    stack column here
  • 00:05:54
    click ok and we get a nice
  • 00:05:58
    bar charger i'm going to go ahead and
  • 00:06:00
    move this down so we get some more real
  • 00:06:01
    estate here it's going to be pretty cool
  • 00:06:05
    move it down all right i'll make this
  • 00:06:08
    bigger so we can see it
  • 00:06:10
    awesome so i'm just going to get rid of
  • 00:06:11
    this because i think it just you know
  • 00:06:14
    just makes things a little more
  • 00:06:15
    complicated okay awesome so we've got a
  • 00:06:18
    bar chart here
  • 00:06:18
    and what the bar chart is showing us is
  • 00:06:21
    counts so for example this is females in
  • 00:06:24
    first class
  • 00:06:26
    first class that got on the ship in
  • 00:06:28
    sherborg
  • 00:06:29
    in france and we can see here they
  • 00:06:32
    basically
  • 00:06:32
    looks like all of them except for maybe
  • 00:06:34
    one survived and it was around maybe
  • 00:06:37
    40-something people 40-something
  • 00:06:40
    passengers that fall
  • 00:06:41
    in this particular category and just
  • 00:06:43
    generally speaking we're going to be
  • 00:06:44
    looking at
  • 00:06:45
    two things in this particular
  • 00:06:47
    visualization one
  • 00:06:48
    we're going to be looking at the
  • 00:06:49
    relative proportion of the colored bars
  • 00:06:52
    right because orange means that they
  • 00:06:53
    survived and blue means that they
  • 00:06:55
    perished
  • 00:06:56
    so that gives us some general indication
  • 00:06:58
    of the survival rates
  • 00:07:00
    and we're also going to look at the
  • 00:07:02
    length of the bar because that tells us
  • 00:07:03
    how many
  • 00:07:04
    observations how many rows of data where
  • 00:07:06
    the center of gravity is and as
  • 00:07:08
    not surprisingly we can see that males
  • 00:07:11
    in third class they got on in
  • 00:07:13
    um southampton look at that more than
  • 00:07:16
    250 of them and very few survived
  • 00:07:18
    so this visualization right here tells
  • 00:07:20
    us a lot
  • 00:07:22
    it says okay look we got a lot of males
  • 00:07:25
    in third class
  • 00:07:26
    and they don't survive and it doesn't
  • 00:07:28
    really seem like proportion wise
  • 00:07:31
    any particular place where a third class
  • 00:07:34
    male passenger got on the titanic
  • 00:07:36
    matters because there are the orange
  • 00:07:39
    portions of each of these bars is very
  • 00:07:41
    very thin
  • 00:07:41
    it's very skinny you say okay cool
  • 00:07:45
    we already kind of know that females in
  • 00:07:47
    first and second class overwhelmingly
  • 00:07:49
    survive
  • 00:07:50
    no matter where they got on the ship it
  • 00:07:51
    looks like third class females
  • 00:07:54
    okay it's kind of interesting you see
  • 00:07:56
    that sherberg in uh
  • 00:07:58
    queenstown i believe that this is they
  • 00:07:59
    seem to have much better proportions
  • 00:08:02
    than those that got on in sherborg
  • 00:08:05
    excuse me southampton excuse me
  • 00:08:07
    southampton s stands for southampton
  • 00:08:11
    and you can see here a lot going on this
  • 00:08:14
    is a great
  • 00:08:14
    data visualization now what really will
  • 00:08:17
    make this pop in terms of
  • 00:08:18
    proportions actually answering the
  • 00:08:20
    questions of like which
  • 00:08:22
    which segment of the data overwhelmingly
  • 00:08:24
    like will just jump out to your
  • 00:08:26
    eye that they survived is creating a
  • 00:08:28
    proportions chart
  • 00:08:30
    um which which is known in excel is a
  • 00:08:32
    stacked bar
  • 00:08:33
    chart so once again we'll just go up to
  • 00:08:35
    here we'll click insert
  • 00:08:37
    we're going to do a pivot chart and
  • 00:08:39
    we're gonna do stack this time
  • 00:08:43
    boom and you can see already even
  • 00:08:46
    without me
  • 00:08:47
    increasing the size of the chart that
  • 00:08:50
    obviously the orange dominates over here
  • 00:08:53
    which is all
  • 00:08:54
    females and especially females in first
  • 00:08:55
    and second class so inside all this
  • 00:08:57
    stuff
  • 00:08:58
    just to have more real estate
  • 00:09:01
    and now we can see the proportions
  • 00:09:05
    this is really cool right and what we
  • 00:09:06
    can do now that we've got these pivot
  • 00:09:08
    charts is we can obviously take things
  • 00:09:09
    in and out so like for example we can
  • 00:09:11
    remove embarked
  • 00:09:13
    and we can just look at first class and
  • 00:09:14
    class third class males
  • 00:09:16
    or we can put embarked back in
  • 00:09:20
    and then get rid of p class
  • 00:09:23
    see oh yeah look at that right so you do
  • 00:09:26
    these kinds of things are pretty useful
  • 00:09:28
    right creating pivot charts and being
  • 00:09:30
    able to quickly and easily
  • 00:09:31
    move data in and out is one of the
  • 00:09:33
    hallmarks of doing exploratory data
  • 00:09:35
    analysis in excel right
  • 00:09:36
    especially with pivot charts this is
  • 00:09:37
    wildly awesome stuff so let's put a new
  • 00:09:40
    p
  • 00:09:40
    class back in now this is a great data
  • 00:09:43
    visualization
  • 00:09:45
    because it incorporates four dimensions
  • 00:09:48
    at the same time
  • 00:09:49
    we've got our survival right orange or
  • 00:09:51
    blue that's one dimension
  • 00:09:52
    we have male versus female that's our
  • 00:09:55
    second dimension
  • 00:09:56
    we've got p-class for second or third
  • 00:09:58
    that's our third dimension and lastly we
  • 00:10:00
    have embarked where you got on the ship
  • 00:10:02
    that's fourth it's four dimensions so
  • 00:10:04
    this is a pretty powerful visualization
  • 00:10:07
    unfortunately one of the things with
  • 00:10:11
    excel and the way it chooses to do
  • 00:10:12
    visualizations
  • 00:10:14
    is at least at least to me anyway is
  • 00:10:17
    that
  • 00:10:17
    they get a little unwieldy to look at
  • 00:10:20
    and understand what's going on because
  • 00:10:22
    of the way they structure the actual
  • 00:10:23
    visualization
  • 00:10:25
    so three dimensions isn't too bad so i'm
  • 00:10:27
    going to remove it embarked again
  • 00:10:29
    this isn't too bad right this is three
  • 00:10:31
    dimensions we got males here
  • 00:10:32
    females first second and third and then
  • 00:10:34
    the color coding of course is our
  • 00:10:36
    third dimension of survived this is a
  • 00:10:38
    pretty
  • 00:10:40
    decent visualization however
  • 00:10:43
    if you're looking for more power
  • 00:10:46
    this is a little bit unfortunate because
  • 00:10:47
    you want to be able to add more
  • 00:10:48
    dimensions and then but still have the
  • 00:10:50
    actual resulting visualization
  • 00:10:53
    really work well for you as a data
  • 00:10:56
    analyst so you can just kind of look and
  • 00:10:58
    just kind of sit back
  • 00:10:59
    and you just kind of look at it and see
  • 00:11:01
    you know what's going on
  • 00:11:03
    and when we add dimensions here i put it
  • 00:11:05
    barked back in it gets more and more
  • 00:11:06
    complicated
  • 00:11:07
    so let me give you an example of what
  • 00:11:08
    i'm talking about regarding
  • 00:11:11
    something that's a little bit better in
  • 00:11:13
    terms of using bar charts
  • 00:11:14
    it also works with lots of dimensions
  • 00:11:17
    okay so this is a good example of what
  • 00:11:19
    i'm talking about here so this
  • 00:11:21
    is a four dimensional bar chart
  • 00:11:24
    you can see here we have males and
  • 00:11:27
    females
  • 00:11:28
    males and females males and females
  • 00:11:30
    males and females
  • 00:11:31
    that's our first dimension we have third
  • 00:11:34
    class folks
  • 00:11:35
    we have second class folks and we got
  • 00:11:38
    first class folks
  • 00:11:39
    so that's our second dimension we've got
  • 00:11:41
    where
  • 00:11:42
    folks got on the ship this is embarked
  • 00:11:44
    so it's our third dimension and then of
  • 00:11:45
    course our color coding
  • 00:11:47
    is our fourth dimension and i hope you
  • 00:11:49
    would agree that this
  • 00:11:50
    representation of the data is superior
  • 00:11:54
    to what you would get in excel what we
  • 00:11:56
    saw in excel
  • 00:11:57
    this was created using the r programming
  • 00:11:59
    language
  • 00:12:01
    and it allows you to quickly and easily
  • 00:12:02
    create super awesome data visualizations
  • 00:12:05
    like this
  • 00:12:06
    so this is the counts you can see here
  • 00:12:07
    we have passenger account on the
  • 00:12:09
    y-axis here and you can quickly and
  • 00:12:12
    easily see
  • 00:12:12
    what's going on in the data it's a
  • 00:12:14
    little bit i would argue this grid
  • 00:12:16
    representation is a lot more
  • 00:12:19
    powerful than what excel does out of the
  • 00:12:20
    box now i can also show you the
  • 00:12:22
    proportions here so this is the
  • 00:12:24
    proportion
  • 00:12:25
    chart so these are the two equivalent
  • 00:12:26
    charts that we saw in excel once again
  • 00:12:28
    the grid
  • 00:12:29
    i think works a lot better than the way
  • 00:12:32
    excel does it in terms of just like
  • 00:12:34
    putting all the bars
  • 00:12:35
    along the x-axis so this
  • 00:12:39
    is good stuff powerful stuff and it
  • 00:12:41
    really catches your eye so bar charts
  • 00:12:42
    like this especially multi-dimensional
  • 00:12:44
    bar charts
  • 00:12:45
    are awesome they're one of the best ways
  • 00:12:47
    to create
  • 00:12:48
    insightful data visualizations and
  • 00:12:50
    explore your
  • 00:12:52
    data by the way just so that you know
  • 00:12:55
    i have an online program that i teach
  • 00:12:58
    which takes
  • 00:12:59
    your skills as an excel user and teaches
  • 00:13:02
    you how to do our programming and create
  • 00:13:04
    data visualizations just like this
  • 00:13:06
    and if you're interested in learning
  • 00:13:07
    more about that you just go ahead and
  • 00:13:08
    click up here and i've got a video on my
  • 00:13:10
    channel that will talk
  • 00:13:11
    that talks all about that bar charts
  • 00:13:15
    totally awesome data visualization if
  • 00:13:18
    you're going to be serious about working
  • 00:13:19
    with your business data
  • 00:13:21
    exploring it understanding what's going
  • 00:13:22
    on bar charts
  • 00:13:24
    needs to be a tool in your visualization
  • 00:13:27
    tool belt without a doubt
  • 00:13:28
    so next up in the series as i mentioned
  • 00:13:30
    earlier we're going to be talking about
  • 00:13:31
    scatter plots
  • 00:13:32
    which are a data visualization where you
  • 00:13:36
    have numeric
  • 00:13:36
    features on both the x-axis and on the
  • 00:13:39
    y-axis
  • 00:13:40
    and then they're super powerful when you
  • 00:13:42
    color code the dots you add a third
  • 00:13:44
    dimension to them and that's exactly
  • 00:13:45
    what we will do
  • 00:13:46
    in the next video and when that's ready
  • 00:13:48
    that'll show up either here or here
  • 00:13:51
    on the video here and you just click the
  • 00:13:53
    card and it'll take you to that video
  • 00:13:55
    when it's ready
  • 00:13:56
    there you have it part five of
  • 00:13:58
    exploratory data analysis with excel
  • 00:14:00
    bar charts until next time please stay
  • 00:14:03
    healthy
  • 00:14:04
    and i wish you very happy data smoothing
Tags
  • Exploratory Data Analysis
  • Excel
  • Bar Charts
  • Titanic Dataset
  • Data Visualization
  • Pivot Table
  • Statistics
  • R Programming
  • Scatter Plots
  • Survival Rates