Python Pandas Tutorial (Part 1): Getting Started with Data Analysis - Installation and Loading Data

00:23:01
https://www.youtube.com/watch?v=ZyhVh-qRZPA

Resumo

TLDRThis video is part of a series teaching how to use the pandas library in Python for data analysis. Pandas allows for easy manipulation and analysis of data from various file types like CSV and Excel. The tutorial starts with installation instructions for pandas and Jupyter Notebook, a tool used for interactive data analysis. It demonstrates how to set up a working environment, including importing data and viewing its structure within a Jupyter Notebook. The video uses real-world data from the Stack Overflow Developer Survey for practical examples. The tutorial also provides detailed steps on how to install necessary tools, such as creating a virtual environment (optional), and setting up the system to begin with data reading and analysis. It briefly mentions the usefulness of the Jupyter interface for data visualization and how to adjust display settings to accommodate data inspection needs. This lays the foundation for performing various data manipulation operations using pandas in forthcoming videos. The session ends with an introduction of the dataset used for demonstrations and guidance on utilizing certain basic pandas functionalities.

Conclusões

  • 📚 Pandas is essential for data analysis in Python, especially for data science.
  • 🔧 Installation of pandas can be done using pip in a virtual environment or directly in your system.
  • 📊 Jupyter notebooks are a convenient tool for data visualization when using pandas.
  • 📂 Pandas can easily handle CSV and Excel files for data analysis.
  • 🔍 Understanding and setting up the data viewing options in Jupyter can help manage large datasets effectively.
  • 🗃️ The tutorial uses real-world data from Stack Overflow Developer Survey for demonstration.
  • 👨‍💻 Importing data into pandas is straightforward with 'pd.read_csv()'.
  • 🔍 Viewing dataframes in Jupyter offers interactive inspection through features like 'df.head()' and 'df.info()'.
  • 🛠️ Adjusting settings in Jupyter allows full inspection of dataframe columns and rows.
  • 📈 The video sets the stage for more advanced data manipulation techniques in pandas.

Linha do tempo

  • 00:00:00 - 00:05:00

    In this video series, we will be learning how to use the pandas library in Python, a crucial tool for data analysis. Pandas is highly popular due to its abilities to easily read and work with data in formats like CSV and Excel, and to perform data analysis efficiently. We will start by installing pandas, downloading the relevant dataset, and setting up a Jupyter notebook environment for coding and analysis. The video also mentions a sponsor, brilliant.org, urging viewers to check them out.

  • 00:05:00 - 00:10:00

    To begin setting up, we install pandas and Jupyter using pip commands. The speaker opts to use Jupyter notebooks despite some initial hesitation. The notebooks provide an advantageous interface to visualize data within the browser, which is useful when working with pandas. A project folder is created on the desktop, and the Stack Overflow developer survey data is downloaded and saved within it. This real-world data is chosen for its relatability and ability to maintain interest in the tutorial scenarios.

  • 00:10:00 - 00:15:00

    Files are organized within the project directory, and important files are identified such as the survey results CSV and its schema. The setup process is continued by navigating to the project directory via the terminal and initiating a Jupyter server, which runs locally in a browser. A new Jupyter notebook is created, and pandas is imported to begin data manipulation. The basics of loading and inspecting data with pandas are demonstrated, focusing on CSV file reading and initial data exploration.

  • 00:15:00 - 00:23:01

    Key methods for data exploration in pandas are showcased. The video demonstrates how to load data into a DataFrame and explores its contents using commands like `df.shape` and `df.info` to examine data structure and types. Adjustments are made to display settings in Jupyter to ensure all columns and pertinent data are viewable. The video introduces methods like `df.head()` and `df.tail()` for previewing data subsets. There's a brief mention of the sponsor, highlighting courses that supplement learning.

Mostrar mais

Mapa mental

Mind Map

Perguntas frequentes

  • What is the pandas library in Python?

    Pandas is a data analysis library in Python that allows users to easily read and manipulate different types of data.

  • How can I install pandas?

    You can install pandas using the command 'pip install pandas' in your terminal.

  • Why use Jupyter notebooks for working with pandas?

    Jupyter notebooks allow visualization of data directly in the browser, which helps in better understanding and analysis of the data.

  • What type of files can pandas read?

    Pandas can read various files types including CSV and Excel files.

  • What is the first step to start using pandas?

    The first step is to install pandas and import it in your Python script using 'import pandas as pd'.

  • How can you view all columns of a data frame in Jupyter notebooks?

    You can set the option using 'pd.set_option('display.max_columns', 85)' to display all the columns.

  • What dataset is used in this video series for analysis?

    The dataset used is the Stack Overflow Developer Survey data.

  • How can you see the shape of a data frame in pandas?

    You can use the 'shape' attribute like 'df.shape' to see the rows and columns of a data frame.

  • What method can be used to view the first few rows of a data frame?

    You can use the 'head()' method to view the first few rows of a data frame.

  • How can I learn more about using Jupyter notebooks?

    The creator of the video suggests a tutorial is available with more detailed instructions on using Jupyter notebooks.

Ver mais resumos de vídeos

Obtenha acesso instantâneo a resumos gratuitos de vídeos do YouTube com tecnologia de IA!
Legendas
en
Rolagem automática:
  • 00:00:00
    hey there how's it going everybody in
  • 00:00:01
    this series of videos we're going to be
  • 00:00:03
    learning how to use the pandas library
  • 00:00:04
    and Python so pandas is a data analysis
  • 00:00:07
    library that allows us to easily read in
  • 00:00:09
    and work with different types of data so
  • 00:00:12
    we can use this to analyze CSV files
  • 00:00:14
    Excel files and other similar formats so
  • 00:00:17
    if you're getting into the data science
  • 00:00:18
    field then this library is going to be
  • 00:00:20
    essential to learn it's one of the most
  • 00:00:22
    downloaded packages for Python and
  • 00:00:24
    that's for a great reason so not only
  • 00:00:26
    does it allow us to easily read in and
  • 00:00:28
    analyze data but it also has great
  • 00:00:30
    performance since it built on top of
  • 00:00:32
    numpy and we'll be learning how to do
  • 00:00:34
    different types of an analysis or if
  • 00:00:36
    data analysis in this series so in this
  • 00:00:38
    video we're going to be going over how
  • 00:00:40
    to get pandas installed how to download
  • 00:00:42
    the data that I'll be using for most of
  • 00:00:44
    this series and also how to get all of
  • 00:00:47
    this open in a jupiter notebook so that
  • 00:00:49
    we're ready to do some coding and
  • 00:00:50
    analysis now i'd also like to mention
  • 00:00:52
    that we do have a sponsor for the series
  • 00:00:54
    of videos and that is brilliant org so i
  • 00:00:57
    really want to thank brilliant for
  • 00:00:58
    sponsoring this series and it would be
  • 00:01:00
    great if you all can check them out
  • 00:01:01
    using the link in the description
  • 00:01:02
    section below and support the sponsors
  • 00:01:04
    and I'll talk more about their services
  • 00:01:06
    in just a bit so with that said let's go
  • 00:01:08
    ahead and get started so first of all
  • 00:01:10
    let's install pandas so I'm using a
  • 00:01:13
    clean virtual environment for this
  • 00:01:14
    series but you don't have to use a
  • 00:01:16
    virtual environment if you don't want to
  • 00:01:17
    if you don't know what a virtual
  • 00:01:19
    environment is and would like to learn
  • 00:01:21
    more about those then I'll be sure to
  • 00:01:23
    leave a link to my video on that topic
  • 00:01:25
    in the description section below if
  • 00:01:27
    anyone is interested so it's really easy
  • 00:01:30
    to install pandas here all we need to do
  • 00:01:32
    is say pip install pianist and we will
  • 00:01:37
    let this run through and once we have
  • 00:01:40
    pandas installed then let's also install
  • 00:01:43
    Jupiter so that we can use Jupiter
  • 00:01:45
    notebooks now I was a bit hesitant to
  • 00:01:48
    use Jupiter for this series because some
  • 00:01:50
    people find it difficult to get the hang
  • 00:01:52
    of but honestly if you're going to be
  • 00:01:54
    doing a lot of work with pandas then
  • 00:01:56
    it's definitely a nice tool to use for
  • 00:01:58
    this so now it's not necessary so you
  • 00:02:01
    should be able to follow along with this
  • 00:02:02
    series just fine if you're using a
  • 00:02:04
    regular editor but Jupiter notebooks
  • 00:02:06
    allows us to actually see our data more
  • 00:02:09
    easily by using the browser to print out
  • 00:02:11
    our data and tables that make it
  • 00:02:13
    year to visualize so I'm gonna use it in
  • 00:02:16
    the series but you don't have to in
  • 00:02:18
    order to follow along so to install
  • 00:02:20
    Jupiter I want to say pip install and
  • 00:02:24
    this is going to be Jupiter lab and this
  • 00:02:28
    is spelled Ju py ter la B Jupiter lab so
  • 00:02:34
    we'll get that installed now I'm not
  • 00:02:36
    going to go into a deep dive and how to
  • 00:02:38
    use Jupiter in this series I'm mainly
  • 00:02:40
    going to focus on pandas but if you'd
  • 00:02:42
    like a detailed overview of how to use
  • 00:02:44
    Jupiter then I do have a video on how to
  • 00:02:46
    use Jupiter in depth and I'll leave a
  • 00:02:48
    link to that video in the description
  • 00:02:49
    section below if anyone would like to
  • 00:02:52
    learn more about the details of using
  • 00:02:54
    that ok so now we have pandas and
  • 00:02:56
    Jupiter notebooks installed now we're
  • 00:02:58
    going to need to download the data that
  • 00:03:00
    I'll be using for most of this series
  • 00:03:02
    now for anyone who's been watching my
  • 00:03:04
    latest videos you know that I like to
  • 00:03:06
    use the stackoverflow developer survey
  • 00:03:08
    for different kinds of data analysis now
  • 00:03:10
    the reason that I like to use this data
  • 00:03:12
    is because it's real world data and it
  • 00:03:15
    has a lot of data in there that I think
  • 00:03:16
    would be interesting to most people who
  • 00:03:18
    are watching these types of videos I've
  • 00:03:20
    seen some other tutorials where the data
  • 00:03:22
    just seems kind of unrealistic and not
  • 00:03:24
    very relatable
  • 00:03:26
    so hopefully using this data will keep
  • 00:03:28
    people interested and also give you a
  • 00:03:30
    good idea of what it's like to actually
  • 00:03:32
    download download real data from a
  • 00:03:35
    source and start analyzing it with
  • 00:03:37
    pandas so to download this data I have
  • 00:03:40
    this pulled up here in the browser we
  • 00:03:42
    can go over to the Stack Overflow survey
  • 00:03:45
    results page now this is easy to find if
  • 00:03:47
    you just google it but just to keep
  • 00:03:49
    things easy I'll have a link to this
  • 00:03:51
    download page in a description section
  • 00:03:53
    as well ok now on this page you can
  • 00:03:57
    download the data in CSV form for any
  • 00:04:00
    year that they have available and now
  • 00:04:02
    I'm going to go ahead and download the
  • 00:04:04
    2019 data which is the top data here so
  • 00:04:08
    I'm going to download this CSV here and
  • 00:04:12
    then we'll click on download again and
  • 00:04:15
    this should go ahead and download this
  • 00:04:18
    for us ok it did and now I'm going to
  • 00:04:22
    open this in my finder here and I'm
  • 00:04:25
    going to unzip this data it comes
  • 00:04:27
    zip drive and once that data is
  • 00:04:29
    downloaded and unzipped I'm going to go
  • 00:04:32
    ahead and drag that folder to a folder
  • 00:04:34
    here on my desktop and that's where
  • 00:04:37
    we'll also create a notebook and analyze
  • 00:04:39
    this data so real quick I don't have
  • 00:04:42
    this open let me open up this pandas
  • 00:04:48
    demo folder and this will open this and
  • 00:04:51
    find her and now I will take the data
  • 00:04:54
    and drag this into this pandas demo
  • 00:04:56
    folder that is on my desktop so your
  • 00:04:59
    projects can be anywhere but I just had
  • 00:05:02
    I just created a project folder here on
  • 00:05:05
    my desktop called pandas demo and it's
  • 00:05:07
    completely empty except for the data
  • 00:05:09
    that we just dragged in here so now I'm
  • 00:05:12
    going to rename this since this is kind
  • 00:05:14
    of a long name here I'm just going to
  • 00:05:16
    rename this to data that was named
  • 00:05:19
    developer survey 2019 but I'm just gonna
  • 00:05:21
    call that data so that it's easy for us
  • 00:05:23
    to find that within our script okay so
  • 00:05:26
    what files do we have here in the
  • 00:05:28
    directory that we unzipped in this data
  • 00:05:30
    directory let me make this a little
  • 00:05:32
    larger here okay so first of all if you
  • 00:05:36
    download data that comes with a readme
  • 00:05:39
    then this is usually helpful we have a
  • 00:05:41
    readme file right here it tells you what
  • 00:05:43
    these other files are going to be so in
  • 00:05:46
    this case we have this survey results
  • 00:05:48
    public dot CSV and that contains the
  • 00:05:51
    main survey results one respondent per
  • 00:05:54
    row and one column per answer and the
  • 00:05:57
    survey results schema here has the
  • 00:06:00
    questions that correspond to each column
  • 00:06:02
    name and the results now if any of this
  • 00:06:05
    doesn't make sense now then then it will
  • 00:06:07
    once we open up this data in Jupiter so
  • 00:06:10
    I'm just given a broad overview here
  • 00:06:12
    don't let this overwhelm you by
  • 00:06:15
    everything that I'm saying here this
  • 00:06:17
    will make a lot more sense once we open
  • 00:06:18
    this up in Jupiter so let's go ahead and
  • 00:06:21
    do that so to open this in a Jupiter
  • 00:06:23
    notebook I'm going to go back to my
  • 00:06:26
    terminal so I'm going to go ahead and
  • 00:06:27
    close these Finder windows open here go
  • 00:06:30
    back to my terminal and now within here
  • 00:06:33
    I'm going to navigate to my folder where
  • 00:06:35
    I place that data and this should be the
  • 00:06:38
    same command on Mac
  • 00:06:39
    and windows so I'm gonna say CD and I'm
  • 00:06:43
    gonna go to my desktop this is going to
  • 00:06:45
    be wherever your project directory is
  • 00:06:47
    but mine is in this pandas demo on my
  • 00:06:50
    desktop and once I am navigated to that
  • 00:06:53
    directory to start up a Jupiter notebook
  • 00:06:55
    we just need to say Jupiter notebook and
  • 00:06:59
    run that and we should see a server
  • 00:07:02
    start up here
  • 00:07:03
    and it seems like it's taking a second
  • 00:07:05
    ok there we go
  • 00:07:06
    now back in our terminal here this will
  • 00:07:10
    run a Jupiter server and you will need
  • 00:07:13
    to leave that terminal open while you're
  • 00:07:15
    working in Jupiter so Jupiter rum runs
  • 00:07:18
    in the browser so if you shut down this
  • 00:07:20
    server then you won't be able to access
  • 00:07:22
    our notebook okay so let's go back here
  • 00:07:27
    to the browser and this is where we have
  • 00:07:30
    our Jupiter notebooks so let me zoom in
  • 00:07:32
    here so that we can so that everybody
  • 00:07:34
    can read this fairly well okay I'll zoom
  • 00:07:38
    in to about right there I think is good
  • 00:07:39
    okay so we can see our data folder here
  • 00:07:42
    that we downloaded and placed in our
  • 00:07:44
    Jupiter demo folder a little bit ago but
  • 00:07:47
    now let's create a new notebook so to
  • 00:07:50
    create a new notebook I'm going to click
  • 00:07:51
    on new up here at the top right and then
  • 00:07:54
    I'm going to use Python 3 and now we can
  • 00:07:59
    name our notebook so up here where it
  • 00:08:01
    says untitled I'm going to click here
  • 00:08:03
    and I'm just going to call this pandas
  • 00:08:06
    demo and rename that ok so now we're
  • 00:08:09
    ready to start using pandas so we can
  • 00:08:12
    import this by saying import pandas as
  • 00:08:16
    PD now importing pandas as PD is just a
  • 00:08:21
    common convention when using pandas so
  • 00:08:23
    let's run that and I ran that cell by
  • 00:08:27
    pressing Shift + Enter and again I'm not
  • 00:08:30
    going to go into the specifics of
  • 00:08:31
    working here within Jupiter in this
  • 00:08:33
    series but if you'd like a rundown of
  • 00:08:35
    the features and shortcuts that I'll be
  • 00:08:37
    using then I do have a link to my
  • 00:08:39
    Jupiter video in the description section
  • 00:08:41
    below ok so for the rest of this video
  • 00:08:43
    we'll see how to load in our data and
  • 00:08:46
    look at some information about that data
  • 00:08:48
    so our data is in a CSV format so in
  • 00:08:53
    order to
  • 00:08:53
    in that CSV we can simply say DF which
  • 00:08:57
    is going to stand for data frame we
  • 00:08:59
    learn about all about data frames here
  • 00:09:00
    and a bit we're going to say DF is equal
  • 00:09:02
    to PD dot read underscore CSV we're
  • 00:09:07
    going to use the read CSV method from
  • 00:09:10
    pandas here and now we just want to pass
  • 00:09:13
    in a path to our CSV file now mine was
  • 00:09:16
    within that data folder and that was
  • 00:09:19
    within the file survey underscore
  • 00:09:22
    results under score public dot CSV so
  • 00:09:26
    now if I hit shift enter then that will
  • 00:09:30
    run that cell so right off the bat we
  • 00:09:33
    can see that this is pretty simple to
  • 00:09:34
    work with so when using native Python in
  • 00:09:37
    order to read in a CSV file we need to
  • 00:09:40
    use the CSV module to create a CSV
  • 00:09:42
    reader and things like that but here
  • 00:09:45
    we're just doing this all in one line so
  • 00:09:48
    when it reads this in it's going to read
  • 00:09:50
    it in as a data frame so data frames are
  • 00:09:53
    pretty much the backbone of pandas and
  • 00:09:55
    we'll go more into what go over data
  • 00:09:58
    frames and series objects in depth in
  • 00:10:01
    the next video but for the basics a data
  • 00:10:04
    frame is basically just rows and columns
  • 00:10:07
    of data we can see what a data frame
  • 00:10:09
    looks like but just by printing it out
  • 00:10:11
    and this is the great thing about using
  • 00:10:13
    Jupiter notebooks because it allows us
  • 00:10:15
    to visualize these things in ways that
  • 00:10:19
    we can't do in other editors so here in
  • 00:10:22
    Jupiter I can simply just say DF and run
  • 00:10:25
    that and it will print out our data
  • 00:10:29
    frame here so we didn't even need to
  • 00:10:31
    wrap this here in a print function now
  • 00:10:34
    if you're using a normal editor then you
  • 00:10:37
    can still print out data frame in from
  • 00:10:39
    information but it's not going to look
  • 00:10:42
    as good as it does here in Jupiter where
  • 00:10:45
    we get this interactive table so this is
  • 00:10:48
    a small look at our data now this is
  • 00:10:51
    actually 85 columns here but if I scroll
  • 00:10:55
    through these then it doesn't look like
  • 00:10:57
    there's actually 85 columns printed out
  • 00:11:00
    here so this is actually concatenated by
  • 00:11:04
    default just to give us a broad overview
  • 00:11:07
    of the
  • 00:11:07
    data so by default Jupiter is displaying
  • 00:11:10
    20 columns from our data frame now how
  • 00:11:14
    did I know that there was 85 columns for
  • 00:11:17
    this data frame well there are a few
  • 00:11:19
    attributes and methods that we can use
  • 00:11:21
    to get an idea of what our data looks
  • 00:11:24
    like so first we have the shape
  • 00:11:26
    attribute and shape gives us the number
  • 00:11:31
    of rows and columns in a tuple form so
  • 00:11:35
    let's look at this so in our next cell
  • 00:11:37
    down here I'm gonna say DF dot shape and
  • 00:11:40
    I will run that now this is an attribute
  • 00:11:44
    here it's not a method so you don't want
  • 00:11:47
    to put parentheses so DF dot shape and
  • 00:11:50
    we can see that we have 88 thousand rows
  • 00:11:55
    and 85 columns now if you wanted a bit
  • 00:12:00
    more information then we can use the
  • 00:12:02
    info method the info method will give us
  • 00:12:04
    the number of rows and columns and also
  • 00:12:07
    all of the data types of all the columns
  • 00:12:09
    as well
  • 00:12:10
    now before I run that it looks like my
  • 00:12:14
    text is getting cut off here a little
  • 00:12:16
    bit sometimes this happens whenever I'm
  • 00:12:19
    within Jupiter in order to fix this I
  • 00:12:22
    usually just come up here and restart
  • 00:12:25
    and run all my cells again that usually
  • 00:12:28
    takes care of the problem let's see if
  • 00:12:31
    that works okay so that seemed to work
  • 00:12:33
    another thing that you can do here is
  • 00:12:35
    just to totally reload the page and the
  • 00:12:38
    browser and when you reload the page I
  • 00:12:41
    think it's just because of how my I have
  • 00:12:44
    this text enlarged so it's kind of
  • 00:12:47
    messing with how these look but now we
  • 00:12:49
    can see these just fine
  • 00:12:50
    okay so like I was saying we can see
  • 00:12:54
    here that we have eighty eight thousand
  • 00:12:55
    eight hundred and eighty three rows and
  • 00:12:58
    eighty five columns now if you wanted
  • 00:13:01
    more information then we can use the
  • 00:13:03
    info method and that will give us the
  • 00:13:06
    number of rows and the number of columns
  • 00:13:08
    but also all of the data types of the
  • 00:13:11
    columns so let's run that so if I do D F
  • 00:13:14
    dot info whoops
  • 00:13:16
    D F dot info now this actually is a
  • 00:13:19
    method so we do want to
  • 00:13:21
    you put the parentheses there and let me
  • 00:13:24
    run this and now let's go over this
  • 00:13:27
    output so we can see here that it says
  • 00:13:29
    that we have eighty-eight thousand eight
  • 00:13:31
    hundred and eighty three entries so
  • 00:13:33
    those are our rows we have a total of
  • 00:13:35
    eighty five columns and then it lists
  • 00:13:38
    all of our columns here for our data so
  • 00:13:40
    these are all the columns in our CSV
  • 00:13:43
    file that we have loaded in now it also
  • 00:13:46
    gives us the data types of each of these
  • 00:13:48
    columns and we're going to go over data
  • 00:13:50
    types in a future video but for the most
  • 00:13:54
    part objects usually mean strings and
  • 00:13:57
    then we have other things as well so int
  • 00:14:00
    64 is just an integer float is a float
  • 00:14:04
    so a probably a decimal number and there
  • 00:14:08
    are no other data types in this data set
  • 00:14:12
    but there are more data types in general
  • 00:14:14
    so I will be sure to do a video on data
  • 00:14:18
    types specifically in the near future
  • 00:14:21
    okay so now that we know the number of
  • 00:14:23
    rows and columns let's change a setting
  • 00:14:26
    here within Jupiter so that we can see
  • 00:14:28
    all of the columns so I think it would
  • 00:14:31
    be useful to see all of these if we'd
  • 00:14:33
    like to even if there are a lot of these
  • 00:14:36
    to scroll through so to do this we can
  • 00:14:39
    at change a setting and I'm gonna come
  • 00:14:41
    down here to the bottom here and I'm
  • 00:14:44
    gonna change a setting by saying PD dot
  • 00:14:46
    set underscore option and within here I
  • 00:14:50
    will say display dot max underscore
  • 00:14:55
    columns and I will set that equal to 85
  • 00:15:00
    so that we can see all of our columns
  • 00:15:02
    and I will run that and now if we print
  • 00:15:06
    out our data frame so I'm going to go
  • 00:15:08
    back up here to where we print it out
  • 00:15:10
    this data frame and I will rerun that
  • 00:15:14
    cell and now if I scroll through these
  • 00:15:16
    columns then we can see that now it
  • 00:15:19
    looks like we actually have these 85
  • 00:15:21
    different columns here so I can keep
  • 00:15:24
    scrolling and keep scrolling and it
  • 00:15:26
    didn't just chop us off at that 20 like
  • 00:15:28
    it was before
  • 00:15:28
    now obviously the rows are also being
  • 00:15:31
    concatenated here and we definitely
  • 00:15:33
    don't want to print
  • 00:15:34
    all 89 thousand of these rows but there
  • 00:15:39
    probably are some examples with certain
  • 00:15:41
    datasets where you might want to see all
  • 00:15:43
    of the rows as well so for example I
  • 00:15:46
    said that the survey results schema CSV
  • 00:15:49
    file that was included in our download
  • 00:15:52
    gives the matching questions for all of
  • 00:15:55
    these column names here so if we wanted
  • 00:15:58
    to see what these column names here mean
  • 00:16:02
    for this data then we can load in that
  • 00:16:04
    schema CSV file as well so let me do
  • 00:16:08
    this I'll go down to the bottom of our
  • 00:16:10
    notebook and I will just load this in by
  • 00:16:13
    saying schema underscore D F now I don't
  • 00:16:16
    want to just call this D F because we
  • 00:16:18
    don't want to overwrite our other data
  • 00:16:20
    frame and I will load this in just like
  • 00:16:23
    we saw before by saying PD dot read
  • 00:16:25
    underscore CSV and this is within the
  • 00:16:29
    data folder and this was called survey
  • 00:16:32
    underscore results under score schema
  • 00:16:37
    CSV so I will run this and now let's
  • 00:16:41
    look at this schema data frame that we
  • 00:16:46
    just loaded in so here we on this column
  • 00:16:51
    column here this gives us all of the
  • 00:16:53
    columns in our other data frame so we
  • 00:16:57
    have respondent main branch hobbyist and
  • 00:16:59
    if I scroll up to that data frame here
  • 00:17:01
    I'm gonna delete this info here since we
  • 00:17:04
    no longer need that if I scroll up to
  • 00:17:07
    this data frame here then we can see
  • 00:17:09
    respondent main branch hobbyist so if we
  • 00:17:13
    want to know what these mean then that's
  • 00:17:15
    what we use the schema for so we can see
  • 00:17:17
    that main branch or hobbyist means d-u
  • 00:17:21
    code as a hobby main branch means which
  • 00:17:24
    of the following options best describes
  • 00:17:25
    now it actually concatenates the text
  • 00:17:28
    too in order to actually see this to the
  • 00:17:31
    full text we could either change an
  • 00:17:34
    option or we could just access this
  • 00:17:36
    value directly and I will be showing you
  • 00:17:38
    how to do that in the next video but for
  • 00:17:41
    now we can see that we can't see all of
  • 00:17:44
    the rows to the questions that correlate
  • 00:17:48
    to each column name here remember we
  • 00:17:50
    have 85 columns but for here we can only
  • 00:17:53
    see the first five and then we get this
  • 00:17:56
    ellipses here and then we can see the
  • 00:17:58
    last five so let's set this up so that
  • 00:18:02
    we can view 85 rows and then reprint
  • 00:18:06
    this so that we can see all of these so
  • 00:18:08
    back in the same cell where we set our
  • 00:18:11
    max columns now let's also add one four
  • 00:18:17
    rows as well so I'm just going to copy
  • 00:18:19
    and paste that but instead of max
  • 00:18:21
    columns here I'm gonna have this be max
  • 00:18:23
    rows and I will run that and now we will
  • 00:18:27
    rerun this schema here and now we can
  • 00:18:31
    see that we can see all of the columns
  • 00:18:33
    and the corresponding question text so
  • 00:18:37
    if you wanted to know what any of these
  • 00:18:38
    columns mean then this is how we do it
  • 00:18:42
    so we can see IT person the question was
  • 00:18:44
    are you the IT support person for your
  • 00:18:47
    family so that's probably a yes or no
  • 00:18:49
    question so that is what those mean so
  • 00:18:52
    if you're going through this data on
  • 00:18:54
    your own then you can use this as a
  • 00:18:55
    reference anytime you don't know what a
  • 00:18:58
    certain column means in our survey data
  • 00:19:00
    and if you don't know or if you don't
  • 00:19:03
    want to look through all of these to
  • 00:19:05
    find a specific row or a specific column
  • 00:19:09
    name then in a future video we're going
  • 00:19:11
    to learn about filtering data frames and
  • 00:19:14
    see how we can just grab a specific row
  • 00:19:16
    where the column equals a certain value
  • 00:19:19
    okay so now we have all 85 rows visible
  • 00:19:23
    of our schema data frame here but you
  • 00:19:26
    might be thinking well that's nice but I
  • 00:19:29
    don't want to see eighty five rows of my
  • 00:19:31
    survey data every time I want to look at
  • 00:19:34
    it but there are a couple of methods
  • 00:19:36
    that we can use to only see a certain
  • 00:19:39
    number of rows which you'll most likely
  • 00:19:41
    use a lot just to get an idea that your
  • 00:19:44
    filters and data frames seem to be
  • 00:19:46
    working correctly so we can see the
  • 00:19:49
    first five rows by saying instead of
  • 00:19:51
    doing a DF here we can say D F dot head
  • 00:19:55
    and if I run that then we just get the
  • 00:19:58
    first five rows here okay and you can
  • 00:20:01
    pass
  • 00:20:02
    value if you want to see a certain
  • 00:20:03
    number of values so if you wanted to see
  • 00:20:05
    the first ten rows then we could pass in
  • 00:20:08
    a ten to D F dot head and this gives us
  • 00:20:11
    the first ten rows so we can see it goes
  • 00:20:13
    all the way down zero through nine there
  • 00:20:16
    now if you'd like to see the last rows
  • 00:20:18
    instead of the first rows then we can
  • 00:20:20
    use the tail method instead
  • 00:20:23
    so if we say DF tail and
  • 00:20:26
    we could use it without a number also
  • 00:20:28
    but if we pass in a number just like
  • 00:20:31
    with head then now we're going to say
  • 00:20:32
    that we want the last ten entries here
  • 00:20:36
    in our data so those are the last ten
  • 00:20:38
    items of our data okay so this is a
  • 00:20:41
    brief overview of getting pandas
  • 00:20:44
    installed and then downloading our data
  • 00:20:47
    and loading our data in to Jupiter and
  • 00:20:50
    how to read this in now before we end
  • 00:20:54
    here I'd like to mention the sponsor of
  • 00:20:56
    this video and that is brilliant org so
  • 00:20:59
    in this series we've been learning about
  • 00:21:01
    pandas and how to analyze data and
  • 00:21:03
    python and brilliant would be an
  • 00:21:05
    excellent way to supplement what you
  • 00:21:06
    learn here with their hands-on courses
  • 00:21:08
    they have some excellent courses and
  • 00:21:10
    lessons that do a deep dive on how to
  • 00:21:11
    think about and analyze data correctly
  • 00:21:13
    for data analysis fundamentals I would
  • 00:21:16
    really recommend checking out their
  • 00:21:17
    statistics course which shows you how to
  • 00:21:19
    analyze graphs and determine
  • 00:21:20
    significance in the data and I would
  • 00:21:22
    also recommend their machine learning
  • 00:21:24
    course which takes data analysis to a
  • 00:21:26
    new level
  • 00:21:26
    well you'll learn about the techniques
  • 00:21:28
    being used that allow machines to make
  • 00:21:30
    decisions where there's just too many
  • 00:21:32
    variables for a human to consider so to
  • 00:21:34
    support my channel and learn more about
  • 00:21:36
    brilliant you can go to brilliant org
  • 00:21:38
    Forge slash CMS to sign up for free and
  • 00:21:40
    also the first 200 people they go to
  • 00:21:43
    that link will get 20% off the annual
  • 00:21:45
    premium subscription and you can find
  • 00:21:47
    that link in the description section
  • 00:21:48
    below
  • 00:21:49
    again that's brilliant org forge slash
  • 00:21:52
    CMS
  • 00:21:54
    okay so I think that is going to do it
  • 00:21:56
    for our first pandas video I hope you
  • 00:21:58
    feel like you've got a good introduction
  • 00:21:59
    on how to install pandas and load in
  • 00:22:01
    your data to a jupiter notebook in the
  • 00:22:03
    next video we're going to be learning
  • 00:22:05
    more about data frames and also learn
  • 00:22:07
    about the series data type so we'll
  • 00:22:10
    learn how we can think about data frames
  • 00:22:12
    in a way that's easier to understand and
  • 00:22:14
    also see how we can
  • 00:22:16
    grab certain elements columns and rows
  • 00:22:18
    from these as well so be sure to stick
  • 00:22:21
    around for that but if anyone has any
  • 00:22:23
    questions about will be covered in this
  • 00:22:24
    video then feel free to ask in the
  • 00:22:26
    comment section below and I'll do my
  • 00:22:27
    best to answer those and if you enjoyed
  • 00:22:29
    these tutorials and would like to
  • 00:22:30
    support them then there are several ways
  • 00:22:32
    you can do that the easiest ways to
  • 00:22:34
    simply like the video and give it a
  • 00:22:35
    thumbs up and also it's a huge help to
  • 00:22:37
    share these videos with anyone who you
  • 00:22:38
    think would find them useful and if you
  • 00:22:40
    have the means you can contribute the
  • 00:22:41
    patreon and there's a link to that page
  • 00:22:43
    in the description section below
  • 00:22:44
    be sure to subscribe for future videos
  • 00:22:46
    and thank you all for watching
  • 00:22:58
    you
Etiquetas
  • pandas
  • Python
  • data analysis
  • Jupyter Notebook
  • data manipulation
  • CSV
  • installation
  • pandas tutorial
  • data science
  • Stack Overflow dataset