Python Pandas Tutorial (Part 2): DataFrame and Series Basics - Selecting Rows and Columns

00:33:35
https://www.youtube.com/watch?v=zmdjNSmRXF4

Zusammenfassung

TLDRThis video is a tutorial for learning more about Pandas in Python, specifically focusing on the DataFrame and Series data types. These are primary data structures in Pandas crucial for data manipulation and analysis. The tutorial explains what DataFrames and Series are, how to access rows and columns using various methods, and demonstrates the similarities and differences when using Pandas compared to basic Python lists and dictionaries. It includes a practical look at using indexers 'loc' and 'iloc' for accessing data by labels or integer positions, respectively, and explains the benefits of using DataFrames to handle complex data structures with ease. Additionally, it introduces the function 'value_counts()' to display unique value counts in a Series, provides a glimpse of future advanced functionalities, and directs viewers to supplementary learning resources on Pandas and data science. Lastly, it mentions Brilliant as a sponsor offering further education opportunities.

Mitbringsel

  • ๐Ÿ“Š Data Frame and Series are the core data types in Pandas.
  • ๐Ÿ”‘ Access DataFrame columns like keys in dictionaries or dot notation.
  • ๐Ÿ” 'loc' is for label-based data access, 'iloc' is for integer-based.
  • ๐Ÿ”„ A DataFrame is akin to a table with rows and columns.
  • ๐Ÿ—‚๏ธ Series functions as a single column of data with more functionality.
  • ๐Ÿ“ˆ Use 'value_counts()' to get counts of unique values in a Series.
  • ๐Ÿ“š Pandas enables handling complex data structures effectively.
  • ๐Ÿ” Indexes give unique identity to DataFrame rows.
  • ๐Ÿ“‹ Slice data frames to extract specific rows and columns.
  • ๐Ÿ” Learn advanced data filtering with upcoming lessons.

Zeitleiste

  • 00:00:00 - 00:05:00

    In this video, the focus is on advancing the understanding of pandas, particularly the DataFrame and Series data types, which are central to data manipulation. It starts with a sponsor mention of Brilliant.org, thanking them and encouraging viewers to support the sponsors. The video transitions into discussing the foundational concepts of DataFrames โ€” essentially tables with rows and columns โ€” by using Python-based objects and dictionaries to conceptualize this structure. An introductory example shows how a dictionary can simulate a DataFrame, leading to an explanation of how pandas enhances this capability.

  • 00:05:00 - 00:10:00

    The explanation of DataFrames transitions to real Python examples, illustrating how adapting dictionary lists into pandas DataFrames offers enriched capabilities and visual organization of data in rows and columns. This involves creating a simple DataFrame from dictionaries and discussing indexes, preluding an in-depth discussion forthcoming in the next video. It moves on to basic access techniques for data within these structures, covering extracting columns akin to retrieving dictionary keys, and introducing the Series object when single columns are accessed. This setup helps segue into series as simplified versions of DataFrames focused on rows.

  • 00:10:00 - 00:15:00

    Continuing the hands-on exploration of DataFrame functionality, the video guides through accessing rows and columns using iloc and loc indexers, contrasting integer-based row access (iloc) and label-oriented access (loc), presenting typical use cases and constraints of each. The tutorial provides practical demonstrations of accessing single or multiple rows and how to access columns within those rows, illuminating pandas' robust capabilities over vanilla Python structures. This session builds toward understanding pandas' indexing logic and introduces slicing techniques for convenient data retrieval.

  • 00:15:00 - 00:20:00

    The tutorial progresses into mixing row and column selections using loc and iloc indexers, illustrating how to extract portions of a DataFrame based on criteria, such as multiple column filtering using list comprehensions within DataFrame access chains. Practical examples from a Stack Overflow dataset showcase the ease of slicing and retrieving specific pieces of data, reinforcing the conceptual framework built around pandas. Additionally, it briefly teases the upcoming strength of pandas in simplifying complex operations like counting occurrences with methods like value_counts, exemplifying the transformative power of pandas in data tasks.

  • 00:20:00 - 00:25:00

    An application of pandas' indexing in larger data sets is explored, emphasizing its practicality. The video draws examples from a large Stack Overflow dataset, demonstrating how to efficiently extract specific rows and columns using the concepts learned. Emphasis is placed on understanding the labeling of indexes and using advanced slicing for comprehensive data queries. The tutorial underscores pandas' advantages by comparing it with pure Python approaches, highlighting functionalities like method chaining for operations such as count calculations and the nuanced differences in handling data in real projects.

  • 00:25:00 - 00:33:35

    The video concludes by wrapping up the concepts of DataFrames and Series within pandas, appreciating their role in data science workflows. It heralds future episodes that will explore indexes deeper, showcasing the benefits of tailored indexing for datasets. A revisit to the sponsor, Brilliant.org, aligns their offerings with learning goals around data analysis enhancement, encouraging sign-ups for extended learning. Closing remarks focus on engaging with viewers through comments and promoting further learning via likes, shares, and subscriptions, alongside a nod to supporting content production through Patreon.

Mehr anzeigen

Mind Map

Mind Map

Hรคufig gestellte Fragen

  • What data structures are fundamental in Pandas?

    DataFrame and Series are the fundamental data structures in Pandas.

  • How do you access DataFrame columns in Pandas?

    DataFrame columns can be accessed like dictionary keys or using dot notation.

  • What is a Pandas Series?

    A Pandas Series is a one-dimensional array capable of holding data of any type with axis labels.

  • How do you select multiple columns in a DataFrame?

    Multiple columns are selected by passing a list of column names inside double brackets.

  • What is the purpose of 'value_counts()' in Pandas?

    The 'value_counts()' function counts the unique values in a Series.

  • What is an index in a Pandas DataFrame?

    An index in a DataFrame uniquely identifies rows.

  • How do 'loc' and 'iloc' differ in Pandas?

    'loc' is label-based selection, whereas 'iloc' is integer-location-based selection.

  • How is a DataFrame different from a Series?

    A DataFrame is two-dimensional (rows and columns), while a Series is one-dimensional.

  • Can you modify the index in a Pandas DataFrame?

    Yes, the index in a DataFrame can be modified.

  • What kind of data structure is a Pandas DataFrame considered?

    A Pandas DataFrame is considered a two-dimensional data structure, similar to a table.

Weitere Video-Zusammenfassungen anzeigen

Erhalten Sie sofortigen Zugang zu kostenlosen YouTube-Videozusammenfassungen, die von AI unterstรผtzt werden!
Untertitel
en
Automatisches Blรคttern:
  • 00:00:00
    hey there how's it going everybody in
  • 00:00:01
    this video we're gonna continue learning
  • 00:00:02
    more about pandas and specifically we're
  • 00:00:04
    going to be learning about the data
  • 00:00:06
    frame and series data types so like I
  • 00:00:08
    said in the last video these are
  • 00:00:10
    basically the backbone of pandas and are
  • 00:00:12
    the two primary data types that you'll
  • 00:00:14
    likely be using the most so in this
  • 00:00:16
    video we're gonna go over how we can
  • 00:00:18
    think of data frames and series data
  • 00:00:20
    types in a different way and then we'll
  • 00:00:22
    look at the basics of getting
  • 00:00:24
    information from these data types now I
  • 00:00:26
    would like to mention that we do have a
  • 00:00:27
    sponsor for this series of videos and
  • 00:00:29
    that is brilliant work so I really want
  • 00:00:31
    to thank brilliant for sponsoring the
  • 00:00:32
    series and it would be great if you all
  • 00:00:34
    can check them out using the link in the
  • 00:00:35
    description section below and support
  • 00:00:37
    the sponsors and I'll talk more about
  • 00:00:38
    their services in just a bit so with
  • 00:00:40
    that said let's go ahead and get started
  • 00:00:42
    okay so first let's look at what a data
  • 00:00:45
    frame is and then we'll learn more about
  • 00:00:47
    how we can think about this in terms of
  • 00:00:50
    a Python object so we saw data frames
  • 00:00:52
    briefly in our last video when we check
  • 00:00:55
    to make sure that our data was loaded in
  • 00:00:57
    correctly so these were the objects that
  • 00:01:00
    were displayed in Jupiter as rows and
  • 00:01:03
    columns basically a table so let's take
  • 00:01:06
    a look at what this looks like so if you
  • 00:01:08
    were following along with the last video
  • 00:01:09
    this is basically the same jupiter
  • 00:01:12
    notebook that i had before except this
  • 00:01:15
    has just cleaned up a bit so we're
  • 00:01:17
    importing pandas here we are reading in
  • 00:01:20
    our csv files so one is just our main
  • 00:01:23
    data frame for our survey results one is
  • 00:01:26
    our schema data frame for the schema
  • 00:01:29
    results and then we are setting some
  • 00:01:31
    options here where we have the max
  • 00:01:33
    columns set to 85 so we can see all the
  • 00:01:36
    columns and the max row set the 85 so
  • 00:01:38
    that we can see all of the schema now if
  • 00:01:41
    you haven't been following along with
  • 00:01:42
    the video so far then I do have a link
  • 00:01:44
    in the description section below that
  • 00:01:46
    links to where you can download this
  • 00:01:47
    Dayla data and follow along with this
  • 00:01:50
    okay so this is a data frame here so
  • 00:01:54
    where we are printing out D F dot head
  • 00:01:56
    this is what this returns so this here
  • 00:01:59
    is the first five rows of our data frame
  • 00:02:03
    so you can see that a data frame is made
  • 00:02:05
    up of multiple rows here and we also
  • 00:02:08
    have multiple columns so in the case of
  • 00:02:10
    this data
  • 00:02:11
    these are survey results
  • 00:02:13
    but your data can be you know whatever
  • 00:02:16
    your data is but it's most likely going
  • 00:02:18
    to be in rows and columns kind of like a
  • 00:02:21
    table so for this data with these being
  • 00:02:24
    survey results each row is a survey as
  • 00:02:27
    one person who answered the survey and
  • 00:02:30
    each question was their answer for that
  • 00:02:33
    question on the survey so for example
  • 00:02:36
    this respondent number one here they
  • 00:02:38
    answered that yes they were a hobbyist
  • 00:02:40
    and if you want to know what hobbyist
  • 00:02:42
    means then we just like we saw in the
  • 00:02:45
    last video we can look at our schema
  • 00:02:47
    data frame so let me go ahead and print
  • 00:02:49
    this out here and let's look at this so
  • 00:02:53
    if I look at what a hobbyist is then we
  • 00:02:56
    can see that that question was do you
  • 00:02:58
    code as a hobby so that's what this data
  • 00:03:01
    is and that kind of gives us an idea of
  • 00:03:03
    what a data frame is basically a data
  • 00:03:05
    frame is just rows and columns but now
  • 00:03:09
    let me explain how I like to think of
  • 00:03:11
    data frames using native Python so if we
  • 00:03:14
    were only using Python and not using
  • 00:03:16
    pandas to store information in rows and
  • 00:03:18
    columns
  • 00:03:19
    then how would we do this well for those
  • 00:03:22
    of you familiar with dictionaries you
  • 00:03:24
    might think that it's a good idea to
  • 00:03:25
    store information that way so let me
  • 00:03:27
    pull up a new notebook here that I have
  • 00:03:30
    open here with some snippets and let's
  • 00:03:33
    take a look at this okay so let's look
  • 00:03:35
    at this first cell here so a lot of us
  • 00:03:37
    are probably familiar with Python
  • 00:03:39
    dictionaries where we have keys and
  • 00:03:41
    values so if I'm representing some data
  • 00:03:43
    in this example it's a person then we
  • 00:03:46
    can use a dictionary so first off I have
  • 00:03:49
    a key of first which is going to be the
  • 00:03:51
    first name and then that has a value of
  • 00:03:54
    kori and then we also have keys and
  • 00:03:57
    values for the last name and the email
  • 00:03:59
    as well okay so this dictionary here
  • 00:04:02
    represents data for a single person but
  • 00:04:05
    how would we represent data for multiple
  • 00:04:07
    people well there are probably a couple
  • 00:04:09
    of different ways that we can do this
  • 00:04:10
    but the way that I like to think of this
  • 00:04:12
    in terms of learning pandas is to make
  • 00:04:15
    all of our values and our dictionaries a
  • 00:04:17
    list so let's take a look in the second
  • 00:04:21
    cell here to see what this would look
  • 00:04:22
    like so here in the second cell now we
  • 00:04:25
    can see that we have a pretty similar
  • 00:04:26
    diction
  • 00:04:27
    to what we had above but now instead of
  • 00:04:31
    just a single string here for the values
  • 00:04:33
    I instead have a list and our list
  • 00:04:36
    currently just has one person but now
  • 00:04:39
    since this is a list we can add more
  • 00:04:42
    first names and information in here so
  • 00:04:44
    the first value of our list is going to
  • 00:04:47
    be our first person so if I go to the
  • 00:04:51
    third cell down here at the bottom then
  • 00:04:53
    now we can use this as an example to see
  • 00:04:55
    what this would look like with multiple
  • 00:04:58
    people so the second value in our list
  • 00:05:01
    will be our second person and the third
  • 00:05:04
    value in the list will be our third
  • 00:05:06
    person so if we look here we have people
  • 00:05:08
    we have a key of first so if we want the
  • 00:05:12
    second person here we go to the second
  • 00:05:13
    value that's Jane the last name is Doe
  • 00:05:16
    and the email go to the second value
  • 00:05:19
    here is Jane Doe at email com
  • 00:05:21
    if you want the third person that would
  • 00:05:23
    be John and then third value in last
  • 00:05:25
    would be Doe then third value and email
  • 00:05:28
    is John Doe at email com
  • 00:05:29
    so we can kind of think of this like
  • 00:05:31
    rows and columns the keys are the
  • 00:05:33
    columns and the values are the rows now
  • 00:05:36
    if you look up the definition of a
  • 00:05:38
    panda's data frame online then you'll
  • 00:05:40
    see a lot of definitions that just say
  • 00:05:42
    something like it's a two dimensional
  • 00:05:45
    data structure now that might sound a
  • 00:05:47
    little confusing but in layman's terms
  • 00:05:49
    that basically just means rows and
  • 00:05:51
    columns okay so like I said here the key
  • 00:05:53
    for email here would be our email column
  • 00:05:56
    and contain all of the email values and
  • 00:06:00
    if we wanted to see the email column
  • 00:06:03
    then we can just access that key so if I
  • 00:06:07
    come down here into actually let me run
  • 00:06:11
    all of these really quick here I think I
  • 00:06:14
    open this up without running these so I
  • 00:06:15
    want to make sure that we have this
  • 00:06:17
    registered okay so if I wanted to see
  • 00:06:19
    that email column then I can simply say
  • 00:06:22
    people and then access that email key if
  • 00:06:25
    I run that then we can see that we got
  • 00:06:27
    all of the emails now the reason that I
  • 00:06:29
    wanted to show you this is because I
  • 00:06:30
    feel like this really helped me in terms
  • 00:06:33
    of how I think about data frames so data
  • 00:06:35
    frames are very similar to this but with
  • 00:06:37
    more functionality than what we have
  • 00:06:39
    here in stand
  • 00:06:40
    Python now we can actually create a data
  • 00:06:43
    frame from this dictionary and see what
  • 00:06:45
    this looks like
  • 00:06:46
    so let's do that and look at some basic
  • 00:06:48
    data frame functionality and then we'll
  • 00:06:50
    look at this more using the stack
  • 00:06:52
    overflow data from the last video so
  • 00:06:55
    here in this bottom cell in order to
  • 00:06:57
    create a data frame from the information
  • 00:06:59
    that we have here I'm going to go ahead
  • 00:07:01
    and import pandas so I'm going to say
  • 00:07:03
    import pandas as PD and now we can
  • 00:07:07
    create a data frame actually using this
  • 00:07:09
    dictionary that we have up here so to do
  • 00:07:12
    that I can just say DF is equal to PD
  • 00:07:15
    dot data frame and check the casing
  • 00:07:19
    there that's a capital D and a capital F
  • 00:07:21
    and then we'll just pass in that
  • 00:07:24
    dictionary that has values as lists so
  • 00:07:29
    if I run this and that seemed to run
  • 00:07:31
    okay without any errors and now let me
  • 00:07:34
    just print out DF here and if I print
  • 00:07:36
    that out then we can see that now our
  • 00:07:38
    data frame is representing this in a way
  • 00:07:41
    to where we do have rows and columns
  • 00:07:43
    that we can visualize so we get these
  • 00:07:45
    people printed out in a nice table of
  • 00:07:47
    rows and columns now we also have these
  • 00:07:50
    over here to the far left that don't
  • 00:07:53
    have column names this 0 1 & 2 now this
  • 00:07:56
    is an index now I'm not going to go too
  • 00:07:59
    much into indexes right now because
  • 00:08:01
    that's what the next video is going to
  • 00:08:02
    cover but basically it's a unique value
  • 00:08:05
    for our rows now it doesn't need to be
  • 00:08:07
    unique but again we'll talk more about
  • 00:08:10
    that in the video specifically on
  • 00:08:12
    indexes so now that we have a bit of an
  • 00:08:14
    idea of how to think about data frames
  • 00:08:16
    now let's take a look at how to access
  • 00:08:19
    information here within the data frame
  • 00:08:21
    so first let's just access the values of
  • 00:08:24
    a single column so just like we did with
  • 00:08:26
    the dictionary we can access a single
  • 00:08:29
    column just like we were accessing the
  • 00:08:32
    key of a dictionary so just like I did
  • 00:08:35
    people and email up here I can do very
  • 00:08:38
    similar down here and just say that I
  • 00:08:41
    want that email column of my data frame
  • 00:08:43
    now that's not actually a key that is
  • 00:08:47
    going to access the column of a data
  • 00:08:49
    frame but we can see here that we get
  • 00:08:51
    all of the emails back from that data
  • 00:08:54
    so again I do want to emphasize that I
  • 00:08:56
    only use the pure Python example so that
  • 00:09:00
    we could get an idea of how to think
  • 00:09:01
    about a data frame but like I said a
  • 00:09:04
    data frame is much much more than just a
  • 00:09:06
    dictionary of Lists so for example we
  • 00:09:09
    can see that when we displayed the email
  • 00:09:11
    column here it doesn't look the same as
  • 00:09:14
    when we displayed the list of values
  • 00:09:17
    from that dictionary and that's because
  • 00:09:19
    this is actually returning a series and
  • 00:09:22
    we can see this if we check the type so
  • 00:09:27
    if I check the type of this email column
  • 00:09:32
    here so let me run that we can see that
  • 00:09:36
    this is Panda score series series so
  • 00:09:39
    this is a series object so what is a
  • 00:09:42
    series so a series is still basically a
  • 00:09:45
    list of data but just like with a data
  • 00:09:48
    frame it has a lot more functionality
  • 00:09:50
    than just that now if you look up the
  • 00:09:53
    definition of a series online then
  • 00:09:55
    you'll see a lot of definitions that
  • 00:09:56
    just say it's a one-dimensional array
  • 00:09:58
    and that might sound a little confusing
  • 00:10:00
    but in layman's terms that basically
  • 00:10:03
    just means that it's rows of data so
  • 00:10:06
    again you can think of a data frame as
  • 00:10:08
    being rows and columns and a series as
  • 00:10:11
    being rows of a single column so a data
  • 00:10:16
    frame is basically a container for
  • 00:10:18
    multiple of these series objects so
  • 00:10:21
    again that's important so let me go over
  • 00:10:23
    that one more time so we can see that a
  • 00:10:25
    data frame here is two-dimensional
  • 00:10:27
    because it has rows and columns so we
  • 00:10:29
    can see here that it has you know first
  • 00:10:32
    name last name email now whenever we
  • 00:10:34
    access just the email then we can see
  • 00:10:37
    that we get all these emails here now
  • 00:10:39
    this is a series and I said that a data
  • 00:10:42
    frame basically contains is a container
  • 00:10:44
    for multiple series objects so we can
  • 00:10:48
    think of this email column here as a
  • 00:10:49
    series this last column here is a series
  • 00:10:52
    and this first column as a series and
  • 00:10:54
    also we can see where we printed out
  • 00:10:57
    this series here for the emails we can
  • 00:11:00
    see that this series also has an index
  • 00:11:02
    as well just like our data frame did so
  • 00:11:04
    this index is over here on the left the
  • 00:11:06
    0 1
  • 00:11:07
    - okay so we can access a single column
  • 00:11:10
    of a data frame like we're accessing a
  • 00:11:13
    key just like we did here in this cell
  • 00:11:17
    but you might also see some people use
  • 00:11:20
    dot notation to do the same thing so you
  • 00:11:22
    might see some people do it like this so
  • 00:11:25
    they might do D F dot email and if I run
  • 00:11:28
    this cell then we can see that let me
  • 00:11:32
    get rid of this cell here and just so we
  • 00:11:35
    can compare these two we can see that
  • 00:11:37
    this gives us the same thing whether we
  • 00:11:40
    access this like a key or whether we use
  • 00:11:42
    dot notation this returns the same
  • 00:11:45
    series object of the email values now
  • 00:11:48
    whichever way that you want to do this
  • 00:11:49
    is really just a personal preference I
  • 00:11:51
    actually prefer the first way of using
  • 00:11:54
    the brackets and there are a couple of
  • 00:11:57
    reasons that I prefer to use that over
  • 00:11:59
    dot notation first is that I like using
  • 00:12:02
    the brackets because there's a chance
  • 00:12:05
    that one of your columns is named the
  • 00:12:07
    same thing as one of the attributes or
  • 00:12:10
    methods of a data frame and if that's
  • 00:12:12
    the case then using the dot notation
  • 00:12:14
    might give you some errors so for
  • 00:12:17
    example if a data frame a dataframe has
  • 00:12:20
    a method called count so if you had a
  • 00:12:23
    column named count and you did and you
  • 00:12:27
    were trying to access that count column
  • 00:12:29
    using dot notation then that's actually
  • 00:12:32
    going to access the count method from
  • 00:12:36
    data frame instead of that count column
  • 00:12:39
    so that actually wouldn't work how we
  • 00:12:41
    did it here if you wanted to access the
  • 00:12:43
    actual column called count which we
  • 00:12:46
    don't have one in this specific data
  • 00:12:48
    frame but if we did then we would have
  • 00:12:50
    to access it like this so that's kind of
  • 00:12:52
    why I prefer brackets so I'm going to be
  • 00:12:55
    using brackets throughout this series
  • 00:12:59
    but I wanted you to know about dot
  • 00:13:00
    notation because if you're working with
  • 00:13:02
    other people using pandas then you might
  • 00:13:04
    see them access columns in using dot
  • 00:13:07
    notation so you need to know that it's
  • 00:13:10
    at least a possibility and again that
  • 00:13:12
    doesn't mean that they're doing it wrong
  • 00:13:13
    it's just a personal preference I just
  • 00:13:16
    prefer using the brackets okay so I said
  • 00:13:19
    that data frames have a lot
  • 00:13:21
    functionality than what we saw using you
  • 00:13:24
    know standard Python so let's look at
  • 00:13:27
    some other stuff that we can do here so
  • 00:13:29
    let's say that we wanted to access
  • 00:13:30
    multiple columns now in order to access
  • 00:13:33
    multiple columns we can use the bracket
  • 00:13:35
    notation and pass in a list of the
  • 00:13:38
    columns that we want so if I wanted both
  • 00:13:40
    the last name and email columns then we
  • 00:13:44
    could say DF and use our brackets just
  • 00:13:47
    like we saw before but now I'm going to
  • 00:13:49
    put in a set of inner brackets here as a
  • 00:13:51
    list of columns that I want to access so
  • 00:13:55
    for the first value
  • 00:13:56
    I'll put last for the last name and for
  • 00:13:58
    the second value I'll put email for the
  • 00:14:00
    email so if I run this then we can see
  • 00:14:03
    that now we have a data frame returned
  • 00:14:06
    here of the last column and the email
  • 00:14:09
    column now I want to emphasize again
  • 00:14:11
    here that I passed a list inside of
  • 00:14:14
    these brackets here
  • 00:14:16
    so there are two pairs of brackets you
  • 00:14:19
    can't leave off the inner brackets
  • 00:14:21
    because you'll likely get a key error
  • 00:14:23
    because pandas will think that you're
  • 00:14:25
    passing in both of those strings as a
  • 00:14:27
    single column name and another thing
  • 00:14:30
    that I want to point out here is that
  • 00:14:32
    now that we're getting multiple columns
  • 00:14:35
    this can no longer be a series because
  • 00:14:38
    remember a series is basically a single
  • 00:14:40
    column of rows so when we get multiple
  • 00:14:44
    columns like this
  • 00:14:45
    it's just returning another data frame
  • 00:14:47
    and in this case it's a filtered down
  • 00:14:49
    data frame with just these specific
  • 00:14:52
    columns so we filtered out the first
  • 00:14:55
    name column here and we just have the
  • 00:14:57
    last and the email okay so that's how we
  • 00:14:59
    get a specific column or multiple
  • 00:15:01
    columns and we can slice these as well
  • 00:15:04
    similar similar to how we slice a list
  • 00:15:07
    but I'll show that on our larger stack
  • 00:15:10
    overflow data set here in a second now
  • 00:15:12
    if you have a lot of columns and want to
  • 00:15:15
    see all of them easily then we can just
  • 00:15:17
    grab the columns specifically by saying
  • 00:15:19
    D F dot columns and we can run this and
  • 00:15:24
    we can see here that this gives us all
  • 00:15:28
    of our columns here so our columns are
  • 00:15:30
    an index of first last and email okay so
  • 00:15:34
    now we've seen
  • 00:15:35
    to get a column but how would we get a
  • 00:15:37
    row so in order to get rose we can use
  • 00:15:40
    the Lok and I Lok indexers so that is
  • 00:15:44
    Lok and I look so let's take a look at
  • 00:15:48
    these so first let's take a look at I
  • 00:15:51
    look so I local iรครดs us to access rows
  • 00:15:54
    by integer location hence the name I Lok
  • 00:15:58
    is integer location so if I wanted to
  • 00:16:00
    get the first row then we can just say
  • 00:16:03
    DF dot i lok and then use brackets here
  • 00:16:07
    too since this is an indexer use
  • 00:16:10
    brackets and pass in a 0 and that will
  • 00:16:13
    give us the first row so if I run this
  • 00:16:16
    then we can see that the first row has a
  • 00:16:19
    first name of Cori last name of Schaefer
  • 00:16:21
    and email of corium Schaefer at
  • 00:16:23
    gmail.com so what that did is it returns
  • 00:16:26
    a series that contains the values of
  • 00:16:28
    that first row of data which like I said
  • 00:16:31
    is the first name last name and email of
  • 00:16:34
    the first person in this example and
  • 00:16:36
    again we haven't discussed indexes yet
  • 00:16:39
    that will be in the next video but the
  • 00:16:42
    index here is the column names so that
  • 00:16:45
    we know what those values are so up here
  • 00:16:49
    our index was 0 1 & 2 but whenever we're
  • 00:16:53
    actually accessing a row it's going to
  • 00:16:56
    set that index to the column name so
  • 00:16:58
    that we know what those values are
  • 00:16:59
    because if this just said 0 1 & 2 then
  • 00:17:02
    we might not know what these are
  • 00:17:04
    and just like when we selected multiple
  • 00:17:06
    columns we can select multiple rows as
  • 00:17:08
    well by passing in a list of integers so
  • 00:17:11
    if I want the 1st and 2nd row then we
  • 00:17:15
    can just say and again this is going to
  • 00:17:17
    be a pair of brackets within these
  • 00:17:20
    brackets because we're passing in a list
  • 00:17:23
    to our index here and I'm just going to
  • 00:17:26
    pass in a list of 0 & 1 so if I run this
  • 00:17:30
    then we can see that now we get the
  • 00:17:32
    first two rows of data and again be sure
  • 00:17:35
    to pass in an inner list inside those
  • 00:17:38
    brackets so that it does what you expect
  • 00:17:40
    it to do and also we can see that now
  • 00:17:42
    we're getting a data frame with these
  • 00:17:44
    multiple rows now with these I'll oak
  • 00:17:48
    and Lok
  • 00:17:48
    indexers we can also select columns as
  • 00:17:51
    well and that is going to be the second
  • 00:17:54
    value that we pass into these outer
  • 00:17:56
    brackets so if we thought of I'll oak
  • 00:17:59
    and Lok as functions then we can think
  • 00:18:02
    of the rows that we want as the first
  • 00:18:04
    argument and the columns as the second
  • 00:18:07
    argument so let me show you what this
  • 00:18:08
    looks like so here we have our inner
  • 00:18:11
    bracket those are the rows that we want
  • 00:18:13
    but now after that list we can put a
  • 00:18:15
    comma and now we can specify the column
  • 00:18:19
    that we want now with I Lok we can't
  • 00:18:21
    specify an actual column name because
  • 00:18:23
    these use integers integer locations so
  • 00:18:27
    these are for integers only so remember
  • 00:18:30
    our first name is the first column the
  • 00:18:33
    last name is the second column and the
  • 00:18:35
    email is the third column so if we
  • 00:18:37
    wanted to grab the email address of the
  • 00:18:40
    first two rows then we can grab the
  • 00:18:42
    column at index 2 which will be the
  • 00:18:46
    third column since all of these start at
  • 00:18:48
    0 so if I was to pass in a 2 here and
  • 00:18:51
    run that then we can see that now we get
  • 00:18:54
    the email addresses of these first two
  • 00:18:56
    rows okay so that's I'll okay so now
  • 00:18:59
    let's look at Lok so with I Lok we were
  • 00:19:02
    searching by integer location with Lok
  • 00:19:05
    we're going to be searching by label and
  • 00:19:08
    when we're talking about labels for rows
  • 00:19:10
    these will be the indexes and again we
  • 00:19:13
    don't have custom indexes right now so
  • 00:19:15
    this index is just a default range of
  • 00:19:18
    integers so at the moment this will
  • 00:19:20
    somewhat be similar with I Lok the I Lok
  • 00:19:23
    indexer but we'll look at uses or use
  • 00:19:26
    cases with Lok with actual labels in the
  • 00:19:29
    next video when we cover indexes so real
  • 00:19:32
    quick let's look at our entire data
  • 00:19:35
    frame again so I'm just going to print
  • 00:19:37
    that out down here so like I said over
  • 00:19:40
    here on the far left these are our
  • 00:19:42
    indexes so these are the labels for that
  • 00:19:45
    row so if I want the first row then by
  • 00:19:48
    default this just has a label of 0 so I
  • 00:19:51
    can say DF Lok and pass in a 0 there and
  • 00:19:55
    if I run that then we can see that we
  • 00:19:58
    get that row with that label of 0 and
  • 00:20:00
    again I know that that looks similar to
  • 00:20:02
    look at the moment but we'll see how to
  • 00:20:04
    use indexes with labels in the next
  • 00:20:06
    video and just like with I Lok we can
  • 00:20:10
    also pass in a list to specify multiple
  • 00:20:12
    rows so if I wanted the first and second
  • 00:20:15
    row then just like with I Lok I can pass
  • 00:20:18
    in an inner list here so let's say that
  • 00:20:21
    I want the first row and the second row
  • 00:20:23
    so I'll run that we can see that now we
  • 00:20:26
    get the first and the second row and
  • 00:20:28
    again now we can see that we are getting
  • 00:20:30
    a data frame back with now that we have
  • 00:20:33
    multiple rows and just like with I Lok
  • 00:20:36
    we can also pass in a second value into
  • 00:20:39
    our indexer to select specific columns
  • 00:20:42
    for these rows now with I'll oprah used
  • 00:20:45
    integers to select the columns but now
  • 00:20:48
    that we're using lok we can use labels
  • 00:20:50
    so if we want the email column of these
  • 00:20:53
    first two rows then now we can just pass
  • 00:20:56
    in a value of email so if I run that
  • 00:20:59
    then we can see that now we get the
  • 00:21:01
    email value of these first two rows now
  • 00:21:03
    I didn't show this with I Lok but we can
  • 00:21:06
    also pass in a list for the columns as
  • 00:21:08
    well so if I want the last name and the
  • 00:21:11
    email for these rows then instead of
  • 00:21:14
    just passing in a string as this second
  • 00:21:17
    value here then we can pass in a list of
  • 00:21:20
    strings of the columns that we want so
  • 00:21:22
    I'm gonna wrap this in brackets here I
  • 00:21:25
    know that this can get a little
  • 00:21:27
    confusing with all these inner brackets
  • 00:21:28
    but let's say that we want email and we
  • 00:21:32
    want last name so if I run this then now
  • 00:21:36
    we can see that we got these specific
  • 00:21:38
    columns here email and last name for
  • 00:21:40
    these specific rows the row with label 0
  • 00:21:43
    and the row with the label of 1 and also
  • 00:21:46
    notice that the columns display and the
  • 00:21:50
    order that we used in our list up here
  • 00:21:52
    within loke which is a different order
  • 00:21:56
    from our original data frame so up here
  • 00:21:58
    its first last email but we asked for
  • 00:22:01
    email and last and it gave us back in
  • 00:22:04
    that order of email and last okay so now
  • 00:22:07
    that we've seen the basics of grabbing
  • 00:22:08
    certain rows and columns from a small
  • 00:22:11
    data set now let's go back to our data
  • 00:22:13
    set from the last video and
  • 00:22:15
    see how we grab some rows and columns
  • 00:22:17
    from the Stack Overflow data set so I'm
  • 00:22:20
    gonna go over here to back to our pandas
  • 00:22:22
    demo here and again just a quick
  • 00:22:25
    overview of the data that we have here
  • 00:22:26
    we're importing pandas we have DF as our
  • 00:22:30
    main survey results here our schema DF
  • 00:22:32
    as our schema results we are setting
  • 00:22:36
    some options here this is what our main
  • 00:22:38
    data frame head looks like which is the
  • 00:22:41
    first five rows and then this is what
  • 00:22:43
    our schema looks like so I'm going to go
  • 00:22:45
    down below our schema here and now let's
  • 00:22:48
    mess around with this a little bit so
  • 00:22:50
    let's go over a bit of what we learned
  • 00:22:51
    and pluck out certain rows and columns
  • 00:22:53
    but first let's see how many rows and
  • 00:22:56
    columns that we have in this data frame
  • 00:22:58
    now we saw a couple couple of different
  • 00:23:00
    ways to do this in the last video but
  • 00:23:02
    the easiest way to do this is to use the
  • 00:23:04
    shape attribute so if I say DF dot shape
  • 00:23:07
    and run this then we can see that we
  • 00:23:09
    have 88,000 rows and 85 columns so let's
  • 00:23:14
    grab all of the responses for the
  • 00:23:17
    hobbiest column so again what I'm trying
  • 00:23:20
    to do here is if we look at our main
  • 00:23:22
    data frame I want to grab all of the
  • 00:23:24
    responses for this column right here
  • 00:23:27
    hobbiest okay so how would we do that
  • 00:23:30
    now if you remember if you want to see
  • 00:23:34
    what columns are available then you
  • 00:23:36
    could just say DF doc columns to see all
  • 00:23:39
    of these we can see that these are kind
  • 00:23:41
    of long we have 85 here but here we have
  • 00:23:44
    hobbiest which is the one that we want
  • 00:23:45
    and that is the question where people
  • 00:23:47
    answered if they code as a hobby or not
  • 00:23:50
    and in the next video we're going to
  • 00:23:52
    cover indexes I'll show how we can you
  • 00:23:56
    know search a schema data frame to find
  • 00:23:59
    exact questions so that we can see what
  • 00:24:03
    questions are what specific columns and
  • 00:24:05
    the data frame but right now let's just
  • 00:24:07
    grab those hobbyist responses so if you
  • 00:24:10
    remember from that small data set that
  • 00:24:12
    we just saw in order to grab that
  • 00:24:15
    hobbyist column we can just access that
  • 00:24:18
    like a key so if I say DF and then pass
  • 00:24:22
    in hobbyists there then we get a series
  • 00:24:24
    of all of those responses and luckily
  • 00:24:27
    that doesn't display the entire 89
  • 00:24:29
    thousand rows and our browser here but
  • 00:24:32
    we do get the head and the tail of that
  • 00:24:34
    data to get an idea of what those
  • 00:24:36
    responses look like now real quick let
  • 00:24:39
    me show you something that will cover
  • 00:24:40
    more of further into the series but I
  • 00:24:43
    want to give you an idea of how powerful
  • 00:24:45
    something like pandas is so let's say
  • 00:24:48
    that we wanted to know how many of these
  • 00:24:50
    responses were answered yes and how many
  • 00:24:53
    were answered no now if we were using
  • 00:24:55
    regular Python then we might import the
  • 00:24:58
    counter class or write a quick function
  • 00:25:00
    or a loop to do this but pandas has so
  • 00:25:03
    much of this stuff already built in so
  • 00:25:05
    to get the count of unique values in
  • 00:25:08
    this column I can just use this value
  • 00:25:11
    counts method to calculate this so right
  • 00:25:13
    up here I can just tack on a method of
  • 00:25:16
    value underscore counts now again this
  • 00:25:20
    is going to be for a future video but I
  • 00:25:22
    just want to give you an idea of what
  • 00:25:24
    pandas can do so whenever I add this
  • 00:25:27
    value counts method we can see that out
  • 00:25:29
    of this series that we returned here for
  • 00:25:32
    all of our answers for this hobbyist
  • 00:25:34
    question the value counts are seventy
  • 00:25:37
    one thousand people said yes they do
  • 00:25:40
    code as a hobby and about eighteen
  • 00:25:43
    thousand said no they don't code as a
  • 00:25:45
    hobby and again we'll cover more of this
  • 00:25:46
    and future videos when we learn more
  • 00:25:48
    about analyzing data in depth but I
  • 00:25:50
    wanted to give you a quick taste as to
  • 00:25:52
    why it's beneficial to even learn pandas
  • 00:25:55
    like we're doing here it makes this type
  • 00:25:57
    of stuff really easy and we could go
  • 00:26:00
    further and plot that out and everything
  • 00:26:02
    okay but with that quick sidetrack out
  • 00:26:05
    of the way let's keep going and go over
  • 00:26:08
    the other things that we learned earlier
  • 00:26:10
    so we got a column here so let me get
  • 00:26:14
    rid of that value counts so we have our
  • 00:26:16
    column here so now let's grab a specific
  • 00:26:18
    row and a specific column so let's grab
  • 00:26:22
    the first row and we'll also grab that
  • 00:26:24
    same hobbyist column for that row so how
  • 00:26:27
    do we grab rows so remember if we want
  • 00:26:30
    to grab rows that we use the loke or
  • 00:26:33
    I'll oak met or indexers so I'm going to
  • 00:26:36
    go ahead and use lok because remember
  • 00:26:38
    that that's the one that allows me to
  • 00:26:40
    use labels and i'm going to use a label
  • 00:26:43
    instead
  • 00:26:43
    an integer for the hobbyist column name
  • 00:26:46
    now again since we're just using a
  • 00:26:48
    default index and we can see the indexes
  • 00:26:51
    here 0 1 2 3 4 since we're just using a
  • 00:26:54
    default index instead of a custom one
  • 00:26:56
    our current labels for our indexes are
  • 00:26:59
    just a range of values from 0 to 88,000
  • 00:27:03
    something so in order to get the first
  • 00:27:07
    row I can say D F dot Lok and pass in
  • 00:27:10
    that label of that first index which in
  • 00:27:14
    this case is just a 0 and these are all
  • 00:27:18
    of the responses from the first
  • 00:27:20
    respondent so this is one person's
  • 00:27:23
    entire survey results here now if we
  • 00:27:27
    wanted to see their results for just
  • 00:27:30
    that hobbyist question then remember
  • 00:27:33
    within the brackets here I can pass in a
  • 00:27:36
    second value for the columns that I
  • 00:27:39
    would like so if I pass in hobbiest then
  • 00:27:41
    we can see that their answer to that
  • 00:27:43
    whether they code as a hobby is yes and
  • 00:27:46
    also like we saw earlier I can also pass
  • 00:27:49
    in a list of multiple rows or multiple
  • 00:27:51
    columns to get the exact rows and
  • 00:27:54
    columns that we want to see so to get
  • 00:27:56
    the first three responses for the
  • 00:27:58
    hobbiest column then instead of just
  • 00:28:01
    passing in a single value here then I
  • 00:28:03
    can put in some inner brackets here and
  • 00:28:05
    pass in a list of multiple rows so if I
  • 00:28:09
    pass in a list of three rows here and
  • 00:28:13
    run this then these are the first three
  • 00:28:16
    results for that hobbiest column now one
  • 00:28:19
    thing that we haven't seen yet is that
  • 00:28:21
    we can also use slicing to grab multiple
  • 00:28:23
    rows and columns as well now if you're
  • 00:28:26
    familiar with list slicing then this is
  • 00:28:29
    pretty much the same thing the only
  • 00:28:31
    difference is that our last value is
  • 00:28:33
    going to be inclusive
  • 00:28:35
    at least with loke so if we wanted the
  • 00:28:38
    first three rows then we could say that
  • 00:28:41
    we want from 0 and then slice to the
  • 00:28:45
    index of 2 and if I run this oops and I
  • 00:28:49
    accidentally made a mistake here
  • 00:28:51
    actually whenever we're using slicing we
  • 00:28:54
    do not wrap these in brackets
  • 00:28:57
    so I'm gonna take that out so for our
  • 00:28:59
    first value we're just saying we're no
  • 00:29:02
    longer passing in a list of values we're
  • 00:29:04
    just passing in this slice of zero and
  • 00:29:06
    then colon 2 so if I run that then we
  • 00:29:10
    can see that now we get the same result
  • 00:29:12
    that we got before and we can do this
  • 00:29:14
    with the columns as well so right now
  • 00:29:16
    we're only getting two hobbiest column
  • 00:29:18
    but let's go back and look at our
  • 00:29:20
    columns and see what columns come after
  • 00:29:22
    the hobbiest column so up here these are
  • 00:29:25
    all of our columns here where we printed
  • 00:29:27
    them out so let's look at a few columns
  • 00:29:29
    after hobbiest here so we have open
  • 00:29:31
    source or open source employment so
  • 00:29:34
    let's say that we wanted to get all of
  • 00:29:36
    the columns from hobbiest all the way up
  • 00:29:39
    to this employee employment column so to
  • 00:29:41
    do that I'm just gonna copy that we can
  • 00:29:44
    come down here and we can just pass in a
  • 00:29:48
    colon and then employment and that'll do
  • 00:29:51
    a slice from hobbyists to employment now
  • 00:29:54
    I also want to point out that this is
  • 00:29:56
    the reason that slicing is inclusive for
  • 00:30:01
    these values because imagine how much of
  • 00:30:03
    a pain it would be if we wanted all of
  • 00:30:06
    the columns from hobbyist to employment
  • 00:30:08
    but the last value here wasn't inclusive
  • 00:30:12
    and we had to come up here and say well
  • 00:30:13
    if I want from hobbyists to employment
  • 00:30:15
    then I really need to pass in you know
  • 00:30:18
    hobbyist to country and country's not
  • 00:30:20
    inclusive that would just be way too
  • 00:30:23
    confusing so it's so much easier for
  • 00:30:25
    this to be inclusive here so if you are
  • 00:30:28
    wondering why they did that then that's
  • 00:30:30
    why they do it so if I run this then we
  • 00:30:33
    can see that now for we get these first
  • 00:30:36
    three rows here and for the first three
  • 00:30:38
    rows we get all of those responses for
  • 00:30:42
    the columns of hobbyist open source er
  • 00:30:44
    all the way up to employment so now
  • 00:30:47
    we've seen an overview of everything
  • 00:30:48
    that we've learned about exploring our
  • 00:30:50
    data frames and series objects so far
  • 00:30:53
    and how we can pluck some you know basic
  • 00:30:55
    information out of these now there's
  • 00:30:57
    still tons to learn about data frames
  • 00:30:59
    and series objects and we'll continue
  • 00:31:01
    learning more learning more about these
  • 00:31:03
    throughout the pandas series since these
  • 00:31:05
    two data types are the main data types
  • 00:31:07
    that we'll be using and pandas so we'll
  • 00:31:09
    be learning more about advanced
  • 00:31:11
    filtering queries how to see which data
  • 00:31:14
    type each column of our data contains
  • 00:31:17
    and a lot more now before we end here
  • 00:31:19
    I do want to mention that way you have a
  • 00:31:21
    sponsor for this video and that is
  • 00:31:23
    brilliant
  • 00:31:23
    org brilliant is a problem-solving
  • 00:31:25
    website that helps you understand
  • 00:31:27
    underlying concepts by actively working
  • 00:31:29
    through guided lessons and brilliant
  • 00:31:31
    would be an excellent way to supplement
  • 00:31:32
    what you learn here with their hands-on
  • 00:31:34
    courses they have some excellent courses
  • 00:31:36
    and lessons on data science that do a
  • 00:31:38
    deep dive on how to think about and
  • 00:31:40
    analyze data correctly so if you're
  • 00:31:42
    watching my panda series because you're
  • 00:31:44
    getting into the data science field then
  • 00:31:46
    I would highly recommend also checking
  • 00:31:47
    out brilliant and seeing what other data
  • 00:31:49
    science skills you can learn they even
  • 00:31:51
    use Python in their statistics course
  • 00:31:53
    and will quiz you on how to correctly
  • 00:31:55
    analyze the data within the language
  • 00:31:57
    they're guided lessons will challenge
  • 00:31:58
    you but you'll also have the ability to
  • 00:32:00
    get hints or even solutions if you need
  • 00:32:02
    them it's really tailored towards
  • 00:32:04
    understanding the material so to support
  • 00:32:06
    my channel and learn more about
  • 00:32:07
    brilliant you can go to brilliant org
  • 00:32:09
    forge slash CMS to sign up for free and
  • 00:32:12
    also the first 200 people to go to that
  • 00:32:14
    link will get 20% off the annual premium
  • 00:32:17
    subscription and you can find that link
  • 00:32:19
    in the description section below
  • 00:32:20
    again that's brilliant dot org forge
  • 00:32:23
    slash C m/s okay so I think that's gonna
  • 00:32:27
    do it for this pandas video I hope you
  • 00:32:29
    feel like you got a good introduction to
  • 00:32:31
    the data frame and series objects and
  • 00:32:33
    how to navigate through some of your
  • 00:32:35
    data now like I said there's a lot more
  • 00:32:37
    to learn about these data types and some
  • 00:32:40
    advanced filtering that we'll learn and
  • 00:32:42
    future videos so be sure to stick around
  • 00:32:44
    for that now in the next video we're
  • 00:32:46
    going to be learning more about indexes
  • 00:32:48
    so we saw basic default indexes in this
  • 00:32:51
    video but we'll learn how to set the
  • 00:32:53
    index to specific columns and the
  • 00:32:55
    benefits of doing that in the next video
  • 00:32:57
    but if anyone has any questions about
  • 00:32:59
    what we covered here then feel free to
  • 00:33:00
    ask in the comment section below and
  • 00:33:02
    I'll do my best to answer those and if
  • 00:33:04
    you enjoyed these tutorials and would
  • 00:33:05
    like to support them then there are
  • 00:33:06
    several ways you can do that the easiest
  • 00:33:08
    ways to simply LIKE the video and give
  • 00:33:10
    it a thumbs up and also it's a huge help
  • 00:33:12
    to share these videos with anyone who
  • 00:33:13
    you think would find them useful
  • 00:33:14
    and if you have the means you can
  • 00:33:16
    contribute through patreon and there's a
  • 00:33:17
    link to that page in the description
  • 00:33:18
    section below be sure to subscribe for
  • 00:33:20
    future videos and thank you all for
  • 00:33:21
    watching
  • 00:33:33
    you
Tags
  • Pandas
  • DataFrame
  • Series
  • Python
  • Data Analysis
  • loc
  • iloc
  • value_counts
  • Indexes
  • Data Science