00:00:00
hey there how's it going everybody in
00:00:01
this video we're gonna continue learning
00:00:02
more about pandas and specifically we're
00:00:04
going to be learning about the data
00:00:06
frame and series data types so like I
00:00:08
said in the last video these are
00:00:10
basically the backbone of pandas and are
00:00:12
the two primary data types that you'll
00:00:14
likely be using the most so in this
00:00:16
video we're gonna go over how we can
00:00:18
think of data frames and series data
00:00:20
types in a different way and then we'll
00:00:22
look at the basics of getting
00:00:24
information from these data types now I
00:00:26
would like to mention that we do have a
00:00:27
sponsor for this series of videos and
00:00:29
that is brilliant work so I really want
00:00:31
to thank brilliant for sponsoring the
00:00:32
series and it would be great if you all
00:00:34
can check them out using the link in the
00:00:35
description section below and support
00:00:37
the sponsors and I'll talk more about
00:00:38
their services in just a bit so with
00:00:40
that said let's go ahead and get started
00:00:42
okay so first let's look at what a data
00:00:45
frame is and then we'll learn more about
00:00:47
how we can think about this in terms of
00:00:50
a Python object so we saw data frames
00:00:52
briefly in our last video when we check
00:00:55
to make sure that our data was loaded in
00:00:57
correctly so these were the objects that
00:01:00
were displayed in Jupiter as rows and
00:01:03
columns basically a table so let's take
00:01:06
a look at what this looks like so if you
00:01:08
were following along with the last video
00:01:09
this is basically the same jupiter
00:01:12
notebook that i had before except this
00:01:15
has just cleaned up a bit so we're
00:01:17
importing pandas here we are reading in
00:01:20
our csv files so one is just our main
00:01:23
data frame for our survey results one is
00:01:26
our schema data frame for the schema
00:01:29
results and then we are setting some
00:01:31
options here where we have the max
00:01:33
columns set to 85 so we can see all the
00:01:36
columns and the max row set the 85 so
00:01:38
that we can see all of the schema now if
00:01:41
you haven't been following along with
00:01:42
the video so far then I do have a link
00:01:44
in the description section below that
00:01:46
links to where you can download this
00:01:47
Dayla data and follow along with this
00:01:50
okay so this is a data frame here so
00:01:54
where we are printing out D F dot head
00:01:56
this is what this returns so this here
00:01:59
is the first five rows of our data frame
00:02:03
so you can see that a data frame is made
00:02:05
up of multiple rows here and we also
00:02:08
have multiple columns so in the case of
00:02:10
this data
00:02:11
these are survey results
00:02:13
but your data can be you know whatever
00:02:16
your data is but it's most likely going
00:02:18
to be in rows and columns kind of like a
00:02:21
table so for this data with these being
00:02:24
survey results each row is a survey as
00:02:27
one person who answered the survey and
00:02:30
each question was their answer for that
00:02:33
question on the survey so for example
00:02:36
this respondent number one here they
00:02:38
answered that yes they were a hobbyist
00:02:40
and if you want to know what hobbyist
00:02:42
means then we just like we saw in the
00:02:45
last video we can look at our schema
00:02:47
data frame so let me go ahead and print
00:02:49
this out here and let's look at this so
00:02:53
if I look at what a hobbyist is then we
00:02:56
can see that that question was do you
00:02:58
code as a hobby so that's what this data
00:03:01
is and that kind of gives us an idea of
00:03:03
what a data frame is basically a data
00:03:05
frame is just rows and columns but now
00:03:09
let me explain how I like to think of
00:03:11
data frames using native Python so if we
00:03:14
were only using Python and not using
00:03:16
pandas to store information in rows and
00:03:18
columns
00:03:19
then how would we do this well for those
00:03:22
of you familiar with dictionaries you
00:03:24
might think that it's a good idea to
00:03:25
store information that way so let me
00:03:27
pull up a new notebook here that I have
00:03:30
open here with some snippets and let's
00:03:33
take a look at this okay so let's look
00:03:35
at this first cell here so a lot of us
00:03:37
are probably familiar with Python
00:03:39
dictionaries where we have keys and
00:03:41
values so if I'm representing some data
00:03:43
in this example it's a person then we
00:03:46
can use a dictionary so first off I have
00:03:49
a key of first which is going to be the
00:03:51
first name and then that has a value of
00:03:54
kori and then we also have keys and
00:03:57
values for the last name and the email
00:03:59
as well okay so this dictionary here
00:04:02
represents data for a single person but
00:04:05
how would we represent data for multiple
00:04:07
people well there are probably a couple
00:04:09
of different ways that we can do this
00:04:10
but the way that I like to think of this
00:04:12
in terms of learning pandas is to make
00:04:15
all of our values and our dictionaries a
00:04:17
list so let's take a look in the second
00:04:21
cell here to see what this would look
00:04:22
like so here in the second cell now we
00:04:25
can see that we have a pretty similar
00:04:26
diction
00:04:27
to what we had above but now instead of
00:04:31
just a single string here for the values
00:04:33
I instead have a list and our list
00:04:36
currently just has one person but now
00:04:39
since this is a list we can add more
00:04:42
first names and information in here so
00:04:44
the first value of our list is going to
00:04:47
be our first person so if I go to the
00:04:51
third cell down here at the bottom then
00:04:53
now we can use this as an example to see
00:04:55
what this would look like with multiple
00:04:58
people so the second value in our list
00:05:01
will be our second person and the third
00:05:04
value in the list will be our third
00:05:06
person so if we look here we have people
00:05:08
we have a key of first so if we want the
00:05:12
second person here we go to the second
00:05:13
value that's Jane the last name is Doe
00:05:16
and the email go to the second value
00:05:19
here is Jane Doe at email com
00:05:21
if you want the third person that would
00:05:23
be John and then third value in last
00:05:25
would be Doe then third value and email
00:05:28
is John Doe at email com
00:05:29
so we can kind of think of this like
00:05:31
rows and columns the keys are the
00:05:33
columns and the values are the rows now
00:05:36
if you look up the definition of a
00:05:38
panda's data frame online then you'll
00:05:40
see a lot of definitions that just say
00:05:42
something like it's a two dimensional
00:05:45
data structure now that might sound a
00:05:47
little confusing but in layman's terms
00:05:49
that basically just means rows and
00:05:51
columns okay so like I said here the key
00:05:53
for email here would be our email column
00:05:56
and contain all of the email values and
00:06:00
if we wanted to see the email column
00:06:03
then we can just access that key so if I
00:06:07
come down here into actually let me run
00:06:11
all of these really quick here I think I
00:06:14
open this up without running these so I
00:06:15
want to make sure that we have this
00:06:17
registered okay so if I wanted to see
00:06:19
that email column then I can simply say
00:06:22
people and then access that email key if
00:06:25
I run that then we can see that we got
00:06:27
all of the emails now the reason that I
00:06:29
wanted to show you this is because I
00:06:30
feel like this really helped me in terms
00:06:33
of how I think about data frames so data
00:06:35
frames are very similar to this but with
00:06:37
more functionality than what we have
00:06:39
here in stand
00:06:40
Python now we can actually create a data
00:06:43
frame from this dictionary and see what
00:06:45
this looks like
00:06:46
so let's do that and look at some basic
00:06:48
data frame functionality and then we'll
00:06:50
look at this more using the stack
00:06:52
overflow data from the last video so
00:06:55
here in this bottom cell in order to
00:06:57
create a data frame from the information
00:06:59
that we have here I'm going to go ahead
00:07:01
and import pandas so I'm going to say
00:07:03
import pandas as PD and now we can
00:07:07
create a data frame actually using this
00:07:09
dictionary that we have up here so to do
00:07:12
that I can just say DF is equal to PD
00:07:15
dot data frame and check the casing
00:07:19
there that's a capital D and a capital F
00:07:21
and then we'll just pass in that
00:07:24
dictionary that has values as lists so
00:07:29
if I run this and that seemed to run
00:07:31
okay without any errors and now let me
00:07:34
just print out DF here and if I print
00:07:36
that out then we can see that now our
00:07:38
data frame is representing this in a way
00:07:41
to where we do have rows and columns
00:07:43
that we can visualize so we get these
00:07:45
people printed out in a nice table of
00:07:47
rows and columns now we also have these
00:07:50
over here to the far left that don't
00:07:53
have column names this 0 1 & 2 now this
00:07:56
is an index now I'm not going to go too
00:07:59
much into indexes right now because
00:08:01
that's what the next video is going to
00:08:02
cover but basically it's a unique value
00:08:05
for our rows now it doesn't need to be
00:08:07
unique but again we'll talk more about
00:08:10
that in the video specifically on
00:08:12
indexes so now that we have a bit of an
00:08:14
idea of how to think about data frames
00:08:16
now let's take a look at how to access
00:08:19
information here within the data frame
00:08:21
so first let's just access the values of
00:08:24
a single column so just like we did with
00:08:26
the dictionary we can access a single
00:08:29
column just like we were accessing the
00:08:32
key of a dictionary so just like I did
00:08:35
people and email up here I can do very
00:08:38
similar down here and just say that I
00:08:41
want that email column of my data frame
00:08:43
now that's not actually a key that is
00:08:47
going to access the column of a data
00:08:49
frame but we can see here that we get
00:08:51
all of the emails back from that data
00:08:54
so again I do want to emphasize that I
00:08:56
only use the pure Python example so that
00:09:00
we could get an idea of how to think
00:09:01
about a data frame but like I said a
00:09:04
data frame is much much more than just a
00:09:06
dictionary of Lists so for example we
00:09:09
can see that when we displayed the email
00:09:11
column here it doesn't look the same as
00:09:14
when we displayed the list of values
00:09:17
from that dictionary and that's because
00:09:19
this is actually returning a series and
00:09:22
we can see this if we check the type so
00:09:27
if I check the type of this email column
00:09:32
here so let me run that we can see that
00:09:36
this is Panda score series series so
00:09:39
this is a series object so what is a
00:09:42
series so a series is still basically a
00:09:45
list of data but just like with a data
00:09:48
frame it has a lot more functionality
00:09:50
than just that now if you look up the
00:09:53
definition of a series online then
00:09:55
you'll see a lot of definitions that
00:09:56
just say it's a one-dimensional array
00:09:58
and that might sound a little confusing
00:10:00
but in layman's terms that basically
00:10:03
just means that it's rows of data so
00:10:06
again you can think of a data frame as
00:10:08
being rows and columns and a series as
00:10:11
being rows of a single column so a data
00:10:16
frame is basically a container for
00:10:18
multiple of these series objects so
00:10:21
again that's important so let me go over
00:10:23
that one more time so we can see that a
00:10:25
data frame here is two-dimensional
00:10:27
because it has rows and columns so we
00:10:29
can see here that it has you know first
00:10:32
name last name email now whenever we
00:10:34
access just the email then we can see
00:10:37
that we get all these emails here now
00:10:39
this is a series and I said that a data
00:10:42
frame basically contains is a container
00:10:44
for multiple series objects so we can
00:10:48
think of this email column here as a
00:10:49
series this last column here is a series
00:10:52
and this first column as a series and
00:10:54
also we can see where we printed out
00:10:57
this series here for the emails we can
00:11:00
see that this series also has an index
00:11:02
as well just like our data frame did so
00:11:04
this index is over here on the left the
00:11:06
0 1
00:11:07
- okay so we can access a single column
00:11:10
of a data frame like we're accessing a
00:11:13
key just like we did here in this cell
00:11:17
but you might also see some people use
00:11:20
dot notation to do the same thing so you
00:11:22
might see some people do it like this so
00:11:25
they might do D F dot email and if I run
00:11:28
this cell then we can see that let me
00:11:32
get rid of this cell here and just so we
00:11:35
can compare these two we can see that
00:11:37
this gives us the same thing whether we
00:11:40
access this like a key or whether we use
00:11:42
dot notation this returns the same
00:11:45
series object of the email values now
00:11:48
whichever way that you want to do this
00:11:49
is really just a personal preference I
00:11:51
actually prefer the first way of using
00:11:54
the brackets and there are a couple of
00:11:57
reasons that I prefer to use that over
00:11:59
dot notation first is that I like using
00:12:02
the brackets because there's a chance
00:12:05
that one of your columns is named the
00:12:07
same thing as one of the attributes or
00:12:10
methods of a data frame and if that's
00:12:12
the case then using the dot notation
00:12:14
might give you some errors so for
00:12:17
example if a data frame a dataframe has
00:12:20
a method called count so if you had a
00:12:23
column named count and you did and you
00:12:27
were trying to access that count column
00:12:29
using dot notation then that's actually
00:12:32
going to access the count method from
00:12:36
data frame instead of that count column
00:12:39
so that actually wouldn't work how we
00:12:41
did it here if you wanted to access the
00:12:43
actual column called count which we
00:12:46
don't have one in this specific data
00:12:48
frame but if we did then we would have
00:12:50
to access it like this so that's kind of
00:12:52
why I prefer brackets so I'm going to be
00:12:55
using brackets throughout this series
00:12:59
but I wanted you to know about dot
00:13:00
notation because if you're working with
00:13:02
other people using pandas then you might
00:13:04
see them access columns in using dot
00:13:07
notation so you need to know that it's
00:13:10
at least a possibility and again that
00:13:12
doesn't mean that they're doing it wrong
00:13:13
it's just a personal preference I just
00:13:16
prefer using the brackets okay so I said
00:13:19
that data frames have a lot
00:13:21
functionality than what we saw using you
00:13:24
know standard Python so let's look at
00:13:27
some other stuff that we can do here so
00:13:29
let's say that we wanted to access
00:13:30
multiple columns now in order to access
00:13:33
multiple columns we can use the bracket
00:13:35
notation and pass in a list of the
00:13:38
columns that we want so if I wanted both
00:13:40
the last name and email columns then we
00:13:44
could say DF and use our brackets just
00:13:47
like we saw before but now I'm going to
00:13:49
put in a set of inner brackets here as a
00:13:51
list of columns that I want to access so
00:13:55
for the first value
00:13:56
I'll put last for the last name and for
00:13:58
the second value I'll put email for the
00:14:00
email so if I run this then we can see
00:14:03
that now we have a data frame returned
00:14:06
here of the last column and the email
00:14:09
column now I want to emphasize again
00:14:11
here that I passed a list inside of
00:14:14
these brackets here
00:14:16
so there are two pairs of brackets you
00:14:19
can't leave off the inner brackets
00:14:21
because you'll likely get a key error
00:14:23
because pandas will think that you're
00:14:25
passing in both of those strings as a
00:14:27
single column name and another thing
00:14:30
that I want to point out here is that
00:14:32
now that we're getting multiple columns
00:14:35
this can no longer be a series because
00:14:38
remember a series is basically a single
00:14:40
column of rows so when we get multiple
00:14:44
columns like this
00:14:45
it's just returning another data frame
00:14:47
and in this case it's a filtered down
00:14:49
data frame with just these specific
00:14:52
columns so we filtered out the first
00:14:55
name column here and we just have the
00:14:57
last and the email okay so that's how we
00:14:59
get a specific column or multiple
00:15:01
columns and we can slice these as well
00:15:04
similar similar to how we slice a list
00:15:07
but I'll show that on our larger stack
00:15:10
overflow data set here in a second now
00:15:12
if you have a lot of columns and want to
00:15:15
see all of them easily then we can just
00:15:17
grab the columns specifically by saying
00:15:19
D F dot columns and we can run this and
00:15:24
we can see here that this gives us all
00:15:28
of our columns here so our columns are
00:15:30
an index of first last and email okay so
00:15:34
now we've seen
00:15:35
to get a column but how would we get a
00:15:37
row so in order to get rose we can use
00:15:40
the Lok and I Lok indexers so that is
00:15:44
Lok and I look so let's take a look at
00:15:48
these so first let's take a look at I
00:15:51
look so I local iรครดs us to access rows
00:15:54
by integer location hence the name I Lok
00:15:58
is integer location so if I wanted to
00:16:00
get the first row then we can just say
00:16:03
DF dot i lok and then use brackets here
00:16:07
too since this is an indexer use
00:16:10
brackets and pass in a 0 and that will
00:16:13
give us the first row so if I run this
00:16:16
then we can see that the first row has a
00:16:19
first name of Cori last name of Schaefer
00:16:21
and email of corium Schaefer at
00:16:23
gmail.com so what that did is it returns
00:16:26
a series that contains the values of
00:16:28
that first row of data which like I said
00:16:31
is the first name last name and email of
00:16:34
the first person in this example and
00:16:36
again we haven't discussed indexes yet
00:16:39
that will be in the next video but the
00:16:42
index here is the column names so that
00:16:45
we know what those values are so up here
00:16:49
our index was 0 1 & 2 but whenever we're
00:16:53
actually accessing a row it's going to
00:16:56
set that index to the column name so
00:16:58
that we know what those values are
00:16:59
because if this just said 0 1 & 2 then
00:17:02
we might not know what these are
00:17:04
and just like when we selected multiple
00:17:06
columns we can select multiple rows as
00:17:08
well by passing in a list of integers so
00:17:11
if I want the 1st and 2nd row then we
00:17:15
can just say and again this is going to
00:17:17
be a pair of brackets within these
00:17:20
brackets because we're passing in a list
00:17:23
to our index here and I'm just going to
00:17:26
pass in a list of 0 & 1 so if I run this
00:17:30
then we can see that now we get the
00:17:32
first two rows of data and again be sure
00:17:35
to pass in an inner list inside those
00:17:38
brackets so that it does what you expect
00:17:40
it to do and also we can see that now
00:17:42
we're getting a data frame with these
00:17:44
multiple rows now with these I'll oak
00:17:48
and Lok
00:17:48
indexers we can also select columns as
00:17:51
well and that is going to be the second
00:17:54
value that we pass into these outer
00:17:56
brackets so if we thought of I'll oak
00:17:59
and Lok as functions then we can think
00:18:02
of the rows that we want as the first
00:18:04
argument and the columns as the second
00:18:07
argument so let me show you what this
00:18:08
looks like so here we have our inner
00:18:11
bracket those are the rows that we want
00:18:13
but now after that list we can put a
00:18:15
comma and now we can specify the column
00:18:19
that we want now with I Lok we can't
00:18:21
specify an actual column name because
00:18:23
these use integers integer locations so
00:18:27
these are for integers only so remember
00:18:30
our first name is the first column the
00:18:33
last name is the second column and the
00:18:35
email is the third column so if we
00:18:37
wanted to grab the email address of the
00:18:40
first two rows then we can grab the
00:18:42
column at index 2 which will be the
00:18:46
third column since all of these start at
00:18:48
0 so if I was to pass in a 2 here and
00:18:51
run that then we can see that now we get
00:18:54
the email addresses of these first two
00:18:56
rows okay so that's I'll okay so now
00:18:59
let's look at Lok so with I Lok we were
00:19:02
searching by integer location with Lok
00:19:05
we're going to be searching by label and
00:19:08
when we're talking about labels for rows
00:19:10
these will be the indexes and again we
00:19:13
don't have custom indexes right now so
00:19:15
this index is just a default range of
00:19:18
integers so at the moment this will
00:19:20
somewhat be similar with I Lok the I Lok
00:19:23
indexer but we'll look at uses or use
00:19:26
cases with Lok with actual labels in the
00:19:29
next video when we cover indexes so real
00:19:32
quick let's look at our entire data
00:19:35
frame again so I'm just going to print
00:19:37
that out down here so like I said over
00:19:40
here on the far left these are our
00:19:42
indexes so these are the labels for that
00:19:45
row so if I want the first row then by
00:19:48
default this just has a label of 0 so I
00:19:51
can say DF Lok and pass in a 0 there and
00:19:55
if I run that then we can see that we
00:19:58
get that row with that label of 0 and
00:20:00
again I know that that looks similar to
00:20:02
look at the moment but we'll see how to
00:20:04
use indexes with labels in the next
00:20:06
video and just like with I Lok we can
00:20:10
also pass in a list to specify multiple
00:20:12
rows so if I wanted the first and second
00:20:15
row then just like with I Lok I can pass
00:20:18
in an inner list here so let's say that
00:20:21
I want the first row and the second row
00:20:23
so I'll run that we can see that now we
00:20:26
get the first and the second row and
00:20:28
again now we can see that we are getting
00:20:30
a data frame back with now that we have
00:20:33
multiple rows and just like with I Lok
00:20:36
we can also pass in a second value into
00:20:39
our indexer to select specific columns
00:20:42
for these rows now with I'll oprah used
00:20:45
integers to select the columns but now
00:20:48
that we're using lok we can use labels
00:20:50
so if we want the email column of these
00:20:53
first two rows then now we can just pass
00:20:56
in a value of email so if I run that
00:20:59
then we can see that now we get the
00:21:01
email value of these first two rows now
00:21:03
I didn't show this with I Lok but we can
00:21:06
also pass in a list for the columns as
00:21:08
well so if I want the last name and the
00:21:11
email for these rows then instead of
00:21:14
just passing in a string as this second
00:21:17
value here then we can pass in a list of
00:21:20
strings of the columns that we want so
00:21:22
I'm gonna wrap this in brackets here I
00:21:25
know that this can get a little
00:21:27
confusing with all these inner brackets
00:21:28
but let's say that we want email and we
00:21:32
want last name so if I run this then now
00:21:36
we can see that we got these specific
00:21:38
columns here email and last name for
00:21:40
these specific rows the row with label 0
00:21:43
and the row with the label of 1 and also
00:21:46
notice that the columns display and the
00:21:50
order that we used in our list up here
00:21:52
within loke which is a different order
00:21:56
from our original data frame so up here
00:21:58
its first last email but we asked for
00:22:01
email and last and it gave us back in
00:22:04
that order of email and last okay so now
00:22:07
that we've seen the basics of grabbing
00:22:08
certain rows and columns from a small
00:22:11
data set now let's go back to our data
00:22:13
set from the last video and
00:22:15
see how we grab some rows and columns
00:22:17
from the Stack Overflow data set so I'm
00:22:20
gonna go over here to back to our pandas
00:22:22
demo here and again just a quick
00:22:25
overview of the data that we have here
00:22:26
we're importing pandas we have DF as our
00:22:30
main survey results here our schema DF
00:22:32
as our schema results we are setting
00:22:36
some options here this is what our main
00:22:38
data frame head looks like which is the
00:22:41
first five rows and then this is what
00:22:43
our schema looks like so I'm going to go
00:22:45
down below our schema here and now let's
00:22:48
mess around with this a little bit so
00:22:50
let's go over a bit of what we learned
00:22:51
and pluck out certain rows and columns
00:22:53
but first let's see how many rows and
00:22:56
columns that we have in this data frame
00:22:58
now we saw a couple couple of different
00:23:00
ways to do this in the last video but
00:23:02
the easiest way to do this is to use the
00:23:04
shape attribute so if I say DF dot shape
00:23:07
and run this then we can see that we
00:23:09
have 88,000 rows and 85 columns so let's
00:23:14
grab all of the responses for the
00:23:17
hobbiest column so again what I'm trying
00:23:20
to do here is if we look at our main
00:23:22
data frame I want to grab all of the
00:23:24
responses for this column right here
00:23:27
hobbiest okay so how would we do that
00:23:30
now if you remember if you want to see
00:23:34
what columns are available then you
00:23:36
could just say DF doc columns to see all
00:23:39
of these we can see that these are kind
00:23:41
of long we have 85 here but here we have
00:23:44
hobbiest which is the one that we want
00:23:45
and that is the question where people
00:23:47
answered if they code as a hobby or not
00:23:50
and in the next video we're going to
00:23:52
cover indexes I'll show how we can you
00:23:56
know search a schema data frame to find
00:23:59
exact questions so that we can see what
00:24:03
questions are what specific columns and
00:24:05
the data frame but right now let's just
00:24:07
grab those hobbyist responses so if you
00:24:10
remember from that small data set that
00:24:12
we just saw in order to grab that
00:24:15
hobbyist column we can just access that
00:24:18
like a key so if I say DF and then pass
00:24:22
in hobbyists there then we get a series
00:24:24
of all of those responses and luckily
00:24:27
that doesn't display the entire 89
00:24:29
thousand rows and our browser here but
00:24:32
we do get the head and the tail of that
00:24:34
data to get an idea of what those
00:24:36
responses look like now real quick let
00:24:39
me show you something that will cover
00:24:40
more of further into the series but I
00:24:43
want to give you an idea of how powerful
00:24:45
something like pandas is so let's say
00:24:48
that we wanted to know how many of these
00:24:50
responses were answered yes and how many
00:24:53
were answered no now if we were using
00:24:55
regular Python then we might import the
00:24:58
counter class or write a quick function
00:25:00
or a loop to do this but pandas has so
00:25:03
much of this stuff already built in so
00:25:05
to get the count of unique values in
00:25:08
this column I can just use this value
00:25:11
counts method to calculate this so right
00:25:13
up here I can just tack on a method of
00:25:16
value underscore counts now again this
00:25:20
is going to be for a future video but I
00:25:22
just want to give you an idea of what
00:25:24
pandas can do so whenever I add this
00:25:27
value counts method we can see that out
00:25:29
of this series that we returned here for
00:25:32
all of our answers for this hobbyist
00:25:34
question the value counts are seventy
00:25:37
one thousand people said yes they do
00:25:40
code as a hobby and about eighteen
00:25:43
thousand said no they don't code as a
00:25:45
hobby and again we'll cover more of this
00:25:46
and future videos when we learn more
00:25:48
about analyzing data in depth but I
00:25:50
wanted to give you a quick taste as to
00:25:52
why it's beneficial to even learn pandas
00:25:55
like we're doing here it makes this type
00:25:57
of stuff really easy and we could go
00:26:00
further and plot that out and everything
00:26:02
okay but with that quick sidetrack out
00:26:05
of the way let's keep going and go over
00:26:08
the other things that we learned earlier
00:26:10
so we got a column here so let me get
00:26:14
rid of that value counts so we have our
00:26:16
column here so now let's grab a specific
00:26:18
row and a specific column so let's grab
00:26:22
the first row and we'll also grab that
00:26:24
same hobbyist column for that row so how
00:26:27
do we grab rows so remember if we want
00:26:30
to grab rows that we use the loke or
00:26:33
I'll oak met or indexers so I'm going to
00:26:36
go ahead and use lok because remember
00:26:38
that that's the one that allows me to
00:26:40
use labels and i'm going to use a label
00:26:43
instead
00:26:43
an integer for the hobbyist column name
00:26:46
now again since we're just using a
00:26:48
default index and we can see the indexes
00:26:51
here 0 1 2 3 4 since we're just using a
00:26:54
default index instead of a custom one
00:26:56
our current labels for our indexes are
00:26:59
just a range of values from 0 to 88,000
00:27:03
something so in order to get the first
00:27:07
row I can say D F dot Lok and pass in
00:27:10
that label of that first index which in
00:27:14
this case is just a 0 and these are all
00:27:18
of the responses from the first
00:27:20
respondent so this is one person's
00:27:23
entire survey results here now if we
00:27:27
wanted to see their results for just
00:27:30
that hobbyist question then remember
00:27:33
within the brackets here I can pass in a
00:27:36
second value for the columns that I
00:27:39
would like so if I pass in hobbiest then
00:27:41
we can see that their answer to that
00:27:43
whether they code as a hobby is yes and
00:27:46
also like we saw earlier I can also pass
00:27:49
in a list of multiple rows or multiple
00:27:51
columns to get the exact rows and
00:27:54
columns that we want to see so to get
00:27:56
the first three responses for the
00:27:58
hobbiest column then instead of just
00:28:01
passing in a single value here then I
00:28:03
can put in some inner brackets here and
00:28:05
pass in a list of multiple rows so if I
00:28:09
pass in a list of three rows here and
00:28:13
run this then these are the first three
00:28:16
results for that hobbiest column now one
00:28:19
thing that we haven't seen yet is that
00:28:21
we can also use slicing to grab multiple
00:28:23
rows and columns as well now if you're
00:28:26
familiar with list slicing then this is
00:28:29
pretty much the same thing the only
00:28:31
difference is that our last value is
00:28:33
going to be inclusive
00:28:35
at least with loke so if we wanted the
00:28:38
first three rows then we could say that
00:28:41
we want from 0 and then slice to the
00:28:45
index of 2 and if I run this oops and I
00:28:49
accidentally made a mistake here
00:28:51
actually whenever we're using slicing we
00:28:54
do not wrap these in brackets
00:28:57
so I'm gonna take that out so for our
00:28:59
first value we're just saying we're no
00:29:02
longer passing in a list of values we're
00:29:04
just passing in this slice of zero and
00:29:06
then colon 2 so if I run that then we
00:29:10
can see that now we get the same result
00:29:12
that we got before and we can do this
00:29:14
with the columns as well so right now
00:29:16
we're only getting two hobbiest column
00:29:18
but let's go back and look at our
00:29:20
columns and see what columns come after
00:29:22
the hobbiest column so up here these are
00:29:25
all of our columns here where we printed
00:29:27
them out so let's look at a few columns
00:29:29
after hobbiest here so we have open
00:29:31
source or open source employment so
00:29:34
let's say that we wanted to get all of
00:29:36
the columns from hobbiest all the way up
00:29:39
to this employee employment column so to
00:29:41
do that I'm just gonna copy that we can
00:29:44
come down here and we can just pass in a
00:29:48
colon and then employment and that'll do
00:29:51
a slice from hobbyists to employment now
00:29:54
I also want to point out that this is
00:29:56
the reason that slicing is inclusive for
00:30:01
these values because imagine how much of
00:30:03
a pain it would be if we wanted all of
00:30:06
the columns from hobbyist to employment
00:30:08
but the last value here wasn't inclusive
00:30:12
and we had to come up here and say well
00:30:13
if I want from hobbyists to employment
00:30:15
then I really need to pass in you know
00:30:18
hobbyist to country and country's not
00:30:20
inclusive that would just be way too
00:30:23
confusing so it's so much easier for
00:30:25
this to be inclusive here so if you are
00:30:28
wondering why they did that then that's
00:30:30
why they do it so if I run this then we
00:30:33
can see that now for we get these first
00:30:36
three rows here and for the first three
00:30:38
rows we get all of those responses for
00:30:42
the columns of hobbyist open source er
00:30:44
all the way up to employment so now
00:30:47
we've seen an overview of everything
00:30:48
that we've learned about exploring our
00:30:50
data frames and series objects so far
00:30:53
and how we can pluck some you know basic
00:30:55
information out of these now there's
00:30:57
still tons to learn about data frames
00:30:59
and series objects and we'll continue
00:31:01
learning more learning more about these
00:31:03
throughout the pandas series since these
00:31:05
two data types are the main data types
00:31:07
that we'll be using and pandas so we'll
00:31:09
be learning more about advanced
00:31:11
filtering queries how to see which data
00:31:14
type each column of our data contains
00:31:17
and a lot more now before we end here
00:31:19
I do want to mention that way you have a
00:31:21
sponsor for this video and that is
00:31:23
brilliant
00:31:23
org brilliant is a problem-solving
00:31:25
website that helps you understand
00:31:27
underlying concepts by actively working
00:31:29
through guided lessons and brilliant
00:31:31
would be an excellent way to supplement
00:31:32
what you learn here with their hands-on
00:31:34
courses they have some excellent courses
00:31:36
and lessons on data science that do a
00:31:38
deep dive on how to think about and
00:31:40
analyze data correctly so if you're
00:31:42
watching my panda series because you're
00:31:44
getting into the data science field then
00:31:46
I would highly recommend also checking
00:31:47
out brilliant and seeing what other data
00:31:49
science skills you can learn they even
00:31:51
use Python in their statistics course
00:31:53
and will quiz you on how to correctly
00:31:55
analyze the data within the language
00:31:57
they're guided lessons will challenge
00:31:58
you but you'll also have the ability to
00:32:00
get hints or even solutions if you need
00:32:02
them it's really tailored towards
00:32:04
understanding the material so to support
00:32:06
my channel and learn more about
00:32:07
brilliant you can go to brilliant org
00:32:09
forge slash CMS to sign up for free and
00:32:12
also the first 200 people to go to that
00:32:14
link will get 20% off the annual premium
00:32:17
subscription and you can find that link
00:32:19
in the description section below
00:32:20
again that's brilliant dot org forge
00:32:23
slash C m/s okay so I think that's gonna
00:32:27
do it for this pandas video I hope you
00:32:29
feel like you got a good introduction to
00:32:31
the data frame and series objects and
00:32:33
how to navigate through some of your
00:32:35
data now like I said there's a lot more
00:32:37
to learn about these data types and some
00:32:40
advanced filtering that we'll learn and
00:32:42
future videos so be sure to stick around
00:32:44
for that now in the next video we're
00:32:46
going to be learning more about indexes
00:32:48
so we saw basic default indexes in this
00:32:51
video but we'll learn how to set the
00:32:53
index to specific columns and the
00:32:55
benefits of doing that in the next video
00:32:57
but if anyone has any questions about
00:32:59
what we covered here then feel free to
00:33:00
ask in the comment section below and
00:33:02
I'll do my best to answer those and if
00:33:04
you enjoyed these tutorials and would
00:33:05
like to support them then there are
00:33:06
several ways you can do that the easiest
00:33:08
ways to simply LIKE the video and give
00:33:10
it a thumbs up and also it's a huge help
00:33:12
to share these videos with anyone who
00:33:13
you think would find them useful
00:33:14
and if you have the means you can
00:33:16
contribute through patreon and there's a
00:33:17
link to that page in the description
00:33:18
section below be sure to subscribe for
00:33:20
future videos and thank you all for
00:33:21
watching
00:33:33
you