00:00:00
hey there how's it going everybody in
00:00:01
this series of videos we're going to be
00:00:03
learning how to use the pandas library
00:00:04
and Python so pandas is a data analysis
00:00:07
library that allows us to easily read in
00:00:09
and work with different types of data so
00:00:12
we can use this to analyze CSV files
00:00:14
Excel files and other similar formats so
00:00:17
if you're getting into the data science
00:00:18
field then this library is going to be
00:00:20
essential to learn it's one of the most
00:00:22
downloaded packages for Python and
00:00:24
that's for a great reason so not only
00:00:26
does it allow us to easily read in and
00:00:28
analyze data but it also has great
00:00:30
performance since it built on top of
00:00:32
numpy and we'll be learning how to do
00:00:34
different types of an analysis or if
00:00:36
data analysis in this series so in this
00:00:38
video we're going to be going over how
00:00:40
to get pandas installed how to download
00:00:42
the data that I'll be using for most of
00:00:44
this series and also how to get all of
00:00:47
this open in a jupiter notebook so that
00:00:49
we're ready to do some coding and
00:00:50
analysis now i'd also like to mention
00:00:52
that we do have a sponsor for the series
00:00:54
of videos and that is brilliant org so i
00:00:57
really want to thank brilliant for
00:00:58
sponsoring this series and it would be
00:01:00
great if you all can check them out
00:01:01
using the link in the description
00:01:02
section below and support the sponsors
00:01:04
and I'll talk more about their services
00:01:06
in just a bit so with that said let's go
00:01:08
ahead and get started so first of all
00:01:10
let's install pandas so I'm using a
00:01:13
clean virtual environment for this
00:01:14
series but you don't have to use a
00:01:16
virtual environment if you don't want to
00:01:17
if you don't know what a virtual
00:01:19
environment is and would like to learn
00:01:21
more about those then I'll be sure to
00:01:23
leave a link to my video on that topic
00:01:25
in the description section below if
00:01:27
anyone is interested so it's really easy
00:01:30
to install pandas here all we need to do
00:01:32
is say pip install pianist and we will
00:01:37
let this run through and once we have
00:01:40
pandas installed then let's also install
00:01:43
Jupiter so that we can use Jupiter
00:01:45
notebooks now I was a bit hesitant to
00:01:48
use Jupiter for this series because some
00:01:50
people find it difficult to get the hang
00:01:52
of but honestly if you're going to be
00:01:54
doing a lot of work with pandas then
00:01:56
it's definitely a nice tool to use for
00:01:58
this so now it's not necessary so you
00:02:01
should be able to follow along with this
00:02:02
series just fine if you're using a
00:02:04
regular editor but Jupiter notebooks
00:02:06
allows us to actually see our data more
00:02:09
easily by using the browser to print out
00:02:11
our data and tables that make it
00:02:13
year to visualize so I'm gonna use it in
00:02:16
the series but you don't have to in
00:02:18
order to follow along so to install
00:02:20
Jupiter I want to say pip install and
00:02:24
this is going to be Jupiter lab and this
00:02:28
is spelled Ju py ter la B Jupiter lab so
00:02:34
we'll get that installed now I'm not
00:02:36
going to go into a deep dive and how to
00:02:38
use Jupiter in this series I'm mainly
00:02:40
going to focus on pandas but if you'd
00:02:42
like a detailed overview of how to use
00:02:44
Jupiter then I do have a video on how to
00:02:46
use Jupiter in depth and I'll leave a
00:02:48
link to that video in the description
00:02:49
section below if anyone would like to
00:02:52
learn more about the details of using
00:02:54
that ok so now we have pandas and
00:02:56
Jupiter notebooks installed now we're
00:02:58
going to need to download the data that
00:03:00
I'll be using for most of this series
00:03:02
now for anyone who's been watching my
00:03:04
latest videos you know that I like to
00:03:06
use the stackoverflow developer survey
00:03:08
for different kinds of data analysis now
00:03:10
the reason that I like to use this data
00:03:12
is because it's real world data and it
00:03:15
has a lot of data in there that I think
00:03:16
would be interesting to most people who
00:03:18
are watching these types of videos I've
00:03:20
seen some other tutorials where the data
00:03:22
just seems kind of unrealistic and not
00:03:24
very relatable
00:03:26
so hopefully using this data will keep
00:03:28
people interested and also give you a
00:03:30
good idea of what it's like to actually
00:03:32
download download real data from a
00:03:35
source and start analyzing it with
00:03:37
pandas so to download this data I have
00:03:40
this pulled up here in the browser we
00:03:42
can go over to the Stack Overflow survey
00:03:45
results page now this is easy to find if
00:03:47
you just google it but just to keep
00:03:49
things easy I'll have a link to this
00:03:51
download page in a description section
00:03:53
as well ok now on this page you can
00:03:57
download the data in CSV form for any
00:04:00
year that they have available and now
00:04:02
I'm going to go ahead and download the
00:04:04
2019 data which is the top data here so
00:04:08
I'm going to download this CSV here and
00:04:12
then we'll click on download again and
00:04:15
this should go ahead and download this
00:04:18
for us ok it did and now I'm going to
00:04:22
open this in my finder here and I'm
00:04:25
going to unzip this data it comes
00:04:27
zip drive and once that data is
00:04:29
downloaded and unzipped I'm going to go
00:04:32
ahead and drag that folder to a folder
00:04:34
here on my desktop and that's where
00:04:37
we'll also create a notebook and analyze
00:04:39
this data so real quick I don't have
00:04:42
this open let me open up this pandas
00:04:48
demo folder and this will open this and
00:04:51
find her and now I will take the data
00:04:54
and drag this into this pandas demo
00:04:56
folder that is on my desktop so your
00:04:59
projects can be anywhere but I just had
00:05:02
I just created a project folder here on
00:05:05
my desktop called pandas demo and it's
00:05:07
completely empty except for the data
00:05:09
that we just dragged in here so now I'm
00:05:12
going to rename this since this is kind
00:05:14
of a long name here I'm just going to
00:05:16
rename this to data that was named
00:05:19
developer survey 2019 but I'm just gonna
00:05:21
call that data so that it's easy for us
00:05:23
to find that within our script okay so
00:05:26
what files do we have here in the
00:05:28
directory that we unzipped in this data
00:05:30
directory let me make this a little
00:05:32
larger here okay so first of all if you
00:05:36
download data that comes with a readme
00:05:39
then this is usually helpful we have a
00:05:41
readme file right here it tells you what
00:05:43
these other files are going to be so in
00:05:46
this case we have this survey results
00:05:48
public dot CSV and that contains the
00:05:51
main survey results one respondent per
00:05:54
row and one column per answer and the
00:05:57
survey results schema here has the
00:06:00
questions that correspond to each column
00:06:02
name and the results now if any of this
00:06:05
doesn't make sense now then then it will
00:06:07
once we open up this data in Jupiter so
00:06:10
I'm just given a broad overview here
00:06:12
don't let this overwhelm you by
00:06:15
everything that I'm saying here this
00:06:17
will make a lot more sense once we open
00:06:18
this up in Jupiter so let's go ahead and
00:06:21
do that so to open this in a Jupiter
00:06:23
notebook I'm going to go back to my
00:06:26
terminal so I'm going to go ahead and
00:06:27
close these Finder windows open here go
00:06:30
back to my terminal and now within here
00:06:33
I'm going to navigate to my folder where
00:06:35
I place that data and this should be the
00:06:38
same command on Mac
00:06:39
and windows so I'm gonna say CD and I'm
00:06:43
gonna go to my desktop this is going to
00:06:45
be wherever your project directory is
00:06:47
but mine is in this pandas demo on my
00:06:50
desktop and once I am navigated to that
00:06:53
directory to start up a Jupiter notebook
00:06:55
we just need to say Jupiter notebook and
00:06:59
run that and we should see a server
00:07:02
start up here
00:07:03
and it seems like it's taking a second
00:07:05
ok there we go
00:07:06
now back in our terminal here this will
00:07:10
run a Jupiter server and you will need
00:07:13
to leave that terminal open while you're
00:07:15
working in Jupiter so Jupiter rum runs
00:07:18
in the browser so if you shut down this
00:07:20
server then you won't be able to access
00:07:22
our notebook okay so let's go back here
00:07:27
to the browser and this is where we have
00:07:30
our Jupiter notebooks so let me zoom in
00:07:32
here so that we can so that everybody
00:07:34
can read this fairly well okay I'll zoom
00:07:38
in to about right there I think is good
00:07:39
okay so we can see our data folder here
00:07:42
that we downloaded and placed in our
00:07:44
Jupiter demo folder a little bit ago but
00:07:47
now let's create a new notebook so to
00:07:50
create a new notebook I'm going to click
00:07:51
on new up here at the top right and then
00:07:54
I'm going to use Python 3 and now we can
00:07:59
name our notebook so up here where it
00:08:01
says untitled I'm going to click here
00:08:03
and I'm just going to call this pandas
00:08:06
demo and rename that ok so now we're
00:08:09
ready to start using pandas so we can
00:08:12
import this by saying import pandas as
00:08:16
PD now importing pandas as PD is just a
00:08:21
common convention when using pandas so
00:08:23
let's run that and I ran that cell by
00:08:27
pressing Shift + Enter and again I'm not
00:08:30
going to go into the specifics of
00:08:31
working here within Jupiter in this
00:08:33
series but if you'd like a rundown of
00:08:35
the features and shortcuts that I'll be
00:08:37
using then I do have a link to my
00:08:39
Jupiter video in the description section
00:08:41
below ok so for the rest of this video
00:08:43
we'll see how to load in our data and
00:08:46
look at some information about that data
00:08:48
so our data is in a CSV format so in
00:08:53
order to
00:08:53
in that CSV we can simply say DF which
00:08:57
is going to stand for data frame we
00:08:59
learn about all about data frames here
00:09:00
and a bit we're going to say DF is equal
00:09:02
to PD dot read underscore CSV we're
00:09:07
going to use the read CSV method from
00:09:10
pandas here and now we just want to pass
00:09:13
in a path to our CSV file now mine was
00:09:16
within that data folder and that was
00:09:19
within the file survey underscore
00:09:22
results under score public dot CSV so
00:09:26
now if I hit shift enter then that will
00:09:30
run that cell so right off the bat we
00:09:33
can see that this is pretty simple to
00:09:34
work with so when using native Python in
00:09:37
order to read in a CSV file we need to
00:09:40
use the CSV module to create a CSV
00:09:42
reader and things like that but here
00:09:45
we're just doing this all in one line so
00:09:48
when it reads this in it's going to read
00:09:50
it in as a data frame so data frames are
00:09:53
pretty much the backbone of pandas and
00:09:55
we'll go more into what go over data
00:09:58
frames and series objects in depth in
00:10:01
the next video but for the basics a data
00:10:04
frame is basically just rows and columns
00:10:07
of data we can see what a data frame
00:10:09
looks like but just by printing it out
00:10:11
and this is the great thing about using
00:10:13
Jupiter notebooks because it allows us
00:10:15
to visualize these things in ways that
00:10:19
we can't do in other editors so here in
00:10:22
Jupiter I can simply just say DF and run
00:10:25
that and it will print out our data
00:10:29
frame here so we didn't even need to
00:10:31
wrap this here in a print function now
00:10:34
if you're using a normal editor then you
00:10:37
can still print out data frame in from
00:10:39
information but it's not going to look
00:10:42
as good as it does here in Jupiter where
00:10:45
we get this interactive table so this is
00:10:48
a small look at our data now this is
00:10:51
actually 85 columns here but if I scroll
00:10:55
through these then it doesn't look like
00:10:57
there's actually 85 columns printed out
00:11:00
here so this is actually concatenated by
00:11:04
default just to give us a broad overview
00:11:07
of the
00:11:07
data so by default Jupiter is displaying
00:11:10
20 columns from our data frame now how
00:11:14
did I know that there was 85 columns for
00:11:17
this data frame well there are a few
00:11:19
attributes and methods that we can use
00:11:21
to get an idea of what our data looks
00:11:24
like so first we have the shape
00:11:26
attribute and shape gives us the number
00:11:31
of rows and columns in a tuple form so
00:11:35
let's look at this so in our next cell
00:11:37
down here I'm gonna say DF dot shape and
00:11:40
I will run that now this is an attribute
00:11:44
here it's not a method so you don't want
00:11:47
to put parentheses so DF dot shape and
00:11:50
we can see that we have 88 thousand rows
00:11:55
and 85 columns now if you wanted a bit
00:12:00
more information then we can use the
00:12:02
info method the info method will give us
00:12:04
the number of rows and columns and also
00:12:07
all of the data types of all the columns
00:12:09
as well
00:12:10
now before I run that it looks like my
00:12:14
text is getting cut off here a little
00:12:16
bit sometimes this happens whenever I'm
00:12:19
within Jupiter in order to fix this I
00:12:22
usually just come up here and restart
00:12:25
and run all my cells again that usually
00:12:28
takes care of the problem let's see if
00:12:31
that works okay so that seemed to work
00:12:33
another thing that you can do here is
00:12:35
just to totally reload the page and the
00:12:38
browser and when you reload the page I
00:12:41
think it's just because of how my I have
00:12:44
this text enlarged so it's kind of
00:12:47
messing with how these look but now we
00:12:49
can see these just fine
00:12:50
okay so like I was saying we can see
00:12:54
here that we have eighty eight thousand
00:12:55
eight hundred and eighty three rows and
00:12:58
eighty five columns now if you wanted
00:13:01
more information then we can use the
00:13:03
info method and that will give us the
00:13:06
number of rows and the number of columns
00:13:08
but also all of the data types of the
00:13:11
columns so let's run that so if I do D F
00:13:14
dot info whoops
00:13:16
D F dot info now this actually is a
00:13:19
method so we do want to
00:13:21
you put the parentheses there and let me
00:13:24
run this and now let's go over this
00:13:27
output so we can see here that it says
00:13:29
that we have eighty-eight thousand eight
00:13:31
hundred and eighty three entries so
00:13:33
those are our rows we have a total of
00:13:35
eighty five columns and then it lists
00:13:38
all of our columns here for our data so
00:13:40
these are all the columns in our CSV
00:13:43
file that we have loaded in now it also
00:13:46
gives us the data types of each of these
00:13:48
columns and we're going to go over data
00:13:50
types in a future video but for the most
00:13:54
part objects usually mean strings and
00:13:57
then we have other things as well so int
00:14:00
64 is just an integer float is a float
00:14:04
so a probably a decimal number and there
00:14:08
are no other data types in this data set
00:14:12
but there are more data types in general
00:14:14
so I will be sure to do a video on data
00:14:18
types specifically in the near future
00:14:21
okay so now that we know the number of
00:14:23
rows and columns let's change a setting
00:14:26
here within Jupiter so that we can see
00:14:28
all of the columns so I think it would
00:14:31
be useful to see all of these if we'd
00:14:33
like to even if there are a lot of these
00:14:36
to scroll through so to do this we can
00:14:39
at change a setting and I'm gonna come
00:14:41
down here to the bottom here and I'm
00:14:44
gonna change a setting by saying PD dot
00:14:46
set underscore option and within here I
00:14:50
will say display dot max underscore
00:14:55
columns and I will set that equal to 85
00:15:00
so that we can see all of our columns
00:15:02
and I will run that and now if we print
00:15:06
out our data frame so I'm going to go
00:15:08
back up here to where we print it out
00:15:10
this data frame and I will rerun that
00:15:14
cell and now if I scroll through these
00:15:16
columns then we can see that now it
00:15:19
looks like we actually have these 85
00:15:21
different columns here so I can keep
00:15:24
scrolling and keep scrolling and it
00:15:26
didn't just chop us off at that 20 like
00:15:28
it was before
00:15:28
now obviously the rows are also being
00:15:31
concatenated here and we definitely
00:15:33
don't want to print
00:15:34
all 89 thousand of these rows but there
00:15:39
probably are some examples with certain
00:15:41
datasets where you might want to see all
00:15:43
of the rows as well so for example I
00:15:46
said that the survey results schema CSV
00:15:49
file that was included in our download
00:15:52
gives the matching questions for all of
00:15:55
these column names here so if we wanted
00:15:58
to see what these column names here mean
00:16:02
for this data then we can load in that
00:16:04
schema CSV file as well so let me do
00:16:08
this I'll go down to the bottom of our
00:16:10
notebook and I will just load this in by
00:16:13
saying schema underscore D F now I don't
00:16:16
want to just call this D F because we
00:16:18
don't want to overwrite our other data
00:16:20
frame and I will load this in just like
00:16:23
we saw before by saying PD dot read
00:16:25
underscore CSV and this is within the
00:16:29
data folder and this was called survey
00:16:32
underscore results under score schema
00:16:37
CSV so I will run this and now let's
00:16:41
look at this schema data frame that we
00:16:46
just loaded in so here we on this column
00:16:51
column here this gives us all of the
00:16:53
columns in our other data frame so we
00:16:57
have respondent main branch hobbyist and
00:16:59
if I scroll up to that data frame here
00:17:01
I'm gonna delete this info here since we
00:17:04
no longer need that if I scroll up to
00:17:07
this data frame here then we can see
00:17:09
respondent main branch hobbyist so if we
00:17:13
want to know what these mean then that's
00:17:15
what we use the schema for so we can see
00:17:17
that main branch or hobbyist means d-u
00:17:21
code as a hobby main branch means which
00:17:24
of the following options best describes
00:17:25
now it actually concatenates the text
00:17:28
too in order to actually see this to the
00:17:31
full text we could either change an
00:17:34
option or we could just access this
00:17:36
value directly and I will be showing you
00:17:38
how to do that in the next video but for
00:17:41
now we can see that we can't see all of
00:17:44
the rows to the questions that correlate
00:17:48
to each column name here remember we
00:17:50
have 85 columns but for here we can only
00:17:53
see the first five and then we get this
00:17:56
ellipses here and then we can see the
00:17:58
last five so let's set this up so that
00:18:02
we can view 85 rows and then reprint
00:18:06
this so that we can see all of these so
00:18:08
back in the same cell where we set our
00:18:11
max columns now let's also add one four
00:18:17
rows as well so I'm just going to copy
00:18:19
and paste that but instead of max
00:18:21
columns here I'm gonna have this be max
00:18:23
rows and I will run that and now we will
00:18:27
rerun this schema here and now we can
00:18:31
see that we can see all of the columns
00:18:33
and the corresponding question text so
00:18:37
if you wanted to know what any of these
00:18:38
columns mean then this is how we do it
00:18:42
so we can see IT person the question was
00:18:44
are you the IT support person for your
00:18:47
family so that's probably a yes or no
00:18:49
question so that is what those mean so
00:18:52
if you're going through this data on
00:18:54
your own then you can use this as a
00:18:55
reference anytime you don't know what a
00:18:58
certain column means in our survey data
00:19:00
and if you don't know or if you don't
00:19:03
want to look through all of these to
00:19:05
find a specific row or a specific column
00:19:09
name then in a future video we're going
00:19:11
to learn about filtering data frames and
00:19:14
see how we can just grab a specific row
00:19:16
where the column equals a certain value
00:19:19
okay so now we have all 85 rows visible
00:19:23
of our schema data frame here but you
00:19:26
might be thinking well that's nice but I
00:19:29
don't want to see eighty five rows of my
00:19:31
survey data every time I want to look at
00:19:34
it but there are a couple of methods
00:19:36
that we can use to only see a certain
00:19:39
number of rows which you'll most likely
00:19:41
use a lot just to get an idea that your
00:19:44
filters and data frames seem to be
00:19:46
working correctly so we can see the
00:19:49
first five rows by saying instead of
00:19:51
doing a DF here we can say D F dot head
00:19:55
and if I run that then we just get the
00:19:58
first five rows here okay and you can
00:20:01
pass
00:20:02
value if you want to see a certain
00:20:03
number of values so if you wanted to see
00:20:05
the first ten rows then we could pass in
00:20:08
a ten to D F dot head and this gives us
00:20:11
the first ten rows so we can see it goes
00:20:13
all the way down zero through nine there
00:20:16
now if you'd like to see the last rows
00:20:18
instead of the first rows then we can
00:20:20
use the tail method instead
00:20:23
so if we say DF tail and
00:20:26
we could use it without a number also
00:20:28
but if we pass in a number just like
00:20:31
with head then now we're going to say
00:20:32
that we want the last ten entries here
00:20:36
in our data so those are the last ten
00:20:38
items of our data okay so this is a
00:20:41
brief overview of getting pandas
00:20:44
installed and then downloading our data
00:20:47
and loading our data in to Jupiter and
00:20:50
how to read this in now before we end
00:20:54
here I'd like to mention the sponsor of
00:20:56
this video and that is brilliant org so
00:20:59
in this series we've been learning about
00:21:01
pandas and how to analyze data and
00:21:03
python and brilliant would be an
00:21:05
excellent way to supplement what you
00:21:06
learn here with their hands-on courses
00:21:08
they have some excellent courses and
00:21:10
lessons that do a deep dive on how to
00:21:11
think about and analyze data correctly
00:21:13
for data analysis fundamentals I would
00:21:16
really recommend checking out their
00:21:17
statistics course which shows you how to
00:21:19
analyze graphs and determine
00:21:20
significance in the data and I would
00:21:22
also recommend their machine learning
00:21:24
course which takes data analysis to a
00:21:26
new level
00:21:26
well you'll learn about the techniques
00:21:28
being used that allow machines to make
00:21:30
decisions where there's just too many
00:21:32
variables for a human to consider so to
00:21:34
support my channel and learn more about
00:21:36
brilliant you can go to brilliant org
00:21:38
Forge slash CMS to sign up for free and
00:21:40
also the first 200 people they go to
00:21:43
that link will get 20% off the annual
00:21:45
premium subscription and you can find
00:21:47
that link in the description section
00:21:48
below
00:21:49
again that's brilliant org forge slash
00:21:52
CMS
00:21:54
okay so I think that is going to do it
00:21:56
for our first pandas video I hope you
00:21:58
feel like you've got a good introduction
00:21:59
on how to install pandas and load in
00:22:01
your data to a jupiter notebook in the
00:22:03
next video we're going to be learning
00:22:05
more about data frames and also learn
00:22:07
about the series data type so we'll
00:22:10
learn how we can think about data frames
00:22:12
in a way that's easier to understand and
00:22:14
also see how we can
00:22:16
grab certain elements columns and rows
00:22:18
from these as well so be sure to stick
00:22:21
around for that but if anyone has any
00:22:23
questions about will be covered in this
00:22:24
video then feel free to ask in the
00:22:26
comment section below and I'll do my
00:22:27
best to answer those and if you enjoyed
00:22:29
these tutorials and would like to
00:22:30
support them then there are several ways
00:22:32
you can do that the easiest ways to
00:22:34
simply like the video and give it a
00:22:35
thumbs up and also it's a huge help to
00:22:37
share these videos with anyone who you
00:22:38
think would find them useful and if you
00:22:40
have the means you can contribute the
00:22:41
patreon and there's a link to that page
00:22:43
in the description section below
00:22:44
be sure to subscribe for future videos
00:22:46
and thank you all for watching
00:22:58
you