Statistics deals with the collection, analysis, and presentation of data.

What is the difference between descriptive and inferential statistics?

Descriptive statistics summarize data from a sample, while inferential statistics use sample data to make conclusions about a population.

What are measures of central tendency?

Measures of central tendency include the mean, median, and mode, which describe where most values in a dataset are centered.

Why are measures of dispersion important?

Measures of dispersion, like standard deviation and range, indicate how spread out data points are, providing insight into variability.

What is a frequency table?

A frequency table displays how often each value appears in a dataset, summarizing distribution effectively.

How does a contingency table work?

A contingency table analyzes and compares the relationship between two categorical variables by organizing data into rows and columns.

What is the P-value in hypothesis testing?

The P-value indicates the probability of observing data as extreme as the sample, assuming the null hypothesis is true.

What are type I and type II errors?

Type I error occurs when a true null hypothesis is rejected; type II error occurs when a false null hypothesis is not rejected.

What is statistical significance?

Statistical significance suggests that the observed data is unlikely to have occurred by chance, often considered at a P-value less than 0.05.

How does datatab.net help with statistical analysis?

Datatab.net offers tools to select appropriate hypothesis tests, calculate results, and provide interpretations based on input data.

What is Statistics? A Beginner's Guide to Statistics (Data Analytics)!

00:20:20

https://www.youtube.com/watch?v=Gi4GxE4obAc

الملخص

TLDRThe video serves as an educational resource for understanding statistics, specifically explaining both descriptive and inferential statistics. Descriptive statistics involves summarizing data through measures of central tendency (mean, median, mode) and measures of dispersion (standard deviation, variance, range). On the other hand, inferential statistics allows for drawing conclusions about a population based on a sample, employing hypothesis testing to ascertain the significance of results. The video breaks down essential concepts like frequency tables, contingency tables, P-values, and errors in hypothesis testing (type I and II). It also introduces tools like datatab.net for performing statistical analyses effectively. This is a must-watch for those looking to grasp the basics of statistical analysis, including the types of statistics and the interpretation of data.

الوجبات الجاهزة

📊 Statistics involves collecting, analyzing, and presenting data.
📉 Descriptive statistics summarize sample data.
📈 Inferential statistics draw conclusions about a population from a sample.
🧮 Mean, median, and mode are key measures of central tendency.
📏 Standard deviation and variance measure data dispersion.
🗃️ Frequency tables show how often values appear in a dataset.
💡 P-value helps determine statistical significance in hypothesis testing.
❌ Type I error: rejecting a true null hypothesis.
✔️ Statistical significance often considered at P-value < 0.05.
⚖️ Use datatab.net for selecting suitable hypothesis tests and analyzing data effectively.

الجدول الزمني

00:00:00 - 00:05:00
Statistics involves collecting, analyzing, and presenting data. In investigating if gender influences newspaper preference, gender and newspaper are variables. Data collection is crucial; a survey example shows data in a tabular format with variables as columns and responses as rows. The analysis can focus on just the sample (descriptive statistics) or infer about a population (inferential statistics). Descriptive statistics summarize data but do not infer about the whole population. Key components include measures of central tendency, with examples like mean, median, and mode, each providing different insights into the data.
00:05:00 - 00:10:00
Mode is highlighted as a measure of frequency; in a transport survey, the most common transport mode is the mode. Measures of dispersion describe data spread, with standard deviation and variance being key metrics. Standard deviation measures average data point distance from the mean. In sample studies, specific standard deviation formulas adjust for not covering the whole population. Range and interquartile range are other dispersion measures, with interquartile covering the middle 50% of values. Dispersion helps understand data clustering or spread, complementing central tendency measures.
00:10:00 - 00:15:00
Contingency tables analyze relationships between categorical variables, representing data as cross-tabs with rows and columns for each variable. Data visualization includes frequency tables and charts like bar or pie charts, revealing distribution and comparison between sets. Inferential statistics draws conclusions about populations based on sample data, crucial for hypothesis testing. Understanding population vs sample is key in hypothesis testing, which tests claims using sample data to make population inferences, essential in research for significant decision-making.
00:15:00 - 00:20:20
Hypothesis testing involves starting with a research hypothesis and testing against the null hypothesis, which assumes no effect. The P value helps assess the probability of observing results if the null hypothesis is true. A statistically significant P value leads to rejecting the null hypothesis, suggesting an effect or difference in the population. However, errors exist: Type 1 (false positive) and Type 2 (false negative) errors. Using tools like data.net, users can select and interpret appropriate hypothesis tests, considering assumptions and deciding on parametric or non-parametric tests.

اعرض المزيد

الخريطة الذهنية

فيديو أسئلة وأجوبة

What is statistics?
Statistics deals with the collection, analysis, and presentation of data.
What is the difference between descriptive and inferential statistics?
Descriptive statistics summarize data from a sample, while inferential statistics use sample data to make conclusions about a population.
What are measures of central tendency?
Measures of central tendency include the mean, median, and mode, which describe where most values in a dataset are centered.
Why are measures of dispersion important?
Measures of dispersion, like standard deviation and range, indicate how spread out data points are, providing insight into variability.
What is a frequency table?
A frequency table displays how often each value appears in a dataset, summarizing distribution effectively.
How does a contingency table work?
A contingency table analyzes and compares the relationship between two categorical variables by organizing data into rows and columns.
What is the P-value in hypothesis testing?
The P-value indicates the probability of observing data as extreme as the sample, assuming the null hypothesis is true.
What are type I and type II errors?
Type I error occurs when a true null hypothesis is rejected; type II error occurs when a false null hypothesis is not rejected.
What is statistical significance?
Statistical significance suggests that the observed data is unlikely to have occurred by chance, often considered at a P-value less than 0.05.
How does datatab.net help with statistical analysis?
Datatab.net offers tools to select appropriate hypothesis tests, calculate results, and provide interpretations based on input data.

عرض المزيد من ملخصات الفيديو

احصل على وصول فوري إلى ملخصات فيديو YouTube المجانية المدعومة بالذكاء الاصطناعي!

الترجمات

التمرير التلقائي:

00:00:00
if you want to finally understand
00:00:02
statistics this is the place to be after
00:00:05
this video you will know what statistics
00:00:08
is what descriptive statistics is and
00:00:11
what inferential statistics is so let's
00:00:13
start with the first question what is
00:00:16
statistics statistics deals with the
00:00:18
collection analysis and presentation of
00:00:21
data an example we would like to
00:00:24
investigate whether gender has an
00:00:27
influence on the preferred newspaper
00:00:30
then gender and newspaper are our
00:00:32
so-called variables that we want to
00:00:35
analyze in order to analyze whether
00:00:38
genda has an influence on the preferred
00:00:40
newspaper we first need to collect data
00:00:43
to do this we create a questionnaire
00:00:46
that asks about gender and preferred
00:00:49
newspaper we will then send out the
00:00:52
survey and wait 2 weeks afterwards we
00:00:55
can display the received answers in a
00:00:57
table in this table we have one column
00:01:01
for each variable one for gender and one
00:01:04
for newspaper on the other hand each row
00:01:08
is the response of one served person the
00:01:11
first respondent is mail and stated New
00:01:14
York Post the second is female and
00:01:17
stated USA Today and so on and so forth
00:01:20
of course the data does not have to be
00:01:22
from a survey the data can also come
00:01:25
from an experiment in which you for
00:01:27
example want to study the effect of Two
00:01:30
drugs on blood pressure now the first
00:01:33
step is done we have collected data and
00:01:36
we can start analyzing the data but what
00:01:39
do we actually want to analyze we did
00:01:42
not survey the entire population but we
00:01:44
took a sample now the big question is do
00:01:48
we just want to describe the sample data
00:01:51
or do we want to make a statement about
00:01:53
the whole population if our aim is
00:01:56
limited to the sample itself I.E we only
00:01:59
want to describe the collected data we
00:02:01
will use descriptive statistics
00:02:04
descriptive statistics will provide a
00:02:06
detailed summary of the sample however
00:02:09
if we want to draw conclusions about the
00:02:11
population as a whole inferential
00:02:14
statistics are used this approach allows
00:02:17
us to make educated guesses about the
00:02:19
population based on the sample data let
00:02:22
us take a closer look at both methods
00:02:25
starting with descriptive statistics why
00:02:28
is descriptive statistics so important
00:02:31
let's say a company wants to know how
00:02:34
its employees travel to work so the
00:02:36
company creates a survey to answer this
00:02:39
question once enough data has been
00:02:41
collected this data can be analyzed
00:02:44
using descriptive statistics but what is
00:02:47
descriptive statistics descriptive
00:02:49
statistics aims to describe and
00:02:51
summarize a data set in a meaningful way
00:02:55
but it is important to note that
00:02:56
descriptive statistics only describe the
00:02:59
collection data without drawing
00:03:01
conclusions about a larger population
00:03:04
put simply just because we know how some
00:03:07
people from one company get to work we
00:03:10
cannot say how all working people of the
00:03:13
company get to work this is the task of
00:03:16
infuential Statistics which we will
00:03:18
discuss later to describe data
00:03:20
descriptively we now look at the four
00:03:23
key components measures of central
00:03:25
tendency measures of dispersion
00:03:28
frequency tables and parts let's start
00:03:31
with the first one measures of central
00:03:33
tendency measures of central tendency
00:03:35
are for example the mean the median and
00:03:38
the mode Let's first have a look at the
00:03:41
mean the arithmetic mean is the sum of
00:03:44
all observations divided by the number
00:03:47
of observations an example imagine we
00:03:50
have the test scores of five students to
00:03:53
find the mean score we sum up all the
00:03:56
scores and divide by the number of
00:03:58
scores the mean test score of these five
00:04:01
students is therefore
00:04:03
86.6 what about the median when the
00:04:06
values in a data set are arranged in
00:04:09
ascending order the median is the middle
00:04:12
value if there is an odd number of data
00:04:15
points the median is simply the middle
00:04:18
value if there is an even number of data
00:04:21
points the median is the average of the
00:04:24
two middle values it is important to
00:04:27
note that the median is resistant to
00:04:29
extreme values or outliers let's look at
00:04:32
this example no matter how tall the last
00:04:35
person is the person in the middle
00:04:38
Remains the person in the middle so the
00:04:40
median does not change but if we look at
00:04:43
the mean it does have an effect on how
00:04:46
tall the last person is the mean is
00:04:49
therefore not robust to outliers let's
00:04:52
continue with the mode the mode refers
00:04:55
to the value or values that appear most
00:04:58
frequently in a a set of data for
00:05:01
example if 14 people travel to work by
00:05:03
car six by bike five walk and five take
00:05:08
public transport then car occurs most
00:05:12
often and is therefore the mode great
00:05:15
let's continue with the measures of
00:05:17
dispersion measures of dispersion
00:05:20
describe how spread out the values in a
00:05:22
data set are measures of dispersion are
00:05:25
for example the variance and standard
00:05:28
deviation the rate
00:05:30
and the interquartile range let's start
00:05:33
with the standard deviation the standard
00:05:35
deviation indicates the average distance
00:05:38
between each data point and the mean but
00:05:41
what does that mean each person has some
00:05:44
deviation from the mean now we want to
00:05:47
know how much the person's deviate from
00:05:49
the mean value on average in this
00:05:52
example the average deviation from the
00:05:54
mean value is 11.5 cm to calculate the
00:05:59
standard deviation we can use this
00:06:02
equation Sigma is the standard deviation
00:06:05
n is the number of persons x i is the
00:06:09
size of each person and xar is the mean
00:06:13
value of all persons but attention there
00:06:16
are two slightly different equations for
00:06:18
the standard deviation the difference is
00:06:21
that we have once 1 / by n and 1's 1 /
00:06:26
nus 1 to keep it simple if our survey
00:06:30
doesn't cover the whole population we
00:06:32
always use this equation to estimate the
00:06:35
standard deviation likewise if we have
00:06:38
conducted a clinical study then we also
00:06:41
use this equation to estimate the
00:06:43
standard deviation but what is the
00:06:45
difference between the standard
00:06:47
deviation and the variance as we now
00:06:50
know the standard deviation is the
00:06:52
quadratic mean of the distance from the
00:06:55
mean the variance now is the squared
00:06:57
standard deviation if if you want to
00:06:59
know more details about the standard
00:07:01
deviation and the variance please watch
00:07:04
our video let's move on to range and
00:07:06
interquartile range it is easy to
00:07:09
understand the range is simply the
00:07:11
difference between the maximum and
00:07:14
minimum value inter quartile range
00:07:17
represents the middle 50% of the data it
00:07:20
is the difference between the first
00:07:22
quartile q1 and the third quartile Q3
00:07:27
therefore 25% of the values are smaller
00:07:30
than the interquartile range and 25% of
00:07:34
the values are larger the inter quartile
00:07:36
range contains exactly the middle 50% of
00:07:40
the values before we get to the last two
00:07:43
points let's briefly compare measures of
00:07:46
central tendency and measures of
00:07:48
dispersion let's say we measure the
00:07:50
blood pressure of patients measures of
00:07:53
central tendency provide a single value
00:07:56
that represents the entire data set have
00:07:59
helping to identify a central value
00:08:02
around which data points tend to Cluster
00:08:05
measures of dispersion like the standard
00:08:07
deviation the range and the
00:08:09
interquartile range indicate how spread
00:08:12
out the data points are whether they are
00:08:15
closely packed around the center or
00:08:17
spread far from it in summary while
00:08:20
measures of central tendency provide a
00:08:23
central point of the data set measures
00:08:25
of dispersion describe how the data is
00:08:28
spread around Center let's move on to
00:08:31
tables here we will have a look at the
00:08:33
most important ones frequency tables and
00:08:36
contingency tables a frequency table
00:08:39
displays how often each distinct value
00:08:43
appears in a data set let's have a
00:08:45
closer look at the example from the
00:08:47
beginning a company surveyed its
00:08:50
employees to find out how they get to
00:08:52
work the options given were car bicycle
00:08:56
walk and public transport here are the
00:08:58
results results from 30 employees the
00:09:01
first answered car the next walk and so
00:09:04
on and so forth now we can create a
00:09:07
frequency table to summarize this data
00:09:10
to do this we simply enter the four
00:09:13
possible options car bicycle walk and
00:09:16
public transport in the First Column and
00:09:19
then count how often they occurred from
00:09:22
the table it is evident that the most
00:09:25
common mode of Transport among the
00:09:27
employees is by car with 14 employees
00:09:30
preferring it the frequency table thus
00:09:33
provides a clear and concise summary of
00:09:35
the data but what if we have not only
00:09:38
one but two categorical variables this
00:09:41
is where the contingency table also
00:09:43
called cross tab comes in Imagine the
00:09:46
company doesn't have one Factory but two
00:09:50
one in Detroit and one in Cleveland so
00:09:53
we also ask the employees at which
00:09:56
location they work if we want to display
00:09:58
both variables we can use a contingency
00:10:01
table a contingency table provides a way
00:10:04
to analyze and compare the relationship
00:10:07
between two categorical variables the
00:10:10
rows of a contingency table represent
00:10:13
the categories of one variable while the
00:10:16
columns represent the categories of
00:10:18
another variable each cell in the table
00:10:21
shows the number of observations that
00:10:23
fall into the corresponding category
00:10:26
combination for example the first cell
00:10:28
show that car and Detroit were answered
00:10:31
six times and what about the charts
00:10:35
let's take a look at the most important
00:10:37
ones to do this let's simply use
00:10:40
data.net if you like you can load this
00:10:43
sample data set with the link in the
00:10:45
video description or you just copy your
00:10:47
own data into this table here below you
00:10:50
can see the variables distance to work
00:10:53
mode of transport and site data daab
00:10:56
gives you a hint about the level of
00:10:58
measurement but you can also change it
00:11:01
here now if we only click on mode of
00:11:04
Transport we get a frequency table and
00:11:07
we can also display the percentage
00:11:10
values if we scroll down we get a bar
00:11:13
chart and a pie chart here on the left
00:11:16
we can adjust further settings for
00:11:19
example we can specify whether we want
00:11:22
to display the frequencies or the
00:11:24
percentage values or whether the bars
00:11:27
should be vertical or
00:11:29
horizontal if you also select side we
00:11:33
get a cross table here and a grouped bar
00:11:37
chart for the diagrams here we can
00:11:40
specify whether we want the chart to be
00:11:42
grouped or stacked if we click on
00:11:45
distance to work and mode of Transport
00:11:48
we get a bar chart where the height of
00:11:51
the bar shows the mean value of the
00:11:53
individual groups here we can also
00:11:56
display the
00:11:57
dispersion we also get a histogram a box
00:12:01
plot a violin plot and a rainbow plot if
00:12:05
you would like to know more about what a
00:12:07
box plot a violin plot and a rainbow
00:12:10
plot are take a look at my videos let's
00:12:13
continue with inferential statistics at
00:12:16
the beginning we briefly go through what
00:12:18
inferential statistics is and then I'll
00:12:21
explain the six key components to you so
00:12:24
what is inferential statistics
00:12:26
inferential statistics allows us to make
00:12:29
a conclusion or inference about a
00:12:32
population based on data from a sample
00:12:35
what is the population and what is the
00:12:38
sample the population is the whole group
00:12:41
we're interested in if you want to study
00:12:43
the average height of all adults in the
00:12:46
United States then the population would
00:12:49
be all adults in the United States the
00:12:52
sample is a smaller group we actually
00:12:54
study chosen from the population for
00:12:57
example 150 the adults were selected
00:13:00
from the United States and now we want
00:13:02
to use the sample to make a statement
00:13:05
about the population and here are the
00:13:07
six steps how to do that number one
00:13:11
hypothesis first we need a statement a
00:13:13
hypothesis that we want to test for
00:13:16
example you want to know whether a drug
00:13:19
will have a positive effect on blood
00:13:21
pressure in people with high blood
00:13:23
pressure but what's next in our
00:13:26
hypothesis we stated that we would like
00:13:28
to study people with high blood pressure
00:13:31
so our population is all people with
00:13:34
high blood pressure in for example the
00:13:36
us obviously we cannot collect data from
00:13:39
the whole population so we take a sample
00:13:42
from the population now we use this
00:13:45
sample to make a statement about the
00:13:47
population but how do we do that for
00:13:50
this we need a hypothesis test
00:13:52
hypothesis testing is a method for
00:13:55
testing a claim about a parameter in a
00:13:58
population using data measured in a
00:14:00
sample great that's exactly what we need
00:14:03
there are many different hypothesis
00:14:05
tests and at the end of this video I
00:14:07
will give you a guide on how to find the
00:14:10
right test and of course you can find
00:14:12
videos about many more hypothesis tests
00:14:15
on our Channel but how does a hypothesis
00:14:18
test work when we conduct a hypothesis
00:14:21
test we start with a research hypothesis
00:14:24
also called alternative hypothesis this
00:14:27
is the hypothesis we are trying trying
00:14:28
to find evidence for in our case the
00:14:31
research hypothesis is the drug has an
00:14:34
effect on blood pressure but we cannot
00:14:37
test this hypothesis directly with a
00:14:39
classical hypothesis test so we test the
00:14:42
opposite hypothesis that the drug has no
00:14:45
effect on blood pressure but what does
00:14:47
that mean first we assume that the drug
00:14:51
has no effect in the population we
00:14:53
therefore assume that in general people
00:14:56
who take the drug and people who don't
00:14:58
take the drug have the same blood
00:15:01
pressure on average if we now take a
00:15:03
random sample and it turns out that the
00:15:06
drag has a large effect in a sample then
00:15:09
we can ask How likely it is to draw such
00:15:13
a sample or one that deviates even more
00:15:16
if the drag actually has no effect so in
00:15:20
reality on average there's no difference
00:15:22
in a population if this probability is
00:15:25
very low we can ask ourselves maybe the
00:15:29
drug has an effect in the population and
00:15:32
we may have enough evidence to reject
00:15:34
the null hypothesis that the drug has no
00:15:37
effect and it is this probability that
00:15:40
is called the P value let's summarize
00:15:43
this in three simple steps number one
00:15:46
the null hypothesis states that there is
00:15:48
no difference in the population number
00:15:51
two the hypothesis test calculates how
00:15:54
much the sample deviates from the null
00:15:56
hypothesis number three the P value
00:15:59
indicates the probability of getting a
00:16:02
sample that deviates as much as our
00:16:05
sample or one that even deviates more
00:16:08
than our sample assuming the null
00:16:11
hypothesis is true but at what point is
00:16:14
the P value small enough for us to
00:16:16
reject the Nile hypothesis this brings
00:16:19
us to the next Point statistical
00:16:21
significance if the P value is less than
00:16:24
a predetermined threshold the result is
00:16:27
considered statistic ically significant
00:16:30
this means that the result is unlikely
00:16:32
to have occurred by chance alone and
00:16:35
that we have enough evidence to reject
00:16:37
the N hypothesis this threshold is often
00:16:41
0.05 therefore a small P value suggests
00:16:45
that the observed data or sample is
00:16:48
inconsistent with the null hypothesis
00:16:50
this leads us to reject the null
00:16:52
hypothesis in favor of the alternative
00:16:55
hypothesis a large P value suggests that
00:16:58
the obser serve data is consistent with
00:17:00
the Nal hypothesis and we will not
00:17:02
reject it but note there is always a
00:17:05
risk of making an error a small P value
00:17:08
does not prove that the alternative
00:17:10
hypothesis is true it is only saying
00:17:13
that it is unlikely to get such a result
00:17:16
or a more extreme when the null
00:17:19
hypothesis is true and again if the null
00:17:21
hypothesis is true there is no
00:17:24
difference in the population and the
00:17:26
other way around a large p value does
00:17:29
not prove that the N hypothesis is true
00:17:32
it is only saying that it is likely to
00:17:34
get such a result or a more extreme when
00:17:38
the null hypothesis is true so there are
00:17:40
two types of Errors which are called
00:17:42
type one and type two error let's start
00:17:45
with the type one error in hypothesis
00:17:48
testing a type one error occurs when a
00:17:51
true null hypothesis is rejected so in
00:17:54
reality the null hypothesis is true but
00:17:57
we make the the decision to reject the
00:17:59
null hypothesis in our example it means
00:18:02
that the drug actually had no effect so
00:18:06
in reality there is no difference in
00:18:08
blood pressure whether the drug is taken
00:18:11
or not the blood pressure Remains the
00:18:13
Same in both cases but our sample
00:18:16
happened to be so far off the True Value
00:18:19
that we mistakenly thought the drag was
00:18:22
working and a type two error occurs when
00:18:25
a full Sile hypothesis is not rejected
00:18:28
so in reality the null hypothesis is
00:18:31
false but we make the decision not to
00:18:34
reject the null hypothesis in our
00:18:36
example this means the drag actually did
00:18:39
work there is a difference between those
00:18:42
who have taken the drag and those who
00:18:44
have not but it was just a coincidence
00:18:47
that the sample taken did not show much
00:18:50
difference and we mistakenly thought the
00:18:53
drug was not working and now I'll show
00:18:56
you how data helps you to find a
00:18:59
suitable hypothesis test and of course
00:19:02
calculates it and interprets the results
00:19:04
for you let's go to data.net and copy
00:19:08
your own data in here we will just use
00:19:11
this example data set after copying your
00:19:13
data into the table the variables appear
00:19:17
down here data tab automatically tries
00:19:20
to determine the correct level of
00:19:22
measurement but you can also change it
00:19:25
up here now we just click on hypothesis
00:19:29
testing and select the variables we want
00:19:32
to use for the calculation of a
00:19:34
hypothesis test data tab will then
00:19:37
suggest a suitable test for example in
00:19:40
this case a Kai Square test or in that
00:19:43
case an analysis of
00:19:46
variant then you will see the hypotheses
00:19:49
and the results if you're not sure how
00:19:52
to interpret the results click on
00:19:54
summary inverts further you can check
00:19:57
the assumptions and decide whether you
00:20:00
want to calculate a parametric or a
00:20:03
non-parametric test you can find out the
00:20:06
difference between parametric and
00:20:08
nonparametric tests in my next video
00:20:12
thanks for watching and I hope you
00:20:13
enjoyed the
00:20:19
video