We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !
Statistical data analysis always involve uncertainty. In hypothesis testing, the possibility of the other side than the conclusion usually exists, and the analysis commits so-called Type I and Type II errors, with respect to the truth and the decision made upon the random sample and hypotheses. In particular, a Type I error measures the probability that a true Null hypothesis (H0) is incorrectly rejected, and a Type II error says the probability that a false H0 not being rejected, respectively.
In this article, we will show how to implement calculation of both Type I and Type II Errors using Python and example with Binomial distribution.
Null hypothesis (H0) and Alternative hypothesis (H1) in the example are formed as follows:
H0 : the success rate in each Bernoulli trial is p = 0.25
H1: the success rate in each Bernoulli trial is p = 0.50
Decision rule : the critical limit is 8 positives, i.e. if there are more than 8 positives found in a random sample of size 20, then H0 will be rejected in favor of H1; otherwise H0 will be held.
To solve this problem, function binom.cdf() from scipy.stats module in Python is applied. The function is used to compute the cumulative probability til success counts x from Binomial probability distribution trials of size N, with constant success rate p for each trial. The basic form of the function is
binom.cdf(x, N, p)
The following code in Python presents the calculation.
#import module
from scipy.stats import binom
#setting the success number, sample size
#success rate p for H0 and H1
r = 8
n = 20
p0 = 0.25
p1 = 0.5
#Type 1 error is the probability of getting success counts
#larger than 8, given H0 is true
type_1_error = 1- binom.cdf(r, n, p0)
type_1_error
Out[1]: 0.04092516770651855
#Type 2 error is the probability of getting success counts
#less than or equal to 8, given H1 is true
type_2_error = binom.cdf(r, n, p1)
type_2_error
Out[2]: 0.2517223358154297
We see that in this example, the Type I and Type II errors are located at around 0.04 and 0.25, respectively.
Actually,Type I and Type II errors behave as trade-offs, with respect to the decision rule (critical limit) for the given H0 and H1. Now we change the critical limit from 8 to 7, to see how it is going.
# www.rdatacode.com
#Calculate Type1 and Type2 error in Python
#import module
from scipy.stats import binom
#setting the success number, sample size
#success rate p for H0 and H1
r = 7
n = 20
p0 = 0.25
p1 = 0.5
#Type 1 error is the probability of getting success counts
#larger than 7, given H0 is true
type_1_error = 1- binom.cdf(r, n, p0)
type_1_error
Out[1]: 0.10181185692272265
#Type 2 error is the probability of getting success counts
#less than or equal to 7, given H1 is true
type_2_error = binom.cdf(r, n, p1)
type_2_error
Out[2]: 0.13158798217773438
The result shows that by reducing the critical value from 8 to 7, Type I error has been increased, and Type II error is lowered. The reason is due to the fast that it is more probable to reject H0 than before, thus the probability of committing a wrong rejection is enlarged. On the other hand, because larger chance exists for rejection H0 now, so less chance exists for the false H0 avoiding being rejected, which causes Type II error lessen.
You can also watch video on our YouTube channel for vivid understanding the topic.
0 Comments