Statistics Using Python Archives - We provide R, Python, Statistics Online-Learning Course

How to calculate binomial distribution in Python

wilsonzhang746 — Wed, 03 Jul 2024 20:26:06 +0000

We provide effective and economically affordable online training courses for R and Python, click here for more details and course registration !

Binomial distribution in statistics is a type of discrete probability distribution, modeling the probability of success times among a predetermined Bernoulli process. A Bernoulli process is a binary experiment with result success or fail, in which the probability of success or fail is constant for each independent trial. The most striking point for a binomial distribution is the probability associated with discrete success number, 1,2,3,,,, until N, which is the trial size. In Python, binomial distribution probability can be calculated using pmf() function from scipy.stats.binom module. Say, we are interested in N = 25 Bernoulli process with success rate 0.15 for each trial. The following code shows the calculation of probability for every possible success count from 0 to 25.


#total 15 Bernoulli trial, with success probability 0.15 for each trial
n= 15
p = 0.15
#a list for integer 0 to n
sn = list(range(n + 1))
#calculate probability for each count number, and store in a list
pn = [binom.pmf(s, n, p) for s in sn ]
#print out each possible count and their corresponding probability
for i in range(n + 1):
    print(str(sn[i]) + "\t" + str(pn[i]))
#output
r	p(r)
0	0.08735421910125173
1	0.2312317564444899
2	0.28563922854907725
3	0.21842999830223525
4	0.11563941086588901
5	0.0448953006891098
6	0.01320450020267937
7	0.0029959790375827166
8	0.0005287021831028324
9	7.256696630823176e-05
10	7.68356113851866e-06
11	6.163284335710162e-07
12	3.625461373947157e-08
13	1.4764322337341365e-09
14	3.722098068237306e-11
15	4.378938903808602e-13
#sum the value for all probabilities
sum(pn)
#result is one, verified that it is correct
1.0000000000000033

For getting more knowledge of Python and a preview of our training course, you can watch Python tutorial videos on our YouTube channel !

The post How to calculate binomial distribution in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Calculate mean, median and mode using Python

wilsonzhang746 — Tue, 21 May 2024 13:23:49 +0000

We provide effective and economically affordable training courses for R and Python, click here for more details and course registration !

Mean, median and mode are three statistics that measure the central tendency of numerical data. Mean is the average of sample, and median the mid point of the sample with value sorting from the lowest to the highest, and mode is the value that occur most frequently in the sample. In Python there are various data objects, they can be list, or Numpy ndarray. Here we will show how to calculate mean, median and mode for various data objects in Python.

Calculating mean, median and mode for list in Python

#mean
testlist = [5,8,22,4,5,13,7,266,1,8,23,14,32,4,7,8,6,9,23,7,8]
def Average(lst):
    return sum(lst) / len(lst)

ave = Average(testlist)      #mean
ave
Out[2]: 22.857142857142858

#median

testlist.sort()
mid = len(testlist) // 2
midn = (testlist[mid] + testlist[-mid-1]) / 2
midn

Out[3]: 8.0

#mode

from statistics import mode
mode(testlist)

Out[4]: 8

2. Calculation of Mean, median and mode for Numpy array

import numpy as np

#mean
salary = np.random.randint(38, high=60, size=1500)
np.mean(salary) 
Out[5]: 48.415333333333336

#median
np.median(salary) 
Out[6]: 48.0

#mode
from scipy import stats
stats.mode(salary)
Out[9]: ModeResult(mode=array([58]), count=array([85]))

You can also watch video for Python course from our YouTube Channel.

The post Calculate mean, median and mode using Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Using exponential distributions in Python

wilsonzhang746 — Mon, 18 Mar 2024 12:28:07 +0000

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

The exponential distribution is modeling the probability distribution of the random time until next event occur in a Poisson event process. A Poisson event process has a constant occurring rate during an interval. Exponential distribution is a particular case of the gamma distribution when the shape parameter (alpha) equals 1.

Exponential distribution has the following probability density function.

Exponential distribution probability density function

The mean of an exponential distribution equals to beta.

Generally, exponential distributions can be implemented using expon() model from scipy.stats() module. Following code shows several examples for random number generation, calculating probability densities, and calculating cumulative probabilities for exponential distributions in Python.

Example 1: Generating random numbers from an exponential distribution.

from scipy.stats import expon 
import numpy as np
import matplotlib.pyplot as plt

#scale or beta, is the average time between two events
# it is the reciprocal of hazard rate lambda in poisson distribution.
   
# Random 10 Variates from exponential distribution with beta # = 2

R = expon.rvs(scale = 2,  size = 10)
print ("Random Variates : \n", R)

#output
Random Variates : 
 [0.23565008 0.39198579 3.78138116 4.38514265 4.05855378 0.61639563
 3.65629436 1.49444996 6.99201959 3.47163068]

Example 2: Calculating probability densities from an exponential distribution.

from scipy.stats import expon 
import numpy as np
import matplotlib.pyplot as plt
#a numpy array for random variables
quantile = np.arange (0.01, 3, 0.1)

#probability densities for the array, from exponential 
#distribution with beta = 1
Den_city = expon.pdf(quantile,  scale = 1)

plt.plot(quantile, Den_city)

Probability densities from exponential distribution (beta = 1)

Example 3. Calculating cumulative probabilities for random variates from exponential distributions.

from scipy.stats import expon 
import numpy as np
import matplotlib.pyplot as plt
#cumulative probability 
quantile = np.arange (0.01, 9, 0.1)
Cum_prob = expon.cdf(quantile,  scale = 1)

plt.plot(quantile, Cum_prob)

Cumulative probabilities from exponential distribution (beta =1 )

Example 4. To calculate the probability that the time until next event coming larger than 3 minutes, when the average time until next event coming is 2 minutes, from an exponential distribution.


from scipy.stats import expon 
import numpy as np
import matplotlib.pyplot as plt
scale = 2

p_large_3 = 1 - expon.cdf(3,  scale = 2)

p_large_3

#result
Out[18]: 0.2231301601484298

You can also watch our Python course full video from our YouTube channel.

The post Using exponential distributions in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Working with normal distributions in Python

wilsonzhang746 — Sun, 25 Feb 2024 18:17:11 +0000

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

Normal distribution is describing random variables with bell-shaped probability density functions. Normal distribution is widely used in data science because large sample random variates have a mean value which follows approximate normal distribution if variates are independently drawn from any distributions. The probability density function for normal distribution is determined by two parameters: mean(miu) and standard deviation(sigma).

Next, we show using normal distribution in Python programming.

First, we draw a histogram which represent thousands of random numbers drawn from a normal distribution.

#Histogram plotting Normal Distribution

#import numpy and matplotlib module
import numpy as np
import matplotlib.pyplot as plt
  
# Mean of the distribution
Mean = 32
 
# satndard deviation of the distribution
Standard_deviation  = 8
  
# sample size
size = 3200
  
# creating a normal distribution data array
values = np.random.normal(Mean, Standard_deviation, size)
  
# plotting histogram of this array
plt.hist(values, 10)
# plotting mean line
plt.axvline(values.mean(), color='k', linestyle='dashed', linewidth=2)
plt.show()

Histogram for normal variates

Next example is using normal distribution functions in Python to calculate the probabilities associated with a sample of student testing scores.

Suppose student test scores follow Normal probability distribution with mean 81 and standard deviation 18.

Question 1: Calculate the percentage of students who have scores less than 60.

Question 2: How much percentage of students having scored better than 95 in testing?

Question 3: Which testing score under which there are about 80% of all the students ?

Following code examples show how to implement normal distribution to solve these problems in Python.

#Solution to question 1
# import required libraries: norm function from scipy.stats 
#module, and numpy module
from scipy.stats import norm
import numpy as np
 
# Given normal distribution information 
mean = 81
std_dev = 18
total_students = 100
score = 60
 
# Calculate z-score, the standardized value for 60
z_score = (score - mean) / std_dev
 
# Calculate the probability of getting a score less than 60
#using norm.cdf() function
#cdf() function is used to calculate the cumulative #probability for all the random values less than or equal to
#a value
prob = norm.cdf(z_score)
 
# Calculate the percentage of students who got less than 60 marks
percent = prob * 100

# Print the result
print("Percentage of students who are worse than 60 in marks:", round(percent, 2), "%")

#Result
Percentage of students who are worse than 60 in marks: 12.17 %

#Solution to question 2
# import required libraries, norm module from scipy.stats
#module , and numpy module
from scipy.stats import norm
import numpy as np
 
# Given distribution information
mean = 81
std_dev = 18
total_students = 100
score = 95
 
# Calculate z-score , the standardized score for 95
z_score = (score - mean) / std_dev
 
# Calculate the probability of getting a less than 95
#using norm.cdf() function
prob = norm.cdf(z_score)
 
# Calculate the percentage of students who got more than 95 #marks
percent = (1-prob) * 100
# Print the result
print("Percentage of students who have better than 95 marks: ", round(percent, 2), " %")

#Result
Percentage of students who have better than 95 marks:  21.84  %

#Solution to question 3
# import required libraries, norm module from scipy.stats
#module , and numpy module
from scipy.stats import norm
import numpy as np
 
# Given statistical information
mean = 81
std_dev = 18
total_students = 100
q_score = 0.8

 

#find the z-value with the cumulative probability 80%
#using norm.ppf() ,which is the inverse of norm.cdf()

z_80 = norm.ppf(q_score)

#then transform z value, which is standard normally #distributed to the score conformed to testing score.
z_80_score = z_80 * std_dev + mean


z_80_score

#Result
96.14918220431245

#Alternative way

z_80_score = norm.ppf(q_score, loc = mean, scale = std_dev )

z_80_score

#Result
96.14918220431245

You can also watch full video on Python tutorial from our YouTube channel.

The post Working with normal distributions in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Poisson distribution implementation in Python

wilsonzhang746 — Sat, 30 Dec 2023 17:01:47 +0000

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

Poisson distribution is a discrete distribution. It is frequently used to model the counts of event occurrence during a specified time interval, such as telephone calls coming in to a call center in a given day. There is one parameter in the Poisson probability function, λ, which denotes the constant occurring rate in a Poisson process.

X here represents the discrete count numbers, and both mean and variance in a Poisson distribution equal to λt.

To use Poisson distribution with Python, you can simply import module poisson from scipy.stats, and use the corresponding functions:

poisson.pmf() for computation of probability functions,

poisson.cdf() for computation of cumulative probability functions, and

poisson.rvs() for random number generation.

Following code examples show how to use these functions in Python environment.

# How to Calculate Probabilities Using a Poisson Distribution
# You can use the poisson.pmf(k, mu) 
#to calculate probabilities related to the specific count #value from a Poisson distribution.

#Example 1: Probability Equal to Some Value

#A store sells 8 icecreams per day on average. What is the 
#probability that they will sell 10 icecreams on a given day? 

from scipy.stats import poisson

#calculate probability
poisson.pmf(k=10, mu=8)
Out[1]: 0.09926153383153544

#You can use the poisson.cdf(k, mu) functions to calculate 
#cumulative probabilities up to a certain discrete value
# from a given Poisson distribution.

#Example 2: Probability Less than Some Value
#A call center has on average 5 calls coming in per hour. 
# What is the probability that this call center has four or #less incoming calls during a given hour?

from scipy.stats import poisson

#calculate probability
poisson.cdf(k=4, mu=5)
Out[2]: 0.44049328506521257

#Example 3
#generate random values from Poisson distribution with mean=8 #and sample size=20
poisson.rvs(mu=8, size=20)
Out[3]: 
array([ 5, 13,  7,  9, 11, 10,  8,  8,  6,  9,  5,  6,  6, 13,  5,  6,  4, 4, 10, 11], dtype=int64)


#Example 4: Probability where occurence Greater than Some #Value

#A certain shoå sells 25 bottles of PersiMax per day on #average. What is the probability that this shop sells more #than 90 bottles of PersiMax in 3 days?

from scipy.stats import poisson

#calculate probability
1-poisson.cdf(k=90, mu=25*3)
Out[5]: 0.039923967285473094

You can also watch video on our YouTube channel which sheds light on using Python for statistical problem.

The post Poisson distribution implementation in Python appeared first on We provide R, Python, Statistics Online-Learning Course.

Calculating Type I Error and Type II Error in Hypothesis Testing using Python

wilsonzhang746 — Fri, 29 Dec 2023 13:35:52 +0000

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

Statistical data analysis always involve uncertainty. In hypothesis testing, the possibility of the other side than the conclusion usually exists, and the analysis commits so-called Type I and Type II errors, with respect to the truth and the decision made upon the random sample and hypotheses. In particular, a Type I error measures the probability that a true Null hypothesis (H0) is incorrectly rejected, and a Type II error says the probability that a false H0 not being rejected, respectively.

In this article, we will show how to implement calculation of both Type I and Type II Errors using Python and example with Binomial distribution.

Null hypothesis (H0) and Alternative hypothesis (H1) in the example are formed as follows:

H0 : the success rate in each Bernoulli trial is p = 0.25

H1: the success rate in each Bernoulli trial is p = 0.50

Decision rule : the critical limit is 8 positives, i.e. if there are more than 8 positives found in a random sample of size 20, then H0 will be rejected in favor of H1; otherwise H0 will be held.

To solve this problem, function binom.cdf() from scipy.stats module in Python is applied. The function is used to compute the cumulative probability til success counts x from Binomial probability distribution trials of size N, with constant success rate p for each trial. The basic form of the function is

binom.cdf(x, N, p)

The following code in Python presents the calculation.

#import module 
from scipy.stats import binom

#setting the success number, sample size
#success rate p for H0 and H1
r = 8 
n = 20
p0 = 0.25

p1 = 0.5


#Type 1 error is the probability of getting success counts
#larger than 8, given H0 is true
type_1_error =  1- binom.cdf(r, n, p0)
type_1_error

Out[1]: 0.04092516770651855

#Type 2 error is the probability of getting success counts
#less than or equal to 8, given H1 is true
type_2_error =  binom.cdf(r, n, p1)
type_2_error

Out[2]: 0.2517223358154297

We see that in this example, the Type I and Type II errors are located at around 0.04 and 0.25, respectively.

Actually,Type I and Type II errors behave as trade-offs, with respect to the decision rule (critical limit) for the given H0 and H1. Now we change the critical limit from 8 to 7, to see how it is going.

#          www.rdatacode.com
#Calculate Type1 and Type2 error in Python
#import module 
from scipy.stats import binom
#setting the success number, sample size
#success rate p for H0 and H1
r = 7 
n = 20
p0 = 0.25
p1 = 0.5

#Type 1 error is the probability of getting success counts
#larger than 7, given H0 is true
type_1_error =  1- binom.cdf(r, n, p0)
type_1_error

Out[1]: 0.10181185692272265

#Type 2 error is the probability of getting success counts
#less than or equal to 7, given H1 is true
type_2_error =  binom.cdf(r, n, p1)
type_2_error

Out[2]: 0.13158798217773438

The result shows that by reducing the critical value from 8 to 7, Type I error has been increased, and Type II error is lowered. The reason is due to the fast that it is more probable to reject H0 than before, thus the probability of committing a wrong rejection is enlarged. On the other hand, because larger chance exists for rejection H0 now, so less chance exists for the false H0 avoiding being rejected, which causes Type II error lessen.

You can also watch video on our YouTube channel for vivid understanding the topic.

The post Calculating Type I Error and Type II Error in Hypothesis Testing using Python appeared first on We provide R, Python, Statistics Online-Learning Course.