尺度大于和小于 1 的 scipy 正态分布 [英] scipy normal distribution with scale greater and less than 1
问题描述
我正在使用 numpy 的正态分布并且很难理解它的文档.假设我有一个均值为 5 且标准差为 0.5 的正态分布:
将 numpy 导入为 np从 matplotlib 导入 pyplot 作为 plt从 scipy.stats 导入规范平均值 = 5标准差 = 0.25x = np.linspace(mean - 3*std, mean + 3*std, 1000)y = norm(loc=mean, scale=std).pdf(x)plt.plot(x,y)
生成的图表是熟悉的钟形曲线,但其峰值在 1.6 左右.任何值的概率如何超过 1?如果我将它乘以 scale
那么概率是正确的.
当 std
(和 scale
)大于 1 时没有这样的问题:
均值 = 5标准差 = 10x = np.linspace(mean - 3*std, mean + 3*std, 1000)y = norm(loc=mean, scale=std).pdf(x)plt.plot(x,y)
关于 好吧,这不完全是 1,但让我们通过扩展 x 限制和增加矩形数量来获得更好的近似值: 0.9999899999999875 I'm using the normal distribution from numpy and having a hard time understanding its documentation. Let's say I have a normal distribution with mean of 5 and standard deviation of 0.5: The resulting chart is the familiar bell curve but with its peak at around 1.6. How can the probability of any value exceed 1? If I multiply it by No such problem when The documentation on Python 3.8.2. Scipy 1.4.1 The "bell curve" you are plotting is a probability density function (PDF). This means that the probability for a random variable with that distribution falling in any interval [a, b] is the area under the curve between a and b. Thus the whole area under the curve (from -infinity to +infinity) must be 1. So when the standard deviation is small, the maximum of the PDF may well be greater than 1, there is nothing strange about that. Follow-up question: Is the area under the curve in the first plot really 1? Yes, it is. One way to confirm this is to approximate the area under the curve by calculating the total area of a series of rectangles whose heights are defined by the curve: 0.9411599204607589 Okay, that's not quite 1, but let's get a better approximation by extending the x-limits and inreasing the number of rectangles: 0.9999899999999875 这篇关于尺度大于和小于 1 的 scipy 正态分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!xlim_approx = [0, 10]n_approx = 100_000width_approx = (xlim_approx[1] - xlim_approx[0])/n_approxx_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)y_approx = norm(loc=mean, scale=std).pdf(x_approx)面积 = y_approx * width_approx打印(总和(面积))
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
mean = 5
std = 0.25
x = np.linspace(mean - 3*std, mean + 3*std, 1000)
y = norm(loc=mean, scale=std).pdf(x)
plt.plot(x,y)
scale
then the probabilities are correct.std
(and scale
) are greater than 1 however:mean = 5
std = 10
x = np.linspace(mean - 3*std, mean + 3*std, 1000)
y = norm(loc=mean, scale=std).pdf(x)
plt.plot(x,y)
norm
says loc
is the mean and scale
is the standard deviation. Why does it behave so strangely with scale
greater and less than 1?
import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
import matplotlib.patches as patches
mean = 5
std = 0.25
x = np.linspace(4, 6, 1000)
y = norm(loc=mean, scale=std).pdf(x)
fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_aspect('equal')
ax.set_xlim([4, 6])
ax.set_ylim([0, 1.7])
# Approximate area under the curve by summing over rectangles:
xlim_approx = [4, 6] # locations of left- and rightmost rectangle
n_approx = 17 # number of rectangles
# width of one rectangle:
width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx
# x-locations of rectangles:
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
# heights of rectangles:
y_approx = norm(loc=mean, scale=std).pdf(x_approx)
# plot approximation rectangles:
for i, xi in enumerate(x_approx):
ax.add_patch(patches.Rectangle((xi - width_approx/2, 0), width_approx,
y_approx[i], facecolor='gray', alpha=.3))
# areas of the rectangles:
areas = y_approx * width_approx
# total area of the rectangles:
print(sum(areas))
xlim_approx = [0, 10]
n_approx = 100_000
width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
y_approx = norm(loc=mean, scale=std).pdf(x_approx)
areas = y_approx * width_approx
print(sum(areas))