尺度大于和小于 1 的 scipy 正态分布 [英] scipy normal distribution with scale greater and less than 1

查看:91
本文介绍了尺度大于和小于 1 的 scipy 正态分布的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 numpy 的正态分布并且很难理解它的文档.假设我有一个均值为 5 且标准差为 0.5 的正态分布:

将 numpy 导入为 np从 matplotlib 导入 pyplot 作为 plt从 scipy.stats 导入规范平均值 = 5标准差 = 0.25x = np.linspace(mean - 3*std, mean + 3*std, 1000)y = norm(loc=mean, scale=std).pdf(x)plt.plot(x,y)

生成的图表是熟悉的钟形曲线,但其峰值在 1.6 左右.任何值的概率如何超过 1?如果我将它乘以 scale 那么概率是正确的.

std(和 scale)大于 1 时没有这样的问题:

均值 = 5标准差 = 10x = np.linspace(mean - 3*std, mean + 3*std, 1000)y = norm(loc=mean, scale=std).pdf(x)plt.plot(x,y)

关于

好吧,这不完全是 1,但让我们通过扩展 x 限制和增加矩形数量来获得更好的近似值:

xlim_approx = [0, 10]n_approx = 100_000width_approx = (xlim_approx[1] - xlim_approx[0])/n_approxx_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)y_approx = norm(loc=mean, scale=std).pdf(x_approx)面积 = y_approx * width_approx打印(总和(面积))

0.9999899999999875

I'm using the normal distribution from numpy and having a hard time understanding its documentation. Let's say I have a normal distribution with mean of 5 and standard deviation of 0.5:

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm

mean = 5
std = 0.25

x = np.linspace(mean - 3*std, mean + 3*std, 1000)
y = norm(loc=mean, scale=std).pdf(x)
plt.plot(x,y)

The resulting chart is the familiar bell curve but with its peak at around 1.6. How can the probability of any value exceed 1? If I multiply it by scale then the probabilities are correct.

No such problem when std (and scale) are greater than 1 however:

mean = 5
std = 10

x = np.linspace(mean - 3*std, mean + 3*std, 1000)
y = norm(loc=mean, scale=std).pdf(x)
plt.plot(x,y)

The documentation on norm says loc is the mean and scale is the standard deviation. Why does it behave so strangely with scale greater and less than 1?

Python 3.8.2. Scipy 1.4.1

解决方案

The "bell curve" you are plotting is a probability density function (PDF). This means that the probability for a random variable with that distribution falling in any interval [a, b] is the area under the curve between a and b. Thus the whole area under the curve (from -infinity to +infinity) must be 1. So when the standard deviation is small, the maximum of the PDF may well be greater than 1, there is nothing strange about that.


Follow-up question: Is the area under the curve in the first plot really 1?

Yes, it is. One way to confirm this is to approximate the area under the curve by calculating the total area of a series of rectangles whose heights are defined by the curve:

import numpy as np
from matplotlib import pyplot as plt
from scipy.stats import norm
import matplotlib.patches as patches

mean = 5
std = 0.25

x = np.linspace(4, 6, 1000)
y = norm(loc=mean, scale=std).pdf(x)

fig, ax = plt.subplots()
ax.plot(x, y)
ax.set_aspect('equal')
ax.set_xlim([4, 6])
ax.set_ylim([0, 1.7])

# Approximate area under the curve by summing over rectangles:

xlim_approx = [4, 6]  # locations of left- and rightmost rectangle
n_approx = 17  # number of rectangles

# width of one rectangle:
width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx  
# x-locations of rectangles:
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
# heights of rectangles:
y_approx = norm(loc=mean, scale=std).pdf(x_approx)

# plot approximation rectangles:
for i, xi in enumerate(x_approx):
    ax.add_patch(patches.Rectangle((xi - width_approx/2, 0), width_approx, 
                                   y_approx[i], facecolor='gray', alpha=.3))

# areas of the rectangles:
areas = y_approx * width_approx

# total area of the rectangles:
print(sum(areas))

0.9411599204607589

Okay, that's not quite 1, but let's get a better approximation by extending the x-limits and inreasing the number of rectangles:

xlim_approx = [0, 10]
n_approx = 100_000

width_approx = (xlim_approx[1] - xlim_approx[0]) / n_approx
x_approx = np.linspace(xlim_approx[0], xlim_approx[1], n_approx)
y_approx = norm(loc=mean, scale=std).pdf(x_approx)

areas = y_approx * width_approx
print(sum(areas))

0.9999899999999875

这篇关于尺度大于和小于 1 的 scipy 正态分布的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆