大于 1 的标准化直方图 y 轴 [英] Normed histogram y-axis larger than 1

查看:55
本文介绍了大于 1 的标准化直方图 y 轴的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有时当我使用 seaborn 的 displot 函数创建直方图时,如果 norm_hist = True,y 轴小于 1,正如预期的 PDF 一样.其他时候,它取的值大于1.

Sometimes when I create a histogram, using say seaborn's displot function, with norm_hist = True, the y-axis is less than 1 as expected for a PDF. Other times it takes on values greater than one.

例如,如果我运行

        sns.set(); 
        x = np.random.randn(10000)
        ax = sns.distplot(x)

然后直方图上的 y 轴按预期从 0.0 变为 0.4,但如果数据不正常,即使 norm_hist = True,y 轴也可以大到 30.

Then the y-axis on the histogram goes from 0.0 to 0.4 as expected, but if the data is not normal the y-axis can be as large as 30 even if norm_hist = True.

关于直方图函数的归一化参数,我遗漏了什么,例如nors_hist是否适用于sns.distplot?即使我自己通过以下方式创建新变量来规范化数据:

What am I missing about the normalization arguments for histogram functions, e.g. norm_hist for sns.distplot? Even if I normalize the data myself by creating a new variable thus:

        new_var = data/sum(data)

,以便数据求和为1,无论norm_hist参数是否为True,y轴仍将显示远大于1的值(例如30).

so that the data sums to 1, the y-axis will still show values far larger than 1 (like 30 for example) whether the norm_hist argument is True or not.

当y轴具有如此大的范围时,我能给出什么解释?

What interpretation can I give when the y-axis has such a large range?

我认为发生的事情是我的数据集中在零附近,因此为了使数据的面积等于 1(例如在 kde 下),直方图的高度必须大于 1...但既然概率不能大于 1,结果意味着什么?

I think what is happening is my data is concentrated closely around zero so in order for the data to have an area equal to 1 (under the kde for example) the height of the histogram has to be larger than 1...but since probabilities can't be above 1 what does the result mean?

此外,如何获得这些函数以在y轴上显示概率?

Also, how can I get these functions to show probability on the y-axis?

推荐答案

规则不是所有的小节都应加一.规则是所有条形的所有面积总和应为 1.当条形图非常窄时,尽管它们的面积之和为1,但它们的总和可能会很大.条形的高度乘以其宽度是一个值全部在该范围内的概率.要使高度等于概率,您需要宽度为 1 的条形.

The rule isn't that all the bars should sum to one. The rule is that all the areas of all the bars should sum to one. When the bars are very narrow, their sum can be quite large although their areas sum to one. The height of a bar times its width is the probability that a value would all in that range. To have the height being equal to the probability, you need bars of width one.

这里有一个例子来说明发生了什么.

Here is an example to illustrate what's going on.

import numpy as np
from matplotlib import pyplot as plt
import seaborn as sns


fig, axs = plt.subplots(ncols=2, figsize=(14, 3))

a = np.random.normal(0, 0.01, 100000)
sns.distplot(a, bins=np.arange(-0.04, 0.04, 0.001), ax=axs[0])
axs[0].set_title('Measuring in meters')
axs[0].containers[0][40].set_color('r')

a *= 1000
sns.distplot(a, bins=np.arange(-40, 40, 1), ax=axs[1])
axs[1].set_title('Measuring in milimeters')
axs[1].containers[0][40].set_color('r')

plt.show()

左侧的图使用 0.001 米宽的箱.最高料仓(红色)大约为 40 高.值落入该bin的概率为 40 * 0.001 = 0.04 .

The plot at the left uses bins of 0.001 meter wide. The highest bin (in red) is about 40 high. The probability that a value falls into that bin is 40*0.001 = 0.04.

右侧的图使用完全相同的数据,但以毫米为单位.现在,垃圾箱的宽度为 1 mm .最高的 bin 大约是 0.04 高.由于bin宽度为 1 ,因此值落入该bin的概率也为 0.04 .

The plot at the right uses exactly the same data, but measures in milimeter. Now the bins are 1 mm wide. The highest bin is about 0.04 high. The probability that a value falls into that bin is also 0.04, because of the bin width of 1.

PS:作为概率密度函数区域大于 1 的分布示例,请参阅 帕累托分布α = 3.

PS: As an example of a distribution for which the probability density function has zones larger than 1, see the Pareto distribution with α = 3.

这篇关于大于 1 的标准化直方图 y 轴的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆