Seaborn distplot中的y轴是什么? [英] What is y axis in seaborn distplot?

查看:49
本文介绍了Seaborn distplot中的y轴是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些几何分布的数据.当我想看看它时,我使用

I have some geometrically distributed data. When I want to take a look at it, I use

sns.distplot(data, kde=False, norm_hist=True, bins=100)

其结果是一张图片:

但是,bins 高度加起来不等于 1,这意味着 y 轴不显示概率,这是不同的东西.如果我们使用

However, bins heights don't add up to 1, which means y axis doesn't show probability, it's something different. If instead we use

weights = np.ones_like(np.array(data))/float(len(np.array(data)))
plt.hist(data, weights=weights, bins = 100)

y轴应显示概率,因为箱柜高度之和为1:

the y axis shall show probability, as bins heights sum up to 1:

这里可以看得更清楚:假设我们有一个列表

It can be seen more clearly here: suppose we have a list

l = [1, 3, 2, 1, 3]

我们有两个1,两个3和一个2,因此它们各自的概率分别为2/5、2/5和1/5.当我们使用带有 3 个 bin 的 seaborn histplot 时:

We have two 1s, two 3s and one 2, so their respective probabilities are 2/5, 2/5 and 1/5. When we use seaborn histplot with 3 bins:

sns.distplot(l, kde=False, norm_hist=True, bins=3)

我们得到:

如您所见,第1个和第3个bin的总和为0.6 + 0.6 = 1.2,该数字已经大于1,因此y轴不是概率.当我们使用

As you can see, the 1st and the 3rd bin sum up to 0.6+0.6=1.2 which is already greater than 1, so y axis is not a probability. When we use

weights = np.ones_like(np.array(l))/float(len(np.array(l)))
plt.hist(l, weights=weights, bins = 3)

我们得到:

并且y轴是概率,如预期的那样为0.4 + 0.4 + 0.2 = 1.

and the y axis is probability, as 0.4+0.4+0.2=1 as expected.

在这两种情况下,每种情况下使用的两种方法的仓位数量都是相同的:100个仓位用于几何分布数据,3个仓位用于具有3个可能值的小数组l.因此,垃圾箱数量不是问题.

The amount of bins in these 2 cases are is the same for both methods used in each case: 100 bins for geometrically distributed data, 3 bins for small array l with 3 possible values. So bins amount is not the issue.

我的问题是:在用norm_hist=True调用的seaborn distplot中,y轴是什么意思?

My question is: in seaborn distplot called with norm_hist=True, what is the meaning of y axis?

推荐答案

来自文档:

norm_hist :bool,可选

如果为 True,则直方图高度显示密度而不是计数.如果绘制了KDE或拟合密度,则暗示这一点.

If True, the histogram height shows a density rather than a count. This is implied if a KDE or fitted density is plotted.

因此,您还需要考虑bin宽度,即计算曲线下的面积,而不仅仅是bin高度的总和.

So you need to take into account your bin width as well, i.e. compute the area under the curve and not just the sum of the bin heights.

这篇关于Seaborn distplot中的y轴是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆