Seaborn kde 图绘制概率而不是密度(直方图不带条) [英] Seaborn kde plot plotting probabilities instead of density (histplot without bars)

查看:60
本文介绍了Seaborn kde 图绘制概率而不是密度(直方图不带条)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个关于 seaborn kdeplot 的问题.在 histplot 中,可以设置哪些

非常感谢.

解决方案

histplot 的 y 轴与 stat="probability" 对应的概率一个值属于某个柱.0.23 代表最高条的值,意味着鳍状肢长度在 189.7195.6 毫米之间的概率约为 23%(作为该特定垃圾箱的边缘).请注意,默认情况下,遇到的最小值和最大值之间会分布 10 个区间.

kdeplot 的 y 轴类似于

PS:为了只显示kdeplot的概率,代码可以是:

binwidth = 5ax = sns.kdeplot(数据=企鹅,x=flipper_length_mm")ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth)) # 显示轴使得 1/binwidth 对应于 100%ax.set_ylabel(f'概率为 {binwidth}')

另一种选择是使用 kde=True 绘制 histplot,并删除生成的条形图.为了可解释,应该设置 binwidth.使用 binwidth=1,您将获得与密度图相同的 y 轴.(kde_kws={'cut': 3}) 让 kde 平滑地接近零,默认 kde 曲线被数据的最小值和最大值截断.

ax = sns.histplot(data=penguins, x=flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'剪切':3})ax.containers[0].remove() # 移除条形ax.relim() # 轴限制需要在没有条形的情况下重新计算ax.autoscale_view()

I have a question about seaborn kdeplot. In histplot one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde argument, it also applies to the kdeplot. However, I have not found a way how to change it directly in the kdeplot if I wanted to have just the kde plot estimation with probabilities. Alternatively, the same result should be coming from histplot if the bars were possible to be switched off, which I also have not found. So how can one do that?

To give some visual example, I would like to have just the red curve, ie. either pass an argument to kdeplot to use probabilities, or to remove the bars from histplot:

import seaborn

penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="probabilities")
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density")
plt.legend()

Thanks a lot.

解决方案

The y-axis of a histplot with stat="probability" corresponds to the probability that a value belongs to a certain bar. The value of 0.23 for the highest bar, means that there is a probability of about 23% that a flipper length is between 189.7 and 195.6 mm (being the edges of that specific bin). Note that by default, 10 bins are spread out between the minimum and maximum value encountered.

The y-axis of a kdeplot is similar to a probability density function. The height of the curve is proportional to the approximate probability of a value being within a bin of width 1 of the corresponding x-value. A value of 0.031 for x=191 means there is a probability of about 3.1 % that the length is between 190.5 and 191.5.

Now, to directly get probability values next to a kdeplot, first a bin width needs to be chosen. Then the y-values can be divided by that bin with to correspond to an x-value being within a bin of that width. The PercentageFormatter provides a way to set such a correspondence, using ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth)).

The code below illustrates an example with a binwidth of 5 mm, and how a histplot can match a kdeplot.

import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import PercentFormatter

fig, ax1 = plt.subplots()
penguins = sns.load_dataset("penguins")
binwidth = 5
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="Probabilities",
             binwidth=binwidth, ax=ax1)
ax2 = ax1.twinx()
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density", ls=':', lw=5, ax=ax2)
ax2.set_ylim(0, ax1.get_ylim()[1] / binwidth)  # similir limits on the y-axis to align the plots
ax2.yaxis.set_major_formatter(PercentFormatter(1 / binwidth))  # show axis such that 1/binwidth corresponds to 100%
ax2.set_ylabel(f'Probability for a bin width of {binwidth}')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()

PS: To only show the kdeplot with a probability, the code could be:

binwidth = 5
ax = sns.kdeplot(data=penguins, x="flipper_length_mm")
ax.yaxis.set_major_formatter(PercentFormatter(1 / binwidth))  # show axis such that 1/binwidth corresponds to 100%
ax.set_ylabel(f'Probability for a bin width of {binwidth}')

Another option could be to draw a histplot with kde=True, and remove the generated bars. To be interpretable, a binwidth should be set. With binwidth=1 you'd get the same y-axis as a density plot. (kde_kws={'cut': 3}) lets the kde smoothly go to about zero, default the kde curve is cut off with the minimum and maximum of the data).

ax = sns.histplot(data=penguins, x="flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'cut': 3})
ax.containers[0].remove() # remove the bars
ax.relim() # the axis limits need to be recalculated without the bars
ax.autoscale_view()

这篇关于Seaborn kde 图绘制概率而不是密度(直方图不带条)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆