Seaborn kde 图绘制概率而不是密度(直方图不带条) [英] Seaborn kde plot plotting probabilities instead of density (histplot without bars)
问题描述
我有一个关于 seaborn kdeplot
的问题.在 histplot
中,可以设置哪些
非常感谢.
histplot
的 y 轴与 stat="probability"
对应的概率一个值属于某个柱.0.23
代表最高条的值,意味着鳍状肢长度在 189.7
和 195.6
毫米之间的概率约为 23%(作为该特定垃圾箱的边缘).请注意,默认情况下,遇到的最小值和最大值之间会分布 10 个区间.
kdeplot
的 y 轴类似于
PS:为了只显示kdeplot
的概率,代码可以是:
binwidth = 5ax = sns.kdeplot(数据=企鹅,x=flipper_length_mm")ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth)) # 显示轴使得 1/binwidth 对应于 100%ax.set_ylabel(f'概率为 {binwidth}')
另一种选择是使用 kde=True
绘制 histplot
,并删除生成的条形图.为了可解释,应该设置 binwidth
.使用 binwidth=1
,您将获得与密度图相同的 y 轴.(kde_kws={'cut': 3})
让 kde 平滑地接近零,默认 kde 曲线被数据的最小值和最大值截断.
ax = sns.histplot(data=penguins, x=flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'剪切':3})ax.containers[0].remove() # 移除条形ax.relim() # 轴限制需要在没有条形的情况下重新计算ax.autoscale_view()
I have a question about seaborn kdeplot
. In histplot
one can set up which stats they want to have (counts, frequency, density, probability) and if used with the kde
argument, it also applies to the kdeplot
. However, I have not found a way how to change it directly in the kdeplot
if I wanted to have just the kde plot estimation with probabilities. Alternatively, the same result should be coming from histplot
if the bars were possible to be switched off, which I also have not found. So how can one do that?
To give some visual example, I would like to have just the red curve, ie. either pass an argument to kdeplot
to use probabilities
, or to remove the bars from histplot
:
import seaborn
penguins = sns.load_dataset("penguins")
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="probabilities")
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density")
plt.legend()
Thanks a lot.
The y-axis of a histplot
with stat="probability"
corresponds to the probability that a value belongs to a certain bar. The value of 0.23
for the highest bar, means that there is a probability of about 23% that a flipper length is between 189.7
and 195.6
mm (being the edges of that specific bin). Note that by default, 10 bins are spread out between the minimum and maximum value encountered.
The y-axis of a kdeplot
is similar to a probability density function. The height of the curve is proportional to the approximate probability of a value being within a bin of width 1
of the corresponding x-value. A value of 0.031
for x=191
means there is a probability of about 3.1 %
that the length is between 190.5
and 191.5
.
Now, to directly get probability values next to a kdeplot
, first a bin width needs to be chosen. Then the y-values can be divided by that bin with to correspond to an x-value being within a bin of that width. The PercentageFormatter
provides a way to set such a correspondence, using ax.yaxis.set_major_formatter(PercentFormatter(1/binwidth))
.
The code below illustrates an example with a binwidth of 5 mm
, and how a histplot
can match a kdeplot
.
import matplotlib.pyplot as plt
import seaborn as sns
from matplotlib.ticker import PercentFormatter
fig, ax1 = plt.subplots()
penguins = sns.load_dataset("penguins")
binwidth = 5
sns.histplot(data=penguins, x="flipper_length_mm", kde=True, stat="probability", color="r", label="Probabilities",
binwidth=binwidth, ax=ax1)
ax2 = ax1.twinx()
sns.kdeplot(data=penguins, x="flipper_length_mm", color="k", label="kde density", ls=':', lw=5, ax=ax2)
ax2.set_ylim(0, ax1.get_ylim()[1] / binwidth) # similir limits on the y-axis to align the plots
ax2.yaxis.set_major_formatter(PercentFormatter(1 / binwidth)) # show axis such that 1/binwidth corresponds to 100%
ax2.set_ylabel(f'Probability for a bin width of {binwidth}')
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
plt.show()
PS: To only show the kdeplot
with a probability, the code could be:
binwidth = 5
ax = sns.kdeplot(data=penguins, x="flipper_length_mm")
ax.yaxis.set_major_formatter(PercentFormatter(1 / binwidth)) # show axis such that 1/binwidth corresponds to 100%
ax.set_ylabel(f'Probability for a bin width of {binwidth}')
Another option could be to draw a histplot
with kde=True
, and remove the generated bars. To be interpretable, a binwidth
should be set. With binwidth=1
you'd get the same y-axis as a density plot. (kde_kws={'cut': 3})
lets the kde smoothly go to about zero, default the kde curve is cut off with the minimum and maximum of the data).
ax = sns.histplot(data=penguins, x="flipper_length_mm", binwidth=1, kde=True, stat='probability', kde_kws={'cut': 3})
ax.containers[0].remove() # remove the bars
ax.relim() # the axis limits need to be recalculated without the bars
ax.autoscale_view()
这篇关于Seaborn kde 图绘制概率而不是密度(直方图不带条)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!