如何更好地适应seaborn violinplots? [英] How to better fit seaborn violinplots?

查看:31
本文介绍了如何更好地适应seaborn violinplots?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码为我提供了一个非常好的小提琴图(以及其中的箱线图).

The following code gives me a very nice violinplot (and boxplot within).

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

foo = np.random.rand(100)
sns.violinplot(foo)
plt.boxplot(foo)
plt.show()

到目前为止一切顺利.但是,当我查看 foo 时,该变量不包含任何负值.seaborn 情节在这里似乎具有误导性.正常的 matplotlib 箱线图给出的结果更接近我的预期.

So far so good. However, when I look at foo, the variable does not contain any negative values. The seaborn plot seems misleading here. The normal matplotlib boxplot gives something closer to what I would expect.

如何制作更适合的小提琴图(不显示假负值)?

How can I make violinplots with a better fit (not showing false negative values)?

推荐答案

正如评论所指出的,这是高斯 KDE 基础假设的结果(我不确定我是否将其称为人工制品").如前所述,这是不可避免的,如果您的数据不符合这些假设,您最好使用箱线图,它只显示实际数据中存在的点.

As the comments note, this is a consequence (I'm not sure I'd call it an "artifact") of the assumptions underlying gaussian KDE. As has been mentioned, this is somewhat unavoidable, and if your data don't meet those assumptions, you might be better off just using a boxplot, which shows only points that exist in the actual data.

但是,在您的回复中,您询问是否可以更紧"地合身,这可能意味着一些事情.

However, in your response you ask about whether it could be fit "tighter", which could mean a few things.

一个答案可能是改变平滑内核的带宽.您可以使用 bw 参数来实现,它实际上是一个比例因子;将使用的带宽是 bw * data.std():

One answer might be to change the bandwidth of the smoothing kernel. You do that with the bw argument, which is actually a scale factor; the bandwidth that will be used is bw * data.std():

data = np.random.rand(100)
sns.violinplot(y=data, bw=.1)

另一个答案可能是在数据点的极端处截断小提琴.KDE 仍将拟合密度超出数据范围,但不会显示尾部.您可以使用 cut 参数执行此操作,该参数指定应绘制的密度超过极值的带宽单位数.要截断,请将其设置为 0:

Another answer might be to truncate the violin at the extremes of the datapoints. The KDE will still be fit with densities that extend past the bounds of your data, but the tails will not be shown. You do that with the cut parameter, which specifies how many units of bandwidth past the extreme values the density should be drawn. To truncate, set it to 0:

sns.violinplot(y=data, cut=0)

顺便说一下,violinplot 的 API 是要改变 在 0.6 中,我在这里使用的是开发版本,但是 bwcut 参数都存在于当前发布的版本中并且行为或多或少相同方式.

By the way, the API for violinplot is going to change in 0.6, and I'm using the development version here, but both the bw and cut arguments exist in the current released version and behave more or less the same way.

这篇关于如何更好地适应seaborn violinplots?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆