如何更好地适合海底小提琴? [英] How to better fit seaborn violinplots?

查看:109
本文介绍了如何更好地适合海底小提琴?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

以下代码为我提供了一个非常漂亮的小提琴图(以及其中的箱线图).

The following code gives me a very nice violinplot (and boxplot within).

import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

foo = np.random.rand(100)
sns.violinplot(foo)
plt.boxplot(foo)
plt.show()

到目前为止,一切都很好.但是,当我查看foo时,该变量不包含任何负值. seaborn图在这里似乎具有误导性.普通的matplotlib boxplot提供了一些与我期望的更接近的东西.

So far so good. However, when I look at foo, the variable does not contain any negative values. The seaborn plot seems misleading here. The normal matplotlib boxplot gives something closer to what I would expect.

如何制作更合适的小提琴图(不显示假负值)?

How can I make violinplots with a better fit (not showing false negative values)?

推荐答案

作为注释,这是基于高斯KDE的假设的结果(我不确定我将其称为工件").如前所述,这在某种程度上是不可避免的,并且如果您的数据不符合这些假设,则最好使用箱形图,最好只显示实际数据中存在的点.

As the comments note, this is a consequence (I'm not sure I'd call it an "artifact") of the assumptions underlying gaussian KDE. As has been mentioned, this is somewhat unavoidable, and if your data don't meet those assumptions, you might be better off just using a boxplot, which shows only points that exist in the actual data.

但是,在您的答复中,您询问它是否可以更紧",这可能意味着一些事情.

However, in your response you ask about whether it could be fit "tighter", which could mean a few things.

一个答案可能是改变平滑内核的带宽.您可以使用bw参数来完成此操作,该参数实际上是一个比例因子.将使用的带宽为bw * data.std():

One answer might be to change the bandwidth of the smoothing kernel. You do that with the bw argument, which is actually a scale factor; the bandwidth that will be used is bw * data.std():

data = np.random.rand(100)
sns.violinplot(y=data, bw=.1)

另一个答案可能是在数据点的末端截断小提琴. KDE仍将适合,其密度将超出数据范围,但不会显示尾部.您可以使用cut参数执行此操作,该参数指定应绘制超出极限值的带宽单位数.要截断,请将其设置为0:

Another answer might be to truncate the violin at the extremes of the datapoints. The KDE will still be fit with densities that extend past the bounds of your data, but the tails will not be shown. You do that with the cut parameter, which specifies how many units of bandwidth past the extreme values the density should be drawn. To truncate, set it to 0:

sns.violinplot(y=data, cut=0)

顺便说一句,violinplot的API是要进行更改(在0.6中) ,而我在这里使用的是开发版本,但是bwcut参数都存在于当前发布的版本中,并且其行为大致相同.

By the way, the API for violinplot is going to change in 0.6, and I'm using the development version here, but both the bw and cut arguments exist in the current released version and behave more or less the same way.

这篇关于如何更好地适合海底小提琴?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆