设定seaborn kdeplot的置信度 [英] Set confidence levels in seaborn kdeplot

查看:90
本文介绍了设定seaborn kdeplot的置信度的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对seaborn完全陌生,所以如果这是一个简单的问题,我深表歉意,但是我在文档的任何地方都找不到关于如何在kdeplot中控制由n_levels绘制的级别的描述.这是一个例子:

I'm completely new to seaborn, so apologies if this is a simple question, but I cannot find anywhere in the documentation a description of how the levels plotted by n_levels are controlled in kdeplot. This is an example:

import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt

x,y=np.random.randn(2,10000)

fig,ax=plt.subplots()
sns.kdeplot(x,y, shade=True,shade_lowest=False, ax=ax,n_levels=3,cmap="Reds")
plt.show()

这是结果图:

我希望能够知道显示了什么置信度,这样我就可以将图标记为阴影区域显示(a,b,c)百分比置信区间".我会天真地假设 n_levels 以某种方式与高斯中的等效西格玛"有关,但从这个例子来看,情况并非如此.

I would like to be able to know what confidence levels are shown, so that I can label my plot "shaded regions show the (a,b,c) percentage confidence intervals." I would naively assume that n_levels is somehow related to equivalent "sigmas" in a Gaussian, but from the example that doesn't look to be the case.

理想情况下,我希望能够通过将元组传递给 kdeplot 来指定显示的间隔,例如:

Ideally, I would like to be able to specify the intervals shown by passing a tuple to kdeplot, such as:

levels=[68,95,99]

并绘制这些置信区域.

感谢@Goyo 和@tom,我想我可以澄清我的问题,并得到我正在寻找的答案的一部分.正如所指出的,n_levels 作为 levels 传递给 plt.cotourf,因此可以传递一个列表.但是 sns.kdeplot 会绘制PDF,而PDF中的值并不对应于我要寻找的置信区间(因为它们对应于PDF的积分).

Thanks to @Goyo and @tom I think I can clarify my question, and come partway to the answer I am looking for. As pointed out, n_levels is passed to plt.cotourf as levels, and so a list can be passed. But sns.kdeplot plots the PDF, and the values in the PDF don't correspond to the confidence intervals I am looking for (since these correspond to integration of the PDF).

我需要做的是通过 sns.kdeplot 集成(和标准化)PDF 的 x,y 值,然后我将能够输入例如n_levels=[0.68,0.95,0.99,1].

What I need to do is pass sns.kdeplot the x,y values of the integrated (and normalized) PDF, and then I will be able to enter e.g. n_levels=[0.68,0.95,0.99,1].

我现在已经解决了这个问题.见下文.我使用二维规范化直方图来定义置信区间,然后将其作为级别传递给规范化kde图.抱歉重复,我可以创建一个函数来返回级别,但我明确地输入了所有内容.

EDIT 2: I have now solved this problem. See below. I use a 2d normed histogram to define the confidence intervals, which I then pass as levels to the normed kde plot. Apologies for repetition, I could make a function to return levels, but I typed it all out explicitly.

import numpy as np
import scipy.optimize
import matplotlib.pyplot as plt
import seaborn as sns

# Generate some random data
x,y=np.random.randn(2,100000)

# Make a 2d normed histogram
H,xedges,yedges=np.histogram2d(x,y,bins=40,normed=True)

norm=H.sum() # Find the norm of the sum
# Set contour levels
contour1=0.99
contour2=0.95
contour3=0.68

# Set target levels as percentage of norm
target1 = norm*contour1
target2 = norm*contour2
target3 = norm*contour3

# Take histogram bin membership as proportional to Likelihood
# This is true when data comes from a Markovian process
def objective(limit, target):
    w = np.where(H>limit)
    count = H[w]
    return count.sum() - target

# Find levels by summing histogram to objective
level1= scipy.optimize.bisect(objective, H.min(), H.max(), args=(target1,))
level2= scipy.optimize.bisect(objective, H.min(), H.max(), args=(target2,))
level3= scipy.optimize.bisect(objective, H.min(), H.max(), args=(target3,))

# For nice contour shading with seaborn, define top level
level4=H.max()
levels=[level1,level2,level3,level4]

# Pass levels to normed kde plot
fig,ax=plt.subplots()
sns.kdeplot(x,y, shade=True,ax=ax,n_levels=levels,cmap="Reds_d",normed=True)
ax.set_aspect('equal')
plt.show()

结果图如下:

水平比我预期的要宽一些,但我认为这是正确的.

The levels are slightly wider than I expect, but I think this is correct.

推荐答案

这些级别不是置信区间或 sigma,而是估计 pdf 的值.您能够将级别作为列表而不是n_levels通过.

The levels are not confidente intervals or sigmas but values of the estimated pdf. You are able to pass the levels as a list instead as n_levels.

编辑

Seaborn 只是在策划事情.它不会给你估计的 pdf,只是一个 matplotlib 轴.因此,如果要使用kde pdf进行计算,则必须自己估算.

Seaborn just plot things. It won't give you the estimated pdf, just a matplotlib axes. So if you want do do calculations with the kde pdf you'll have to estimate it by yourself.

Seaborn在后台使用statsmodels或scipy,因此您可以执行相同操作.Statsmodels 也可以为您提供 cdf,如果这是您正在寻找的(也许是 scipy,但我不确定).您可以计算您感兴趣的级别,在网格中评估pdf并将所有内容传递给contourf,这或多或少是seaborn的.

Seaborn uses statsmodels or scipy under the hood so you can do the same. Statsmodels can give you also the cdf if that is what you are looking for (and maybe scipy but I am not sure). You can compute the levels you are interested in, evaluate the pdf in a grid and pass everything to contourf, which is more or less what seaborn does.

不幸的是,我没有足够的技巧给您更多建议(我不时使用statsmodels进行OLS回归),但是您可以查看 kdeplot 的代码并找出答案.

Unfortunately I am not skilled enough yo give you more advice on this (I just use statsmodels for OLS regressions every now and then) but you can look at the code of kdeplotand figure out.

这篇关于设定seaborn kdeplot的置信度的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆