如何仅从直方图值创建KDE? [英] How can you create a KDE from histogram values only?

查看:173
本文介绍了如何仅从直方图值创建KDE?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组想要绘制高斯核密度估计值的值,但是我有两个问题:

I have a set of values that I'd like to plot the gaussian kernel density estimation of, however there are two problems that I'm having:

  1. 我只有条形图的值,而没有值本身
  2. 我正在绘制分类轴

这是我到目前为止生成的情节: y轴的顺序实际上是相关的,因为它代表每个细菌物种的系统发育.

Here's the plot I've generated so far: The order of the y axis is actually relevant since it is representative of the phylogeny of each bacterial species.

我想为每种颜色添加一个高斯kde叠加层,但是到目前为止,我还无法利用seaborn或scipy来做到这一点.

I'd like to add a gaussian kde overlay for each color, but so far I haven't been able to leverage seaborn or scipy to do this.

这是上面使用python和matplotlib分组的条形图的代码:

Here's the code for the above grouped bar plot using python and matplotlib:

enterN = len(color1_plotting_values)
fig, ax = plt.subplots(figsize=(20,30))
ind = np.arange(N)    # the x locations for the groups
width = .5         # the width of the bars
p1 = ax.barh(Species_Ordering.Species.values, color1_plotting_values, width, label='Color1', log=True)
p2 = ax.barh(Species_Ordering.Species.values, color2_plotting_values, width, label='Color2', log=True)
for b in p2:
    b.xy = (b.xy[0], b.xy[1]+width)

谢谢!

推荐答案

如何从直方图开始绘制"KDE"

用于内核密度估计的协议需要基础数据.您可以提出一种使用经验pdf(即直方图)的新方法,但是它不是KDE分布.

How to plot a "KDE" starting from a histogram

The protocol for kernel density estimation requires the underlying data. You could come up with a new method that uses the empirical pdf (ie the histogram) instead, but then it wouldn't be a KDE distribution.

但是,并不是所有的希望都消失了.通过首先从直方图中获取样本,然后对这些样本使用KDE,可以很好地近似KDE分布.这是一个完整的工作示例:

Not all hope is lost, though. You can get a good approximation of a KDE distribution by first taking samples from the histogram, and then using KDE on those samples. Here's a complete working example:

import matplotlib.pyplot as plt
import numpy as np
import scipy.stats as sts

n = 100000

# generate some random multimodal histogram data
samples = np.concatenate([np.random.normal(np.random.randint(-8, 8), size=n)*np.random.uniform(.4, 2) for i in range(4)])
h,e = np.histogram(samples, bins=100, density=True)
x = np.linspace(e.min(), e.max())

# plot the histogram
plt.figure(figsize=(8,6))
plt.bar(e[:-1], h, width=np.diff(e), ec='k', align='edge', label='histogram')

# plot the real KDE
kde = sts.gaussian_kde(samples)
plt.plot(x, kde.pdf(x), c='C1', lw=8, label='KDE')

# resample the histogram and find the KDE.
resamples = np.random.choice((e[:-1] + e[1:])/2, size=n*5, p=h/h.sum())
rkde = sts.gaussian_kde(resamples)

# plot the KDE
plt.plot(x, rkde.pdf(x), '--', c='C3', lw=4, label='resampled KDE')
plt.title('n = %d' % n)
plt.legend()
plt.show()

输出:

图中的红色虚线和橙色线几乎完全重叠,表明真实的KDE和通过对直方图进行重采样计算得出的KDE非常吻合.

The red dashed line and the orange line nearly completely overlap in the plot, showing that the real KDE and the KDE calculated by resampling the histogram are in excellent agreement.

如果直方图确实很嘈杂(例如在上面的代码中设置n = 10时所得到的结果),那么在将重采样的KDE用于绘图以外的其他用途时,您应格外谨慎:

If your histograms are really noisy (like what you get if you set n = 10 in the above code), you should be a bit cautious when using the resampled KDE for anything other than plotting purposes:

真实的和重新采样的KDE之间的总体协议仍然很好,但是差异很明显.

Overall the agreement between the real and resampled KDEs is still good, but the deviations are noticeable.

由于您尚未发布实际数据,因此无法提供详细建议.我认为您最好的选择是按顺序对类别进行编号,然后将该数字用作直方图中每个条形的"x"值.

Since you haven't posted your actual data I can't give you detailed advice. I think your best bet will be to just number your categories in order, then use that number as the "x" value of each bar in the histogram.

这篇关于如何仅从直方图值创建KDE?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆