python (sklearn) 中的 2d 核密度估计如何工作? [英] how does 2d kernel density estimation in python (sklearn) work?
问题描述
我很抱歉这个可能很愚蠢的问题,但我现在正在尝试几个小时来估计一组二维数据的密度.假设我的数据由数组给出: sample = np.random.uniform(0,1,size=(50,2))
.我只想使用 scipys scikit 学习包来估计样本数组的密度(这里当然是二维均匀密度),我正在尝试以下操作:
将 numpy 导入为 np从 sklearn.neighbors.kde 导入 KernelDensity从 matplotlib 导入 pyplot 作为 pltsp = 0.01samples = np.random.uniform(0,1,size=(50,2)) # 随机样本x = y = np.linspace(0,1,100)X,Y = np.meshgrid(x,y) # 创建数据网格,评估估计密度kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(samples) # 从样本创建密度kde.score_samples(X,Y) # 我想评估 X,Y 网格上的估计密度
但最后一步总是会产生错误:score_samples() 需要 2 个位置参数,但给出了 3 个
所以可能 .score_samples 不能将网格作为输入,但是没有针对 2d 案例的教程/文档,所以我不知道如何解决这个问题.如果有人可以提供帮助,那就太好了.
查看
I am sorry for the probably stupid question but I am trying now for hours to estimate a density from a set of 2d data. Let's assume my data is given by the array: sample = np.random.uniform(0,1,size=(50,2))
. I just want to use scipys scikit learn package to estimate the density from the sample array (which is here of course a 2d uniform density) and I am trying the following:
import numpy as np
from sklearn.neighbors.kde import KernelDensity
from matplotlib import pyplot as plt
sp = 0.01
samples = np.random.uniform(0,1,size=(50,2)) # random samples
x = y = np.linspace(0,1,100)
X,Y = np.meshgrid(x,y) # creating grid of data , to evaluate estimated density on
kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(samples) # creating density from samples
kde.score_samples(X,Y) # I want to evaluate the estimated density on the X,Y grid
But the last step always yields the error: score_samples() takes 2 positional arguments but 3 were given
So probably .score_samples cannot take a grid as input, but there no tutorials/docs for the 2d case so I don't know how to fix this issue. It would be really great if someone could help.
Looking at the Kernel Density Estimate of Species Distributions example, you have to package the x,y data together (both the training data and the new sample grid).
Below is a function that simplifies the sklearn API.
from sklearn.neighbors import KernelDensity
def kde2D(x, y, bandwidth, xbins=100j, ybins=100j, **kwargs):
"""Build 2D kernel density estimate (KDE)."""
# create grid of sample locations (default: 100x100)
xx, yy = np.mgrid[x.min():x.max():xbins,
y.min():y.max():ybins]
xy_sample = np.vstack([yy.ravel(), xx.ravel()]).T
xy_train = np.vstack([y, x]).T
kde_skl = KernelDensity(bandwidth=bandwidth, **kwargs)
kde_skl.fit(xy_train)
# score_samples() returns the log-likelihood of the samples
z = np.exp(kde_skl.score_samples(xy_sample))
return xx, yy, np.reshape(z, xx.shape)
This gives you the xx, yy, zz needed for something like a scatter or pcolormesh plot. I've copied the example from the scipy page on the gaussian_kde function.
import numpy as np
import matplotlib.pyplot as plt
m1 = np.random.normal(size=1000)
m2 = np.random.normal(scale=0.5, size=1000)
x, y = m1 + m2, m1 - m2
xx, yy, zz = kde2D(x, y, 1.0)
plt.pcolormesh(xx, yy, zz)
plt.scatter(x, y, s=2, facecolor='white')
这篇关于python (sklearn) 中的 2d 核密度估计如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!