python (sklearn) 中的 2d 核密度估计如何工作? [英] how does 2d kernel density estimation in python (sklearn) work?

查看:68
本文介绍了python (sklearn) 中的 2d 核密度估计如何工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我很抱歉这个可能很愚蠢的问题,但我现在正在尝试几个小时来估计一组二维数据的密度.假设我的数据由数组给出: sample = np.random.uniform(0,1,size=(50,2)) .我只想使用 scipys scikit 学习包来估计样本数组的密度(这里当然是二维均匀密度),我正在尝试以下操作:

将 numpy 导入为 np从 sklearn.neighbors.kde 导入 KernelDensity从 matplotlib 导入 pyplot 作为 pltsp = 0.01samples = np.random.uniform(0,1,size=(50,2)) # 随机样本x = y = np.linspace(0,1,100)X,Y = np.meshgrid(x,y) # 创建数据网格,评估估计密度kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(samples) # 从样本创建密度kde.score_samples(X,Y) # 我想评估 X,Y 网格上的估计密度

但最后一步总是会产生错误:score_samples() 需要 2 个位置参数,但给出了 3 个

所以可能 .score_samples 不能将网格作为输入,但是没有针对 2d 案例的教程/文档,所以我不知道如何解决这个问题.如果有人可以提供帮助,那就太好了.

解决方案

查看

I am sorry for the probably stupid question but I am trying now for hours to estimate a density from a set of 2d data. Let's assume my data is given by the array: sample = np.random.uniform(0,1,size=(50,2)) . I just want to use scipys scikit learn package to estimate the density from the sample array (which is here of course a 2d uniform density) and I am trying the following:

import numpy as np
from sklearn.neighbors.kde import KernelDensity
from matplotlib import pyplot as plt
sp = 0.01

samples = np.random.uniform(0,1,size=(50,2))  # random samples
x = y = np.linspace(0,1,100)
X,Y = np.meshgrid(x,y)     # creating grid of data , to evaluate estimated density on

kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(samples) # creating density from samples

kde.score_samples(X,Y) # I want to evaluate the estimated density on the X,Y grid

But the last step always yields the error: score_samples() takes 2 positional arguments but 3 were given

So probably .score_samples cannot take a grid as input, but there no tutorials/docs for the 2d case so I don't know how to fix this issue. It would be really great if someone could help.

解决方案

Looking at the Kernel Density Estimate of Species Distributions example, you have to package the x,y data together (both the training data and the new sample grid).

Below is a function that simplifies the sklearn API.

from sklearn.neighbors import KernelDensity

def kde2D(x, y, bandwidth, xbins=100j, ybins=100j, **kwargs): 
    """Build 2D kernel density estimate (KDE)."""

    # create grid of sample locations (default: 100x100)
    xx, yy = np.mgrid[x.min():x.max():xbins, 
                      y.min():y.max():ybins]

    xy_sample = np.vstack([yy.ravel(), xx.ravel()]).T
    xy_train  = np.vstack([y, x]).T

    kde_skl = KernelDensity(bandwidth=bandwidth, **kwargs)
    kde_skl.fit(xy_train)

    # score_samples() returns the log-likelihood of the samples
    z = np.exp(kde_skl.score_samples(xy_sample))
    return xx, yy, np.reshape(z, xx.shape)

This gives you the xx, yy, zz needed for something like a scatter or pcolormesh plot. I've copied the example from the scipy page on the gaussian_kde function.

import numpy as np
import matplotlib.pyplot as plt

m1 = np.random.normal(size=1000)
m2 = np.random.normal(scale=0.5, size=1000)

x, y = m1 + m2, m1 - m2

xx, yy, zz = kde2D(x, y, 1.0)

plt.pcolormesh(xx, yy, zz)
plt.scatter(x, y, s=2, facecolor='white')

这篇关于python (sklearn) 中的 2d 核密度估计如何工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆