选择带宽和线性空间以进行内核密度估计. (为什么我的带宽不起作用?) [英] choosing bandwidth&linspace for kernel density estimation. (why my bandwidth doesn't work?)

查看:409
本文介绍了选择带宽和线性空间以进行内核密度估计. (为什么我的带宽不起作用?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已遵循

I have followed this link for the application of kernel density estimation. My aim is creating two different groups/clusters or more for an array group. The below code works for every members of array group except this array:

X = np.array([[77788], [77793],[77798], [77803], [92886], [92891], [92896], [92901]])

所以我的期望是看到两个不同的集群,例如:

So my expectation is seeing two different clusters such as:

first_group =([[77788],[77793],[77798],[77803]])

first_group = ([[77788], [77793],[77798], [77803]])

second_group =([[92886],[92891],[92896],[92901]])

second_group = ([[92886], [92891], [92896], [92901]])

我有一个动态列表,所以我无法固定linspace的值.因为此数组可能是0到10或100000到2000000.这就是为什么我将数组的最大和最小点放在linspace中的原因.

I have a dynamic list, so I can not fix a value for linspace. Because this array may be 0to 10 or 100000 to 2000000. That's why I have put max and min points of the array in the linspace.

毕竟,即使尝试了各种带宽,我也无法获得不同的群集.我的代码如下所示:

After all, I could not obtain different clusters even though I tried various bandwidths. My code can be seen below:

a = X.reshape(-1,1)
kde = KernelDensity(kernel='gaussian', bandwidth=8).fit(a)
s = linspace(min(a),max(a))
e = kde.score_samples(s.reshape(-1,1))
plot(s, e)

mi, ma = argrelextrema(e, np.less)[0], argrelextrema(e, np.greater)[0]
print("Minima:", s[mi])  # output: []
print("Maxima:", s[ma])  # output: []

s [mi]和s [ma]的值为空,这表示此数组没有两个不同的簇.在可视化中可以看到,我们至少有一个最小点.为什么看不到s [mi]输出的此值?

s[mi] and s[ma] values are empty which means there is no two different clusters for this array. In the visualization can be seen that we have at least one minimum point. why can not be seen this value for the s[mi] output?

我将相同的代码应用于不同的带宽,如下所示,但是,此群集没有最小值或最大值.所以知道我在做什么错吗?

And I applied the same code for different bandwidths which can be seen below, however, there is no minimum or maximum values for this cluster. so any idea what am I doing wrong?

bandwidth=0.008

bandwidth = 0.00002

推荐答案

尝试使用10000的带宽,或者尝试依靠启发式方法选择带宽.

Try a bandwidth of 10000, or try relying on heuristics for choosing the bandwidth.

为了使您的代码更健壮,还可以在连续的最小值处拆分群集.因为您的问题是这里没有唯一的最小值,而是一个间隔.

To make your code more robusty also split clusters at consecutive minima. Because your problem is that there is no unique minimum here, but an interval.

这篇关于选择带宽和线性空间以进行内核密度估计. (为什么我的带宽不起作用?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆