在内核k均值(R中的kernlab软件包)中将新数据点分配给群集? [英] Assign new data point to cluster in kernel k-means (kernlab package in R)?

查看:106
本文介绍了在内核k均值(R中的kernlab软件包)中将新数据点分配给群集?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我对R的kernlab软件包中的kkmeans函数有疑问.我是此软件包的新手,如果我在这里缺少明显的内容,请原谅我.

I have a question about the kkmeans function in the kernlab package of R. I am new to this package and please forgive me if I'm missing something obvious here.

我想为一组新的数据点分配一个群集,这些群集是使用具有功能"kkmeans"的内核k-means创建的.对于常规聚类,可以通过计算新数据点与聚类质心之间的欧几里得距离来实现,并选择具有最接近质心的聚类.在内核k均值中,必须在特征空间中执行此操作.

I would like to assign a new data point to a cluster in a set of clusters that were created using kernel k-means with the function 'kkmeans'. With regular clustering, one would do this by calculating the Euclidian distance between the new data point and the cluster centroids, and choose the cluster with the closest centroid. In kernel k-means, one must do this in the feature space.

以kkmeans描述中使用的示例为例:

Take the example used in the kkmeans description:

data(iris)

sc <- kkmeans(as.matrix(iris[,-5]), centers=3)

假设我在这里有一个新的数据点,我想将其分配给上面在sc中创建的最近的群集.

Say that I have a new data point here, which I would like to assign to the closest cluster created above in sc.

Sepal.Length  Sepal.Width  Petal.Length  Petal.Width
     5.0         3.6          1.2         0.4 

有关如何执行此操作的任何提示?非常感谢您的帮助.

Any tips on how to do this? Your help is very appreciated.

推荐答案

内核K-means使用内核功能来计算对象的相似性.在简单的k均值中,您遍历所有质心并选择一个最小化到给定数据点的距离(在使用的度量标准下)的质心.如果使用内核方法(kkmeans中的默认内核函数为radial basis function),则只需遍历质心并选择最大化内核函数值的一个(对于RBF)或最小化内核引起的距离(对于任何内核). 此处提供了将内核转换为距离量度的详细说明-通常是由内核K可以通过d^2(a,b) = K(a,a)+K(b,b)-2K(a,b)进行计算,但是对于RBF,对于所有x而言,K(x,x)=1都可以将K(a,b)最大化,而不是将整个K(a,a)+K(b,b)-2K(a,b)最小化.

Kernel K-means uses the Kernel function to calculate similarity of objects. In the simple k-means you loop through all centroids and select the one which minimizes the distance (under used metric) to the given data point. In case of kernel method (default kernel function in kkmeans is radial basis function), you simply loop through centroids and select the one that maximizes the kernel function value (in case of RBF) or minimizes the kernel induced distance (for any kernel). Detailed description of converting kernel to distance measure is provided here - in general distance induced by kernel K can be calculated through d^2(a,b) = K(a,a)+K(b,b)-2K(a,b), but as in case of RBF, K(x,x)=1 for all x, you can just maximize the K(a,b) instead of minimizing the whole K(a,a)+K(b,b)-2K(a,b).

要从kkmeans对象获取内核函数,可以使用kernelf函数

To get the kernel function from kkmeans object you can use kernelf function

> data(iris)
> sc <- kkmeans(as.matrix(iris[,-5]), centers=3)
> K = kernelf(sc)

因此,您的示例

> c=centers(sc)
> x=c(5.0, 3.6, 1.2, 0.4)
> K(x,c[1,])
             [,1]
[1,] 1.303795e-11
> K(x,c[2,])
             [,1]
[1,] 8.038534e-06
> K(x,c[3,])
          [,1]
[1,] 0.8132268
> which.max( c( K(x,c[1,]), K(x,c[2,]), K(x,c[3,]) ) )
[1] 3

在使用的内核函数的意义上,

最接近的质心是c[3,]=5.032692 3.401923 1.598077 0.3115385.

the closest centroid is c[3,]=5.032692 3.401923 1.598077 0.3115385 in the sense of used kernel function.

这篇关于在内核k均值(R中的kernlab软件包)中将新数据点分配给群集?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆