使用频谱聚类对看不见的点进行聚类 [英] Cluster unseen points using Spectral Clustering

查看:100
本文介绍了使用频谱聚类对看不见的点进行聚类的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用光谱聚类将数据聚类的方法.该实施似乎正常工作.但是,我有一个问题-我有一组看不见的点(在训练集中不存在),并且希望基于k均值得出的质心对这些点进行聚类(本文的第5步).但是,由于在k个特征向量上计算了k均值,因此质心是低维的.

I am using Spectral Clustering method to cluster my data. The implementation seems to work properly. However, I have one problem - I have a set of unseen points (not present in the training set) and would like to cluster these based on the centroids derived by k-means (Step 5 in the paper). However, the k-means is computed on the k eigenvectors and therefore the centroids are low-dimensional.

任何人都知道一种方法,该方法可用于将看不见的点映射到低维,并计算投影点与步骤5中由k均值得出的质心之间的距离.

Does any-one knows a method that can be used to map an unseen point to a low-dimension and compute the distance between the projected point and the centroids derived by k-means in step 5.

推荐答案

最新答案,但这是在R中执行此操作的方法.我一直在自己搜索它,但最终还是自己编写了代码.

Late answer, but here's how to do it in R. I've been searching it myself, but I finally managed to code it myself.

##Let's use kernlab for all kernel stuff
library(kernlab)

##Let's generate two concentric circles to cluster
r1 = 1 + .1*rnorm(250) #inner
r2 = 2 + .1*rnorm(250) #outer
q1 = 2*pi*runif(500) #random angle distribution
q2 = 2*pi*runif(500) #random angle distribution

##This is our data now
data = cbind(x = c(r1*cos(q1),r2*cos(q2)), y = c(r1*sin(q1),r2*sin(q2)))

##Let's take a sample to define train and test data
t = sample(1:nrow(data), 0.95*nrow(data))
train = data[t,]
test = data[-t,]

##This is our data
plot(train, pch = 1, col = adjustcolor("black", alpha = .5))
points(test, pch = 16)
legend("topleft", c("train data","test data"), pch = c(1,16), bg = "white")


##The paper gives great instructions on how to perform spectral clustering
##so I'll be following the steps
##Ng, A. Y., Jordan, M. I., & Weiss, Y. (2002). On spectral clustering: Analysis and an algorithm. Advances in neural information processing systems, 2, 849-856.
##Pg.2 http://papers.nips.cc/paper/2092-on-spectral-clustering-analysis-and-an-algorithm.pdf
#1. Form the affinity matrix
k = 2L #This is the number ofo clusters we will train
K = rbfdot(sigma = 300) #Our kernel
A = kernelMatrix(K, train) #Caution choosing your kernel product function, some have higher numerical imprecision
diag(A) = 0
#2. Define the diagonal matrix D and the laplacean matrix L
D = diag(rowSums(A))
L = diag(1/sqrt(diag(D))) %*% A %*% diag(1/sqrt(diag(D)))
#3. Find the eigenvectors of L
X = eigen(L, symmetric = TRUE)$vectors[,1:k]
#4. Form Y from X
Y = X/sqrt(rowSums(X^2))
#5. Cluster (k-means)
kM = kmeans(Y, centers = k, iter.max = 100L, nstart = 1000L)
#6. This is the cluster assignment of the original data
cl = fitted(kM, "classes")
##Projection on eigen vectors, see the ranges, it shows how there's a single preferential direction
plot(jitter(Y, .1), ylab = "2nd eigenfunction", xlab = "1st eigenfunction", col = adjustcolor(rainbow(3)[2*cl-1], alpha = .5))

##LET'S TRY TEST DATA NOW
B = kernelMatrix(K, test, train) #The kernel product between train and test data

##We project on the learned eigenfunctions
f = tcrossprod(B, t(Y))
#This part is described in Bengio, Y., Vincent, P., Paiement, J. F., Delalleau, O., Ouimet, M., & Le Roux, N. (2003). Spectral clustering and kernel PCA are learning eigenfunctions (Vol. 1239). CIRANO.
#Pg.12 http://www.cirano.qc.ca/pdf/publication/2003s-19.pdf

##And assign clusters based on the centers in that space
new.cl = apply(as.matrix(f), 1, function(x) { which.max(tcrossprod(x,kM$centers)) } ) #This computes the distance to the k-means centers on the transformed space

##And here's our result
plot(train, pch = 1, col = adjustcolor(rainbow(3)[2*cl-1], alpha = .5))
points(test, pch = 16, col = rainbow(3)[2*new.cl-1])
legend("topleft", c("train data","test data"), pch = c(1,16), bg = "white")

输出图片

这篇关于使用频谱聚类对看不见的点进行聚类的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆