R中kmean的创建预测功能 [英] Creation prediction function for kmean in R
问题描述
我要创建预测函数,以预测观察结果属于哪个聚类
I want create predict function which predicts for which cluster, observation belong
data(iris)
mydata=iris
m=mydata[1:4]
train=head(m,100)
xNew=head(m,10)
rownames(train)<-1:nrow(train)
norm_eucl=function(train)
train/apply(train,1,function(x)sum(x^2)^.5)
m_norm=norm_eucl(train)
result=kmeans(m_norm,3,30)
predict.kmean <- function(cluster, newdata)
{
simMat <- m_norm(rbind(cluster, newdata),
sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ]
unname(apply(simMat, 2, which.max))
}
## assign new data samples to exemplars
predict.kmean(m_norm, x[result$cluster, ], xNew)
出现错误后
Error in predict.kmean(m_norm, x[result$cluster, ], xNew) :
unused argument (xNew)
我知道我正在做一些错误的功能,因为我只是在学习做这件事,但我不知道确切地在哪里.
i understand that i am making something wrong function, cause I'm just learning to do it, but I can't understand where exactly.
实际上我想采用apcluster的类似功能(我见过类似的话题,但对于apcluster)
indeed i want adopt similar function of apcluster ( i had seen similar topic, but for apcluster)
predict.apcluster <- function(s, exemplars, newdata)
{
simMat <- s(rbind(exemplars, newdata),
sel=(1:nrow(newdata)) + nrow(exemplars))[1:nrow(exemplars), ]
unname(apply(simMat, 2, which.max))
}
## assign new data samples to exemplars
predict.apcluster(negDistMat(r=2), x[apres@exemplars, ], xNew)
该怎么做?
推荐答案
让我们提出一个自己的函数,而不是尝试复制某些内容.对于给定的向量x
,我们想使用一些先前的k均值输出来分配聚类.给定k-means算法的工作原理,我们想要找到哪个集群的 center 最接近x
.可以这样做
Rather than trying to replicate something, let's come up with our own function. For a given vector x
, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x
. That can be done as
predict.kmeans <- function(x, newdata)
apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))
也就是说,我们逐行遍历newdata
并计算到每个中心的相应行的距离,并找到最小的中心.然后,例如
That is, we go over newdata
row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,
head(predict(result, train / sqrt(rowSums(train^2))), 3)
# 1 2 3
# 2 2 2
all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
# [1] TRUE
确认我们的预测功能将所有相同的聚类分配给了训练观测值.然后也
which confirms that our predicting function assigned all the same clusters to the training observations. Then also
predict(result, xNew / sqrt(rowSums(xNew^2)))
# 1 2 3 4 5 6 7 8 9 10
# 2 2 2 2 2 2 2 2 2 2
还请注意,我只是在呼叫predict
而不是predict.kmeans
.这是因为result
属于kmeans
类,并且会自动选择一种正确的方法.还要注意我如何以向量化的方式规范化数据,而不使用apply
.
Notice also that I'm calling simply predict
rather than predict.kmeans
. That is because result
is of class kmeans
and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply
.
这篇关于R中kmean的创建预测功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!