R中kmean的创建预测功能 [英] Creation prediction function for kmean in R

查看:98
本文介绍了R中kmean的创建预测功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要创建预测函数,以预测观察结果属于哪个聚类

I want create predict function which predicts for which cluster, observation belong

data(iris)
 mydata=iris
m=mydata[1:4]
train=head(m,100)
xNew=head(m,10)


rownames(train)<-1:nrow(train)

norm_eucl=function(train)
  train/apply(train,1,function(x)sum(x^2)^.5)
m_norm=norm_eucl(train)


result=kmeans(m_norm,3,30)

predict.kmean <- function(cluster, newdata)
{
  simMat <- m_norm(rbind(cluster, newdata),
              sel=(1:nrow(newdata)) + nrow(cluster))[1:nrow(cluster), ]
  unname(apply(simMat, 2, which.max))
}

## assign new data samples to exemplars
predict.kmean(m_norm, x[result$cluster, ], xNew)

出现错误后

Error in predict.kmean(m_norm, x[result$cluster, ], xNew) : 
  unused argument (xNew)

我知道我正在做一些错误的功能,因为我只是在学习做这件事,但我不知道确切地在哪里.

i understand that i am making something wrong function, cause I'm just learning to do it, but I can't understand where exactly.

实际上我想采用apcluster的类似功能(我见过类似的话题,但对于apcluster)

indeed i want adopt similar function of apcluster ( i had seen similar topic, but for apcluster)

predict.apcluster <- function(s, exemplars, newdata)
{
  simMat <- s(rbind(exemplars, newdata),
              sel=(1:nrow(newdata)) + nrow(exemplars))[1:nrow(exemplars), ]
  unname(apply(simMat, 2, which.max))
}

## assign new data samples to exemplars
predict.apcluster(negDistMat(r=2), x[apres@exemplars, ], xNew)

该怎么做?

推荐答案

让我们提出一个自己的函数,而不是尝试复制某些内容.对于给定的向量x,我们想使用一些先前的k均值输出来分配聚类.给定k-means算法的工作原理,我们想要找到哪个集群的 center 最接近x.可以这样做

Rather than trying to replicate something, let's come up with our own function. For a given vector x, we want to assign a cluster using some prior k-means output. Given how k-means algorithm works, what we want is to find which cluster's center is closest to x. That can be done as

predict.kmeans <- function(x, newdata)
  apply(newdata, 1, function(r) which.min(colSums((t(x$centers) - r)^2)))

也就是说,我们逐行遍历newdata并计算到每个中心的相应行的距离,并找到最小的中心.然后,例如

That is, we go over newdata row by row and compute the corresponding row's distance to each of the centers and find the minimal one. Then, e.g.,

head(predict(result, train / sqrt(rowSums(train^2))), 3)
# 1 2 3 
# 2 2 2
all.equal(predict(result, train / sqrt(rowSums(train^2))), result$cluster)
# [1] TRUE

确认我们的预测功能将所有相同的聚类分配给了训练观测值.然后也

which confirms that our predicting function assigned all the same clusters to the training observations. Then also

predict(result, xNew / sqrt(rowSums(xNew^2)))
#  1  2  3  4  5  6  7  8  9 10 
#  2  2  2  2  2  2  2  2  2  2 

还请注意,我只是在呼叫predict而不是predict.kmeans.这是因为result属于kmeans类,并且会自动选择一种正确的方法.还要注意我如何以向量化的方式规范化数据,而不使用apply.

Notice also that I'm calling simply predict rather than predict.kmeans. That is because result is of class kmeans and a right method is automatically chosen. Also notice how I normalize the data in a vectorized manner, without using apply.

这篇关于R中kmean的创建预测功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆