考虑中心顺序在kmean结果中重新标记样本 [英] Relabel samples in kmean results considering the order of centers

查看:64
本文介绍了考虑中心顺序在kmean结果中重新标记样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用kmeans对数据进行聚类,对于产生的结果,我有一个计划.

I am using kmeans to cluster my data, for the produced result I have a plan.

我想根据订购的中心重新标记样品.考虑以下示例:

I wanted to relabel the samples based on ordered centres. Consider following example :

a = c("a","b","c","d","e","F","i","j","k","l","m","n")
b = c(1,2,3,20,21,21,40,41,42,4,23,50)

mydata = data.frame(id=a,amount=b)
result = kmeans(mydata$amount,3,nstart=10)

这是结果:

clus$cluster 
2 2 2 3 3 3 1 1 1 2 3 1

clus$centers
1 43.25
2  2.50
3 21.25


mydata = data.frame(mydata,label =clus$cluster)
mydata
    id amount  label
1   a      1        2
2   b      2        2
3   c      3        2
4   d     20        3
5   e     21        3
6   F     21        3
7   i     40        1
8   j     41        1
9   k     42        1
10  l      4        2
11  m     23        3
12  n     50        1

我正在寻找的是对中心进行排序并相应地产生标签:

What I am looking for is sorting the centres and producing the labels accordingly:

1  2.50
2  21.25
3  43.25

并标记要去的样品:

1 1 1 2 2 2 3 3 3 1 2 3 

,结果应为:

    id amount  label
1   a      1        1
2   b      2        1
3   c      3        1
4   d     20        2
5   e     21        2
6   F     21        2
7   i     40        3
8   j     41        3
9   k     42        3
10  l      4        1
11  m     23        2
12  n     50        3

我认为可以这样做,可以对中心进行排序,并以中心为中心的样本的最小距离索引为每个样本对每个样本进行标记.

I think it is possible to do it by, order the centres and for each sample taking the index of minimum distance of samples with centres as the label of that cluster.

R是否有另一种方法可以自动完成?

Is there another way that R can do it automatically ?

推荐答案

一个想法是通过将您的中心与排序后的中心进行匹配来创建命名向量.然后将向量与mydata$label匹配,并替换为向量的名称,即

One idea is to create a named vector by matching your centers with the sorted centers. Then match the vector with mydata$label and replace with the names of the vector, i.e.

i1 <- setNames(match(sort(result$centers), result$centers), rownames(result$centers))

as.numeric(names(i1)[match(mydata$label, i1)])
# [1] 1 1 1 2 2 2 3 3 3 1 2 3

这篇关于考虑中心顺序在kmean结果中重新标记样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆