获取订购的kmeans集群标签 [英] Get ordered kmeans cluster labels

查看:89
本文介绍了获取订购的kmeans集群标签的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设我有一个数据集x并执行以下kmeans群集:

Say I have a data set x and do the following kmeans cluster:

fit <- kmeans(x,2)

我的问题是关于fit $ cluster的输出的:我知道它将给我一个整数向量(从1:k开始),指示每个点分配到的簇。相反,有没有一种方法可以将聚类标记为1,2等,以使其中心的数值递减?

My question is in regards to the output of fit$cluster: I know that it will give me a vector of integers (from 1:k) indicating the cluster to which each point is allocated. Instead, is there a way to have the clusters be labeled 1,2, etc... in order of decreasing numerical value of their center?

例如:如果 x = c(1.5,1.4,1.45,.2,.3,.3),然后fit $ cluster应该导致(1,1, 1,2,2,2),但 not 导致(2,2,2,1,1,1,1)

For example: If x=c(1.5,1.4,1.45,.2,.3,.3) , then fit$cluster should result in (1,1,1,2,2,2) but not result in (2,2,2,1,1,1)

类似地,如果 x = c(1.5,.2,1.45,1.4,.3,.3)然后fit $ cluster应该返回(1,2,1,1,2,2),而不是(2,1,2, 2,1,1)

Similarly, if x=c(1.5,.2,1.45,1.4,.3,.3) then fit$cluster should return (1,2,1,1,2,2), instead of (2,1,2,2,1,1)

现在,fit $ cluster似乎随机标记了簇编号。我已经研究过文档,但找不到任何东西。如果您可以提供帮助,请告诉我!

Right now, fit$cluster seems to label the cluster numbers randomly. I've looked into documentation but haven't been able to find anything. Please let me know if you can help!

推荐答案

我遇到了类似的问题。我有一个年龄向量,希望根据逻辑序数集将其分为5个因子组。我做了以下事情:

I had a similar problem. I had a vector of ages that I wanted to separate into 5 factor groups based on a logical ordinal set. I did the following:

我运行了k-means函数:

I ran the k-means function:

k5 <- kmeans(all_data$age, centers = 5, nstart = 25)

I建立了k均值指标和中心的数据框架;然后按中心值排列。

I built a data frame of the k-means indexes and centres; then arranged it by centre value.

kmeans_index <- as.numeric(rownames(k5$centers))
k_means_centres <- as.numeric(k5$centers)
k_means_df <- data_frame(index=kmeans_index, centres=k_means_centres)
k_means_df <- k_means_df %>% 
    arrange(centres)

现在,中心按升序排列在df中,我创建了5元素因子列表,并将其绑定到数据框:

Now that the centres are in the df in ascending order, I created my 5 element factor list and bound it to the data frame:

factors <- c("very_young", "young", "middle_age", "old", "very_old")
k_means_df <- cbind(k_means_df, factors)

看起来像这样:

> k_means_df
  index  centres    factors
1     2 23.33770 very_young
2     5 39.15239      young
3     1 55.31727 middle_age
4     4 67.49422        old
5     3 79.38353   very_old

我将群集值保存在数据框中并创建了一个虚拟因子列:

I saved my cluster values in a data frame and created a dummy factor column:

cluster_vals <- data_frame(cluster=k5$cluster, factor=NA)

最后,我遍历k_means_df中的factor选项,并在cluster_vals数据框中使用我的factor / character值替换了簇值:

Finally, I iterated through the factor options in k_means_df and replaced the cluster value with my factor/character value within the cluster_vals data frame:

for (i in 1:nrow(k_means_df))
  {
    index_val <- k_means_df$index[i]
    factor_val <- as.character(k_means_df$factors[i])

    cluster_vals <- cluster_vals %>% 
      mutate(factor=replace(factor, cluster==index_val, factor_val))
  }

Voila;现在,我有了一个根据其顺序逻辑将其随机应用于聚类向量的因子/字符向量。

Voila; I now have a vector of factors/characters that were applied based on their ordinal logic to the randomly created cluster vector.

# A tibble: 3,163 x 2
   cluster factor    
     <int> <chr>     
 1       4 old       
 2       2 very_young
 3       2 very_young
 4       2 very_young
 5       3 very_old  
 6       3 very_old  
 7       4 old       
 8       4 old       
 9       2 very_young
10       5 young     
# ... with 3,153 more rows

希望这会有所帮助。

这篇关于获取订购的kmeans集群标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆