获取订购的kmeans集群标签 [英] Get ordered kmeans cluster labels

查看：89 发布时间：2020/10/3 2:11:10 r cluster-analysis k-means

本文介绍了获取订购的kmeans集群标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个数据集x并执行以下kmeans群集：

Say I have a data set x and do the following kmeans cluster:

fit <- kmeans(x,2)

我的问题是关于fit $ cluster的输出的：我知道它将给我一个整数向量（从1：k开始），指示每个点分配到的簇。相反，有没有一种方法可以将聚类标记为1,2等，以使其中心的数值递减？

My question is in regards to the output of fit$cluster: I know that it will give me a vector of integers (from 1:k) indicating the cluster to which each point is allocated. Instead, is there a way to have the clusters be labeled 1,2, etc... in order of decreasing numerical value of their center?

例如：如果 x = c（1.5,1.4,1.45，.2，.3，.3），然后fit $ cluster应该导致（1,1， 1,2,2,2），但 not 导致（2,2,2,1,1,1,1）

For example: If x=c(1.5,1.4,1.45,.2,.3,.3) , then fit$cluster should result in (1,1,1,2,2,2) but not result in (2,2,2,1,1,1)

类似地，如果 x = c（1.5，.2,1.45,1.4，.3，.3）然后fit $ cluster应该返回（1,2,1,1,2,2），而不是（2,1,2， 2,1,1）

Similarly, if x=c(1.5,.2,1.45,1.4,.3,.3) then fit$cluster should return (1,2,1,1,2,2), instead of (2,1,2,2,1,1)

现在，fit $ cluster似乎随机标记了簇编号。我已经研究过文档，但找不到任何东西。如果您可以提供帮助，请告诉我！

Right now, fit$cluster seems to label the cluster numbers randomly. I've looked into documentation but haven't been able to find anything. Please let me know if you can help!

推荐答案

我遇到了类似的问题。我有一个年龄向量，希望根据逻辑序数集将其分为5个因子组。我做了以下事情：

I had a similar problem. I had a vector of ages that I wanted to separate into 5 factor groups based on a logical ordinal set. I did the following:

我运行了k-means函数：

I ran the k-means function:

k5 <- kmeans(all_data$age, centers = 5, nstart = 25)

I建立了k均值指标和中心的数据框架；然后按中心值排列。

I built a data frame of the k-means indexes and centres; then arranged it by centre value.

kmeans_index <- as.numeric(rownames(k5$centers))
k_means_centres <- as.numeric(k5$centers)
k_means_df <- data_frame(index=kmeans_index, centres=k_means_centres)
k_means_df <- k_means_df %>% 
    arrange(centres)

现在，中心按升序排列在df中，我创建了5元素因子列表，并将其绑定到数据框：

Now that the centres are in the df in ascending order, I created my 5 element factor list and bound it to the data frame:

factors <- c("very_young", "young", "middle_age", "old", "very_old")
k_means_df <- cbind(k_means_df, factors)

看起来像这样：

> k_means_df
  index  centres    factors
1     2 23.33770 very_young
2     5 39.15239      young
3     1 55.31727 middle_age
4     4 67.49422        old
5     3 79.38353   very_old

我将群集值保存在数据框中并创建了一个虚拟因子列：

I saved my cluster values in a data frame and created a dummy factor column:

cluster_vals <- data_frame(cluster=k5$cluster, factor=NA)

最后，我遍历k_means_df中的factor选项，并在cluster_vals数据框中使用我的factor / character值替换了簇值：

Finally, I iterated through the factor options in k_means_df and replaced the cluster value with my factor/character value within the cluster_vals data frame:

for (i in 1:nrow(k_means_df))
  {
    index_val <- k_means_df$index[i]
    factor_val <- as.character(k_means_df$factors[i])

    cluster_vals <- cluster_vals %>% 
      mutate(factor=replace(factor, cluster==index_val, factor_val))
  }

Voila;现在，我有了一个根据其顺序逻辑将其随机应用于聚类向量的因子/字符向量。

Voila; I now have a vector of factors/characters that were applied based on their ordinal logic to the randomly created cluster vector.

# A tibble: 3,163 x 2
   cluster factor    
     <int> <chr>     
 1       4 old       
 2       2 very_young
 3       2 very_young
 4       2 very_young
 5       3 very_old  
 6       3 very_old  
 7       4 old       
 8       4 old       
 9       2 very_young
10       5 young     
# ... with 3,153 more rows

希望这会有所帮助。

这篇关于获取订购的kmeans集群标签的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

获取订购的kmeans集群标签 [英] Get ordered kmeans cluster labels

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

获取订购的kmeans集群标签 [英] Get ordered kmeans cluster labels

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭