获取订购的kmeans集群标签 [英] Get ordered kmeans cluster labels
问题描述
假设我有一个数据集x并执行以下kmeans群集:
Say I have a data set x and do the following kmeans cluster:
fit <- kmeans(x,2)
我的问题是关于fit $ cluster的输出的:我知道它将给我一个整数向量(从1:k开始),指示每个点分配到的簇。相反,有没有一种方法可以将聚类标记为1,2等,以使其中心的数值递减?
My question is in regards to the output of fit$cluster: I know that it will give me a vector of integers (from 1:k) indicating the cluster to which each point is allocated. Instead, is there a way to have the clusters be labeled 1,2, etc... in order of decreasing numerical value of their center?
例如:如果 x = c(1.5,1.4,1.45,.2,.3,.3)
,然后fit $ cluster应该导致(1,1, 1,2,2,2)
,但 not 导致(2,2,2,1,1,1,1)
For example: If x=c(1.5,1.4,1.45,.2,.3,.3)
, then fit$cluster should result in (1,1,1,2,2,2)
but not result in (2,2,2,1,1,1)
类似地,如果 x = c(1.5,.2,1.45,1.4,.3,.3)
然后fit $ cluster应该返回(1,2,1,1,2,2)
,而不是(2,1,2, 2,1,1)
Similarly, if x=c(1.5,.2,1.45,1.4,.3,.3)
then fit$cluster should return (1,2,1,1,2,2)
, instead of (2,1,2,2,1,1)
现在,fit $ cluster似乎随机标记了簇编号。我已经研究过文档,但找不到任何东西。如果您可以提供帮助,请告诉我!
Right now, fit$cluster seems to label the cluster numbers randomly. I've looked into documentation but haven't been able to find anything. Please let me know if you can help!
推荐答案
我遇到了类似的问题。我有一个年龄向量,希望根据逻辑序数集将其分为5个因子组。我做了以下事情:
I had a similar problem. I had a vector of ages that I wanted to separate into 5 factor groups based on a logical ordinal set. I did the following:
我运行了k-means函数:
I ran the k-means function:
k5 <- kmeans(all_data$age, centers = 5, nstart = 25)
I建立了k均值指标和中心的数据框架;然后按中心值排列。
I built a data frame of the k-means indexes and centres; then arranged it by centre value.
kmeans_index <- as.numeric(rownames(k5$centers))
k_means_centres <- as.numeric(k5$centers)
k_means_df <- data_frame(index=kmeans_index, centres=k_means_centres)
k_means_df <- k_means_df %>%
arrange(centres)
现在,中心按升序排列在df中,我创建了5元素因子列表,并将其绑定到数据框:
Now that the centres are in the df in ascending order, I created my 5 element factor list and bound it to the data frame:
factors <- c("very_young", "young", "middle_age", "old", "very_old")
k_means_df <- cbind(k_means_df, factors)
看起来像这样:
> k_means_df
index centres factors
1 2 23.33770 very_young
2 5 39.15239 young
3 1 55.31727 middle_age
4 4 67.49422 old
5 3 79.38353 very_old
我将群集值保存在数据框中并创建了一个虚拟因子列:
I saved my cluster values in a data frame and created a dummy factor column:
cluster_vals <- data_frame(cluster=k5$cluster, factor=NA)
最后,我遍历k_means_df中的factor选项,并在cluster_vals数据框中使用我的factor / character值替换了簇值:
Finally, I iterated through the factor options in k_means_df and replaced the cluster value with my factor/character value within the cluster_vals data frame:
for (i in 1:nrow(k_means_df))
{
index_val <- k_means_df$index[i]
factor_val <- as.character(k_means_df$factors[i])
cluster_vals <- cluster_vals %>%
mutate(factor=replace(factor, cluster==index_val, factor_val))
}
Voila;现在,我有了一个根据其顺序逻辑将其随机应用于聚类向量的因子/字符向量。
Voila; I now have a vector of factors/characters that were applied based on their ordinal logic to the randomly created cluster vector.
# A tibble: 3,163 x 2
cluster factor
<int> <chr>
1 4 old
2 2 very_young
3 2 very_young
4 2 very_young
5 3 very_old
6 3 very_old
7 4 old
8 4 old
9 2 very_young
10 5 young
# ... with 3,153 more rows
希望这会有所帮助。
这篇关于获取订购的kmeans集群标签的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!