用聚类索引修改原始数据 [英] amend original data with cluster indices

查看:64
本文介绍了用聚类索引修改原始数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用此玩具代码执行一些基本的层次聚类:

I am using this toy code to perform some basic hierarchical clustering:

library(dplyr)
library(ggplot2)

OrginalData <- read.table("https://s3.amazonaws.com/Somewhere/IrisTabSepData/IrisData.txt",
                   header = TRUE, sep = "\t")

SubsetData <- subset(OrginalData, select = c(
#"SepalLength"
#,"SepalWidth"
"PetalLength"
,"PetalWidth"
))

clusters = hclust(dist(SubsetData), method = 'average')
plot(clusters)

clusterCut <- cutree(clusters, 3)
table(clusterCut, OrginalData$Species)

ggplot(OrginalData, aes(PetalLength, PetalWidth, color = OrginalData$Species)) + 
  geom_point(alpha = 0.4, size = 3.5) + geom_point(col = clusterCut) + 
  scale_color_manual(values = c('black', 'red', 'green')) 

是否可以在原始数据帧OrginalData中添加一个附加列,其中包含上述代码中创建的簇(在本例中为1-3,则为3)并将其写为csv文件?

Is it possible to add an additional column to the original dataframe OrginalData which contains the clusters created in the above code (3 in this case 1-3) and write it as csv file?

推荐答案

您已经创建的变量 clusterCut 包含集群.您只需将它们添加到data.frame中,然后使用 write.csv 保存数据即可.

The variable clusterCut that you already created contains the clusters. You can simply add them to the data.frame and use write.csv to save off the data.

OrginalData$clusterCut = clusterCut 
write.csv(OrginalData, "EnhancedIris.csv", row.names=FALSE)

这篇关于用聚类索引修改原始数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆