用聚类索引修改原始数据 [英] amend original data with cluster indices
本文介绍了用聚类索引修改原始数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我正在使用此玩具代码执行一些基本的层次聚类:
I am using this toy code to perform some basic hierarchical clustering:
library(dplyr)
library(ggplot2)
OrginalData <- read.table("https://s3.amazonaws.com/Somewhere/IrisTabSepData/IrisData.txt",
header = TRUE, sep = "\t")
SubsetData <- subset(OrginalData, select = c(
#"SepalLength"
#,"SepalWidth"
"PetalLength"
,"PetalWidth"
))
clusters = hclust(dist(SubsetData), method = 'average')
plot(clusters)
clusterCut <- cutree(clusters, 3)
table(clusterCut, OrginalData$Species)
ggplot(OrginalData, aes(PetalLength, PetalWidth, color = OrginalData$Species)) +
geom_point(alpha = 0.4, size = 3.5) + geom_point(col = clusterCut) +
scale_color_manual(values = c('black', 'red', 'green'))
是否可以在原始数据帧OrginalData中添加一个附加列,其中包含上述代码中创建的簇(在本例中为1-3,则为3)并将其写为csv文件?
Is it possible to add an additional column to the original dataframe OrginalData which contains the clusters created in the above code (3 in this case 1-3) and write it as csv file?
推荐答案
您已经创建的变量 clusterCut
包含集群.您只需将它们添加到data.frame中,然后使用 write.csv
保存数据即可.
The variable clusterCut
that you already created contains the clusters. You can simply add them to the data.frame and use write.csv
to save off the data.
OrginalData$clusterCut = clusterCut
write.csv(OrginalData, "EnhancedIris.csv", row.names=FALSE)
这篇关于用聚类索引修改原始数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文