从R中的切割树状图提取标签成员资格/分类(即:树状图的Cutree函数) [英] Extract labels membership / classification from a cut dendrogram in R (i.e.: a cutree function for dendrogram)
问题描述
我正在尝试从R中的树状图提取分类,该分类已在一定高度进行了切割
切割。在 hclust
对象上使用 cutree
很容易做到,但是我不知道该怎么做一个树状图
对象。
此外,我不能仅仅使用原始hclust中的集群,因为(令人沮丧), cutree
中的类编号与具有 cut
的类编号不同。
hc<-hclust(dist(USArrests), ave)
category< --cutree(hc,h = 70)
dend1<-as树状图(hc)
dend2<-cut(dend1,h = 70)
str (dend2 $ lower [[1]])#group 1在这里与
分类[classification == 1] #group 1在这里
是否有一种方法可以使分类相互映射,或者从树图
对象(也许巧妙地使用了 dendrapply
?)的格式类似于 cutree
给出的格式
我建议您使用 cutree
函数从
希望您或其他人对此答案有所帮助。
I'm trying to extract a classification from a dendrogram in R that I've cut
at a certain height. This is easy to do with cutree
on an hclust
object, but I can't figure out how to do it on a dendrogram
object.
Further, I can't just use my clusters from the original hclust, becuase (frustratingly), the numbering of the classes from cutree
is different from the numbering of classes with cut
.
hc <- hclust(dist(USArrests), "ave")
classification<-cutree(hc,h=70)
dend1 <- as.dendrogram(hc)
dend2 <- cut(dend1, h = 70)
str(dend2$lower[[1]]) #group 1 here is not the same as
classification[classification==1] #group 1 here
Is there a way to either get the classifications to map to each other, or alternatively to extract lower branch memberships from the dendrogram
object (perhaps with some clever use of dendrapply
?) in a format more like what cutree
gives?
I would propose for you to use the cutree
function from the dendextend package. It includes a dendrogram method (i.e.: dendextend:::cutree.dendrogram
).
You can learn more about the package from its introductory vignette.
I should add that while your function (classify
) is good, there are several advantage for using cutree
from dendextend:
It also allows you to use a specific
k
(number of clusters), and not justh
(a specific height).It is consistent with the result you would get from cutree on hclust (
classify
will not be).It will often be faster.
Here are examples for using the code:
# Toy data:
hc <- hclust(dist(USArrests), "ave")
dend1 <- as.dendrogram(hc)
# Get the package:
install.packages("dendextend")
library(dendextend)
# Get the package:
cutree(dend1,h=70) # it now works on a dendrogram
# It is like using:
dendextend:::cutree.dendrogram(dend1,h=70)
By the way, on the basis of this function, dendextend allows the user to do more cool things, like color branches/labels based on cutting the dendrogram:
dend1 <- color_branches(dend1, k = 4)
dend1 <- color_labels(dend1, k = 5)
plot(dend1)
Lastly, here is some more code for demonstrating my other points:
# This would also work with k:
cutree(dend1,k=4)
# and would give identical result as cutree on hclust:
identical(cutree(hc,h=70) , cutree(dend1,h=70) )
# TRUE
# But this is not the case for classify:
identical(classify(dend1,70) , cutree(dend1,h=70) )
# FALSE
install.packages("microbenchmark")
require(microbenchmark)
microbenchmark(classify = classify(dend1,70),
cutree = cutree(dend1,h=70) )
# Unit: milliseconds
# expr min lq median uq max neval
# classify 9.70135 9.94604 10.25400 10.87552 80.82032 100
# cutree 37.24264 37.97642 39.23095 43.21233 141.13880 100
# 4 times faster for this tree (it will be more for larger trees)
# Although (if to be exact about it) if I force cutree.dendrogram to not go through hclust (which can happen for "weird" trees), the speed will remain similar:
microbenchmark(classify = classify(dend1,70),
cutree = cutree(dend1,h=70, try_cutree_hclust = FALSE) )
# Unit: milliseconds
# expr min lq median uq max neval
# classify 9.683433 9.819776 9.972077 10.48497 29.73285 100
# cutree 10.275839 10.419181 10.540126 10.66863 16.54034 100
If you are thinking of ways to improve this function, please patch it through here:
https://github.com/talgalili/dendextend/blob/master/R/cutree.dendrogram.R
I hope you, or others, will find this answer helpful.
这篇关于从R中的切割树状图提取标签成员资格/分类(即:树状图的Cutree函数)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!