如何在聚类分析(分层)中了解组信息? [英] How to know about group information in cluster analysis (hierarchical)?

查看:83
本文介绍了如何在聚类分析(分层)中了解组信息?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在聚类分析(分层聚类)中有关于的问题。例如,这是 Iris数据集的完全链接的树状图。

I have problem about group in cluster analysis(hierarchical cluster). As example, this is the dendrogram of complete linkage of Iris data set.

使用后

> table(cutree(hc, 3), iris$Species)

这是输出

  setosa versicolor virginica
1     50          0         0
2      0         23        49
3      0         27         1

我在一个统计网站上读到,数据中的对象1始终属于到组/集群1.从上面的输出中,我们知道 setosa 组1 中。然后,我将如何了解其他两个物种。他们如何属于第2组或第3组。它是如何发生的。也许我需要知道一个计算方法?

I have read in one statistical website that, object 1 in the data always belongs to group/cluster 1. From the output above, we know that setosa is in group 1. Then, how I am going to know about the other two species. How do they fall into either group 2 or 3. How did it happen. Perhaps there is a calculation I need to know?

推荐答案

我猜您正在使用它来创建该图像

I'm guessing that you're using this to create that image that doesn't appear to be there at the moment.

> lmbjck <- cutree(hclust(dist(iris[1:4], "euclidean")), 3)
> table(lmbjck, iris$Species)

lmbjck setosa versicolor virginica
     1     50          0         0
     2      0         23        49
     3      0         27         1

Dist是通过对来自三个不同物种的具有相同列和行名称的植物进行测量而创建的。

Dist is created from measurements of plants from three different species with identical column and row names.

> iris.dist <- dist(iris[1:4], "euclidean")
> identical(rownames(iris.dist), colnames(iris.dist))
[1] TRUE

该对象传递给hclust,后者构造一棵树并将其切成三段。对象 iris.order 保存树状图的绘制顺序。

That object is passed on to hclust which constructs a tree and cut it into three pieces. Object iris.order holds the order by which the dendrogram is drawn. Original order is preserved, the tree is drawn based on this ordering.

> iris.hclust <- hclust(iris.dist)
> iris.cutree <- cutree(iris.hclust, 3)
> iris.order <- iris.hclust$order

这里有证据。我把原始的命名,有序的物种命名放在一起,从树皮图,订单号和从零碎功能中可以看到它们。

Here's proof. I've put together original Species designations, ordered species designations as they can be seen in the dendrogram, order number and group from a cutree function.

> data.frame(original = iris$Species, ordered = iris$Species[iris.order],
             order.num = iris.order, cutree = iris.cutree)

      original    ordered order.num cutree
1       setosa  virginica       108      1
2       setosa  virginica       131      1
3       setosa  virginica       103      1
4       setosa  virginica       126      1
5       setosa  virginica       130      1
6       setosa  virginica       119      1
    ...
103  virginica     setosa        31      2
104  virginica     setosa        26      2
105  virginica     setosa        10      2
106  virginica     setosa        35      2
107  virginica     setosa        13      3
108  virginica     setosa         2      2
    ...

让我们看一下输出。如果您看第一行,在 order.num 下,数字为108。这意味着该项(树状图左侧的第一项)来自第108行跳到第108行,您可以看到原始的 Species 确实是 virginica 。 Cutree将此分配给组 1 。让我们看一下第3行。在 order.num 下,您可以看到此项目来自第103行。同样,如果您向下查看第103行中的原始物种,则为(仍然)弗吉尼亚州。我将练习让您检查其他(随机)行,并说服自己保留了开始时构造表的顺序。因此,表格应该正确。

Let's look at the output. If you look at the first line, under order.num there's number 108. This means that for this item (first item on the left side of the dendrogram) comes from row 108. Skim down to line 108, and you can see that the original Species is indeed virginica. Cutree assigns this to group 1. Let's look at line 3. Under order.num you can see that this item comes from row 103. Again, if you go down and check the original species in row 103, it's (still) virginica. I'll make it an exercise for you to check other (random) rows and convince yourself that the order for constructing the table at the beginning is preserved. Ergo, the table should thus be correct.

这篇关于如何在聚类分析(分层)中了解组信息?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆