测量树/树状图(Traminer)的可靠性 [英] Measuring reliability of tree/dendrogram (Traminer)

查看:213
本文介绍了测量树/树状图(Traminer)的可靠性的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用TraMineR进行了分析,以测量空间使用序列之间的相似性(例如,Rural与城市之间的对比:序列示例-> RRRRRUUURRUUU) 在我的分析中,一个要求是在同一时间比较状态,因此我使用了汉明序列相似性.基于相似度矩阵,我创建了一个树状图,给出了各个序列之间的距离,从而有助于识别连续空间使用中的行为相似性". 现在,我正在寻找一种计算树的鲁棒性或可靠性的方法.有人知道如何计算引导树(沿着分支指示引导值)吗?

I did an analysis using TraMineR in order to measure the similarity among sequences of spatial use (for example Rural(R) vs Urban (U): sequence example -> RRRRRUUURRUUU) A requirement in my analysis is that states are compared at the same moment in time and therefore I used the hamming sequence similarity. Based on the similarity matrix I created a dendrogram, giving the distances among individual sequences, helping to identify "behavioral similarities" in sequential spatial use. Now I am looking for a way to calculate the robustness or reliability of the tree. Does somebody have an idea how I can calculate a bootstrap tree (with bootstrap values indicated along the branches)?

亲切的问候,

约翰内斯

推荐答案

fpc软件包具有一个名为clusterboot的函数,可用于评估群集过程的稳定性.可以按以下方式使用它:

The fpc package has a function called clusterboot that can be used to assess the stability of a clustering procedure. It can be used in the following way:

library(TraMineR)
data(mvad)
##Use some sequence data to illustrate
mvad.alphabet <- c("employment", "FE", "HE", "joblessness", "school", "training")
mvad.labels <- c("employment", "further education", "higher education", "joblessness", "school", "training")
mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
mvad.seq <- seqdef(mvad, 17:86, alphabet = mvad.alphabet, states = mvad.scodes, labels = mvad.labels, xtstep = 6)
## Compute Hamming distances
ham <- seqdist(mvad.seq, method="HAM")
library(fpc)
cf2 <- clusterboot(as.dist(ham),clustermethod=disthclustCBI, k=5, cut="number", method="average")
print(cf2)

clusterboot帮助页面提供了以下准则来解释值.

The clusterboot help page provides the following guidelines to interpret the values.

Jaccard相似度值小于或等于0.5作为溶解簇"的指示有一些理论上的证明,请参见Hennig(2008).通常,有效,稳定的群集应产生的Jaccard相似度平均值为0.75或更高.在0.6到0.75之间,可以将聚类视为数据中的指示模式,但是究竟应该属于这些聚类的点却值得高度怀疑.低于平均Jaccard值0.6,则不应信任群集. 高度稳定"的簇的平均Jaccard相似度应为0.85或更高.

There is some theoretical justification to consider a Jaccard similarity value smaller or equal to 0.5 as an indication of a "dissolved cluster", see Hennig (2008). Generally, a valid, stable cluster should yield a mean Jaccard similarity value of 0.75 or more. Between 0.6 and 0.75, clusters may be considered as indicating patterns in the data, but which points exactly should belong to these clusters is highly doubtful. Below average Jaccard values of 0.6, clusters should not be trusted. "Highly stable" clusters should yield average Jaccard similarities of 0.85 and above.

具有稳定的聚类过程并不意味着聚类良好.您可能也对群集质量度量感兴趣.在这种情况下,您可以使用WeightedCluster软件包,请参见此处: http://mephisto.unige.ch/weightedcluster/

Having a stable clustering procedure do not implies that the clustering is good. You may also be interested in cluster quality measure. In that case, you can use the WeightedCluster package, see here: http://mephisto.unige.ch/weightedcluster/

这篇关于测量树/树状图(Traminer)的可靠性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆