带有Tanimoto/Jaccard的R群集 [英] R cluster with Tanimoto/Jaccard
问题描述
输入文件为
Mydata <- read.table(con <- textConnection('
gene treatment1 treatment2 treatment3
aaa 1 0 1
bbb 1 1 1
ccc 0 0 0
eee 0 1 0
'), header=TRUE)
close(con)
我的数据是
gene treatment1 treatment2 treatment3
1 aaa 1 0 1
2 bbb 1 1 1
3 ccc 0 0 0
4 eee 0 1 0
为了建立集群,我已经完成
In order to built cluster, I have done
d <- dist(mydata, method = "euclidean")
fit <- hclust(d, method="ward")
plot(fit)
我根据欧几里得"距离得到了聚类.
I got the cluster based on "euclidean" distance.
我在stackoverflow中的上一条消息中 如何使用R来计算Tanimoto/提花分数作为距离矩阵
In my previous message in stackoverflow How to use R to compute Tanimoto/Jacquard Score as distance matrix
我发现我也可以使用R计算tanimoto-jacquard距离矩阵.您能教我如何将tanimoto-jacquard与前面的步骤结合起来,以基于tanimoto-jacquard距离而不是欧几里得距离的距离矩阵获得聚类. ?非常感谢.
I found I can also calculate tanimoto-jacquard distance matrix with R. Could you mind to teach me how to incorporate tanimoto-jacquard with the previous steps to get a cluster based on distance matrix calculated by tanimoto-jacquard distance instead of euclidean? Thanks a lot.
推荐答案
您不了解什么? ?vegdist
告诉我们它返回一个类"dist"
的对象,因此您只需删除dist(....)
行并将其替换为一个调用vegdist(....)
的行即可.例如:
What is it you don't understand? ?vegdist
tells us that it returns an object of class "dist"
so you can just remove the dist(....)
line and replace it with one calling vegdist(....)
. For example:
require(vegan)
d <- vegdist(Mydata[, -1], method = "jaccard")
fit <- hclust(d, method="ward")
plot(fit)
您需要删除第一列(并且应该在您在Q中显示的欧几里得版本中删除),因为这不是用于形成相异矩阵的数据.
You need to drop the first column (and should have done in the Euclidean version you showed in your Q) as this is not data that should be used to form the dissimilarity matrix.
这将产生警告:
Warning message:
In vegdist(Mydata[, -1], method = "jaccard") :
you have empty rows: their dissimilarities may be meaningless in method jaccard
因为第3行不包含任何信息来形成它与其他样本之间的jaccard距离.您可能要考虑在这种情况下,提花卡是否最合适.
because row 3 contains no information to form the jaccard distance between it and the other samples. You might want to consider if the jaccard is most appropriate in such cases.
OP现在希望将基因标签作为行名.最简单的选择是使用read.table()
的row.names
参数将读入数据时告诉R:
The OP now wants the gene labels as row names. The easiest option is to tell R this when reading the data in, using the row.names
argument to read.table()
:
mydata2 <- read.table(con <- textConnection("gene treatment1 treatment2 treatment3
aaa 1 0 1
bbb 1 1 1
ccc 0 0 0
eee 0 1 0
"), header = TRUE, row.names = 1)
close(con)
给予:
> mydata2
treatment1 treatment2 treatment3
aaa 1 0 1
bbb 1 1 1
ccc 0 0 0
eee 0 1 0
或者如果数据已经在R中并且重新加载和重做以前的计算很麻烦,只需将gene
列分配给行名并删除gene
列(使用原始的mydata
):
Or if the data are already in R and it is a pain to reload and redo previous computations, just assign the gene
column to the row names and remove the gene
column (using the original mydata
):
rownames(mydata) <- mydata$gene
mydata <- mydata[, -1]
给予:
> mydata
treatment1 treatment2 treatment3
aaa 1 0 1
bbb 1 1 1
ccc 0 0 0
eee 0 1 0
这篇关于带有Tanimoto/Jaccard的R群集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!