带有Tanimoto/Jaccard的R群集 [英] R cluster with Tanimoto/Jaccard

查看:110
本文介绍了带有Tanimoto/Jaccard的R群集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

输入文件为

Mydata <- read.table(con <- textConnection('
gene treatment1 treatment2 treatment3
aaa 1 0 1
bbb 1 1 1
ccc 0 0 0
eee 0 1 0
'), header=TRUE)
close(con)

我的数据是

  gene treatment1 treatment2 treatment3
1  aaa          1          0          1
2  bbb          1          1          1
3  ccc          0          0          0
4  eee          0          1          0

为了建立集群,我已经完成

In order to built cluster, I have done

d <- dist(mydata, method = "euclidean")
fit <- hclust(d, method="ward") 
plot(fit)

我根据欧几里得"距离得到了聚类.

I got the cluster based on "euclidean" distance.

我在stackoverflow中的上一条消息中 如何使用R来计算Tanimoto/提花分数作为距离矩阵

In my previous message in stackoverflow How to use R to compute Tanimoto/Jacquard Score as distance matrix

我发现我也可以使用R计算tanimoto-jacquard距离矩阵.您能教我如何将tanimoto-jacquard与前面的步骤结合起来,以基于tanimoto-jacquard距离而不是欧几里得距离的距离矩阵获得聚类. ?非常感谢.

I found I can also calculate tanimoto-jacquard distance matrix with R. Could you mind to teach me how to incorporate tanimoto-jacquard with the previous steps to get a cluster based on distance matrix calculated by tanimoto-jacquard distance instead of euclidean? Thanks a lot.

推荐答案

您不了解什么? ?vegdist告诉我们它返回一个类"dist"的对象,因此您只需删除dist(....)行并将其替换为一个调用vegdist(....)的行即可.例如:

What is it you don't understand? ?vegdist tells us that it returns an object of class "dist" so you can just remove the dist(....) line and replace it with one calling vegdist(....). For example:

require(vegan)
d <- vegdist(Mydata[, -1], method = "jaccard")
fit <- hclust(d, method="ward") 
plot(fit)

您需要删除第一列(并且应该在您在Q中显示的欧几里得版本中删除),因为这不是用于形成相异矩阵的数据.

You need to drop the first column (and should have done in the Euclidean version you showed in your Q) as this is not data that should be used to form the dissimilarity matrix.

这将产生警告:

Warning message:
In vegdist(Mydata[, -1], method = "jaccard") :
  you have empty rows: their dissimilarities may be meaningless in method jaccard

因为第3行不包含任何信息来形成它与其他样本之间的jaccard距离.您可能要考虑在这种情况下,提花卡是否最合适.

because row 3 contains no information to form the jaccard distance between it and the other samples. You might want to consider if the jaccard is most appropriate in such cases.

OP现在希望将基因标签作为行名.最简单的选择是使用read.table()row.names参数将读入数据时告诉R:

The OP now wants the gene labels as row names. The easiest option is to tell R this when reading the data in, using the row.names argument to read.table():

mydata2 <- read.table(con <- textConnection("gene treatment1 treatment2 treatment3
aaa 1 0 1
bbb 1 1 1
ccc 0 0 0
eee 0 1 0
"), header = TRUE, row.names = 1)
close(con)

给予:

> mydata2
    treatment1 treatment2 treatment3
aaa          1          0          1
bbb          1          1          1
ccc          0          0          0
eee          0          1          0

或者如果数据已经在R中并且重新加载和重做以前的计算很麻烦,只需将gene列分配给行名并删除gene列(使用原始的mydata):

Or if the data are already in R and it is a pain to reload and redo previous computations, just assign the gene column to the row names and remove the gene column (using the original mydata):

rownames(mydata) <- mydata$gene
mydata <- mydata[, -1]

给予:

> mydata
    treatment1 treatment2 treatment3
aaa          1          0          1
bbb          1          1          1
ccc          0          0          0
eee          0          1          0

这篇关于带有Tanimoto/Jaccard的R群集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆