匹配和计数R中的数据矩阵 [英] Match and Count the Data Matrix in R
本文介绍了匹配和计数R中的数据矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
数据集如下:
Gene SampleName
gene1 sample1
gene1 sample2
gene1 sample3
gene2 sample2
gene2 sample3
gene2 sample4
gene3 sample1
gene3 sample5
我的目标是制作一个像这样的数据矩阵:
My goal is to make a data matrix like this:
gene1 gene2 gene3
gene1 - 2 1
gene2 - - 0
gene3 - - -
gene1
与gene2
之所以是2
,是因为它们共享相同的样本sample2
和sample3
. gene1
vs gene3
为1,因为它们仅共享一个相同的样本-sample1
.
gene1
vs gene2
is 2
because they share the same samples sample2
and sample3
. gene1
vs gene3
is 1 because they only share one same sample - sample1
.
我的问题是我如何在R或Perl中实现这个目标?实际数据集要大得多.非常感谢您的帮助.
My question is how can I achieve this goal in R or Perl? The actual data set is much larger. I highly appreciate your help.
这是R的dput(df)
输出:
df <- structure(list(Gene = c("gene1", "gene1", "gene1", "gene2", "gene2",
"gene2", "gene3", "gene3"), SampleName = c("sample1", "sample2",
"sample3", "sample2", "sample3", "sample4", "sample1", "sample5"
)), .Names = c("Gene", "SampleName"), row.names = c(NA, -8L), class = "data.frame")
推荐答案
您可以将crossprod
(或tcrossprod
)功能与table
一起查看:
You can look at the crossprod
(or tcrossprod
) function along with table
:
out <- tcrossprod(table(df))
out
# Gene
# Gene gene1 gene2 gene3
# gene1 3 2 1
# gene2 2 3 0
# gene3 1 0 2
拖放对角线和下三角形以获得您显示的确切输出.
Drop the diagonal and the lower-triangle to get the exact output you show.
diag(out) <- NA
out[lower.tri(out)] <- NA
print.table(out) ## print.table deals with NAs differently
# Gene
# Gene gene1 gene2 gene3
# gene1 2 1
# gene2 0
# gene3
这篇关于匹配和计数R中的数据矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文