data.table中的行操作 [英] Row operations in data.table
问题描述
我想使用 data.table 尝试执行简单的总和和平均值,但我得到意想不到的结果。我遵循常见问题手册第2部分中的帮助 for data.table。我发现一种方式工作,但我不知道为什么这个方法在FAQ的第2节不是。
这个方法给我不正确的结果(例如,它给我第一列的值):
dt [ ,genesum:= lapply(.SD,sum),by = gene]
head(dt)
基因TCGA_04_1348 TCGA_04_1362 genesum
1:A1BG 0.94565 0.70585 0.94565
2:A1BG-AS 0.97610 1.15850 0.97610
3:A1CF 0.00000 0.02105 0.00000
4:A2BP1 0.00300 0.04150 0.00300
5:A2LD1 4.57975 5.02820 4.57975
6:A2M 60.37320 36.09715 60.37320
这给了我想要的结果。
dt [,genesum:= apply dt [, - 1,with = FALSE],1,sum)]
head(dt)
基因TCGA_04_1348 TCGA_04_1362 genesum
1:A1BG 0.94565 0.70585 1.65150
2:A1BG-AS 0.97610 1.15850 2.13460
3:A1CF 0.00000 0.02105 0.02105
4:A2BP1 0.00300 0.04150 0.04450
5:A2LD1 4.57975 5.02820 9.60795
6:A2M 60.37320 36.09715 96.47035
我有更多的列和行,这只是一个子集。这与我设置键的方式有什么关系吗?
tables()
NAME NROW MB COLS KEY
[1] DT 20785 2基因,TCGA_04_1348_01A,TCGA_04_1362_01A,genesum基因
解决方案有几件事:
dt [,genesum:= lapply(.SD,sum),by = gene]
和dt [,genesum:= apply(dt [, - 1,with = FALSE],1,sum)]
有很大的不同。
dt [,genesum:= lapply(.SD,sum),by = gene]
/ strong> $.SD
data.table并对它们求和
- 中的每一行应用
dt [,genesum:= apply(dt [, - 1,with = FALSE],1,sum)]
正在循环遍历行(即apply ,function)
对x
函数
>
我想你可以通过调用
rowSums
:dt [,genesum:= rowSums(dt [,-1,with = FALSE])]
这是你之后吗?
Im trying to perform a simple sum and mean by rows using data.table, but I am getting unexpected results. I followed the help in section 2 of the FAQ manual for data.table. I found a way that works, but I am not sure why this method in section 2 of the FAQ is not. This method gives me the incorrect result (i.e., it is giving me the value of the first column):
dt[, genesum:=lapply(.SD,sum), by=gene]
head(dt)
gene TCGA_04_1348 TCGA_04_1362 genesum 1: A1BG 0.94565 0.70585 0.94565 2: A1BG-AS 0.97610 1.15850 0.97610 3: A1CF 0.00000 0.02105 0.00000 4: A2BP1 0.00300 0.04150 0.00300 5: A2LD1 4.57975 5.02820 4.57975 6: A2M 60.37320 36.09715 60.37320
and this is giving me the desired result
dt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)]
head(dt)
gene TCGA_04_1348 TCGA_04_1362 genesum 1: A1BG 0.94565 0.70585 1.65150 2: A1BG-AS 0.97610 1.15850 2.13460 3: A1CF 0.00000 0.02105 0.02105 4: A2BP1 0.00300 0.04150 0.04450 5: A2LD1 4.57975 5.02820 9.60795 6: A2M 60.37320 36.09715 96.47035
I have many more columns and rows, this is just a subset. Does this have anything to do with the way I set the key?
tables()
NAME NROW MB COLS KEY [1,] dt 20,785 2 gene,TCGA_04_1348_01A,TCGA_04_1362_01A,genesum gene
解决方案A few things:
dt[, genesum:=lapply(.SD,sum), by=gene]
anddt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)]
are quite different.
dt[, genesum:=lapply(.SD,sum), by=gene]
loops over the columns of the.SD
data.table and sums them
dt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)]
is looping over the rows (ie.apply(x, 1, function)
appliesfunction
to every row inx
I think you can get what you want by calling
rowSums
, like so:dt[, genesum := rowSums(dt[, -1, with=FALSE])]
Is that what you're after?
这篇关于data.table中的行操作的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!