data.table中的行操作 [英] Row operations in data.table

查看：153 发布时间：2017/3/12 11:43:51 r data.table mean

本文介绍了data.table中的行操作的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想使用 data.table 尝试执行简单的总和和平均值，但我得到意想不到的结果。我遵循常见问题手册第2部分中的帮助 for data.table。我发现一种方式工作，但我不知道为什么这个方法在FAQ的第2节不是。
这个方法给我不正确的结果（例如，它给我第一列的值）：

dt [ ，genesum：= lapply（.SD，sum），by = gene]

head（dt）

 基因TCGA_04_1348 TCGA_04_1362 genesum 
 1：A1BG 0.94565 0.70585 0.94565 
 2：A1BG-AS 0.97610 1.15850 0.97610 
 3：A1CF 0.00000 0.02105 0.00000 
 4：A2BP1 0.00300 0.04150 0.00300 
 5：A2LD1 4.57975 5.02820 4.57975 
 6：A2M 60.37320 36.09715 60.37320

这给了我想要的结果。

dt [，genesum：= apply dt [， - 1，with = FALSE]，1，sum）]

head（dt）

 基因TCGA_04_1348 TCGA_04_1362 genesum 
 1：A1BG 0.94565 0.70585 1.65150 
 2：A1BG-AS 0.97610 1.15850 2.13460 
 3：A1CF 0.00000 0.02105 0.02105 
 4：A2BP1 0.00300 0.04150 0.04450 
 5：A2LD1 4.57975 5.02820 9.60795 
 6：A2M 60.37320 36.09715 96.47035

我有更多的列和行，这只是一个子集。这与我设置键的方式有什么关系吗？

tables（）

  NAME NROW MB COLS KEY 
 [1] DT 20785 2基因，TCGA_04_1348_01A，TCGA_04_1362_01A，genesum基因
  
 
 
解决方案
有几件事：
 
    dt [，genesum：= lapply（.SD，sum），by = gene] 和 dt [，genesum：= apply（dt [， -  1，with = FALSE]，1，sum）] 有很大的不同。
 
 
  
    dt [，genesum：= lapply（.SD，sum），by = gene]  / strong> $  .SD  data.table并对它们求和
 
 
    dt [，genesum：= apply（dt [， -  1，with = FALSE]，1，sum）] 正在循环遍历行（即 apply ，function）对 x  
 
中的每一行应用函数 > 
 
 
 
  我想你可以通过调用 rowSums  ：
  dt [，genesum：= rowSums（dt [，-1，with = FALSE]）] 
  
 
 
 
这是你之后吗？
 
Im trying to perform a simple sum and mean by rows using data.table, but I am getting unexpected results. I followed the help in section 2 of the FAQ manual for data.table. I found a way that works, but I am not sure why this method in section 2 of the FAQ is not.
This method gives me the incorrect result (i.e., it is giving me the value of the first column):

  dt[, genesum:=lapply(.SD,sum), by=gene]
  
  head(dt)


      gene      TCGA_04_1348      TCGA_04_1362   genesum  
  1:    A1BG          0.94565          0.70585  0.94565   
  2: A1BG-AS          0.97610          1.15850  0.97610   
  3:    A1CF          0.00000          0.02105  0.00000   
  4:   A2BP1          0.00300          0.04150  0.00300   
  5:   A2LD1          4.57975          5.02820  4.57975  
  6:     A2M         60.37320         36.09715 60.37320 
and this is giving me the desired result

  dt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)]
  
  head(dt)


       gene     TCGA_04_1348       TCGA_04_1362 genesum
  1:    A1BG          0.94565          0.70585  1.65150
  2: A1BG-AS          0.97610          1.15850  2.13460
  3:    A1CF          0.00000          0.02105  0.02105
  4:   A2BP1          0.00300          0.04150  0.04450
  5:   A2LD1          4.57975          5.02820  9.60795
  6:     A2M         60.37320         36.09715 96.47035
I have many more columns and rows, this is just a subset. Does this have anything to do with the way I set the key?

  tables()


 NAME        NROW    MB COLS                                                      KEY                                                                       
 [1,] dt     20,785  2  gene,TCGA_04_1348_01A,TCGA_04_1362_01A,genesum            gene

 解决方案 
A few things:

dt[, genesum:=lapply(.SD,sum), by=gene] and dt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)] are quite different.


dt[, genesum:=lapply(.SD,sum), by=gene] loops over the columns of the .SD data.table and sums them
dt[, genesum:=apply(dt[,-1, with=FALSE],1, sum)] is looping over the rows (ie. apply(x, 1, function) applies function to every row in x

I think you can get what you want by calling rowSums, like so:
dt[, genesum := rowSums(dt[, -1, with=FALSE])]

Is that what you're after?

                        这篇关于data.table中的行操作的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

data.table中的行操作 [英] Row operations in data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

data.table中的行操作 [英] Row operations in data.table

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭