使用data.table进行聚合 [英] Using data.table to aggregate

查看:101
本文介绍了使用data.table进行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在SO用户的多个建议后,我终于试图将我的代码转换为使用 data.table

After multiple suggestions from SO users, I am finally trying to convert my code over to using data.table.

library(data.table)
DT <- data.table(plate = paste0("plate",rep(1:2,each=5)),
             id = rep(c("CTRL","CTRL","ID1","ID2","ID3"),2),
             val = 1:10)

> DT
    plate   id val
1: plate1 CTRL   1
2: plate1 CTRL   2
3: plate1  ID1   3
4: plate1  ID2   4
5: plate1  ID3   5
6: plate2 CTRL   6
7: plate2 CTRL   7
8: plate2  ID1   8
9: plate2  ID2   9
10: plate2  ID3  10

我想要做的是取平均 DT [,val] / code>,当ID为CTRL时。

What I would like to do is take the average of DT[,val] by plate when the id is "CTRL".

我通常会聚合数据框,然后使用 match 将值映射回新列ctrl。

I would normally aggregate the data frame, then use match to map the values back to a new column, 'ctrl'.

使用 data.table 包可以得到:

DT[id=="CTRL",ctrl:=mean(val),by=plate]

> DT
    plate   id val ctrl
1: plate1 CTRL   1  1.5
2: plate1 CTRL   2  1.5
3: plate1  ID1   3   NA
4: plate1  ID2   4   NA
5: plate1  ID3   5   NA
6: plate2 CTRL   6  6.5
7: plate2 CTRL   7  6.5
8: plate2  ID1   8   NA
9: plate2  ID2   9   NA
10: plate2  ID3  10   NA

我需要的是:

DT <- data.table(plate = paste0("plate",rep(1:2,each=5)),
                 id = rep(c("CTRL","CTRL","ID1","ID2","ID3"),2),
                 val = 1:10,
                 ctrl = rep(c(1.5,6.5),each=5))

> DT
    plate   id val ctrl
1: plate1 CTRL   1  1.5
2: plate1 CTRL   2  1.5
3: plate1  ID1   3  1.5
4: plate1  ID2   4  1.5
5: plate1  ID3   5  1.5
6: plate2 CTRL   6  6.5
7: plate2 CTRL   7  6.5
8: plate2  ID1   8  6.5
9: plate2  ID2   9  6.5
10: plate2  ID3  10  6.5

最后,我想使用更复杂的值,但是我不知道如何选择特定的值,运行一些函数,然后使用数据框将这些值映射回相应的行。

Eventually I would like to use much more complicated selections of the values, but I do not know how to select specific values, run some function, then map those values back to the appropriate row using data frames.

推荐答案

这是你想要做的:

This is what you want to do:

DT[,ctrl:=mean(val[id=="CTRL"]),by=plate]

b $ b

     plate   id val ctrl
 1: plate1 CTRL   1  1.5
 2: plate1 CTRL   2  1.5
 3: plate1  ID1   3  1.5
 4: plate1  ID2   4  1.5
 5: plate1  ID3   5  1.5
 6: plate2 CTRL   6  6.5
 7: plate2 CTRL   7  6.5
 8: plate2  ID1   8  6.5
 9: plate2  ID2   9  6.5
10: plate2  ID3  10  6.5

代码 DT [id ==CTRL,ctrl:= mean(val),by = plate] 未对 id ==CTRL不是真的,因为当你使用 []的第一个参数时,第二个参数中的操作只对子集化的 data.table 进行。

Your original code DT[id=="CTRL",ctrl:=mean(val),by=plate] did not make an assignment for rows where id=="CTRL" was not true because, when you use the first argument of [, you are subsetting; the operations in the second argument are only done for the subsetted data.table.

这篇关于使用data.table进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆