比较data.table列中的组 [英] Comparing Groups in data.table Columns
问题描述
我有一个数据集,我需要两个拆分一个变量( Day
),然后在另一个变量的组之间进行比较( code>),执行每组统计(例如
mean
)并测试。
I have a dataset that I need to both split by one variable (Day
) and then compare between groups of another variable (Group
), performing per-group statistics (e.g. mean
) and also tests.
我设计的示例:
require(data.table)
data = data.table(Day = rep(1:10, each = 10),
Group = rep(1:2, times = 50),
V = rnorm(100))
data[, .(g1_mean = mean(.SD[Group == 1]$V),
g2_mean = mean(.SD[Group == 2]$V),
p.value = t.test(V ~ Group, .SD, alternative = "two.sided")$p.value),
by = list(Day)]
其中产生:
Day g1_mean g2_mean p.value
1: 1 0.883406048 0.67177271 0.6674138
2: 2 0.007544956 -0.55609722 0.3948459
3: 3 0.409248637 0.28717183 0.8753213
4: 4 -0.540075365 0.23181458 0.1785854
5: 5 -0.632543900 -1.09965990 0.6457325
6: 6 -0.083221671 -0.96286343 0.2011136
7: 7 -0.044674252 -0.27666473 0.7079499
8: 8 0.260795244 -0.15159164 0.4663712
9: 9 -0.134164758 0.01136245 0.7992453
10: 10 0.496144329 0.76168408 0.1821123
我希望有一个较少迂回的方式来达到这个结果。 >
I'm hoping that there's a less roundabout manner of arriving at this result.
推荐答案
一个可能的紧凑型替代方案,也可以对每个组应用更多的功能:
A possible compact alternative which can also apply more functions to each group:
DTnew <- dcast(DT[, pval := t.test(V ~ Group, .SD, alternative = "two.sided")$p.value, Day],
Day + pval ~ paste0("g",Group), fun = list(mean,sd), value.var = "V")
其中:
> DTnew
Day pval V_mean_g1 V_mean_g2 V_sd_g1 V_sd_g2
1: 1 0.4763594 -0.11630634 0.178240714 0.7462975 0.4516087
2: 2 0.5715001 -0.29689807 0.082970631 1.3614177 0.2745783
3: 3 0.2295251 -0.48792449 -0.031328749 0.3723247 0.6703694
4: 4 0.5565573 0.33982242 0.080169698 0.5635136 0.7560959
5: 5 0.5498684 -0.07554433 0.308661427 0.9343230 1.0100788
6: 6 0.4814518 0.57694034 0.885968245 0.6457926 0.6773873
7: 7 0.8053066 0.29845913 0.116217727 0.9541060 1.2782210
8: 8 0.3549573 0.14827289 -0.319017581 0.5328734 0.9036501
9: 9 0.7290625 -0.21589411 -0.005785092 0.9639758 0.8859461
10: 10 0.9899833 0.84034529 0.850429982 0.6645952 1.5809149
的分解:
- 首先,使用
DT [,pval:= t.test(V〜Group,.SD,...),将
pval
alternative =two.sided)$ p.value,Day] - 因为
DT
dcast
函数可以直接应用于该函数。 - 在转换公式中,指定需要保留在RHS上的当前表单的变量和需要分布在LHS上的列的变量。
- 使用
fun
参数你可以指定在
value.var
(这里V
)。如果需要多个聚合函数,可以在列表中指定它们(例如list(mean,sd)
)。这可以是任何类型的函数。因此,也可以使用cumstom函数。
- First, a
pval
variable is added to the dataset withDT[, pval := t.test(V ~ Group, .SD, alternative = "two.sided")$p.value, Day]
- Because
DT
is updated in place and by reference by the previous step, thedcast
function can be applied to that directly. - In the casting formula, you specify the variables that need to stay in the current form on the RHS and the variable that needs to be spread over columns on the LHS.
- With the
fun
argument you can specify which aggregation function has to be used on thevalue.var
(hereV
). If multiple aggregation functions are needed, you can specify them in a list (e.g.list(mean,sd)
). This can be any type of function. So, also cumstom made functions can be used.
如果要删除 V _
从列名称,您可以:
If you want to remove the V_
from the column names, you can do:
names(DTnew) <- gsub("V_","",names(DTnew))
注意:我将 data.table
重命名为 DT
,因为在函数后命名数据集通常是不明智的检查?data
)
NOTE: I renamed the data.table
to DT
as it is often not wise to name your dataset after a function (check ?data
)
这篇关于比较data.table列中的组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!