以虚拟对象为条件在数据框中逐行添加列总和 [英] Adding columns sums in dataframe row wise conditional on a dummy
问题描述
我想一次添加一行数据帧的列的总和,条件是具有二进制变量的另一列.
I would like to add the sums of the columns of my dataframe one row at a time, conditional on another column that has a binary variable.
因此,对于每一行,我想为相应行中的二进制变量具有相同值的所有行计算其上方整列的总和.
So for each row, I would like to compute the sum of the entire column above it for all rows where the binary variable in the corresponding row has the same value.
这是一个例子:
dummy var1 var2
1 x1 y1
0 x2 y2
0 x3 y3
1 x4 y4
我的目标是获得这个:
dummy var1 var2
1 x1 y1
0 x2 y2
0 x3+x2 y3+y2
1 x4+x1 y4+y1
我之前问过这个问题的简化版本(在数据框行明智),我只是在没有条件的情况下添加了上面的所有值.有没有办法合并这个条件?
I have asked this question previously for a simplified version (Adding columns sums in dataframe row wise) where I just add all of the values above without the condition. Is there a way to incorporate this condition?
推荐答案
data.table::rleid
会给你你想要的分组.如果您将数据框转换为 data.table,则如下所示:
data.table::rleid
will give you the grouping you want. If you convert your data frame to a data.table, it's like this:
(注意:这假设您的文本是准确的,而您的示例不正确:它按dummy
列中的连续 相等值分组.)
(Note: this assumes that your text is accurate and your example incorrect: it groups by consecutive equal values in the dummy
column.)
library(data.table)
setDT(your_data)
your_data[, id := rleid(dummy)][
, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = id
]
如果您需要对一堆列执行此操作,请按照上述设置 id
,定义您的列向量,然后:
If you need to do this to a bunch of columns, set the id
as above, define your vector of columns, and then:
cols = c("var1", "var2", "var3", ...)
your_data[, (cols) := lapply(.SD, cumsum), by = id, .SD = cols]
<小时>
如果您只想按虚拟列分组,忽略连续性,那么您的问题 与此完全相同,您可以这样做:
setDT(your_data)
your_data[, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = dummy]
这篇关于以虚拟对象为条件在数据框中逐行添加列总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!