在虚拟机上按条件在数据帧行中添加列总和 [英] Adding columns sums in dataframe row wise conditional on a dummy

查看:42
本文介绍了在虚拟机上按条件在数据帧行中添加列总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想一次添加一行数据框各列的总和,但要以具有二进制变量的另一列为条件.

I would like to add the sums of the columns of my dataframe one row at a time, conditional on another column that has a binary variable.

因此,对于每一行,我想为所有行(对应行中的二进制变量具有相同的值)计算其上方整列的总和.

So for each row, I would like to compute the sum of the entire column above it for all rows where the binary variable in the corresponding row has the same value.

这里是一个例子:

dummy var1  var2
1     x1     y1
0     x2     y2
0     x3     y3
1     x4     y4

我的目标是获得这个:

dummy var1     var2
1     x1       y1
0     x2       y2
0     x3+x2    y3+y2
1     x4+x1    y4+y1

我之前曾问过这个问题的简化版本(在datawise rowwise ),我只是在没有条件的情况下添加了以上所有值.有没有办法合并这种情况?

I have asked this question previously for a simplified version (Adding columns sums in dataframe row wise) where I just add all of the values above without the condition. Is there a way to incorporate this condition?

推荐答案

data.table::rleid将为您提供所需的分组.如果将数据框转换为data.table,则如下所示:

data.table::rleid will give you the grouping you want. If you convert your data frame to a data.table, it's like this:

(注意:这是假设您的文本正确且示例不正确:在dummy列中按连续个相等的值分组.)

(Note: this assumes that your text is accurate and your example incorrect: it groups by consecutive equal values in the dummy column.)

library(data.table)
setDT(your_data)
your_data[, id := rleid(dummy)][
  , c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = id
]

如果需要对一堆列执行此操作,请按上述设置id,定义列向量,然后:

If you need to do this to a bunch of columns, set the id as above, define your vector of columns, and then:

cols = c("var1", "var2", "var3", ...)
your_data[, (cols) := lapply(.SD, cumsum), by = id, .SD = cols]


如果您只想按虚拟列分组,而忽略连续性,那么您的问题


If you just want to group by the dummy column, ignoring consecutiveness, then your question is an exact duplicate of this one, and you can do it like this:

setDT(your_data)
your_data[, c("var1", "var2") := .(cumsum(var1), cumsum(var2)), by = dummy]

这篇关于在虚拟机上按条件在数据帧行中添加列总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆