使用dplyr mutate获得唯一值的总和 [英] cumsum for unique value using dplyr mutate

查看：68 发布时间：2021/5/2 20:50:19 r dplyr cumsum mutate

本文介绍了使用dplyr mutate获得唯一值的总和的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

虚拟数据集是:

data <- data.frame(
  id = c(1,1,2,2,3,4,5,6),
  value = c(10,10,20,20,10,30,40,50),
  other = c(1,2,3,4,5,6,7,8)
)

数据是在 dplyr 管道中通过 group_by(id)操作输出的.每个 id 最多关联一个值，并且两个不同的 id 可以具有相同的值.我需要通过添加新列来查找ID之间的累计和: cum_col = c(10,10,30,30,40,70,110,160) mutate 中的 cumsum 将在整个值列中找到累积的总和，而不会在每个组中仅选择一个值. summaryise 没什么用，因为我还需要保持其他列不变.

The data was output of group_by(id) operation in dplyr pipe. Each id is associated with at most one value and two different id can have same value. I need to find cumulative sum across ids by adding new column: cum_col = c(10,10,30,30,40,70,110,160) The cumsum in mutate will find cumulative sum across whole column of values and doesn't pick only one value per group. summarise is not useful as there are other columns I need to keep intact.

有没有不使用 summary 然后使用 join -将其向后退的方法?或者，如果以前已经回答过，请指向我链接.

Is there a way out without using summarise and then join-ing it backward? Or please point me to link if it has been answered before.

仅作为参考，实际数据有大约200万行和100列.

Just for info the actual data has ~2 million rows and 100 columns.

推荐答案

另一种替代方法是我们创建一个虚拟列( cols )，该虚拟列每个组仅具有第一个 value ，其余部分将替换为0，然后在整个列中采用 cumsum .

Another alternative is we create a dummy column (cols) which has only first value per group and rest are replaced by 0 and then we take cumsum over the entire column.

library(dplyr)
data %>%
  group_by(id) %>%
  mutate(cols = c(value[1], rep(0, n() -1))) %>%
  ungroup() %>%
  mutate(cum_col = cumsum(cols)) %>%
  select(-cols)


# A tibble: 8 x 4
#     id value other cum_col
#  <dbl> <dbl> <dbl>   <dbl>
#1     1    10     1      10
#2     1    10     2      10
#3     2    20     3      30
#4     2    20     4      30
#5     3    10     5      40
#6     4    30     6      70
#7     5    40     7     110
#8     6    50     8     160

这篇关于使用dplyr mutate获得唯一值的总和的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr mutate获得唯一值的总和 [英] cumsum for unique value using dplyr mutate

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用dplyr mutate获得唯一值的总和 [英] cumsum for unique value using dplyr mutate

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭