R有条件和重置的累积总和 [英] R Cumulative Sum with a condition and a reset

查看:69
本文介绍了R有条件和重置的累积总和的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由-1s和1s组成的信号位置指示符向量。另外,我还有一些要基于Signal值求和的体数据。基本数据表如下所示:

  df<-cbind(Signal,Volume)
head(df, 20)

信号量
2016-01-04 NA 37912403
2016-01-05 -1 23258238
2016-01-06 -1 25096183
2016-01-07 -1 45172906
2016-01-08 -1 35402298
2016-01-11 -1 29932385
2016-01-12 -1 28395390
2016 -01-13 -1 33410553
2016-01-14 -1 48658623
2016-01-15 1 46132781
2016-01-19 1 30998256
2016-01-20 -1 59051429
2016-01-21 1 30518939
2016-01-22 1 30495387
2016-01-25 1 32482015
2016-01-26 -1 26877080
2016-01-27 -1 58699359
2016-01-28 1 107475327
2016-01-29 1 62739548
2016-02-01 1 46132726

我要实现的是(不使用for循环)产生一个cum Volume向量,该向量将在每个信号变化的时间。另外,应将音量值乘以信号值,即,当信号为-1时,应将-Volume加到当前的cum Volume。
基于类似的问题,我尝试过

  ave(df $ a,cumsum(c(F, diff(sign(diff(d(df $ a)))!= 0)* df $ Volume),FUN = seq_along)

产生正确的Signal分组,但是由于某些原因未包括Volume。无需重置,解决方案就非常简单(张贴在SO上)

  require(data.table)
DT < -data.table(dt)
DT [,Cum.Sum:= cumsum(Volume),by = Signal]

有人知道用于重置和调节总和的dplyr或data.table类型的解决方案吗?谢谢。

解决方案

这可以通过以下方式实现:

  library(tidyverse)
库(data.table)

z%&%;%
group_by(rleid(Signal))%>%#advance每次信号更改并根据
mutate(cum = Signal * cumsum(Volume))%&%;%#cumsum每组中的值
ungroup()%&%;%#ungroup所以您可以删除分组列
select(-4)#删除分组列

或不包含 data.table 通过使用 rle

  z%>%
mutate(rl = rep(1:length(rle(Signal)$ length),times = rle(Signal)$ length))%>%
group_by (rl)%>%
变异(cum = Signal * cumsum(Volume))%>%
ungroup()%&%;%
select(-4)

#输出
日期信号量暨

< fct> < int> < int> < int>
1 2016-01-04 NA 37912403 NA
2 2016-01-05-1 23258238-23258238
3 2016-01-06-1 25096183-48354421
4 2016- 01-07-1 45172906-93527327
5 2016-01-08-1 35402298 -128929625
6 2016-01-11-1 29932385 -158862010
7 2016-01-12-1 28395390 -187257400
8 2016-01-13-1 33410553 -220667953
9 2016-01-14-1 48658623 -269326576
10 2016-01-15 1 46132781 46132781
11 2016-01-19 1 30998256 77131037
12 2016-01-20-1 59051429-59051429
13 2016-01-21 1 30518939 30518939
14 2016-01-22 1 30495387 61014326
15 2016-01-25 1 32482015 93496341
16 2016-01-26-1 26877080-26877080
17 2016-01-27-1 58699359-85576439
18 2016- 01-28 1 107475327 107475327
19 2016-01-29 1 62739548 170214875
20 2016-02-01 1 46132726 216347601

数据:

  z<-read.table(text = date Signal Volume 
2016-01-04 NA 37912403
2016-01-05 -1 23258238
2016- 01-06 -1 25096183
2016-01-07 -1 45172906
2016-01-08 -1 35402298
2016-01-11 -1 29932385
2016-01- 12 -1 28395390
2016-01-13 -1 33410553
2016-01-14 -1 48658623
2016-01-15 1 46132781
2016-01-19 1 30998256
2016-01-20 -1 59051429
2016-01-21 1 30518939
2016-01-22 1 30495387
2016-01-25 1 32482015
2016 -01-26 -1 26877080
2016-01-27 -1 58699359
2016-01-28 1 107475327
2016-01-29 1 62739548
2016-02-01 1 46132726,标题= T)


I have a signal position indicator vector consisting out of -1s and 1s. In addition, I have volume data which I want to sum based on the value of Signal. The basic data table looks like this:

df <- cbind(Signal, Volume)
head(df, 20)

           Signal    Volume
2016-01-04     NA  37912403
2016-01-05     -1  23258238
2016-01-06     -1  25096183
2016-01-07     -1  45172906
2016-01-08     -1  35402298
2016-01-11     -1  29932385
2016-01-12     -1  28395390
2016-01-13     -1  33410553
2016-01-14     -1  48658623
2016-01-15      1  46132781
2016-01-19      1  30998256
2016-01-20     -1  59051429
2016-01-21      1  30518939
2016-01-22      1  30495387
2016-01-25      1  32482015
2016-01-26     -1  26877080
2016-01-27     -1  58699359
2016-01-28      1 107475327
2016-01-29      1  62739548
2016-02-01      1  46132726

What I would like to achieve is (without using a for loop) is to produce a vector of cum Volume, which would be reset every time the Signal changes. In addition, the values of volume should be multiplied by the value of the Signal, i.e. when Signal is -1 it should add -Volume to the current cum Volume. Based on a similar questions on SO I have tried

ave(df$a, cumsum(c(F, diff(sign(diff(df$a))) != 0)*df$Volume), FUN=seq_along) 

which produces the right grouping of Signal, but the Volume is not included for some reason. Without the reset the solution is fairly straightforward (posted on SO)

require(data.table)
DT <- data.table(dt)
DT[, Cum.Sum := cumsum(Volume), by=Signal]

Does anyone know a dplyr or data.table kind of solution for both resetting and conditioning a cum sum? Thanks.

解决方案

This can be achieved by:

library(tidyverse)
library(data.table)     

z %>%
  group_by(rleid(Signal)) %>% #advance value every time Signal changes and group by that
  mutate(cum = Signal*cumsum(Volume)) %>% #cumsum in each group
  ungroup() %>% #ungroup so you could remove the grouping column
  select(-4) #remove grouping column

or without data.table by using rle:

z %>%
  mutate(rl = rep(1:length(rle(Signal)$length), times = rle(Signal)$length)) %>%
  group_by(rl) %>%
  mutate(cum = Signal*cumsum(Volume)) %>%
  ungroup() %>%
  select(-4)

#output
    date       Signal    Volume        cum

  <fct>       <int>     <int>      <int>
 1 2016-01-04     NA  37912403         NA
 2 2016-01-05    - 1  23258238 - 23258238
 3 2016-01-06    - 1  25096183 - 48354421
 4 2016-01-07    - 1  45172906 - 93527327
 5 2016-01-08    - 1  35402298 -128929625
 6 2016-01-11    - 1  29932385 -158862010
 7 2016-01-12    - 1  28395390 -187257400
 8 2016-01-13    - 1  33410553 -220667953
 9 2016-01-14    - 1  48658623 -269326576
10 2016-01-15      1  46132781   46132781
11 2016-01-19      1  30998256   77131037
12 2016-01-20    - 1  59051429 - 59051429
13 2016-01-21      1  30518939   30518939
14 2016-01-22      1  30495387   61014326
15 2016-01-25      1  32482015   93496341
16 2016-01-26    - 1  26877080 - 26877080
17 2016-01-27    - 1  58699359 - 85576439
18 2016-01-28      1 107475327  107475327
19 2016-01-29      1  62739548  170214875
20 2016-02-01      1  46132726  216347601

data:

z <- read.table(text =      "date     Signal    Volume
           2016-01-04     NA  37912403
           2016-01-05     -1  23258238
           2016-01-06     -1  25096183
           2016-01-07     -1  45172906
           2016-01-08     -1  35402298
           2016-01-11     -1  29932385
           2016-01-12     -1  28395390
           2016-01-13     -1  33410553
           2016-01-14     -1  48658623
           2016-01-15      1  46132781
           2016-01-19      1  30998256
           2016-01-20     -1  59051429
           2016-01-21      1  30518939
           2016-01-22      1  30495387
           2016-01-25      1  32482015
           2016-01-26     -1  26877080
           2016-01-27     -1  58699359
           2016-01-28      1 107475327
           2016-01-29      1  62739548
           2016-02-01      1  46132726", header = T)

这篇关于R有条件和重置的累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆