R有条件和重置的累积总和 [英] R Cumulative Sum with a condition and a reset
问题描述
df<-cbind(Signal,Volume)
head(df, 20)
信号量
2016-01-04 NA 37912403
2016-01-05 -1 23258238
2016-01-06 -1 25096183
2016-01-07 -1 45172906
2016-01-08 -1 35402298
2016-01-11 -1 29932385
2016-01-12 -1 28395390
2016 -01-13 -1 33410553
2016-01-14 -1 48658623
2016-01-15 1 46132781
2016-01-19 1 30998256
2016-01-20 -1 59051429
2016-01-21 1 30518939
2016-01-22 1 30495387
2016-01-25 1 32482015
2016-01-26 -1 26877080
2016-01-27 -1 58699359
2016-01-28 1 107475327
2016-01-29 1 62739548
2016-02-01 1 46132726
我要实现的是(不使用for循环)产生一个cum Volume向量,该向量将在每个信号变化的时间。另外,应将音量值乘以信号值,即,当信号为-1时,应将-Volume加到当前的cum Volume。
基于类似的问题,我尝试过
ave(df $ a,cumsum(c(F, diff(sign(diff(d(df $ a)))!= 0)* df $ Volume),FUN = seq_along)
产生正确的Signal分组,但是由于某些原因未包括Volume。无需重置,解决方案就非常简单(张贴在SO上)
require(data.table)
DT < -data.table(dt)
DT [,Cum.Sum:= cumsum(Volume),by = Signal]
有人知道用于重置和调节总和的dplyr或data.table类型的解决方案吗?谢谢。
这可以通过以下方式实现:
library(tidyverse)
库(data.table)
z%&%;%
group_by(rleid(Signal))%>%#advance每次信号更改并根据
mutate(cum = Signal * cumsum(Volume))%&%;%#cumsum每组中的值
ungroup()%&%;%#ungroup所以您可以删除分组列
select(-4)#删除分组列
或不包含 data.table
通过使用 rle
:
z%>%
mutate(rl = rep(1:length(rle(Signal)$ length),times = rle(Signal)$ length))%>%
group_by (rl)%>%
变异(cum = Signal * cumsum(Volume))%>%
ungroup()%&%;%
select(-4)
#输出
日期信号量暨
< fct> < int> < int> < int>
1 2016-01-04 NA 37912403 NA
2 2016-01-05-1 23258238-23258238
3 2016-01-06-1 25096183-48354421
4 2016- 01-07-1 45172906-93527327
5 2016-01-08-1 35402298 -128929625
6 2016-01-11-1 29932385 -158862010
7 2016-01-12-1 28395390 -187257400
8 2016-01-13-1 33410553 -220667953
9 2016-01-14-1 48658623 -269326576
10 2016-01-15 1 46132781 46132781
11 2016-01-19 1 30998256 77131037
12 2016-01-20-1 59051429-59051429
13 2016-01-21 1 30518939 30518939
14 2016-01-22 1 30495387 61014326
15 2016-01-25 1 32482015 93496341
16 2016-01-26-1 26877080-26877080
17 2016-01-27-1 58699359-85576439
18 2016- 01-28 1 107475327 107475327
19 2016-01-29 1 62739548 170214875
20 2016-02-01 1 46132726 216347601
数据:
z<-read.table(text = date Signal Volume
2016-01-04 NA 37912403
2016-01-05 -1 23258238
2016- 01-06 -1 25096183
2016-01-07 -1 45172906
2016-01-08 -1 35402298
2016-01-11 -1 29932385
2016-01- 12 -1 28395390
2016-01-13 -1 33410553
2016-01-14 -1 48658623
2016-01-15 1 46132781
2016-01-19 1 30998256
2016-01-20 -1 59051429
2016-01-21 1 30518939
2016-01-22 1 30495387
2016-01-25 1 32482015
2016 -01-26 -1 26877080
2016-01-27 -1 58699359
2016-01-28 1 107475327
2016-01-29 1 62739548
2016-02-01 1 46132726,标题= T)
I have a signal position indicator vector consisting out of -1s and 1s. In addition, I have volume data which I want to sum based on the value of Signal. The basic data table looks like this:
df <- cbind(Signal, Volume)
head(df, 20)
Signal Volume
2016-01-04 NA 37912403
2016-01-05 -1 23258238
2016-01-06 -1 25096183
2016-01-07 -1 45172906
2016-01-08 -1 35402298
2016-01-11 -1 29932385
2016-01-12 -1 28395390
2016-01-13 -1 33410553
2016-01-14 -1 48658623
2016-01-15 1 46132781
2016-01-19 1 30998256
2016-01-20 -1 59051429
2016-01-21 1 30518939
2016-01-22 1 30495387
2016-01-25 1 32482015
2016-01-26 -1 26877080
2016-01-27 -1 58699359
2016-01-28 1 107475327
2016-01-29 1 62739548
2016-02-01 1 46132726
What I would like to achieve is (without using a for loop) is to produce a vector of cum Volume, which would be reset every time the Signal changes. In addition, the values of volume should be multiplied by the value of the Signal, i.e. when Signal is -1 it should add -Volume to the current cum Volume. Based on a similar questions on SO I have tried
ave(df$a, cumsum(c(F, diff(sign(diff(df$a))) != 0)*df$Volume), FUN=seq_along)
which produces the right grouping of Signal, but the Volume is not included for some reason. Without the reset the solution is fairly straightforward (posted on SO)
require(data.table)
DT <- data.table(dt)
DT[, Cum.Sum := cumsum(Volume), by=Signal]
Does anyone know a dplyr or data.table kind of solution for both resetting and conditioning a cum sum? Thanks.
This can be achieved by:
library(tidyverse)
library(data.table)
z %>%
group_by(rleid(Signal)) %>% #advance value every time Signal changes and group by that
mutate(cum = Signal*cumsum(Volume)) %>% #cumsum in each group
ungroup() %>% #ungroup so you could remove the grouping column
select(-4) #remove grouping column
or without data.table
by using rle
:
z %>%
mutate(rl = rep(1:length(rle(Signal)$length), times = rle(Signal)$length)) %>%
group_by(rl) %>%
mutate(cum = Signal*cumsum(Volume)) %>%
ungroup() %>%
select(-4)
#output
date Signal Volume cum
<fct> <int> <int> <int>
1 2016-01-04 NA 37912403 NA
2 2016-01-05 - 1 23258238 - 23258238
3 2016-01-06 - 1 25096183 - 48354421
4 2016-01-07 - 1 45172906 - 93527327
5 2016-01-08 - 1 35402298 -128929625
6 2016-01-11 - 1 29932385 -158862010
7 2016-01-12 - 1 28395390 -187257400
8 2016-01-13 - 1 33410553 -220667953
9 2016-01-14 - 1 48658623 -269326576
10 2016-01-15 1 46132781 46132781
11 2016-01-19 1 30998256 77131037
12 2016-01-20 - 1 59051429 - 59051429
13 2016-01-21 1 30518939 30518939
14 2016-01-22 1 30495387 61014326
15 2016-01-25 1 32482015 93496341
16 2016-01-26 - 1 26877080 - 26877080
17 2016-01-27 - 1 58699359 - 85576439
18 2016-01-28 1 107475327 107475327
19 2016-01-29 1 62739548 170214875
20 2016-02-01 1 46132726 216347601
data:
z <- read.table(text = "date Signal Volume
2016-01-04 NA 37912403
2016-01-05 -1 23258238
2016-01-06 -1 25096183
2016-01-07 -1 45172906
2016-01-08 -1 35402298
2016-01-11 -1 29932385
2016-01-12 -1 28395390
2016-01-13 -1 33410553
2016-01-14 -1 48658623
2016-01-15 1 46132781
2016-01-19 1 30998256
2016-01-20 -1 59051429
2016-01-21 1 30518939
2016-01-22 1 30495387
2016-01-25 1 32482015
2016-01-26 -1 26877080
2016-01-27 -1 58699359
2016-01-28 1 107475327
2016-01-29 1 62739548
2016-02-01 1 46132726", header = T)
这篇关于R有条件和重置的累积总和的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!