dplyr分组依据,将值从上一个分组结转到下一个 [英] dplyr group by, carry forward value from previous group to next

查看:46
本文介绍了dplyr分组依据,将值从上一个分组结转到下一个的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好吧,这是我试图用dplyr实现的整体视图:

Ok this is the over all view of what i'm trying to achieve with dplyr:

使用dplyr,我正在进行计算以形成新的列.

Using dplyr I am making calculations to form new columns.

initial.capital - 
x.long.shares - 
x.end.value - 
x.net.profit - 
new.initial.capital

执行此操作的代码:

# Calculate Share Prices For Each ETF 
# Initialize Start Capital Column 
library(dplyr)
library(data.table)
df$inital.capital <- 10000
output <- df %>%
  dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
  group_by(RunID) %>%
  dplyr::mutate(x.long.shares = ifelse(x.long == 0,0, 
                                       ifelse(row_number() == n(),
                                      first(inital.capital) / first(close.x),0))) %>%
  dplyr::mutate(x.end.value = ifelse(x.long == 0,0, 
                                       ifelse(row_number() == n(),
                                              last(x.long.shares) * last(close.x),0))) %>%
  dplyr::mutate(x.net.profit = ifelse(x.long == 0,0, 
                                     ifelse(row_number() == n(),
                                            last(initial.capital) - last(x.end.value),0))) %>%
  dplyr::mutate(new.initial.capital = ifelse(x.long == 0,0, 
                                      ifelse(row_number() == n(),
                                             last(x.net.profit) + last(inital.capital),0))) %>%

  ungroup() %>%
  select(-RunID)

我按x.long列分组.以及何时分组.使用组中的第一个/最后一个位置从不同的列进行计算我的基本问题是:

I am grouping per x.long column. And when grouped. Making calculations from different columns using the first/last positions within the group My basic question is:

在照片中,在new.initial.capital列下看到红色突出显示.我如何保存"该值(10185.33)...并将其插入NEXT组,将其保存在initial.capital列下,再次用红色突出显示(它将替换10,000或将其存储在组的第一行)?

In the photo, see red highlight under new.initial.capital column. How can I 'save' this value (10185.33)... and insert it on the NEXT group, saving it under initial.capital column, again highlighted in red (it would replace 10,000 Or storing it on the first line of the group)?

我真正需要做的是将new.initial.capital列中的最终值保存到变量中.然后,可以在下一组中使用此变量(请参见下面的代码).此处的值将用作下一组计算的一部分...,然后在更新new.initial.capital结尾时,此值将进入变量,然后携带到下一组的开头(请参见下面的代码)..然后所有值将再次更新....变量将放置在此处:

What I really need to do is save the final value in the new.initial.capital column into a variable. Then this variable can be used in the next group (see code below) The value here will be used as part of the next groups calculations... and then when the end new.initial.capital is updated, then this values goes into the variable, then it carrys to the start of the next group (see code below).. then all the values will update again.... The variable would be placed here:

output <- df %>%
  dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
  group_by(RunID) %>%
  dplyr::mutate(x.long.shares = ifelse(x.long == 0,0, 
                                       ifelse(row_number() == n(),
                                      first(end_of_new.initial.capital_variable_from_previous_group) / first(close.x),0))) %>%

我本质上是想在dplyr组之间保留值.这可能吗?还是可以每次将其存储在变量中?

I essentially want to carry over values between dplyr groups. Is this possible? Or can I store it in a variable each time?

这里有一些照片中的示例数据:保存到.txt

Heres some example data that is in the photo: Save to .txt

df <- read.table("your_dir\df.txt",header=TRUE, sep="", stringsAsFactors=FALSE)

    close.x x.long  y.short x.short y.long  inital.capital  x.long.shares   x.end.value x.net.profit    new.initial.capital
37.96   NA  NA  NA  NA  10000   NA  NA  NA  NA
36.52   0   0   0   0   10000   0   0   0   0
38.32   0   0   0   0   10000   0   0   0   0
38.5504 0   0   0   0   10000   0   0   0   0
38.17   0   0   0   0   10000   0   0   0   0
38.85   1   1   0   0   10000   0   0   0   0
38.53   1   1   0   0   10000   0   0   0   0
39.13   1   1   0   0   10000   0   0   0   0
38.13   1   1   0   0   10000   257.4002574 9814.671815 185.3281853 10185.32819
37.01   0   0   1   1   10000   0   0   0   0
36.14   0   0   1   1   10000   0   0   0   0
35.27   0   0   1   1   10000   0   0   0   0
35.13   0   0   1   1   10000   0   0   0   0
32.2    0   0   1   1   10000   0   0   0   0
33.03   1   1   0   0   10000   0   0   0   0
34.94   1   1   0   0   10000   0   0   0   0
34.57   1   1   0   0   10000   0   0   0   0
33.6    1   1   0   0   10000   0   0   0   0
34.34   1   1   0   0   10000   302.7550711 10396.60914 -396.6091432    9603.390857
35.86   0   0   1   1   10000   0   0   0   0

我尝试过的东西

我试图做一个变量:

What I have Tried

I tried to make a variable:

inital.capital <- 10000

并将其插入代码...

And insert this in the code...

output <- df %>%
  dplyr::mutate(RunID = data.table::rleid(x.long)) %>%
  group_by(RunID) %>%
  dplyr::mutate(x.long.shares = ifelse(x.long == 0,0, 
                                       ifelse(row_number() == n(),
                                              initial.capital / first(close.x),0))) %>%   # place initial.capital variable.. initialized with 10000
  dplyr::mutate(x.end.value = ifelse(x.long == 0,0, 
                                       ifelse(row_number() == n(),
                                              last(x.long.shares) * last(close.x),0))) %>%
  dplyr::mutate(x.net.profit = ifelse(x.long == 0,0, 
                                     ifelse(row_number() == n(),
                                            last(initial.capital) - last(x.end.value),0))) %>%
  dplyr::mutate(new.initial.capital = ifelse(x.long == 0,0, 
                                      ifelse(row_number() == n(),
                                             last(x.net.profit) + last(inital.capital),0))) %>%
  dplyr::mutate(new.initial.capitals = ifelse(x.long == 0,0, 
                                             ifelse(row_number() == n(),
                                                    inital.capital < - last(new.initial.capital),0))) %>%  # update variable with the final balance of new.inital.capital column

  ungroup() %>%
  select(-RunID)

如果我每次都可以更新initial.capital变量.然后,这将成为组之间的链接".但是,该想法目前在dplyr设置中不起作用.

If I can update the initial.capital variable each time. This then would serve as the 'link' between groups. However, this idea is not currently working in the dplyr setup.

任何帮助表示赞赏.

推荐答案

您在问题中使用了data.table并标记了问题data.table,因此这是data.table的答案.当 j 求值时,它是在静态作用域中,本地变量保留前一组的值.

You're using data.table in the question and have tagged the question data.table, so here is a data.table answer. When j evaluates, it's in a static scope where local variables retain their values from the previous group.

使用虚拟数据进行演示:

Using dummy data to demonstrate :

require(data.table)
set.seed(1)
DT = data.table( long = rep(c(0,1,0,1),each=3),
                 val = sample(5,12,replace=TRUE))
DT
    long val
 1:    0   2
 2:    0   2
 3:    0   3
 4:    1   5
 5:    1   2
 6:    1   5
 7:    0   5
 8:    0   4
 9:    0   4
10:    1   1
11:    1   2
12:    1   1

DT[, v1:=sum(val), by=rleid(long)][]
    long val v1
 1:    0   2  7
 2:    0   2  7
 3:    0   3  7
 4:    1   5 12
 5:    1   2 12
 6:    1   5 12
 7:    0   5 13
 8:    0   4 13
 9:    0   4 13
10:    1   1  4
11:    1   2  4
12:    1   1  4

到目前为止,足够简单了.

So far, simple enough.

prev = NA  # initialize previous group value
DT[, v2:={ans<-last(val)/prev; prev<-sum(val); ans}, by=rleid(long)][]
    long val v1         v2
 1:    0   2  7         NA
 2:    0   2  7         NA
 3:    0   3  7         NA
 4:    1   5 12 0.71428571
 5:    1   2 12 0.71428571
 6:    1   5 12 0.71428571
 7:    0   5 13 0.33333333
 8:    0   4 13 0.33333333
 9:    0   4 13 0.33333333
10:    1   1  4 0.07692308
11:    1   2  4 0.07692308
12:    1   1  4 0.07692308

> 3/NA
[1] NA
> 5/7
[1] 0.7142857
> 4/12
[1] 0.3333333
> 1/13
[1] 0.07692308
> prev
[1] NA

请注意, prev 值未更新,因为 prev ans j 内的局部变量每个组运行时正在更新的范围.只是为了说明,可以使用R的<<-运算符从每个组中更新全局 prev :

Notice that the prev value did not update because prev and ans are local variables inside j's scope that were being updated as each group ran. Just to illustrate, the global prev can be updated from within each group using R's <<- operator :

DT[, v2:={ans<-last(val)/prev; prev<<-sum(val); ans}, by=rleid(long)]
prev
[1] 4

但是在data.table中不需要使用<<-,因为局部变量是静态的(保留前一组的值).除非查询完成后需要使用最终组的值.

But there's no need to use <<- in data.table as local variables are static (retain their values from previous group). Unless you need to use the final group's value after the query has finished.

这篇关于dplyr分组依据,将值从上一个分组结转到下一个的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆