如何删除面板数据中相反的值(例如退款)? [英] How would I be able to remove opposite values (e.g. refunds) in panel data?
问题描述
给出以下数据:
id|datee | price | quant | discrete_x
1 2018-12-19 4 -3000 A
1 2018-12-04 4 3000 A
1 2018-12-21 4 3000 B
1 2018-12-20 3 2000 A
...
所需输出:
id|datee | price | quant | discrete_x
1 2018-12-21 4 3000 B
1 2018-12-20 3 2000 A
...
在这种情况下,很明显,已退还3000的数量(数量
),然后再次购买。我想删除两行以相互抵消。假设 id
和 quant
匹配,而退款发生在购买匹配项之后 quant
的数量,如何能够为每个 id
值删除所有这些变量?
In this case, it is quite clear that the quantity (quant
) of 3000 is refunded, then bought again. I would like to remove the two rows for cancelling each other out. Given that id
and quant
match while the refund happens once and after a purchase of matching number of quant
, how would I be able to remove all of them for each id
value?
到目前为止,我一直在考虑(但坚持)两个想法:
1)在安排好的 group_by
值中,检查列中的后续日期,以查看 quant
是否与相反的值
相匹配2)for循环内的for循环
I've been considering (but stuck on) two ideas so far:
1) Within an arranged group_by
values, check the later dates within a column to see if quant
would match as opposite values
2) For loop within a for loop
我觉得for循环内的for循环更好,但不确定如何匹配 discrete_x
。
I feel that for loop within a for loop is better, but not sure how I would match on discrete_x
.
您的做法如何?您会在for循环中使用for循环吗?
How would your approach be? Would you use for loop within a for loop?
推荐答案
这是一个非常丑陋的实现,但是我认为这可能有用。在按 id
分组并按 date
排列后,我们可以创建一个过滤列。
This is a very ugly implementation, but I think this might work. We can create a filtering column after grouping by id
and arranging by date
.
library(dplyr)
library(tidyr)
df %>%
group_by(id) %>%
arrange(datee) %>%
mutate(f = lead(quant) + quant == 0,
f = ifelse(f, f, lag(f)),
f = tidyr::replace_na(f, FALSE)) %>%
filter(!f) %>%
select(-f)
#> # A tibble: 2 x 6
#> # Groups: id [1]
#> id datee price quant discrete_x
#> <dbl> <date> <dbl> <dbl> <chr>
#> 1 1 2018-12-20 3 2000 A
#> 2 1 2018-12-21 4 3000 B
这篇关于如何删除面板数据中相反的值(例如退款)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!