在group_modify()下的case_when()中,%within%不起作用 [英] %within% in case_when() under group_modify() not working

查看:48
本文介绍了在group_modify()下的case_when()中,%within%不起作用的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下类型的数据:

library(tidyverse)
library(lubridate)


data <- tibble(a = c(1, 1, 2, 3, 3, 3, 3),
               b = c('x', 'y', 'z', 'z', 'z', 'z', 'z'),
               c = c('ps', 'ps', 'qs', 'rs', 'rs', 'rs', 'rs'),
               d = c(100, 200, 300, 400, 500, 600, 700),
               strt = ymd(c('2019-03-20', '2020-01-01', '2018-01-02', '2020-05-01', '2016-01-01', '2020-03-01', '2020-01-01')),
               fnsh = ymd(c('3019-03-20', '3020-01-01', '3018-01-02', '2020-06-01', '2016-05-01', '2020-04-01', '2020-06-10')))

我正在使用 group_modify()基于变量a,b和c(即 data%>%group_by(a,b,c))进行按组操作.对于每个组,我需要查找上一年内具有真实开始日期的行.如果strt不在组中任何其他行的strt和fnsh之间,则它是真实的.我目前的方法是:

I am doing a group-wise operation based on the variables a, b and c (i.e. data %>% group_by(a, b, c)) using group_modify(). For each group, I need to find the rows with genuine starting dates within the last year. A strt is genuine if it doesn't fall between the strt and fnsh of any other row in the group. My current approach is:

test <- data %>%
  group_by(a, b, c) %>%
  group_modify(function(.x, .y) {
               .x %>%
               mutate(startLatestYear = case_when(strt > today(tzone = 'CET') - years(1) &
                                                  strt <= today(tzone = 'CET') &
                                                  !strt %within% (.x %>%
                                                                  mutate(pushInterval = interval(strt + days(1), fnsh)) %>%
                                                                  select(pushInterval)) ~ 1,
                                                  TRUE ~ 0))}) %>%
  ungroup()

这种方法给出:

data <- tibble(a = c(1, 1, 2, 3, 3, 3, 3),
               b = c('x', 'y', 'z', 'z', 'z', 'z', 'z'),
               c = c('ps', 'ps', 'qs', 'rs', 'rs', 'rs', 'rs'),
               d = c(100, 200, 300, 400, 500, 600, 700),
               strt = ymd(c('2019-03-20', '2020-01-01', '2018-01-02', '2020-05-01', '2016-01-01', '2020-03-01', '2020-01-01')),
               fnsh = ymd(c('3019-03-20', '3020-01-01', '3018-01-02', '2020-06-01', '2016-05-01', '2020-04-01', '2020-06-10')),
               startLatestYear = c(0, 1, 0, 1, 0, 1, 1))

所需的是:

data <- tibble(a = c(1, 1, 2, 3, 3, 3, 3),
               b = c('x', 'y', 'z', 'z', 'z', 'z', 'z'),
               c = c('ps', 'ps', 'qs', 'rs', 'rs', 'rs', 'rs'),
               d = c(100, 200, 300, 400, 500, 600, 700),
               strt = ymd(c('2019-03-20', '2020-01-01', '2018-01-02', '2020-05-01', '2016-01-01', '2020-03-01', '2020-01-01')),
               fnsh = ymd(c('3019-03-20', '3020-01-01', '3018-01-02', '2020-06-01', '2016-05-01', '2020-04-01', '2020-06-10')),
               startLatestYear = c(0, 1, 0, 0, 0, 0, 1))

基于 a == 3 b =='z' c =='rs'的组中有一行(最后一行),它应该是startLatestYear中组中唯一的一行.最后一行是该组中唯一在最近一年内到达并且在该组中其他行的时间间隔之外到达的行.

The group based on a == 3, b == 'z' and c == 'rs' has a row (the very last row) that should be the only row in the group with 1 in startLatestYear. The very last row is the only row in the group which has strt within the latest year and strt outside the intervals from the other rows in the group.

当前使用 case_when()的前两个条件似乎有效.使用%within%的第三个条件似乎不起作用.使用%within%的条件如何起作用?或如何实施替代解决方案?

The first two conditions in the present use of case_when() seem to work. The third condition using %within% does not seem to work. How can the condition using %within% come to work? Or how can an alternative solution be implemented?

PS:在对小节进行分组之前,我尝试过制作pushInterval.这样做会为startLatestYear生成相同的列,但是该操作导致 bind_rows _()的问题"剥离了间隔属性.因此,当前的解决方案可以即时生成pushInterval.

PS: I have tried making pushInterval before grouping the tibble. Doing so produces the same column for startLatestYear, but the operation leads to the 'problem' of bind_rows_() stripping away the interval attributes. Hence the current solution that produces pushInterval on the fly.

推荐答案

我认为您不需要使用 group_modify ,这可以在简单的 mutate 组中使用:

I don't think you need to use group_modify, this works in a simple group mutate:

data %>%
  group_by(a, b, c) %>%
  mutate(x = +(purrr::map_lgl(strt, ~ sum(strt <= .x & .x <= fnsh) < 2) &
                 difftime(Sys.time(), strt, "days") < 365)) %>%
  ungroup()
# # A tibble: 7 x 7
#       a b     c         d strt       fnsh           x
#   <dbl> <chr> <chr> <dbl> <date>     <date>     <int>
# 1     1 x     ps      100 2019-03-20 3019-03-20     0
# 2     1 y     ps      200 2020-01-01 3020-01-01     1
# 3     2 z     qs      300 2018-01-02 3018-01-02     0
# 4     3 z     rs      400 2020-05-01 2020-06-01     0
# 5     3 z     rs      500 2016-01-01 2016-05-01     0
# 6     3 z     rs      600 2020-03-01 2020-04-01     0
# 7     3 z     rs      700 2020-01-01 2020-06-10     1

.x 是作为第一个参数传递给 map_lgl 的参数的占位符.在这种情况下,它也是 strt ,但让我们暂时忘记它.

.x is the placeholder for the parameter passed as the first argument to map_lgl. In this case, it's also strt, but let's forget about that for a moment.

在波浪符号函数中, strt 指的是整个向量,而 .x 指的是每个单独的 strt 值(它始终为长度1). strt< = .x 第一次是有效的 strt< = strt [1] . sum 只是计算有多少次是真实的.(应该始终为一个,因为数字将始终在其自己的范围内.)

Inside of the tilde-function, strt refers to the whole vector, and .x is referring to each individual strt value (it is always length 1). strt <= .x the first time is effectively strt <= strt[1]. The sum just counts how many of the occurrences are true. (There should always be one, since a number will always be within its own range.)

这篇关于在group_modify()下的case_when()中,%within%不起作用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆