R在行中的多个条件上使用any() [英] R using any() on multiple conditions within row

查看:62
本文介绍了R在行中的多个条件上使用any()的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的数据框;

I have a dataframe like below;


library(dplyr);library(anytime)
set.seed(2450)
a <- c('V1','V1','V1','V1','V1','V1','V2','V2','V2','V3','V3','V3','V3','V4','V4','V4')
b <- c('Farm','Farm','Meat','Fish','Farm','Tag','Farm','Farm','Reg','Meat','Farm', 'Farm','Tag','Meat','Lifestyle','Reg')

c <-  sample(seq(anydate('2017-01-01'), anydate('2020-01-01'), by="day"), 16)
df <- data.frame(a,b,c) %>% group_by(a) %>% arrange(a, c) %>% mutate(Rank = row_number()) 

我试图确定满足各种条件的任何行,这些条件有时涉及它们所在的组,所以我通常使用case_when()来实现此目的,即,如果我要标识其中该行中还有其他行的Farm行我会做的是肉"类:

I am trying to identify any lines that meet various criteria which sometimes involves the group that they are within, I generally use case_when () to achieve this i.e. if I want to identify a Farm row where there are any other rows within that group that are 'Meat' i'd do:

df1 <- df %>% mutate(ID_col = case_when(b== 'Farm' & any(b) == 'Meat' ~ T)

但是在一种情况下,我试图确定日期是否比我早的行是否为b =肉",因此我添加了一个等级列,希望进行any()查询,其中存在包含排名高于感兴趣的行,并且还具有b =='meat',

But for one case I am trying to identify if any row with a earlier date than mine is b = "meat", so I added a rank column hoping to do a any() query where theres a row that has a higher rank than the row of interest and also has b == 'meat',

如果我不关心以前的行位置,

In cases where I don't care about row position I've previously:

library(stringr)
#pivot wider, unite, str_extract to get a list of words, then detect in that list using case_when 
wide <- df %>% 
        pivot_wider(id_cols = a, names_from = c values_from  = b) %>%
        unite(d, contains("-"), sep =",", na.rm=T) %>% 
        mutate(Extract = str_extract_all(d, "\\[a-z]+")) %>% 
        full_join(df) %>% 
        mutate(SY_Del = case_when(b == 'Farm' &
                                  str_detect(Extract, 'Meat') == T ~ T,
                              TRUE ~ F)

我曾想过要使用一个额外的mutate,尽管我在此子集中专门使用了排名,但我想从列表中提取低于排名的任何内容,即mutate(List_of_Interest = Extract [1] [3:5])

I'd thought about using an additional mutate whereby I extract anything in the list that's below the rank i.e. mutate(List_of_Interest = Extract[1][3:5]), although using the rank specifically in this subset

但是我一直收到错误消息,我认为这是我没有从列表中正确提取消息的征兆.

But I keep getting error messages, which I think is a symptom of me not extracting from the list correctly.

当我认为自己想得太多时,获得一些有关如何实现此目标的想法将是很棒的,而且我敢肯定,有一种更简单的方法.

It would be great to get some ideas of how to achieve this as i think I'm overthinking it, and I'm sure there's an easier way.

实际上,我只需要查看'meat'是否是该组内感兴趣行下方b列的向量in%.

In reality I just need to see if 'meat' is %in% a vector of column b below the row of interest within that group.

预期的输出将标记该组中任何较早(按日期)行中b ==肉"的行.

expected output is to flag any rows where there is a b=='meat' in any earlier (by date) row within that group.

     a         b          c Rank  Flag
1  V1      Farm 2017-01-08    1 FALSE
2  V1       Tag 2017-07-28    2 FALSE
3  V1      Fish 2017-11-13    3 FALSE
4  V1      Farm 2017-11-15    4 FALSE
5  V1      Meat 2018-03-27    5 FALSE
6  V1      Farm 2018-09-19    6 TRUE
7  V2      Farm 2017-07-20    1 FALSE
8  V2      Farm 2017-08-01    2 FALSE
9  V2       Reg 2018-09-27    3 FALSE
10 V3      Meat 2018-07-28    1 FALSE
11 V3      Farm 2018-09-28    2  TRUE
12 V3      Farm 2018-11-04    3  TRUE
13 V3       Tag 2018-12-16    4  TRUE
14 V4       Reg 2017-01-19    1 FALSE
15 V4 Lifestyle 2017-05-13    2 FALSE
16 V4      Meat 2017-12-31    3 FALSE

推荐答案

使用按组和日期排序的数据框,您可以使用

With your data frame ordered by group and date, you can use tidyr::fill() to keep track of where b == 'Meat' in each group:

library(tidyr)

data.frame(a, b, c) %>% 
  group_by(a) %>% 
  arrange(a, c) %>% 
  mutate(has_meat = if_else(b == "Meat", TRUE, NA)) %>%
  fill(has_meat, .direction = "down") %>%
  mutate(has_meat = if_else(b == "Meat", NA, has_meat)) %>%
  rename(meat_occurs_earlier = has_meat)

# A tibble: 16 x 4
# Groups:   a [4]
   a     b         c          meat_occurs_earlier
   <fct> <fct>     <date>     <lgl>              
 1 V1    Farm      2017-06-17 NA                 
 2 V1    Fish      2018-02-25 NA                 
 3 V1    Farm      2018-04-19 NA                 
 4 V1    Meat      2018-05-16 NA                 
 5 V1    Farm      2019-04-20 TRUE               
 6 V1    Tag       2019-08-10 TRUE               
 7 V2    Reg       2017-03-14 NA                 
 8 V2    Farm      2017-12-22 NA                 
 9 V2    Farm      2018-03-31 NA                 
10 V3    Meat      2017-01-15 NA                 
11 V3    Farm      2017-03-03 TRUE               
12 V3    Farm      2018-01-25 TRUE               
13 V3    Tag       2019-11-25 TRUE               
14 V4    Lifestyle 2017-03-18 NA                 
15 V4    Meat      2018-01-16 NA                 
16 V4    Reg       2018-10-27 TRUE 

步骤:

  1. 创建一个简单的 has_meat 列:如果 b =='Meat' NA 否则.

由于数据框是按组和日期排序的,因此可以使用向下的 fill()在每个组中创建所有后续的 has_meat 条目 TRUE .

Since the data frame is ordered by group and date, you can use a downward fill() to make all the subsequent has_meat entries within each group TRUE as well.

您的问题声明说,我们只应标记感兴趣的行之前 b =='肉' 的行,这意味着该行在 b =-'肉'不应被标记.因此,我们将 has_meat == TRUE 行更改为 NA .

Your problem statement says we should only flag rows in which b == 'Meat' before the row of interest, which means that the rows where b =- 'Meat' should not be flagged. So we change the has_meat == TRUE rows to NA instead.

has_meat 重命名为 meat_occurs_earlier .

Rename has_meat - which isn't really an accurate column name anymore - to meat_occurs_earlier.

注意:如果没有示例输出,很难确定这是否能正确回答您的问题.例如,如果您需要填满而不是填满,则可以轻松地调整这些步骤.

Note: Without example output, it's a little hard to be sure this is answering your question exactly. The steps can be easily tweaked if, for example, you need to fill up instead of down.

这篇关于R在行中的多个条件上使用any()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆