R在行中的多个条件上使用any() [英] R using any() on multiple conditions within row
问题描述
我有一个如下所示的数据框;
I have a dataframe like below;
library(dplyr);library(anytime)
set.seed(2450)
a <- c('V1','V1','V1','V1','V1','V1','V2','V2','V2','V3','V3','V3','V3','V4','V4','V4')
b <- c('Farm','Farm','Meat','Fish','Farm','Tag','Farm','Farm','Reg','Meat','Farm', 'Farm','Tag','Meat','Lifestyle','Reg')
c <- sample(seq(anydate('2017-01-01'), anydate('2020-01-01'), by="day"), 16)
df <- data.frame(a,b,c) %>% group_by(a) %>% arrange(a, c) %>% mutate(Rank = row_number())
我试图确定满足各种条件的任何行,这些条件有时涉及它们所在的组,所以我通常使用case_when()来实现此目的,即,如果我要标识其中该行中还有其他行的Farm行我会做的是肉"类:
I am trying to identify any lines that meet various criteria which sometimes involves the group that they are within, I generally use case_when () to achieve this i.e. if I want to identify a Farm row where there are any other rows within that group that are 'Meat' i'd do:
df1 <- df %>% mutate(ID_col = case_when(b== 'Farm' & any(b) == 'Meat' ~ T)
但是在一种情况下,我试图确定日期是否比我早的行是否为b =肉",因此我添加了一个等级列,希望进行any()查询,其中存在包含排名高于感兴趣的行,并且还具有b =='meat',
But for one case I am trying to identify if any row with a earlier date than mine is b = "meat", so I added a rank column hoping to do a any() query where theres a row that has a higher rank than the row of interest and also has b == 'meat',
如果我不关心以前的行位置,
In cases where I don't care about row position I've previously:
library(stringr)
#pivot wider, unite, str_extract to get a list of words, then detect in that list using case_when
wide <- df %>%
pivot_wider(id_cols = a, names_from = c values_from = b) %>%
unite(d, contains("-"), sep =",", na.rm=T) %>%
mutate(Extract = str_extract_all(d, "\\[a-z]+")) %>%
full_join(df) %>%
mutate(SY_Del = case_when(b == 'Farm' &
str_detect(Extract, 'Meat') == T ~ T,
TRUE ~ F)
我曾想过要使用一个额外的mutate,尽管我在此子集中专门使用了排名,但我想从列表中提取低于排名的任何内容,即mutate(List_of_Interest = Extract [1] [3:5])
I'd thought about using an additional mutate whereby I extract anything in the list that's below the rank i.e. mutate(List_of_Interest = Extract[1][3:5]), although using the rank specifically in this subset
但是我一直收到错误消息,我认为这是我没有从列表中正确提取消息的征兆.
But I keep getting error messages, which I think is a symptom of me not extracting from the list correctly.
当我认为自己想得太多时,获得一些有关如何实现此目标的想法将是很棒的,而且我敢肯定,有一种更简单的方法.
It would be great to get some ideas of how to achieve this as i think I'm overthinking it, and I'm sure there's an easier way.
实际上,我只需要查看'meat'是否是该组内感兴趣行下方b列的向量in%.
In reality I just need to see if 'meat' is %in% a vector of column b below the row of interest within that group.
预期的输出将标记该组中任何较早(按日期)行中b ==肉"的行.
expected output is to flag any rows where there is a b=='meat' in any earlier (by date) row within that group.
a b c Rank Flag
1 V1 Farm 2017-01-08 1 FALSE
2 V1 Tag 2017-07-28 2 FALSE
3 V1 Fish 2017-11-13 3 FALSE
4 V1 Farm 2017-11-15 4 FALSE
5 V1 Meat 2018-03-27 5 FALSE
6 V1 Farm 2018-09-19 6 TRUE
7 V2 Farm 2017-07-20 1 FALSE
8 V2 Farm 2017-08-01 2 FALSE
9 V2 Reg 2018-09-27 3 FALSE
10 V3 Meat 2018-07-28 1 FALSE
11 V3 Farm 2018-09-28 2 TRUE
12 V3 Farm 2018-11-04 3 TRUE
13 V3 Tag 2018-12-16 4 TRUE
14 V4 Reg 2017-01-19 1 FALSE
15 V4 Lifestyle 2017-05-13 2 FALSE
16 V4 Meat 2017-12-31 3 FALSE
推荐答案
With your data frame ordered by group and date, you can use tidyr::fill()
to keep track of where b == 'Meat'
in each group:
library(tidyr)
data.frame(a, b, c) %>%
group_by(a) %>%
arrange(a, c) %>%
mutate(has_meat = if_else(b == "Meat", TRUE, NA)) %>%
fill(has_meat, .direction = "down") %>%
mutate(has_meat = if_else(b == "Meat", NA, has_meat)) %>%
rename(meat_occurs_earlier = has_meat)
# A tibble: 16 x 4
# Groups: a [4]
a b c meat_occurs_earlier
<fct> <fct> <date> <lgl>
1 V1 Farm 2017-06-17 NA
2 V1 Fish 2018-02-25 NA
3 V1 Farm 2018-04-19 NA
4 V1 Meat 2018-05-16 NA
5 V1 Farm 2019-04-20 TRUE
6 V1 Tag 2019-08-10 TRUE
7 V2 Reg 2017-03-14 NA
8 V2 Farm 2017-12-22 NA
9 V2 Farm 2018-03-31 NA
10 V3 Meat 2017-01-15 NA
11 V3 Farm 2017-03-03 TRUE
12 V3 Farm 2018-01-25 TRUE
13 V3 Tag 2019-11-25 TRUE
14 V4 Lifestyle 2017-03-18 NA
15 V4 Meat 2018-01-16 NA
16 V4 Reg 2018-10-27 TRUE
步骤:
-
创建一个简单的
has_meat
列:如果b =='Meat'
,NA
否则.
由于数据框是按组和日期排序的,因此可以使用向下的 fill()
在每个组中创建所有后续的 has_meat
条目 TRUE
.
Since the data frame is ordered by group and date, you can use a downward fill()
to make all the subsequent has_meat
entries within each group TRUE
as well.
您的问题声明说,我们只应标记感兴趣的行之前 b =='肉'
的行,这意味着该行在 b =-'肉'
不应被标记.因此,我们将 has_meat == TRUE
行更改为 NA
.
Your problem statement says we should only flag rows in which b == 'Meat'
before the row of interest, which means that the rows where b =- 'Meat'
should not be flagged. So we change the has_meat == TRUE
rows to NA
instead.
将 has_meat
重命名为 meat_occurs_earlier
.
Rename has_meat
- which isn't really an accurate column name anymore - to meat_occurs_earlier
.
注意:如果没有示例输出,很难确定这是否能正确回答您的问题.例如,如果您需要填满而不是填满,则可以轻松地调整这些步骤.
Note: Without example output, it's a little hard to be sure this is answering your question exactly. The steps can be easily tweaked if, for example, you need to fill up instead of down.
这篇关于R在行中的多个条件上使用any()的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!