使用dplyr汇总条件 [英] Using dplyr summarise with conditions

查看:51
本文介绍了使用dplyr汇总条件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在尝试应用摘要功能,以便将相关观察结果与大数据集隔离开.此处提供了一个简单的可重现示例:

I am currently trying to apply the summarise function in order to isolate the relevant observations from a large data set. A simple reproducible example is given here:

df <- data.frame(c(1,1,1,2,2,2,3,3,3), as.logical(c(TRUE,FALSE,TRUE,TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)),
                 as.numeric(c(0,5,0,0,0,0,7,0,7)))
colnames(df) <- c("ID", "Status", "Price")

  ID Status Price
1  1   TRUE     0
2  1  FALSE     5
3  1   TRUE     0
4  2   TRUE     0
5  2   TRUE     0
6  2   TRUE     0
7  3  FALSE     7
8  3   TRUE     0
9  3  FALSE     7

我只想按观察值对表进行排序,并且仅当所有三个观察值都为真(变通)时才获得状态为真,然后要获取与该状态相对应的价格(即,对于观察值1为FALSE,对于观察值1为0观察值2为TRUE,观察值3为FALSE).

I would like to sort the table by observation and get the status TRUE only if all three observations are TRUE (figured out) and then want to get the price corresponding to the status (i.e. 5 for observation 1 as FALSE, 0 for observation 2 as TRUE and 7 for observation 3 as FALSE).

来自总结dplyr中的条件我发现我可以-只是像往常一样-在方括号中指定条件.到目前为止,我的代码如下:

From Summarize with conditions in dplyr I have figured out that I can - just as usually - specify the conditions in square brackets. My code so far thus looks like this:

library(dplyr)
result <- df %>%
  group_by(ID) %>%
  summarize(Status = all(Status), Test = ifelse(all(Status) == TRUE,
 first(Price[Status == TRUE]), first(Price[Status == FALSE]))) 

# This is what I get: 
# A tibble: 3 x 3
     ID Status  Test
  <dbl> <lgl>  <dbl>
1    1. FALSE     0.
2    2. TRUE      0.
3    3. FALSE     7.

但是正如您所看到的,对于ID = 1,它给出了不正确的价格.我一直在努力尝试,因此,如果有任何提示我会出错,我将不胜感激.

But as you can see, for ID = 1 it gives an incorrect price. I have been trying this forever, so I would appreciate any hint as to where I have been going wrong.

推荐答案

我们可以将 all(Status)保留为 summary 中的第二个参数(或更改列名称)),并且也可以使用 if/else 完成,因为逻辑似乎根据状态"的 all 是否为TRUE返回单个TRUE/FALSE

We could keep the all(Status) as second argument in summarise (or change the column name) and also, it can be done with if/else as the logic seems to return a single TRUE/FALSE based on whether all of the 'Status' is TRUE or not

df %>%
   group_by(ID) %>% 
   summarise( Test = if(all(Status)) first(Price[Status]) else 
                   first(Price[!Status]), Status = all(Status))
# A tibble: 3 x 3
#     ID  Test Status
#   <dbl> <dbl> <lgl> 
#1     1     5 FALSE 
#2     2     0 TRUE  
#3     3     7 FALSE 

注意:最好不要使用长度不等长的 ifelse

NOTE: It is better not to use ifelse with unequal lengths for its arguments

这篇关于使用dplyr汇总条件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆