如果达到变量限制,如何对组的所有值进行过滤(使用dplyr)? [英] How to filter (with dplyr) for all values of a group if variable limit is reached?
问题描述
这是虚拟数据:
cases <- rep(1:5,times=2)
var1 <- as.numeric(c(450,100,250,999,200,500,980,10,700,1000))
var2 <- as.numeric(c(111,222,333,444,424,634,915,12,105,152))
maindata1 <- data.frame(cases,var1,var2)
df1 <- maindata1 %>%
filter(var1 >950) %>%
distinct(cases) %>%
select(cases)
table1 <- maindata1 %>%
filter(cases == 2 | cases == 4 | cases == 5) %>%
arrange(cases)
> table1
cases var1 var2
1 2 100 222
2 2 980 915
3 4 999 444
4 4 700 105
5 5 200 424
6 5 1000 152
我正在尝试制定一个数据框,其中包含与在这种情况下,var1> 950会显示这些情况的var1的每个值(也就是这些值<950),var2的所有值将会丢失所有var1不达到950的情况。 Table1产生所需的数据帧,但是我必须手动输入过滤条件。有没有办法使用df1 $ case作为过滤条件来提取相同的数据框结果?
I'm trying to formulate a dataframe which contains all the data related to cases where var1 >950 so it would show every value of var1 for those cases (also those values which are <950) and all values of var2 and would drop all cases where var1 won't reach >950. Table1 produces the desired dataframe but I had to enter filtering conditions manually. Is there a way to use that df1$cases as a filtering condition for extracting the same dataframe as a result?
我是R的新手,主要用dplyr学习数据操作,因为它的语法对于外行人来说几乎是可以理解的。所以如果有人可以提供基于dplyr的解决方案这样做太棒了,当然我也愿意听取其他软件包的解决方案。
I'm new to R and trying to learn data manipulation mainly with dplyr because it's syntax is almost understandable for layman.. so if someone can offer a solution based on dplyr that would be fantastic, of course I'm willing to hear solutions based on other packages as well.
推荐答案
案例中定义的每个组中的c $ c> max(var1)
maindata1 %>%
group_by(cases) %>%
filter(max(var1) > 950) %>%
arrange(cases)
# cases var1 var2
# 1 2 100 222
# 2 2 980 915
# 3 4 999 444
# 4 4 700 105
# 5 5 200 424
# 6 5 1000 152
这篇关于如果达到变量限制,如何对组的所有值进行过滤(使用dplyr)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!