根据来自不同列的值的总和过滤行组 [英] Filter group of rows based on sum of values from different column
问题描述
我试图过滤掉 R 中的整行,但前提是特定集合的频率加起来不超过 5.
I'm trying to filter out whole rows in R, but only if the frequencies for a particular set don't add up to more than 5.
我的数据看起来有点像这样.这是我目前称之为Words"的数据框:
The data I have looks a bit like this. It's a dataframe that I'm currently calling "Words":
HEADWORD VARIANT FREQUENCY
SWORD sword 2
SWORD swerd 1
SWORD sworde 1
KNIGHT knight 6
KNIGHT kniht 2
KNIGHT knyt 1
我只想要特定词条中频率加起来超过 5 的行.所以在这里,我想保留 KNIGHT 的所有实例,但我想完全摆脱所有 SWORD 行.
I only want rows for which the frequencies within a particular headword add up to more than 5. So here, I want to keep all the instances of KNIGHT but I want to get rid of all the SWORD rows entirely.
我尝试在 dplyr 上执行此操作,但没有成功.这是我试过的代码:
I tried to do this on dplyr, but with no success. This is the code I tried:
Words1 %>% group_by(HW) %>% filter(Fr > 5)
推荐答案
我们需要得到'FREQUENCY'的sum
并在filter
中检查它是否大于5代码>按'HEADWORD'分组后
We need to get the sum
of 'FREQUENCY' and check whether it is greater than 5 in the filter
after grouping by 'HEADWORD'
Words1 %>%
group_by(HEADWORD) %>%
filter(sum(FREQUENCY) >5)
# HEADWORD VARIANT FREQUENCY
# <chr> <chr> <int>
#1 KNIGHT knight 6
#2 KNIGHT kniht 2
#3 KNIGHT knyt 1
这篇关于根据来自不同列的值的总和过滤行组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!