根据不同列的值之和过滤行组 [英] Filter group of rows based on sum of values from different column

查看:57
本文介绍了根据不同列的值之和过滤行组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图过滤掉R中的整行,但前提是特定集合的频率加起来不超过5。


我所看到的数据有点像这样。这是我当前正在调用的数据数据框:

 关键字变体频率
Sword剑2
SWORD 1
剑剑1
骑士骑士6
骑士骑士2
骑士骑士1

我只希望特定单词内的频率加起来大于5的行。因此,在这里,我想保留KNIGHT的所有实例,但我想完全摆脱所有SWORD行。

我尝试在dplyr上执行此操作,但没有成功。这是我尝试的代码:

  Words1%>%group_by(HW)%&%;%filter(Fr> 5)


解决方案

我们需要获取的总和的 FREQUENCY,并在按 HEADWORD分组后检查过滤器是否大于5。

  Words1%>%
group_by(HEADWORD)%&%;%
filter(sum(FREQUENCY)> 5)
#HEADWORD变量频率
#< chr> < chr> < int>
#1骑士6
#2骑士2
#3骑士1


I'm trying to filter out whole rows in R, but only if the frequencies for a particular set don't add up to more than 5.

The data I have looks a bit like this. It's a dataframe that I'm currently calling "Words":

HEADWORD VARIANT FREQUENCY
 SWORD    sword      2
 SWORD    swerd      1
 SWORD    sworde     1
 KNIGHT   knight     6
 KNIGHT   kniht      2
 KNIGHT   knyt       1

I only want rows for which the frequencies within a particular headword add up to more than 5. So here, I want to keep all the instances of KNIGHT but I want to get rid of all the SWORD rows entirely.

I tried to do this on dplyr, but with no success. This is the code I tried:

Words1 %>% group_by(HW) %>%  filter(Fr > 5)

解决方案

We need to get the sum of 'FREQUENCY' and check whether it is greater than 5 in the filter after grouping by 'HEADWORD'

Words1 %>% 
     group_by(HEADWORD) %>% 
     filter(sum(FREQUENCY) >5)   
#   HEADWORD VARIANT FREQUENCY
#     <chr>   <chr>     <int>
#1   KNIGHT  knight         6
#2   KNIGHT   kniht         2 
#3   KNIGHT    knyt         1

这篇关于根据不同列的值之和过滤行组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆