使用带有向量的dplyr的子集数据帧 [英] Subset data frame using dplyr with a vector

查看:37
本文介绍了使用带有向量的dplyr的子集数据帧的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道如何使用dplyr,但是在这里我被卡住了

I know how to use dplyr but here I'm stuck

我有一个向量,例如:

v <- c("A","B","C")

和一个数据框,例如

Groups letters 
G1 A
G1 B
G1 C
G1 C
G2 A
G2 C
G3 A
G3 A
G3 C
G4 C

我想只保留具有所有字母 Groups .

And I would like ton only keep Groups that have all the letters.

,然后在此示例中仅保留G1,因为存在 v 中存在的所有 A,B C .

and then keep only G1 in this exemple because all A,B and C present in v are present.

我尝试过:

filtred_df2=filtred_df %>%
  group_by(Groups) %>%
  filter(all(letters %in% v))

推荐答案

可能有更短的方法,但这应该可行.首先,我们将数据限制为V中的行,然后计算该组中有多少个字母并将其与V中的唯一字母的数量进行比较.最后加入原始数据以仅包含所有字母的组.

There's probably a shorter way, but this should work. First, we limit the data to rows in V, then we count how many of the letters that group has and compare that to the number of unique letters in V. Finally join to original data to only include groups with all letters.

filtred_df %>%
  filter(letters %in% v) %>%  # Only care about letters that are in V
  count(Groups, letters) %>%   # or distinct(Groups, letters) %>%
  count(Groups) %>%
  filter(n == length(unique(v))) %>%
  select(-n) %>%
  left_join(filtred_df)

这篇关于使用带有向量的dplyr的子集数据帧的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆