如何查找数据框中各组之间共享的值? [英] How to find values shared between groups in a data frame?
问题描述
我有一个整洁的data.frame,有两列: exp
和 val
。我想查找所有不同实验之间共享的 val
值。
I have a tidy data.frame with two columns: exp
and val
. I want to find which values of val
are shared among all different experiments.
df <- data.frame(exp = c('A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'C', 'C'),
val = c(10, 20, 15, 10, 10, 15, 99, 2, 15, 20, 10, 4))
df
exp val
1 A 10
2 A 20
3 A 15
4 A 10
5 B 10
6 B 15
7 B 99
8 B 2
9 C 15
10 C 20
11 C 10
12 C 4
预期结果可以是值的向量:
Expected result could be either a vector of values:
10, 15
或数据框上的一列,告知是否共享价值:
or a column on the data frame telling if that value is shared:
exp val shared
<fct> <dbl> <lgl>
1 A 10 TRUE
2 A 20 FALSE
3 A 15 TRUE
4 A 10 TRUE
5 B 10 TRUE
6 B 15 TRUE
7 B 99 FALSE
8 B 2 FALSE
9 C 15 TRUE
10 C 20 FALSE
11 C 10 TRUE
12 C 4 FALSE
我能够找到答案(请参阅下面的自我答案),但这似乎是一个足够普遍的问题, 必须比我提出的真正棘手的解决方案更好。
I was able to find an answer (see the self-answer below) but this seems like a common enough question that there must be a better way than the really hacky solution I cam up with.
我试图在中解决此问题dplyr
因为这是我所熟悉的,但是我对任何一种解决方案都感兴趣。
I tried to solve this problem in dplyr
since that's what I'm familiar with, but I'm interested in any kind of solution.
推荐答案
或者您可以按 val
分组,然后检查该<$ c $的独立 exp
的数量c> val 等于不同的 exp
的数据帧级别数:
Or you can group by val
and then check whether the number of distinct exp
for that val
is equal to the data frame level number of distinct exp
:
df %>%
group_by(val) %>%
mutate(shared = n_distinct(exp) == n_distinct(.$exp))
# notice the first exp refers to exp for each group while .$exp refers
# to the overall exp column in the data frame
# A tibble: 12 x 3
# Groups: val [6]
# exp val shared
# <fct> <dbl> <lgl>
# 1 A 10 TRUE
# 2 A 20 FALSE
# 3 A 15 TRUE
# 4 A 10 TRUE
# 5 B 10 TRUE
# 6 B 15 TRUE
# 7 B 99 FALSE
# 8 B 2 FALSE
# 9 C 15 TRUE
#10 C 20 FALSE
#11 C 10 TRUE
#12 C 4 FALSE
这篇关于如何查找数据框中各组之间共享的值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!