如何选择 R 数据框中两列的所有唯一组合? [英] How do I select all unique combinations of two columns in an R data frame?
问题描述
我有一个相关矩阵,我将其放入数据框中,如下所示:
I have a correlation matrix that I put in a dataframe like so:
row | var1 | var2 | cor
1 | A | B | 0.6
2 | B | A | 0.6
3 | A | C | 0.4
4 | C | A | 0.4
这些结果每行复制到 2 行,同时包含var1"和var2".我只需要一个,最好先使用较低的变量(例如第 1 行和第 3 行).
These results are duplicated into 2 rows each, with both combinations of "var1" and "var2". I only need one, preferably with the lower variable first (e.g. rows 1 and 3).
我一直在玩 dplyr 两个小时并阅读旧线程,但没有找到我需要的东西.
I've been playing with dplyr for two hours and reading old threads, but not finding what I need.
# get correlation of every concept versus every concept
data.cor <- data.jobs %>%
select(-y,-X) %>%
as.matrix %>%
cor %>%
as.data.frame %>%
rownames_to_column(var = 'var1') %>%
gather(var2, value, -var1)
我希望输出看起来像这样:
I would like output to look like so:
row | var1 | var2 | cor
1 | A | B | 0.6
3 | A | C | 0.4
我正在尝试不使用循环来做到这一点.
I am trying to do this without resorting to a loop.
推荐答案
这里是 tidyverse
的一种方式 -
Here's one way with tidyverse
-
dat2 <- dat %>%
filter(!duplicated(paste0(pmax(var1, var2), pmin(var1, var2))))
# A tibble: 2 x 3
var1 var2 cor
<chr> <chr> <dbl>
1 A B 0.600
2 A C 0.400
数据 -
dat <- data_frame(
var1 = LETTERS[c(1,2,1,3)],
var2 = LETTERS[c(2,1,3,1)],
cor = c(0.6,0.6,0.4,0.4))
注意:由于@tmfmnk 清理了逻辑
Note: cleaned up the logic thanks to @tmfmnk
这篇关于如何选择 R 数据框中两列的所有唯一组合?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!