根据分组的data.frame计算每对 [英] Calculation on every pair from grouped data.frame
问题描述
我的问题是关于在data.frame中的每对组之间执行计算,我希望它可以进行矢量化处理。
My question is about performing a calculation between each pair of groups in a data.frame, I'd like it to be more vectorized.
我有一个数据包含以下各列的.frame:位置
,示例
, Var1
和 Var2
。我想找到每个样本
与每个<$ c $位置的每对 Location
s的壁橱匹配c> Var1 和 Var2
。
I have a data.frame that has a consists of the following columns: Location
, Sample
, Var1
, and Var2
. I'd like to find the closet match for each Sample
for each pair of Location
s for both Var1
and Var2
.
我可以在一对位置完成此操作
I can accomplish this for one pair of locations as such:
df0 <- data.frame(Location = rep(c("A", "B", "C"), each =30),
Sample = rep(c(1:30), times =3),
Var1 = sample(1:25, 90, replace =T),
Var2 = sample(1:25, 90, replace=T))
df00 <- data.frame(Location = rep(c("A", "B", "C"), each =30),
Sample = rep(c(31:60), times =3),
Var1 = sample(1:100, 90, replace =T),
Var2 = sample(1:100, 90, replace=T))
df000 <- rbind(df0, df00)
df <- sample_n(df000, 100) # data
dfl <- df %>% gather(VAR, value, 3:4)
df1 <- dfl %>% filter(Location == "A")
df2 <- dfl %>% filter(Location == "B")
df3 <- merge(df1, df2, by = c("VAR"), all.x = TRUE, allow.cartesian=TRUE)
df3 <- df3 %>% mutate(DIFF = abs(value.x-value.y))
result <- df3 %>% group_by(VAR, Sample.x) %>% top_n(-1, DIFF)
我尝试了其他可能性,例如使用 dplyr :: spread
,但无法避免出现错误:行的重复标识符 或用NA填充一半的列。
I tried other possibilities such as using dplyr::spread
but could not avoid the "Error: Duplicate identifiers for rows" or columns half filled with NA.
对于每个可能的组对,是否有更干净,更自动化的方法?我想避免使用每对的手动子集和合并例程。
Is there a more clean and automated way to do this for each possible group pair? I'd like to avoid the manual subset and merge routine for each pair.
推荐答案
一种选择是创建位置与 combn $ c的成对组合$ c>,然后按照OP的代码执行其他步骤
One option would be to create the pairwise combination of 'Location' with combn
and then do the other steps as in the OP's code
library(tidyverse)
df %>%
# get the unique elements of Location
distinct(Location) %>%
# pull the column as a vector
pull %>%
# it is factor, so convert it to character
as.character %>%
# get the pairwise combinations in a list
combn(m = 2, simplify = FALSE) %>%
# loop through the list with map and do the full_join
# with the long format data df1
map(~ full_join(df1 %>%
filter(Location == first(.x)),
df1 %>%
filter(Location == last(.x)), by = "VAR") %>%
# create a column of absolute difference
mutate(DIFF = abs(value.x - value.y)) %>%
# grouped by VAR, Sample.x
group_by(VAR, Sample.x) %>%
# apply the top_n with wt as DIFF
top_n(-1, DIFF))
正如OP提到的关于自动拾取而不是执行两次 filter
(虽然不清楚预期的输出)
Also, as the OP mentioned about automatically picking up instead of doing double filter
(not clear about the expected output though)
df %>%
distinct(Location) %>%
pull %>%
as.character %>%
combn(m = 2, simplify = FALSE) %>%
map(~ df1 %>%
# change here i.e. filter both the Locations
filter(Location %in% .x) %>%
# spread it to wide format
spread(Location, value, fill = 0) %>%
# create the DIFF column by taking the differene
mutate(DIFF = abs(!! rlang::sym(first(.x)) -
!! rlang::sym(last(.x)))) %>%
group_by(VAR, Sample) %>%
top_n(-1, DIFF))
这篇关于根据分组的data.frame计算每对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!