组内观察对 [英] Pairs of Observations within Groups
问题描述
我有一个问题,我知道如何解决使用SQL,但我正在寻找在 R 中实现一个新的数据集的解决方案。我一直试图用reshape2软件包来解决问题,但是我没有想到要完成的任何运气。这是我的问题:
I've got a problem that I know how to solve using SQL, but I'm looking to implement a solution in R with a new data set. I've been trying to figure out things with the reshape2 package, but I haven't had any luck with what I'm trying to accomplish. Here's my problem:
我有一个数据集,我需要查看来自另一个组中的所有项目对。我在下面创建了一个玩具示例,以进一步解释。
I have a dataset in which I need to look at all pairs of items that are together from within another group. I've created a toy example below to further explain.
BUNCH FRUITS
1 apples
1 bananas
1 mangos
2 apples
3 bananas
3 apples
4 bananas
4 apples
我想要的是所有可能的对的列表,并将它们发生在一起的频率相加。我的输出理想地如下所示:
What I want is a listing of all possible pairs and sum the frequency they occur together within a bunch. My output would ideally look like this:
FRUIT1 FRUIT2 FREQUENCY
APPLES BANANAS 3
APPLES MANGOS 1
我的最终目标是让我最终能够导入Gephi进行网络分析。为此,我需要一个源和目标列(又名FRUIT1和FRUIT2)。
My end goal is to make something that I'll eventually be able to import into Gephi for a network analysis. For this I need a Source and Target column (aka FRUIT1 and FRUIT2 above).
SQL中的原始解决方案在这里,如果这将有助于任何人: SAS中的SQL - 所有成对对象
The original solution in SQL is here if that would help anyone: PROC SQL in SAS - All Pairs of Items
推荐答案
以下内容似乎有效:
tmp = table(DF$FRUITS, DF$BUNCH) != 0
#> tmp
# 1 2 3 4
# apples TRUE TRUE TRUE TRUE
# bananas TRUE FALSE TRUE TRUE
# mangos TRUE FALSE FALSE FALSE
do.call(rbind,
combn(unique(as.character(DF$FRUITS)),
2,
function(x) data.frame(fr1 = x[1],
fr2 = x[2],
freq = sum(colSums(tmp[x, ]) == 2)),
simplify = F))
# fr1 fr2 freq
#1 apples bananas 3
#2 apples mangos 1
#3 bananas mangos 1
其中 DF
:
DF = structure(list(BUNCH = c(1L, 1L, 1L, 2L, 3L, 3L, 4L, 4L), FRUITS = structure(c(1L,
2L, 3L, 1L, 2L, 1L, 2L, 1L), .Label = c("apples", "bananas",
"mangos"), class = "factor")), .Names = c("BUNCH", "FRUITS"), class = "data.frame", row.names = c(NA,
-8L))
这篇关于组内观察对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!