组内观察对 [英] Pairs of Observations within Groups

查看:121
本文介绍了组内观察对的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个问题,我知道如何解决使用SQL,但我正在寻找在 R 中实现一个新的数据集的解决方案。我一直试图用reshape2软件包来解决问题,但是我没有想到要完成的任何运气。这是我的问题:

I've got a problem that I know how to solve using SQL, but I'm looking to implement a solution in R with a new data set. I've been trying to figure out things with the reshape2 package, but I haven't had any luck with what I'm trying to accomplish. Here's my problem:

我有一个数据集,我需要查看来自另一个组中的所有项目对。我在下面创建了一个玩具示例,以进一步解释。

I have a dataset in which I need to look at all pairs of items that are together from within another group. I've created a toy example below to further explain.

BUNCH    FRUITS
1        apples
1        bananas
1        mangos
2        apples
3        bananas
3        apples
4        bananas
4        apples

我想要的是所有可能的对的列表,并将它们发生在一起的频率相加。我的输出理想地如下所示:

What I want is a listing of all possible pairs and sum the frequency they occur together within a bunch. My output would ideally look like this:

FRUIT1    FRUIT2     FREQUENCY
APPLES    BANANAS    3
APPLES    MANGOS     1

我的最终目标是让我最终能够导入Gephi进行网络分析。为此,我需要一个源和目标列(又名FRUIT1和FRUIT2)。

My end goal is to make something that I'll eventually be able to import into Gephi for a network analysis. For this I need a Source and Target column (aka FRUIT1 and FRUIT2 above).

SQL中的原始解决方案在这里,如果这将有助于任何人: SAS中的SQL - 所有成对对象

The original solution in SQL is here if that would help anyone: PROC SQL in SAS - All Pairs of Items

推荐答案

以下内容似乎有效:

tmp = table(DF$FRUITS, DF$BUNCH) != 0
#> tmp         
#             1     2     3     4
#  apples  TRUE  TRUE  TRUE  TRUE
#  bananas TRUE FALSE  TRUE  TRUE
#  mangos  TRUE FALSE FALSE FALSE

do.call(rbind, 
        combn(unique(as.character(DF$FRUITS)), 
              2,
              function(x) data.frame(fr1 = x[1], 
                                     fr2 = x[2], 
                                     freq = sum(colSums(tmp[x, ]) == 2)), 
              simplify = F))
#      fr1     fr2 freq
#1  apples bananas    3
#2  apples  mangos    1
#3 bananas  mangos    1

其中 DF

DF = structure(list(BUNCH = c(1L, 1L, 1L, 2L, 3L, 3L, 4L, 4L), FRUITS = structure(c(1L, 
2L, 3L, 1L, 2L, 1L, 2L, 1L), .Label = c("apples", "bananas", 
"mangos"), class = "factor")), .Names = c("BUNCH", "FRUITS"), class = "data.frame", row.names = c(NA, 
-8L))

这篇关于组内观察对的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆