从ID和分组向量生成边缘列表 [英] Generating an edge list from ID and grouping vectors

查看:92
本文介绍了从ID和分组向量生成边缘列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个205,000+行的数据框,其格式如下:

I have a data frame of 205,000+ rows formatted as follows:

df <- data.frame(project.id = c('SP001', 'SP001', 'SP001', 'SP017', 'SP018', 'SP017'),
                 supplier.id = c('1224', '5542', '7741', '1224', '2020', '9122'))

在实际数据帧中,有project.id的6700+个唯一值.我想创建一个边缘列表,将在同一项目中工作过的供应商配对.

In the actual data frame there are 6700+ unique values of project.id. I would like to create an edge list that pairs suppliers who have worked on the same project.

project.id = SP001的所需最终结果:

to     from
1224   5542
1224   7741
5542   7741

到目前为止,我已经尝试使用split通过project.id创建一个列表,然后运行lapply+combn来生成每个列表/组中所有supplier.id的可能组合:

So far I've tried using split to create a list by project.id and then running lapply+combn to generate all possible combinations of supplier.id within each list/group:

try.list <- split(df, df$project.id)
try.output <- lapply(try.list, function(x) combn(x$supplier.id, 2))

是否有一种更优雅/更有效的方式(读取为少于2小时即可计算")来生成类似的内容?

Is there a more elegant/efficient (read "computed in less than 2hrs") way to generate something like this?

任何帮助将不胜感激

推荐答案

您可以使用dplyr软件包代替使用splitlapply.

Instead of using split and lapply, you can use the dplyr package.

df <- data.frame(project.id = c('SP001', 'SP001', 'SP001', 'SP017', 'SP018', 'SP017'),
                 supplier.id = c('1224', '5542', '7741', '1224', '2020', '9122'),
                 stringsAsFactors = FALSE)

library(dplyr)

df %>% group_by(project.id) %>%
  filter(n()>=2) %>% group_by(project.id) %>%
 do(data.frame(t(combn(.$supplier.id, 2)), stringsAsFactors=FALSE))
# Source: local data frame [4 x 3]
# Groups: project.id [2]

#   project.id    X1    X2
#        (chr) (chr) (chr)
# 1      SP001  1224  5542
# 2      SP001  1224  7741
# 3      SP001  5542  7741
# 4      SP017  1224  9122

这篇关于从ID和分组向量生成边缘列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆