使用dplyr为给定组创建值的唯一组合的向量 [英] Using dplyr to create vector of unique combinations of values for a given group

查看：55 发布时间：2021/5/2 20:50:43 r dplyr

本文介绍了使用dplyr为给定组创建值的唯一组合的向量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据集，其中每一行包含一个事件标识符，而各列包含有关受邀者和组织者的信息.多行将具有相同的事件标识符.我想汇总事件标识符，生成唯一的受邀者和组织者的列表.

I have a dataset where each row contains an event identifier and columns contain information on an invitee and an organizer. Multiple rows will have the same event identifier. I want to aggregate over the event identifier, generating a list of unique invitees and organizers.

假设我具有以下数据集:

Let's say I have the following dataset:

test <- data.frame(id = stringi::stri_rand_strings(100, 1, '[A-Z]'), invitee_id = floor(runif(100, min=0, max=500)), organizer_id = floor(runif(100, min=0, max=500)))

我想对'id'变量进行group_by，并创建一个新列，该列是一个用逗号分隔的矢量，其中包含vitate_id和organizer_id的所有唯一值.第一行的最终结果可能类似于:

I want to group_by the 'id' variable, and create a new column that is a comma-delimited vector of all the unique values of invitee_id and organizer_id. The end result for the first row may look like:

> final_df
    id invitee_id organizer_id unique_vals
1    L        481          396 (481, 396, 300, 100, 200)

我们在final_df上崩溃的位置.

Where we have collapsed on final_df.

我尝试了类似的事情:

final_df <- test %>% 
  group_by(id) %>% 
  distinct(invitee_id, .keep_all=TRUE)

最终目标是一个邻接矩阵，其中行和列是与会者的ID，其值表示共享事件的数量.

The end goal is an adjacency matrix where rows and columns are the IDs of attendees and the values represent the number of shared events.

更清楚的例子:

假设我有这个测试数据

> test
   id invitee_id organizer_id
1   A        478          444
2   A        226          346
3   A        338          320
4   A        286          497
5   B        478          327
6   B        226          354
7   B        123          272
8   C        226          297
9   C        338          144
10  C        477           73

我正在尝试按id分组并在受邀者和组织者之间进行汇总，如下所示:

I'm trying to group_by id and aggregate across invitee and organizers like so:

> final_df
   id invitee_id_merged   organizer_id_merged  grouped_values
1   A  c(478, 226, 338)   c(444, 346, 320)     c(478, 226, 338, 444, 346, 320)

最终目标是一个邻接矩阵，其中被邀请者和组织者ID的唯一列表代表行和列.给定的行，列的值应表示这两个人在事件中相遇的次数.所以第一行看起来像这样:

The end goal is an adjacency matrix where a unique list of both invitees and organizer IDs represent the rows and columns. The values of a given row, column should represent the number of times those two individuals met in an event. So the first row would look like this:

> final_matrix
invitee_or_organizer

    478 226 338 286 123 477 ...
478 2
226 1
338 1
286 1
123 0
477 0 
 ...

更新

基于OP帖子中的修改

Update

Based on the edit in the OP's post,

crossprod(table(rep(test$id, 2), unlist(test[-1])))

这篇关于使用dplyr为给定组创建值的唯一组合的向量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr为给定组创建值的唯一组合的向量 [英] Using dplyr to create vector of unique combinations of values for a given group

问题描述

推荐答案

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用dplyr为给定组创建值的唯一组合的向量 [英] Using dplyr to create vector of unique combinations of values for a given group

问题描述

推荐答案

更新

Update

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭