通过列值的排列在R中聚合 [英] aggregate in R by permutations of column values

查看:62
本文介绍了通过列值的排列在R中聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

背景:我正在使用来源地数据。我想计算每对城市之间的比例流量。但是,我发现每对城市都难以汇总数据,因为这两个城市涉及列条目的排列。我可以使用大量的for循环和条件通过蛮力做到这一点,但这花了太长的时间来计算。

Background: I'm working with origin-destination data. I would like to calculate the proportional flow between each pair of cities. However, I'm finding it difficult to aggregate data by each pair of cities since the pairs involve permutations of column entries. I can do it by brute force using lots of for loops and conditionals, but this takes far too long to compute.

具体来说
给出以下形式的矩阵:

Specifically Given a matrix of the following form:

Origin     Destination    Flow   
a          b              f1  
b          a              f2    
c          d              f3    
d          c              f4

我想计算汇总

Pair      Flow
a,b       f1+f2
c,d       f3+f4

我试图通过反转origin-destination列,附加到原始数据集,按origin和destination列进行汇总,使用xtabs创建对称矩阵来做到这一点,然后取上三角形。但是,这似乎无法正常工作。

I tried to do this by reversing the origin-destination columns, appending to the original data set, aggregating by the origin and destination columns, using xtabs to create a symmetric matrix, and then just taking the upper triangle. However, this doesn't see to be working properly.

推荐答案

这里是一种解决方案:

library(dplyr)
df$pair <- ifelse(df$Destination < df$Origin,
                  paste(df$Destination, df$Origin, sep = ','),
                  paste(df$Origin, df$Destination, sep = ','))

df %>% group_by(pair) %>% summarise(Flow = paste(Flow, collapse = ' + '))

Source: local data frame [2 x 2]

   pair    Flow
  (chr)   (chr)
1   a,b f1 + f2
2   c,d f3 + f4

流量列为显然使用字符向量粘贴,因为这就是您给的。如果您有数值,则可以修改为 sum(Flow)

The Flow column is obviously paste using character vectors since that is what you gave. You can modify to sum(Flow) if you have numeric values.

编辑:对不起,我之前是汇总错误的列。固定。

EDITED: Sorry, earlier, I was summing wrong column. Fixed.

这篇关于通过列值的排列在R中聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆