计算共享事件并删除重复项 [英] count shared occurrences and remove duplicates

查看:88
本文介绍了计算共享事件并删除重复项的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这个data.frame:

df <- read.table(text= "   section to from    time
                             a     1  5        9       
                             a     2  5        9        
                             a     1  5        10       
                             a     2  6        10       
                             a     2  7        11       
                             a     2  7        12       
                             a     3  7        12       
                             a     4  7        12
                             a     4  6        13  ", header = TRUE)   

每行标识在时间点timetofrom中id的同时出现.基本上是tofrom中的ID的时间显式网络.

Each row identifies the simultaneoues occurence of an id in to and from at a timepoint time. Basically a time explicit network of ids in to and from.

我想知道哪些to id在特定时间范围2中共享了from id.换句话说,我想知道to中的ID 1和ID 2是否都在彼此相隔两天之内去了咖啡厅5.

I want to know which to ids shared a from id within a particular time range which is 2. In otherwards i want to know if ids 1 and 2 in to both went to coffee shop 5 within two days of each other., i.e.

id 12分别在time 9和10处共享了from中的from中的id 5,因此在时间窗口2内将具有1共享事件.在时间点13共享了一个from id,例如

id 1 and 2 in to shared id 5 in from at time 9 and 10 respectively and so would have 1 shared events within the time window 2. If they also shared a from id at time point 13 e.g.

                             a     1  5        9       
                             a     2  5        9        
                             a     1  7        13       
                             a     2  7        13       

然后12将得到一个2

所以我想要的df最终输出是:

So the final output I would like for the df would be:

                           section to.a to.b    noShared
                             a     1    2        1       
                             a     2    3        1        
                             a     2    4        1       
                             a     3    4        1       

我可以通过以下方式获得一些解决方法:

I can get some of the way there with:

library(plyr)                            
library(tnet)


a <- ddply(df, .(section,to,time), function(x)  
          data.frame(from = unique(x$from)) )

b <- ddply(a, .(section,time), function(x) {

            b <- as.tnet(x[, c("to","from")], type="binary two-mode tnet")
            b <- projecting_tm(b, method="sum")
            return(b)

       })

这使我了解到在每个time点内在to中哪些ID在from中共享的ID.

This gets me which ids in to shared ids in from within each time point.

但是b存在两个主要问题.

However there are two main problems with b.

首先在每个时间点内,ids对在两个方向上都出现两次,即

Firstly within each time point the pairs of ids appear twice in both directions i.e.

 1  2  5  9 # id 1 and 2 went to coffee shop 5  at time 9
 2  1  5  9 # id 2  and 1 went to coffee shop 5 at time 9

 I only want each sombination to appear once: 

  1  2  5  # id 1 and 2 went to coffee shop 5  at time 9</strike> 

其次我需要在时间窗口内对结果进行分档,以使最终结果不会仅占用共享事件数,即

Secondly I need to bin the results within the time window so that my final result doesnt hav time just number of shared events i.e.

编辑

时间问题比预期的问题多.第一个问题足以解决这个问题.

The time issue has more issues than expected. The first problem is enough for this question.

推荐答案

用于生成b(问题的第一部分)

for the generation of b (first part of the question)

我更改代码projecteing_tm这是网络的转换.

I change the code projecteing_tm wihch is transformation of a network.

b <- ddply(a, .(section,time), function(x) {
  ## first I create the origin network
  net2 <- x[, c("to","from")]
  colnames(net2) <- c('i','p')
  net2 <- net2[order(net2[, "i"], net2[, "p"]), ]
  np <- table(net2[, "p"])
  net2 <- merge(net2, cbind(p = as.numeric(rownames(np)),np = np))
  ## trasnformed network
  net1 <- merge(net2, cbind(j = net2[, "i"], p = net2[, "p"]))
  net1 <- net1[net1[, "i"] != net1[, "j"], c("i", "j","np")]
  net1 <- net1[order(net1[, "i"], net1[, "j"]), ]
  index <- !duplicated(net1[, c("i", "j")])
  net1 <- cbind(net1[index, c("i", "j")])
  net1
})

因此,您在没有任何警告的情况下得到b

So here you get your b without any warning

> b
  section time i j
1       a    9 1 2
2       a    9 2 1
3       a   12 2 3
4       a   12 2 4
5       a   12 3 2
6       a   12 3 4
7       a   12 4 2
8       a   12 4 3

对于问题的第二部分,您是否要从b中删除重复项?

For the second part of the question , do you want to remove duplicated from b?

b[!duplicated(t(apply(b[3:4], 1, sort))), ]
  section time i j
1       a    9 1 2
3       a   12 2 3
4       a   12 2 4
6       a   12 3 4

对于这一部分,我在这里使用对此问题的答案.

For this part Here I use an answer to this question.

这篇关于计算共享事件并删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆