计算共享事件并删除重复项 [英] count shared occurrences and remove duplicates
问题描述
我有这个data.frame
:
df <- read.table(text= " section to from time
a 1 5 9
a 2 5 9
a 1 5 10
a 2 6 10
a 2 7 11
a 2 7 12
a 3 7 12
a 4 7 12
a 4 6 13 ", header = TRUE)
每行标识在时间点time
上to
和from
中id的同时出现.基本上是to
和from
中的ID的时间显式网络.
Each row identifies the simultaneoues occurence of an id in to
and from
at a timepoint time
. Basically a time explicit network of ids in to
and from
.
我想知道哪些to
id在特定时间范围2
中共享了from
id.换句话说,我想知道to
中的ID 1和ID 2是否都在彼此相隔两天之内去了咖啡厅5
.
I want to know which to
ids shared a from
id within a particular time range which is 2
. In otherwards i want to know if ids 1 and 2 in to
both went to coffee shop 5
within two days of each other., i.e.
id 1
和2
分别在time
9和10处共享了from
中的from
中的id 5
,因此在时间窗口2内将具有1
共享事件.在时间点13共享了一个from
id,例如
id 1
and 2
in to
shared id 5
in from
at time
9 and 10 respectively and so would have 1
shared events within the time window 2. If they also shared a from
id at time point 13 e.g.
a 1 5 9
a 2 5 9
a 1 7 13
a 2 7 13
然后1
和2
将得到一个2
所以我想要的df
最终输出是:
So the final output I would like for the df
would be:
section to.a to.b noShared
a 1 2 1
a 2 3 1
a 2 4 1
a 3 4 1
我可以通过以下方式获得一些解决方法:
I can get some of the way there with:
library(plyr)
library(tnet)
a <- ddply(df, .(section,to,time), function(x)
data.frame(from = unique(x$from)) )
b <- ddply(a, .(section,time), function(x) {
b <- as.tnet(x[, c("to","from")], type="binary two-mode tnet")
b <- projecting_tm(b, method="sum")
return(b)
})
这使我了解到在每个time
点内在to
中哪些ID在from
中共享的ID.
This gets me which ids in to
shared ids in from
within each time
point.
但是b
存在两个主要问题.
However there are two main problems with b
.
首先在每个时间点内,ids
对在两个方向上都出现两次,即
Firstly within each time point the pairs of ids
appear twice in both directions i.e.
1 2 5 9 # id 1 and 2 went to coffee shop 5 at time 9
2 1 5 9 # id 2 and 1 went to coffee shop 5 at time 9
I only want each sombination to appear once:
1 2 5 # id 1 and 2 went to coffee shop 5 at time 9</strike>
其次我需要在时间窗口内对结果进行分档,以使最终结果不会仅占用共享事件数,即
Secondly I need to bin the results within the time window so that my final result doesnt hav time just number of shared events i.e.
编辑
时间问题比预期的问题多.第一个问题足以解决这个问题.
The time issue has more issues than expected. The first problem is enough for this question.
推荐答案
用于生成b(问题的第一部分)
for the generation of b (first part of the question)
我更改代码projecteing_tm
这是网络的转换.
I change the code projecteing_tm
wihch is transformation of a network.
b <- ddply(a, .(section,time), function(x) {
## first I create the origin network
net2 <- x[, c("to","from")]
colnames(net2) <- c('i','p')
net2 <- net2[order(net2[, "i"], net2[, "p"]), ]
np <- table(net2[, "p"])
net2 <- merge(net2, cbind(p = as.numeric(rownames(np)),np = np))
## trasnformed network
net1 <- merge(net2, cbind(j = net2[, "i"], p = net2[, "p"]))
net1 <- net1[net1[, "i"] != net1[, "j"], c("i", "j","np")]
net1 <- net1[order(net1[, "i"], net1[, "j"]), ]
index <- !duplicated(net1[, c("i", "j")])
net1 <- cbind(net1[index, c("i", "j")])
net1
})
因此,您在没有任何警告的情况下得到b
So here you get your b without any warning
> b
section time i j
1 a 9 1 2
2 a 9 2 1
3 a 12 2 3
4 a 12 2 4
5 a 12 3 2
6 a 12 3 4
7 a 12 4 2
8 a 12 4 3
对于问题的第二部分,您是否要从b中删除重复项?
For the second part of the question , do you want to remove duplicated from b?
b[!duplicated(t(apply(b[3:4], 1, sort))), ]
section time i j
1 a 9 1 2
3 a 12 2 3
4 a 12 2 4
6 a 12 3 4
对于这一部分,我在这里使用对此问题的答案.
For this part Here I use an answer to this question.
这篇关于计算共享事件并删除重复项的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!