如何使用R中的igraph分别计算不同时期的网络度量? [英] How can I calculate network measures separately for different periods using igraph in r?
问题描述
这是我的交易数据:
data:
id from_id to_id amount date_trx
<fctr> <fctr> <fctr> <dbl> <date>
0 7468 5695 700.0 2005-01-04
1 6213 9379 11832.0 2005-01-08
2 7517 8170 1000.0 2005-01-10
3 6143 9845 4276.0 2005-01-12
4 6254 9640 200.0 2005-01-14
5 6669 5815 200.0 2005-01-20
6 6934 8583 49752.0 2005-01-24
7 9240 8314 19961.0 2005-01-26
8 6374 8865 1000.0 2005-01-30
9 6143 6530 13.4 2005-01-31
...
我形成了一个网络,在该网络中,节点(帐户) from_id
和 to_id
的值,边缘的权重则取决于它们的交易量。然后,我计算了网络的度量,例如度中心性,中间性中心,亲密性中心等。
I formed the network where the edges are formed between the nodes(accounts) from_id
's and to_id
's, and the weights of the edges determined by the amounts they transact. Then I calculated the network's measures such as degree centrality, betweenness centrality, closeness centrality etc.
即:
relations <- data.frame(from = data$from_id,
to = data$to_id)
network <- graph_from_data_frame(relations, directed = T)
E(network)$weight <- data$amount
V(network)$degree <- degree(network, normalized=TRUE)
V(network)$betweenness <- betweenness(network, normalized=TRUE)
V(network)$closeness <- closeness(network, normalized=TRUE)
但是现在我想定期计算这些度量。例如,我想将数据按周划分(从第一个交易日开始),并计算每个帐户在相应周内的网络度量。
But now I want to calculate these measures periodically. For example, I want to divide my data by weeks(starting from the very first transaction date) and calculate the network measures for each account for corresponding weeks.
data$week <- unsplit(tapply(data$date_trx, data$from_id, function(x) (as.numeric(x-min(data$trx_date)) %/% 7)+1),data$from_id)
select(data, from_id, to_id, date_trx, week, amount) %>% arrange(date_trx)
from_id to_id date_trx week amount
<fctr> <fctr> <date> <dbl> <dbl>
6644 6934 2005-01-01 1 700
6753 8456 2005-01-01 1 600
9242 9333 2005-01-01 1 1000
9843 9115 2005-01-01 1 900
7075 6510 2005-01-02 1 400
8685 7207 2005-01-02 1 1100
... ... ... ... ...
9866 6697 2010-12-31 313 95.8
9866 5992 2010-12-31 313 139.1
9866 5797 2010-12-31 313 72.1
9866 9736 2010-12-31 313 278.9
9868 8644 2010-12-31 313 242.8
9869 8399 2010-12-31 313 372.2
当我将数据划分为每周期间时,现在我需要分别形成每周的帐户网络,以便可以计算每周期间的帐户网络度量。如何在313周内一次执行该操作?
As I divided my data into weekly periods, now I need to form networks of accounts for each week separately and so that I can calculate network measures for accounts in weekly periods. How can I do that for 313 weeks and at once?
推荐答案
一种方法是按周拆分数据,每周转换一次到igraph对象中,然后使用lapply将中心度和度数一次添加到所有图形中。我的初始data.frame名为d(见下文):
One possibility is splitting your data according to week, transform each week into an igraph object and then add the centralities and degree to all graphs at once, using lapply. My initial data.frame is named d (see below):
library(igraph)
head(d)
from_id to_id weight date_trx
1 D I 8 1999-09-12
2 E H 10 1999-10-20
3 A G 10 1999-09-10
4 C G 13 1999-04-15
5 E J 9 1999-06-26
6 B F 15 1999-04-30
首先获得一周:
d$week <- strftime(d$date_trx, format = "%V")
现在按周拆分:
dd <- split(d, d$week )
每周将其转换为 igraph
dd <- lapply(dd, function(x) graph_from_data_frame(x, directed = T))
编写一个函数,执行您要执行的所有操作,然后将其应用于每个图形:
Write a function that does all the operations you want to carry out, and then apply it to each graph:
my.funct <- function(x) {
V(x)$degree <- degree(x, normalized=TRUE)
V(x)$betweenness <- betweenness(x, normalized=TRUE)
V(x)$closeness <- closeness(x, normalized=TRUE)
return(x)
}
dd <- lapply(dd, my.funct)
例如,对于第一周:
dd[[1]]
IGRAPH f515e52 DN-- 4 2 --
+ attr: name (v/c), degree (v/n), betweenness (v/n), closeness (v/n), weigth (e/n), date_trx
| (e/n), week (e/c)
+ edges from f515e52 (vertex names):
[1] B->F C->G
get.vertex.attribute(dd[[1]])
$name
[1] "B" "C" "F" "G"
$degree
[1] 0.3333333 0.3333333 0.3333333 0.3333333
$betweenness
[1] 0 0 0 0
$closeness
[1] 0.3333333 0.3333333 0.2500000 0.2500000
get.edge.attribute(dd[[1]])
$weight
[1] 9 7
$date_trx
[1] 10595 10601
$week
[1] "01" "01"
然后您可以检索所有星期的所有中心和学位:
You can then retrieve all centralities and degree for all weeks:
ddd <- lapply(dd, function(x) igraph::as_data_frame(x, what = "vertices") )
# keep in mind that `split` names the objects in the list according to
# the value it used to split, therefore the name of the data.frames in
# the list is the name of the week.
library(dplyr)
ddd <- bind_rows(ddd, .id="week")
head(ddd)
week name degree betweenness closeness
1 01 E 1.4444444 0 0.2000000
2 01 D 1.5555556 0 0.1666667
3 01 B 0.7777778 0 0.2000000
4 01 A 1.0000000 0 0.2000000
5 01 C 0.7777778 0 0.1666667
6 01 F 1.0000000 0 0.1000000
以防万一,您可以使用它合并回
In case, you can use this to merge back to the original edges list.
此示例中使用的数据:
set.seed(123)
d <- data.frame(from_id = sample(LETTERS[1:5], 2000, replace = T),
to_id = sample(LETTERS[6:10], 2000, replace = T),
weight = rpois(2000, 10),
date_trx = sample(seq(as.Date('1999/01/01'), as.Date('2000/01/01'), by="day"), 2000, replace = T))
这篇关于如何使用R中的igraph分别计算不同时期的网络度量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!