使用networkd3的R中离散状态序列的Sankey图 [英] Sankey diagram for Discrete State Sequences in R using networkd3

查看:106
本文介绍了使用networkd3的R中离散状态序列的Sankey图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用networkD3包在R中创建交互式Sankey图,如 http://christophergandrud.github.io/networkD3/#sankey .我的数据采用离散状态序列(DSS)的格式. 1行代表1个事件序列. NA表示序列已结束.在R中重新创建数据样本:

I am trying to create an interactive Sankey diagram in R using the networkD3 package as described at http://christophergandrud.github.io/networkD3/#sankey. My data is in the format of Discrete State Sequences(DSS). 1 row represents 1 event sequence. NAs represent that the sequence has ended. Recreating a sample of the data in R:

x1 <- c('06002100', '06002001', '06001304', '06002100')
x2 <- c('06002100', '06002001', 'NA', 'NA')
x3 <- c('06001304', '06002100', '06002001', 'NA')
test <- as.data.frame(rbind(x1,x2,x3))

networkd3软件包需要json形式的数据,如下所示:

networkd3 package requires data in the json form as given by:

URL <- paste0("https://cdn.rawgit.com/christophergandrud/networkD3/","master/JSONdata/energy.json")

以要求的格式铸造上面的示例数据将给我(test.json):

Casting the sample data above in the required format would give me (test.json):

{"nodes":[
{"name":"06002100"},
{"name":"06002001"},
{"name":"06001304"}
],
"links":[
{"source":0,"target":1,"value":3},
{"source":1,"target":2,"value":1},
{"source":2,"target":0,"value":2}
]}

一旦数据采用上述格式,我就可以使用以下代码绘制sankey网络.

Once the data is in the above format, I can use the following code to plot the sankey network.

library(networkD3) 
library(jsonlite) 
Energy <- fromJSON(txt = 'test.json') # Load the data 
result <- as.data.frame(Energy) 
sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source", Target = "target", Value = "value", NodeID = "name", fontSize = 12, nodeWidth = 30)

我想将所需的DSS数据转换为networkD3所需的格式.有直接的方法可以做到这一点吗?

I want to transform the DSS data that I have to the format required by networkD3. Is there a direct way to do this?

networkD3示例页面提到我可以使用igraph包来创建可以用networkD3绘制的网络图数据.不幸的是,我找不到很好的例子.

networkD3 examples page mentions that I can use igraph package to create network graph data that can be plotted with networkD3. Unfortunately I couldn't find good examples for that.

推荐答案

sankeyNetwork()最终想要的是LinksNodes数据帧.假设在您的DSS数据中,每对并排的节点对定义了从左到右的链接,那么数据帧的每对连续列看起来都像是Links数据帧的一部分,其中sourcetarget列.

What sankeyNetwork() ultimately wants is a Links and a Nodes data frame. Assuming that in your DSS data each side by side pair of nodes defines a link from left to right, then each pair of contiguous columns of your data frame looks like part of a Links data frame with a source and target column.

首先,我修复了您的代码,使它成为真正的NA而不是字符串"NA" ...

first, I fixed your code so that it makes real NAs not strings of "NA"...

x1 <- c('06002100', '06002001', '06002425', '06009347', '06010001', '06010383', '06009348')
x2 <- c('06002100', '06040401', '06009347', '06039301', NA, NA, NA)
x3 <- c('06001304', '06002001', '06009346', '06002425', '06003303', NA, NA)
x4 <- c('06002100', '06040401', '06009347', '06039301', '06039302', '06032301', '06032301')
test <- as.data.frame(rbind(x1,x2,x3,x4))

为数据框中的每一组连续列提取一个数据框,将它们绑定到一个长的Links数据框中,并省略具有NA的行...

extract a data frame for each set of contiguous columns in your data frame, bind them into one long Links data frame, and omit rows that have NA's...

linklist <- lapply(1:(ncol(test) - 1), function(x) data.frame(source = test[[x]], target = test[[x+1]], stringsAsFactors = F))
links <- na.omit(do.call(rbind, linklist))

创建一个包含所有唯一节点名称的向量,并从中创建一个Nodes数据框,基于Nodes数据框中的零索引名称构建一个Links数据框,然后对其进行绘制.

make a vector of all unique node names and make a Nodes data frame out of it, build a Links data frame based on the zero-indexed names in the Nodes data frame, then plot it...

node_names <- factor(sort(unique(c(as.character(links$source), 
                                   as.character(links$target)))))
nodes <- data.frame(name = node_names)
links <- data.frame(source = match(links$source, node_names) - 1, 
                    target = match(links$target, node_names) - 1,
                    value = 1)

library(networkD3)
sankeyNetwork(links, nodes, "source", "target", "value", "name")

这篇关于使用networkd3的R中离散状态序列的Sankey图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆