在R中使用NetworkD3包创建Sankey图 [英] Creating a Sankey Diagram using NetworkD3 package in R

查看:359
本文介绍了在R中使用NetworkD3包创建Sankey图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

目前,我正在按照Chris Grandrud( https://christophergandrud.github.io/networkD3/).
我不了解的是表格格式,因为他只使用两列来可视化更多过渡.更具体地说,我有一个数据集,其中包含代表4年的四列.这些列中有不同的酒店名称,而每一行代表一个客户,在过去的四年中对其进行跟踪".

Currently I am trying to create an interactive Sankey with the networkD3 Package following the instructions by Chris Grandrud (https://christophergandrud.github.io/networkD3/).
What I don't understand is is table-format, since he just uses two columns for visualising more transitions. To be more specific, I have a dataset containing four columns which represent 4 years. Inside these columns are different hotel names, whereas each row represents one customer, who is "tracked" over these four years.

    URL <- paste0(
        "https://cdn.rawgit.com/christophergandrud/networkD3/",
        "master/JSONdata/energy.json")
    Energy <- jsonlite::fromJSON(URL)

    sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source",
         Target = "target", Value = "value", NodeID = "name",
         units = "TWh", fontSize = 12, nodeWidth = 30)

下面是一个屏幕截图,为您提供有关我的数据的概述:

To give you an overview of my data here is a screenshot:

我会为您提供更多的编码"信息,但是由于我对R主题非常陌生,因此希望您能遵循我在此问题上的思路.如果没有,请不要犹豫,质疑它.

I would give you more "coded" information but since I am very new to the topic of R I hope you can follow my train of thoughts in this problem. If not, please do not hesistate to question it.

谢谢:)

推荐答案

,您需要两个数据框:一个列出所有节点(包含名称),另一个列出链接.后者包含三列,即源节点,目标节点和一些值,这些值指示链接的强度或宽度.在链接数据框中,您通过节点数据框中的(从零开始)位置来引用节点.

you need two dataframes: one listing all nodes (containing the names) and one listing the links. The latter contains three columns, the source node, the target node and some value, indicating the strength or width of the link. In the links dataframe you refer to the nodes by the (zero-based) position in the nodes dataframe.

假设您的数据如下:

df <- data.frame(Year1=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                 Year2=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                 Year3=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                 Year4=sample(paste0("Hotel", 1:4), 1000, replace = TRUE),
                 stringsAsFactors = FALSE)

对于该图,您不仅需要区分酒店,还需要区分酒店/年份组合,因为它们中的每一个都应该是一个节点:

For the diagram you need to differentiate not only between the hotels but between the hotel/year combination since each of them should be one node:

df$Year1 <- paste0("Year1_", df$Year1)
df$Year2 <- paste0("Year2_", df$Year2)
df$Year3 <- paste0("Year3_", df$Year3)
df$Year4 <- paste0("Year4_", df$Year4)

链接是酒店从一年到下一年之间的过渡":

the links are the "transitions" between the hotels from one year to the next:

library(dplyr)
trans1_2 <- df %>% group_by(Year1, Year2) %>% summarise(sum=n())
trans2_3 <- df %>% group_by(Year2, Year3) %>% summarise(sum=n())
trans3_4 <- df %>% group_by(Year3, Year4) %>% summarise(sum=n())

colnames(trans1_2)[1:2] <- colnames(trans2_3)[1:2] <- colnames(trans3_4)[1:2] <- c("source","target")

links <- rbind(as.data.frame(trans1_2), 
               as.data.frame(trans2_3), 
               as.data.frame(trans3_4))

最后,数据帧需要相互引用:

finally, the dataframes need to be referenced to each other:

nodes <- data.frame(name=unique(c(links$source, links$target)))
links$source <- match(links$source, nodes$name) - 1
links$target <- match(links$target, nodes$name) - 1

然后可以绘制该图:

library(networkD3)
sankeyNetwork(Links = links, Nodes = nodes, Source = "source",
              Target = "target", Value = "sum", NodeID = "name",
              fontSize = 12, nodeWidth = 30)

可能会有更优雅的解决方案,但这可能是您问题的起点.如果您不喜欢节点名称中的"Year ...",则可以在设置数据帧后将其删除.

There might be more elegant solutions, but this could be a starting point for your problem. If you don't like the "Year..." in the nodes' names you con remove them after setting up the dataframes.

这篇关于在R中使用NetworkD3包创建Sankey图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆