Sankey图,其中节点之间的边缘对应于N3列 [英] Sankey plot where edges between nodes correspond to an N3 column

查看:505
本文介绍了Sankey图,其中节点之间的边缘对应于N3列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想根据下面生成的数据结构绘制一张sankey图,其中节点之间的边对应于 N3 列,并且它们的存在和厚度取决于列。对于下面的虚拟数据,情节看起来像


I would like to draw a sankey plot based on the below generated data structure where the edges between nodes correspond to an N3 column and their presence and thickness depend on the Value column. For the below dummy data, the plot would look like this (but with edge thickness corresponding to the value in the Value column). I haven't seen any example of sankey plots built like this. I've tried different options using the riverplot package, but as it doesn't seem to be able to handle the N3 column, it removes all duplicates of, e.g., edges between A and C.

set.seed(123)    

mat <- matrix(rbinom(20,100,0.01),4,5,dimnames=list(LETTERS[1:4],letters[1:5]))
mat

#   a b c d e
# A 0 3 1 1 0
# B 2 0 1 1 0
# C 1 1 3 0 0
# D 2 2 1 2 3

rowKey <- c("A"="N1","B"="N1","C"="N2","D"="N2")

edges = expand.grid(c(split(names(rowKey), rowKey), list(N3 = colnames(mat))))

edges2 = sapply(1:nrow(edges), function(i)
mat[row.names(mat) == edges$N1[i] | row.names(mat) == edges$N2[i],
    colnames(mat) == edges$N3[i]])

edges$Value = colSums(edges2) * (colSums(edges2 > 0) == nrow(edges2))
edges

#   N1 N2 N3 Value
#1   A  C  a     0
#2   B  C  a     3
#3   A  D  a     0
#4   B  D  a     4
#5   A  C  b     4
#6   B  C  b     0
#7   A  D  b     5
#8   B  D  b     0
#9   A  C  c     4
#10  B  C  c     4
#11  A  D  c     2
#12  B  D  c     2
#13  A  C  d     0
#14  B  C  d     0
#15  A  D  d     3
#16  B  D  d     3
#17  A  C  e     0
#18  B  C  e     0
#19  A  D  e     0
#20  B  D  e     0


# Plotting a sankey plot using the riverplot package
require(riverplot)
require(RColorBrewer)

nodes = data.frame(ID = unique(c(as.character(edges$N1),      
as.character(edges$N2))), stringsAsFactors = FALSE)
nodes$x <- c(rep(1,2),rep(2,2))
nodes$y <- c(0:1,0:1)

palette = paste0(brewer.pal(3, "Set1"), "60")
styles = lapply(nodes$y, function(n) {
  list(col = palette[n+1], lty = 0, textcol = "black")
})
names(styles) = nodes$ID

rp <- list(nodes=nodes, edges=edges[,-3], styles=styles)
class(rp) <- c(class(rp), "riverplot")
plot(rp, plot_area = 0.95, yscale=0.06, srt=0)

# Warning message:
# In checkedges(x2$edges, names(x2)) :
# duplicated edge information, removing 16 edges 

解决方案

Here's a solution using the geom_parallel_sets() from the ggforce package

devtools::install_github('thomasp85/ggforce')

edges1 <- gather_set_data(edges, 1:2)

ggplot(edges1, aes(x, id = id, split = y, value = Value)) +
      geom_parallel_sets(aes(fill = N3), alpha = 0.3, axis.width = 0.1) +
      geom_parallel_sets_axes(axis.width = 0.1) +
      geom_parallel_sets_labels(colour = 'white')

这篇关于Sankey图,其中节点之间的边缘对应于N3列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆