从数据框中绘制Sankey图 [英] Draw Sankey Diagram from dataframe

查看:96
本文介绍了从数据框中绘制Sankey图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

Vendor Name                 Category                    Count
AKJ Education               Books                       846888
AKJ Education               Computers & Tablets         1045
Amazon                      Books                       1294423
Amazon                      Computers & Tablets         42165
Amazon                      Other                       415
Flipkart                    Books                       1023

我正在尝试使用上述数据框绘制sankey图,来源为供应商名称,目标为类别,流或宽度为计数.我尝试使用Plotly,但没有成功.有人在Plotly上有制作Sankey图的解决方案吗?

I am trying to draw a sankey diagram using the above dataframe, with the source being Vendor Name and target being Category, and the flow or width being the Count. I tried using Plotly, but no sucess. Does anyone has a solution with Plotly for making a Sankey Diagram?

谢谢

推荐答案

帖子的答案

The answer to the post How to define the structure of a sankey diagram using a dataframe? will show you that forcing your Sankey data sources into one dataframe may quickly lead to confusion. You'll be better off separating nodes from links since they are constructed differently.

因此,您的节点数据框应如下所示:

So your node dataframe should look something like this:

ID               Label    Color
0        AKJ Education  #4994CE
1               Amazon  #8A5988
2             Flipkart  #449E9E
3                Books  #7FC241
4  Computers & tablets  #D3D3D3
5                Other  #4994CE

您的链接数据框应如下所示:

And your links dataframe should look like this:

Source  Target      Value      Link Color
0       3          846888      rgba(127, 194, 65, 0.2)
0       4            1045      rgba(127, 194, 65, 0.2)
1       3         1294423      rgba(211, 211, 211, 0.5)
1       4           42165      rgba(211, 211, 211, 0.5)
1       5             415      rgba(211, 211, 211, 0.5)
2       5               1      rgba(253, 227, 212, 1)

现在,如果您使用与 plot.ly ,您将能够构建此文件:

Now, if you use a similar setup to the Scottish referendum diagram on plot.ly, youll be able to build this:

由于数字之间的巨大差异,该特定图表看起来有些奇怪.出于说明目的,我用1代替了您所有的数字:

That particular diagram looks a bit odd because of the huge difference between the numbers. For illustrative purposes, I've replaced all your numbers with 1:

这是将整个内容轻松复制并粘贴到Jupyter笔记本中的全部内容:

Here's the whole thing for an easy copy&paste into a Jupyter Notebook:

# imports
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

# Nodes & links
nodes = [['ID', 'Label', 'Color'],
        [0,'AKJ Education','#4994CE'],
        [1,'Amazon','#8A5988'],
        [2,'Flipkart','#449E9E'],
        [3,'Books','#7FC241'],
        [4,'Computers & tablets','#D3D3D3'],
        [5,'Other','#4994CE'],]

# links with your data
links = [['Source','Target','Value','Link Color'],

        # AKJ
        [0,3,1,'rgba(127, 194, 65, 0.2)'],
        [0,4,1,'rgba(127, 194, 65, 0.2)'],

        # Amazon
        [1,3,1,'rgba(211, 211, 211, 0.5)'],
        [1,4,1,'rgba(211, 211, 211, 0.5)'],
        [1,5,1,'rgba(211, 211, 211, 0.5)'],

        # Flipkart
        [2,5,1,'rgba(253, 227, 212, 1)'],
        [2,3,1,'rgba(253, 227, 212, 1)'],]

# links with some data for illustrative purposes ################
#links = [
#    ['Source','Target','Value','Link Color'],
#    
#    # AKJ
#    [0,3,846888,'rgba(127, 194, 65, 0.2)'],
#    [0,4,1045,'rgba(127, 194, 65, 0.2)'],
#    
#    # Amazon
#    [1,3,1294423,'rgba(211, 211, 211, 0.5)'],
#    [1,4,42165,'rgba(211, 211, 211, 0.5)'],
#    [1,5,415,'rgba(211, 211, 211, 0.5)'],
#    
#    # Flipkart
#    [2,5,1,'rgba(253, 227, 212, 1)'],]
#################################################################


# Retrieve headers and build dataframes
nodes_headers = nodes.pop(0)
links_headers = links.pop(0)
df_nodes = pd.DataFrame(nodes, columns = nodes_headers)
df_links = pd.DataFrame(links, columns = links_headers)

# Sankey plot setup
data_trace = dict(
    type='sankey',
    domain = dict(
      x =  [0,1],
      y =  [0,1]
    ),
    orientation = "h",
    valueformat = ".0f",
    node = dict(
      pad = 10,
    # thickness = 30,
      line = dict(
        color = "black",
        width = 0
      ),
      label =  df_nodes['Label'].dropna(axis=0, how='any'),
      color = df_nodes['Color']
    ),
    link = dict(
      source = df_links['Source'].dropna(axis=0, how='any'),
      target = df_links['Target'].dropna(axis=0, how='any'),
      value = df_links['Value'].dropna(axis=0, how='any'),
      color = df_links['Link Color'].dropna(axis=0, how='any'),
  )
)

layout = dict(
        title = "Draw Sankey Diagram from dataframes",
    height = 772,
    font = dict(
      size = 10),)

fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)

这篇关于从数据框中绘制Sankey图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆