Plotly:如何从数据框中绘制桑基图? [英] Plotly: How to draw a sankey diagram from a dataframe?

查看:146
本文介绍了Plotly:如何从数据框中绘制桑基图?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据框:

供应商名称类别计数AKJ 教育书籍 846888AKJ 教育计算机 &片 1045亚马逊图书 1294423亚马逊电脑公司片剂 42165亚马逊 其他 415Flipkart 图书 1023

我正在尝试使用上述数据框绘制桑基图,源为供应商名称,目标为类别,流程或宽度为计数.我尝试使用 Plotly,但没有成功.有没有人有使用 Plotly 制作桑基图的解决方案?

谢谢

解决方案

帖子的答案

由于数字之间的巨大差异,该特定图表看起来有点奇怪.为便于说明,我已将您的所有数字替换为 1:

以下是简单复制并粘贴到 Jupyter Notebook 的全部内容:

# 导入将熊猫导入为 pd将 numpy 导入为 np导入 plotly.graph_objs as gofrom plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplotinit_notebook_mode(连接=真)# 节点 &链接节点 = [['ID', '标签', '颜色'],[0,'AKJ 教育','#4994CE'],[1,'亚马逊','#8A5988'],[2,'Flipkart','#449E9E'],[3,'书籍','#7FC241'],[4,'计算机与平板电脑','#D3D3D3'],[5,'其他','#4994CE'],]# 与您的数据的链接links = [['Source','Target','Value','Link Color'],#AKJ[0,3,1,'rgba(127, 194, 65, 0.2)'],[0,4,1,'rgba(127, 194, 65, 0.2)'],# 亚马逊[1,3,1,'rgba(211, 211, 211, 0.5)'],[1,4,1,'rgba(211, 211, 211, 0.5)'],[1,5,1,'rgba(211, 211, 211, 0.5)'],# Flipkart[2,5,1,'rgba(253, 227, 212, 1)'],[2,3,1,'rgba(253, 227, 212, 1)'],]# 链接一些数据用于说明目的#################链接 = [# ['来源','目标','值','链接颜色'],###AKJ# [0,3,846888,'rgba(127, 194, 65, 0.2)'],# [0,4,1045,'rgba(127, 194, 65, 0.2)'],##    # 亚马逊# [1,3,1294423,'rgba(211, 211, 211, 0.5)'],# [1,4,42165,'rgba(211, 211, 211, 0.5)'],# [1,5,415,'rgba(211, 211, 211, 0.5)'],## # Flipkart# [2,5,1,'rgba(253, 227, 212, 1)'],]############################################################### 检索标题并构建数据帧nodes_headers = nodes.pop(0)links_headers = links.pop(0)df_nodes = pd.DataFrame(nodes, columns = nodes_headers)df_links = pd.DataFrame(links, columns = links_headers)# 桑基图设置数据跟踪 = 字典(类型='桑基',域 = 字典(x = [0,1],y = [0,1]),方向 = "h",valueformat = ".0f",节点 = 字典(垫 = 10,# 厚度 = 30,行 = 字典(颜色 = "黑色",宽度 = 0),label = df_nodes['Label'].dropna(axis=0, how='any'),颜色 = df_nodes['颜色']),链接 = 字典(source = df_links['Source'].dropna(axis=0, how='any'),target = df_links['Target'].dropna(axis=0, how='any'),value = df_links['Value'].dropna(axis=0, how='any'),color = df_links['Link Color'].dropna(axis=0, how='any'),))布局 = 字典(title = "从数据框中绘制桑基图",高度 = 772,字体 = 字典(大小 = 10),)fig = dict(data=[data_trace], layout=layout)iplot(无花果,验证=假)

I have a dataframe:

Vendor Name                 Category                    Count
AKJ Education               Books                       846888
AKJ Education               Computers & Tablets         1045
Amazon                      Books                       1294423
Amazon                      Computers & Tablets         42165
Amazon                      Other                       415
Flipkart                    Books                       1023

I am trying to draw a sankey diagram using the above dataframe, with the source being Vendor Name and target being Category, and the flow or width being the Count. I tried using Plotly, but no sucess. Does anyone has a solution with Plotly for making a Sankey Diagram?

Thanks

解决方案

The answer to the post How to define the structure of a sankey diagram using a dataframe? will show you that forcing your Sankey data sources into one dataframe may quickly lead to confusion. You'll be better off separating nodes from links since they are constructed differently.

So your node dataframe should look something like this:

ID               Label    Color
0        AKJ Education  #4994CE
1               Amazon  #8A5988
2             Flipkart  #449E9E
3                Books  #7FC241
4  Computers & tablets  #D3D3D3
5                Other  #4994CE

And your links dataframe should look like this:

Source  Target      Value      Link Color
0       3          846888      rgba(127, 194, 65, 0.2)
0       4            1045      rgba(127, 194, 65, 0.2)
1       3         1294423      rgba(211, 211, 211, 0.5)
1       4           42165      rgba(211, 211, 211, 0.5)
1       5             415      rgba(211, 211, 211, 0.5)
2       5               1      rgba(253, 227, 212, 1)

Now, if you use a similar setup to the Scottish referendum diagram on plot.ly, youll be able to build this:

That particular diagram looks a bit odd because of the huge difference between the numbers. For illustrative purposes, I've replaced all your numbers with 1:

Here's the whole thing for an easy copy&paste into a Jupyter Notebook:

# imports
import pandas as pd
import numpy as np
import plotly.graph_objs as go
from plotly.offline import download_plotlyjs, init_notebook_mode, plot, iplot
init_notebook_mode(connected=True)

# Nodes & links
nodes = [['ID', 'Label', 'Color'],
        [0,'AKJ Education','#4994CE'],
        [1,'Amazon','#8A5988'],
        [2,'Flipkart','#449E9E'],
        [3,'Books','#7FC241'],
        [4,'Computers & tablets','#D3D3D3'],
        [5,'Other','#4994CE'],]

# links with your data
links = [['Source','Target','Value','Link Color'],

        # AKJ
        [0,3,1,'rgba(127, 194, 65, 0.2)'],
        [0,4,1,'rgba(127, 194, 65, 0.2)'],

        # Amazon
        [1,3,1,'rgba(211, 211, 211, 0.5)'],
        [1,4,1,'rgba(211, 211, 211, 0.5)'],
        [1,5,1,'rgba(211, 211, 211, 0.5)'],

        # Flipkart
        [2,5,1,'rgba(253, 227, 212, 1)'],
        [2,3,1,'rgba(253, 227, 212, 1)'],]

# links with some data for illustrative purposes ################
#links = [
#    ['Source','Target','Value','Link Color'],
#    
#    # AKJ
#    [0,3,846888,'rgba(127, 194, 65, 0.2)'],
#    [0,4,1045,'rgba(127, 194, 65, 0.2)'],
#    
#    # Amazon
#    [1,3,1294423,'rgba(211, 211, 211, 0.5)'],
#    [1,4,42165,'rgba(211, 211, 211, 0.5)'],
#    [1,5,415,'rgba(211, 211, 211, 0.5)'],
#    
#    # Flipkart
#    [2,5,1,'rgba(253, 227, 212, 1)'],]
#################################################################


# Retrieve headers and build dataframes
nodes_headers = nodes.pop(0)
links_headers = links.pop(0)
df_nodes = pd.DataFrame(nodes, columns = nodes_headers)
df_links = pd.DataFrame(links, columns = links_headers)

# Sankey plot setup
data_trace = dict(
    type='sankey',
    domain = dict(
      x =  [0,1],
      y =  [0,1]
    ),
    orientation = "h",
    valueformat = ".0f",
    node = dict(
      pad = 10,
    # thickness = 30,
      line = dict(
        color = "black",
        width = 0
      ),
      label =  df_nodes['Label'].dropna(axis=0, how='any'),
      color = df_nodes['Color']
    ),
    link = dict(
      source = df_links['Source'].dropna(axis=0, how='any'),
      target = df_links['Target'].dropna(axis=0, how='any'),
      value = df_links['Value'].dropna(axis=0, how='any'),
      color = df_links['Link Color'].dropna(axis=0, how='any'),
  )
)

layout = dict(
        title = "Draw Sankey Diagram from dataframes",
    height = 772,
    font = dict(
      size = 10),)

fig = dict(data=[data_trace], layout=layout)
iplot(fig, validate=False)

这篇关于Plotly:如何从数据框中绘制桑基图?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆