从图形创建树结构 [英] Create a tree structure from a graph

查看:44
本文介绍了从图形创建树结构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到正确的方法来绘制数据集,该数据集包含有关用户通常在不同位置花费的时间量的信息.重要的是,有些类别和子类别的数据粒度级别越来越高(例如,60% 的人在家",其中 40% 的人在客厅").我知道 TreeMaps 可以显示我需要的信息和关系,但我被要求对数据进行网络"可视化.

我特别寻找的是 Python 中的一种绘图方法,它允许我使用根据属于其类别的用户数量自动调整大小的节点(更好的是,节点标签)来可视化我的数据.重要的是,所有子节点计数也将计入父节点(因此树状图不是真正的选项,因为我需要在每个分支点显示信息).

我的数据看起来有点像这样(请注意,有些位置比其他位置更精细):

<代码>|身份证 |建筑 |subcat01 |subcat02 |----------------------------------------|00 |首页 |厨房|冰箱||01 |办公室 |办公桌 |南||02 |办公室 |接待|南||03 |首页 |卧室|床||04 |首页 |院子|南||05 |首页 |客厅|沙发 ||06 |办公室 |conf_room |南||07 |户外|南|南||... |... |... |... |

有关我想要制作的非常粗略的近似值,请参见下图.重要的是,我能够根据节点的总和(或者如果是结束节点,则仅是他们自己)来调整节点的大小.我将使用不同的过滤器运行大量迭代,所以我需要一些可以轻松迭代的东西,而不仅仅是手动编码每个图形的外观.

关于哪些 Python 库可以最好地实现这一点有什么建议吗?我简要地研究了

解决方案

要生成图形,您可以将行设置为有向图的路径.一种简单的方法是定义一个熊猫数据框并堆叠以删除缺失值:

将 networkx 导入为 nx从 networkx.drawing.nx_agraph 导入 graphviz_layout从 pylab 导入 rcParams将熊猫导入为 pd#df = pd.read_csv ....路径 = df.loc[:,'BUILDING':].stack().groupby(level=0).agg(list).values.tolist()# [['home', 'kitchen', 'fridge'], ['office', 'desk'], ['office', 'reception'],...

请注意,堆栈在这里很有用,因为它忽略了 NaN,然后​​我们可以在索引上gorupby 并聚合为列表.然后创建一个

如果您想为所有建筑物添加一个共同的节点,您可以在ID之后插入一个名为ALL的列:

df.insert(1, 'ALL', 'ALL')路径 = df.loc[:,'ALL':].stack().groupby(level=0).agg(list).values.tolist()

然后像上面那样做,你现在会得到:

请注意,还有其他几个 graphviz 布局程序,它们可能与您的想法更相似.例如circo:

pos=graphviz_layout(G, prog='circo')nx.draw(G, pos=pos,node_color='浅绿色',节点大小=1500,with_labels=真,箭头=真)

I'm trying to find the right approach to graphing a dataset that contains information on amount of time users typically spend in various locations. Importantly, there are categories and subcategories with increasing levels of granularity to my data (for example, 60% of people are at "home", and of those 40% are in the "living room"). I am aware of TreeMaps which would display the information and relationships I need, but I have been asked to make a "network" visualization of the data.

What I specifically am looking for is a graphing approach in Python that would allow me to visualize my data with the nodes (better yet, the node labels) automatically sized according to the number of users that fall within its category. Importantly, all the child node counts would also be counted in the parent nodes as well (so dendrograms aren't really an option because I need to display information at every branching point).

My data looks somewhat like this (note that some locations get more granular than other):

| ID | BUILDING | subcat01  | subcat02 |
----------------------------------------
| 00 |  home    | kitchen   | fridge   |
| 01 |  office  | desk      | NaN      |
| 02 |  office  | reception | NaN      |
| 03 |  home    | bedroom   | bed      |
| 04 |  home    | yard      | NaN      |
| 05 |  home    | livingroom| couch    |
| 06 |  office  | conf_room | NaN      |
| 07 | outdoors | NaN       | NaN      |
|... | ...      | ...       | ...      |

For a very rough approximation of what I want to produce, see the image below. The important thing is that I'm able to size the nodes according to the sum of their children (or just themselves if its an end node). I will be running lots of iterations with different filters, so I need something that I can easily iterate rather than just manually coding the appearance of each graph.

Any suggestions on which Python libraries might best accomplish this? I've briefly looked into networkX, graph-tool, and etetoolkit, but I'm not sure if any of them have exactly the functionality I'm seeking.

Here's a rough approximation of what I want to produce:

解决方案

To generate the graph, you could set the rows as paths of a directed graph. A simple way could be by defining a pandas dataframe and stacking to remove the missing values:

import networkx as nx
from networkx.drawing.nx_agraph import graphviz_layout
from pylab import rcParams
import pandas as pd
#df = pd.read_csv....
paths = df.loc[:,'BUILDING':].stack().groupby(level=0).agg(list).values.tolist()  
# [['home', 'kitchen', 'fridge'], ['office', 'desk'], ['office', 'reception'],...

Note that stack is useful here since it ignores NaNs, then we can just gorupby on the index and aggregate as lists. Then create a directed graph and set the paths with nx.add_path:

G = nx.DiGraph()
for path in paths:
    nx.add_path(G, path)

Now to visualize the graph as a tree-like layout, we could use graphviz_layout, which is basically a wrapper for pygraphviz_layout:

rcParams['figure.figsize'] = 14, 10
pos=graphviz_layout(G, prog='dot')
nx.draw(G, pos=pos,
        node_color='lightgreen', 
        node_size=1500,
        with_labels=True, 
        arrows=True)

If you wanted to add a common source node for all buildings, you could insert a column named ALL right after ID:

df.insert(1, 'ALL', 'ALL')
paths = df.loc[:,'ALL':].stack().groupby(level=0).agg(list).values.tolist()  

And then just do as above, where you'd now get:

Note that there are several other graphviz layout programs which may resemble more what you have in mind. For instance circo:

pos=graphviz_layout(G, prog='circo')
nx.draw(G, pos=pos,
        node_color='lightgreen', 
        node_size=1500,
        with_labels=True, 
        arrows=True)

这篇关于从图形创建树结构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆