如何可视化(树状图)层次结构项的字典? [英] How to visualize (dendrogram) a dictionary of hierarchical items?

查看:104
本文介绍了如何可视化(树状图)层次结构项的字典?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我第一次使用Python从字典格式的分层数据中进行可视化。数据的最后一部分看起来像这样:

  d = {^ 2820:[^ 391,^ 1024],^ 2821: [^ 759,'w',^ 118,^ 51],^ 2822:[^ 291,'o'],^ 2823:[^ 25,^ 64],^ 2824:[^ 177,^ 2459],^ 2825:[^ 338,^ 1946],^ 2826:[^ 186,^ 1511],^ 2827:[^ 162,'i']} 

所以我在列表上有索引,这些索引指向字典的键(索引)。我想这可以用作可视化的基本结构,如果我错了,请纠正我。数据上的字符是末端节点/叶子,它不指向任何索引。



我发现NetworkX可以用于可视化,但是我发现不知道从哪里开始以及我的数据。我希望它像这样简单:

  import networkx as nx 
import matplotlib.pyplot as plt

d = {^ 2820:[^ 391,^ 1024],^ 2821:[^ 759,'w',^ 118,^ 51],^ 2822:[^ 291,'o'], ^ 2823:[^ 25,^ 64],^ 2824:[^ 177,^ 2459],^ 2825:[^ 338,^ 1946],^ 2826:[^ 186,^ 1511],^ 2827:[^ 162 ,'i']}

G = nx.Graph(d)
nx.draw(G)
plt.show()

我正在寻找层次树状图或某种聚类作为输出。抱歉,目前我还不确定是否最好的可视化效果,也许与此类似:





更新



使用NetworkX实际上非常简单。我正在提供其他简单的示例数据,如果还可以通过树状图而不是有线网络图将其可视化,寻找答案吗?

 #原始序列:a,b,c,d,b,c,a,b,c,d,b,c 
d = {^ 1:['b','c'],^ 2: ['a',^ 1,'d',^ 1],'S':[^ 2,^ 2]}
G = nx.Graph(d)
nx.draw_spring(G, node_size = 300,with_labels = True)



我们可以看到,图形显示的是简单的关系,而不是我愿意做的数据的层次结构和顺序。 DiGraph提供了更多细节,但是仍然无法从中构造原始序列:





显然对于树状图,权重和末端节点需要按照第一个答案指出的进行计算。对于这种方法,数据结构可能是这样的:

  d = {'a':[],'b':[ ],'c':[],'d':[],^ 1:['b','c'],^ 2:['a',^ 1,'d',^ 1],'S ':[^ 2,^ 2]} 


解决方案

一个想法是使用



我使用以下代码创建了可视化效果:

 #从scipy.cluster.hierarchy导入树状图

加载networkx作为nx
导入matplotlib.pyplot as plt
b $ b#构造图/层次结构
d = {0:[1,'d'],1:['a','b','c'],'a':[],'b ':[],'c':[],'d':[]}
G = nx.DiGraph(d)
个节点= G.nodes()
个叶子= set(如果G.out_degree(n)== 0),则n为节点中的n)
inner_nod es = [如果G.out_degree(n)> 0]

#计算每个子树的大小
子树= dict((n,[n])表示叶子中的n)
表示u在inner_nodes中的位置:
children = set()
node_list = list(d [u])
而len(node_list)> 0:
v = node_list.pop(0)
children.add(v)
node_list + = d [v]

子树[u] = sorted(children & leaves)

inner_nodes.sort(key = lambda n:len(subtree [n]))#<-按子树大小升序排列内部节点,根是最后

#构造链接矩阵
leaves = sorted(leaves)
index = dict((tuple([n]),i)for i,n in enumerate(leaves))
Z = []
k = len(叶)
,其中i,n枚举(inner_nodes):
children = d [n]
x = children [0]
对于儿童中的y,[1:]:
z =元组(subtree [x] +子树[y])
i,j =索引[tuple(subtree [x])],索引[tuple(subtree [ y]]]
Z.append([i,j,float(len(subtree [n])),len(z)])#<-树状图函数
需要float index [z] = k
子树[z] = list(z)
x = z
k + = 1

#可视化
树状图(Z,标签=叶子)
plt.show()

需要注意一些关键事项:


  1. 给出 d 数据结构,我使用NetworkX有向图( DiGraph )。方向性很重要,因此我们可以确定哪些节点是个叶(无子->零度外)和 inner_nodes (两个或多个孩子->非零度数)。

  2. 通常,树状图中的每个边都有一些权重,但是示例中没有任何权重。取而代之的是,我使用以每个内部节点 n 为根的子树中的叶子数作为 n 的权重。 / li>
  3. 如果内部节点有两个以上的子节点,则必须添加虚拟内部节点,因为链接矩阵的每一行都将两个节点合并在一起。这就是为什么我在儿童中用为y [1:]:

我猜想您可能可以在示例中创建 d 之前根据数据的外观来简化此代码,所以这可能更多概念证明。


This is my first time of doing visualization from hierarchical data in dictionary format with Python. Last part of the data looks like this:

d = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']}

So I have indices on lists referring back to the keys (index) of the dictionary. I suppose this could be used as a base structure for the visualization, please correct me if I'm wrong. Characters on the data are "end nodes/leaves" which doesn't refer back to any index.

I have found NetworkX which possibly could be used for visualization, but I have no idea where to start with it and my data. I was hoping it would be something as simple as:

import networkx as nx
import matplotlib.pyplot as plt

d = {^2820: [^391, ^1024], ^2821: [^759, 'w', ^118, ^51], ^2822: [^291, 'o'], ^2823: [^25, ^64], ^2824: [^177, ^2459], ^2825: [^338, ^1946], ^2826: [^186, ^1511], ^2827: [^162, 'i']}

G = nx.Graph(d)
nx.draw(G)
plt.show()

I'm looking for hierarchical dendrogram or some sort of clustering as an output. Sorry at this point I'm not totally sure what would be the best visualization, maybe similar to this:

UPDATE

Using NetworkX actually was very simple. I'm providing other simple sample data and looking for an answer if it can be visualized by dendrogram also instead of wired network graph?

# original sequence: a,b,c,d,b,c,a,b,c,d,b,c
d = {^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]}
G = nx.Graph(d)
nx.draw_spring(G, node_size=300, with_labels=True)

As we can see, graph show plain relations, but not hierarchy and order of the data what I'm willing to do. DiGraph gives more details, but it is still not possible to construct original sequence from it:

For dendrogram apparently weight and end nodes needs to be calculated as pointed out on the first answer. For that approach data structure could be something like this:

d = {'a': [], 'b': [], 'c': [], 'd': [], ^1: ['b', 'c'], ^2: ['a', ^1, 'd', ^1], 'S': [^2, ^2]}

解决方案

One idea is to use SciPy's dendrogram function to draw your dendrogram. To do so, you just need to create the linkage matrix Z, which is described in the documentation of the SciPy linkage function. Each row [x, y, w, z] of the linkage matrix Z describes the weight w at which x and y merge to form a rooted subtree with z leaves.

To demonstrate, I've created a simple example using a small dendrogram with three leaves, shown below:

I created this visualization with the following code:

# Load required modules
import networkx as nx
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram

# Construct the graph/hierarchy
d           = { 0: [1, 'd'], 1: ['a', 'b', 'c'], 'a': [], 'b': [], 'c': [], 'd': []}
G           = nx.DiGraph(d)
nodes       = G.nodes()
leaves      = set( n for n in nodes if G.out_degree(n) == 0 )
inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ]

# Compute the size of each subtree
subtree = dict( (n, [n]) for n in leaves )
for u in inner_nodes:
    children = set()
    node_list = list(d[u])
    while len(node_list) > 0:
        v = node_list.pop(0)
        children.add( v )
        node_list += d[v]

    subtree[u] = sorted(children & leaves)

inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last

# Construct the linkage matrix
leaves = sorted(leaves)
index  = dict( (tuple([n]), i) for i, n in enumerate(leaves) )
Z = []
k = len(leaves)
for i, n in enumerate(inner_nodes):
    children = d[n]
    x = children[0]
    for y in children[1:]:
        z = tuple(subtree[x] + subtree[y])
        i, j = index[tuple(subtree[x])], index[tuple(subtree[y])]
        Z.append([i, j, float(len(subtree[n])), len(z)]) # <-- float is required by the dendrogram function
        index[z] = k
        subtree[z] = list(z)
        x = z
        k += 1

# Visualize
dendrogram(Z, labels=leaves)
plt.show()

There are a few key items to note:

  1. Give the d data structure, I use a NetworkX directed graph (DiGraph) . The directionality is important so we can determine which nodes are leaves (no children -> out degree of zero) and inner_nodes (two or more children -> non-zero out degree).
  2. Usually there is some weight associated with each edge in your dendrogram, but there weren't any weights in your example. Instead, I used the number of leaves in the subtree rooted at each internal node n as the weight for n.
  3. If an inner node has more than two children, you have to add "dummy" internal nodes, since each row of the linkage matrix merges two nodes together. This is why I write for y in children[1:]:, etc.

I'm guessing you may be able to simplify this code based on what your data looks like before creating the d in your example, so this may be more of a proof of concept.

这篇关于如何可视化(树状图)层次结构项的字典?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆