3套以上的比例维恩图 [英] Proportional venn diagram for more than 3 sets

查看:82
本文介绍了3套以上的比例维恩图的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在MongoDB中有一组文档,每个文档在列表中都有一个或多个类别.使用map reduce,我可以获得有关每个类别的唯一组合的文档数量的详细信息:

['cat1']               = 523
['cat2']               = 231
['cat3']               = 102
['cat4']               = 72
['cat1','cat2']        = 710
['cat1','cat3']        = 891
['cat1','cat3','cat4'] = 621 ...

其中的总数是与类别完全组合的文档数.

我正在寻找一种呈现此数据的明智方法,我认为带有比例区域的维恩图将是一个好主意.以上面的示例为例,cat1的区域为523 + 710 + 891 + 621,cat1和cat3之间的重叠区域为891 + 621,cat1,cat3,cat4之间的重叠区域为621等.

有人对我如何实现这一目标有任何提示吗?我最好是在Python(+ Numpy/MatPlotLib)或MatLab中做到这一点.

解决方案

问题

我们需要表示多个相互关联的对象类别的计数,而维恩图将无法表示数量多得多的类别及其重叠.

解决方案

将每个类别及其组合视为图形中的节点.绘制图形,使节点的大小表示每个类别中的计数,并且边连接相关的类别.这种方法的优点是:可以轻松容纳多个类别,这成为一种连接的气泡图.

结果

代码

建议的解决方案使用 NetworkX 创建数据结构,并使用气泡图 解决方案

The Problem

We need to represent counts of multiple interconnected categories of object, and a Venn diagram would be unable to represent more than a trivial amount of categories and their overlap.

A Solution

Consider each of the categories and their combinations as a node in a graph. Draw the graph such that the size of the node represents the count in each category, and the edges connect the related categories. The advantage of this approach is: multiple categories can be accommodated with ease, and this becomes a type of connected bubble chart.

The Result

The Code

The proposed solution uses NetworkX to create the data structure and matplotlib to draw it. If data is presented in the right format, this will scale to a large number of categories with multiple connections.

import networkx as nx
import matplotlib.pyplot as plt

def load_nodes():
    text = '''  Node    Size
                1        523
                2        231
                3        102
                4         72
                1+2      710
                1+3      891
                1+3+4    621'''
    # load nodes into list, discard header
    # this may be replaced by some appropriate output 
    # from your program
    data = text.split('\n')[1:]
    data = [ d.split() for d in data ]
    data = [ tuple([ d[0], 
                    dict( size=int(d[1]) ) 
                    ]) for d in data]
    return data

def load_edges():
    text = '''  From   To
                1+2    1
                1+2    2
                1+3    1
                1+3    3
                1+3+4    1
                1+3+4    3
                1+3+4    4'''
    # load edges into list, discard header
    # this may be replaced by some appropriate output 
    # from your program
    data = text.split('\n')[1:]
    data = [ tuple( d.split() ) for d in data ]
    return data

if __name__ == '__main__':
    scale_factor = 5
    G = nx.Graph()
    nodes = load_nodes()
    node_sizes = [ n[1]['size']*scale_factor
                  for n in nodes ]

    edges = load_edges()
    G.add_edges_from( edges )

    nx.draw_networkx(G, 
                     pos=nx.spring_layout(G),
                     node_size = node_sizes)
    plt.axis('off')
    plt.show()

Other Solutions

Other solutions might include: bubble charts, Voronoi diagrams, chord diagrams, and hive plots among others. None of the linked examples use Python; they are just given for illustrative purposes.

这篇关于3套以上的比例维恩图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆