3套以上的比例维恩图 [英] Proportional venn diagram for more than 3 sets
问题描述
我在MongoDB中有一组文档,每个文档在列表中都有一个或多个类别.使用map reduce,我可以获得有关每个类别的唯一组合的文档数量的详细信息:
['cat1'] = 523
['cat2'] = 231
['cat3'] = 102
['cat4'] = 72
['cat1','cat2'] = 710
['cat1','cat3'] = 891
['cat1','cat3','cat4'] = 621 ...
其中的总数是与类别完全组合的文档数.
我正在寻找一种呈现此数据的明智方法,我认为带有比例区域的维恩图将是一个好主意.以上面的示例为例,cat1的区域为523 + 710 + 891 + 621,cat1和cat3之间的重叠区域为891 + 621,cat1,cat3,cat4之间的重叠区域为621等.>
有人对我如何实现这一目标有任何提示吗?我最好是在Python(+ Numpy/MatPlotLib)或MatLab中做到这一点.
问题
我们需要表示多个相互关联的对象类别的计数,而维恩图将无法表示数量多得多的类别及其重叠.
解决方案
将每个类别及其组合视为图形中的节点.绘制图形,使节点的大小表示每个类别中的计数,并且边连接相关的类别.这种方法的优点是:可以轻松容纳多个类别,这成为一种连接的气泡图.
结果
代码
The Problem
We need to represent counts of multiple interconnected categories of object, and a Venn diagram would be unable to represent more than a trivial amount of categories and their overlap.
A Solution
Consider each of the categories and their combinations as a node in a graph. Draw the graph such that the size of the node represents the count in each category, and the edges connect the related categories. The advantage of this approach is: multiple categories can be accommodated with ease, and this becomes a type of connected bubble chart.
The Result
The Code
The proposed solution uses NetworkX to create the data structure and matplotlib to draw it. If data is presented in the right format, this will scale to a large number of categories with multiple connections.
import networkx as nx
import matplotlib.pyplot as plt
def load_nodes():
text = ''' Node Size
1 523
2 231
3 102
4 72
1+2 710
1+3 891
1+3+4 621'''
# load nodes into list, discard header
# this may be replaced by some appropriate output
# from your program
data = text.split('\n')[1:]
data = [ d.split() for d in data ]
data = [ tuple([ d[0],
dict( size=int(d[1]) )
]) for d in data]
return data
def load_edges():
text = ''' From To
1+2 1
1+2 2
1+3 1
1+3 3
1+3+4 1
1+3+4 3
1+3+4 4'''
# load edges into list, discard header
# this may be replaced by some appropriate output
# from your program
data = text.split('\n')[1:]
data = [ tuple( d.split() ) for d in data ]
return data
if __name__ == '__main__':
scale_factor = 5
G = nx.Graph()
nodes = load_nodes()
node_sizes = [ n[1]['size']*scale_factor
for n in nodes ]
edges = load_edges()
G.add_edges_from( edges )
nx.draw_networkx(G,
pos=nx.spring_layout(G),
node_size = node_sizes)
plt.axis('off')
plt.show()
Other Solutions
Other solutions might include: bubble charts, Voronoi diagrams, chord diagrams, and hive plots among others. None of the linked examples use Python; they are just given for illustrative purposes.
这篇关于3套以上的比例维恩图的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!