如何改进 matplotlib 散点图的标签放置(代码、算法、提示)? [英] How to improve the label placement for matplotlib scatter chart (code,algorithm,tips)?

查看:22
本文介绍了如何改进 matplotlib 散点图的标签放置(代码、算法、提示)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用matplotlib绘制散点图:

并根据 建议.并且图表中的气泡数量大部分时间都小于 150.

我发现所谓的基于力的标签放置 并使用 k=0.008 产生

I use matplotlib to plot a scatter chart:

And label the bubble using a transparent box according to the tip at matplotlib: how to annotate point on a scatter automatically placed arrow?

Here is the code:

if show_annote:
    for i in range(len(x)):
        annote_text = annotes[i][0][0]  # STK_ID
        ax.annotate(annote_text, xy=(x[i], y[i]), xytext=(-10,3),
            textcoords='offset points', ha='center', va='bottom',
            bbox=dict(boxstyle='round,pad=0.2', fc='yellow', alpha=0.2),
            fontproperties=ANNOTE_FONT) 

and the resulting plot:

But there is still room for improvement to reduce overlap (for instance the label box offset is fixed as (-10,3)). Are there algorithms that can:

  1. dynamically change the offset of label box according to the crowdedness of its neighbourhood
  2. dynamically place the label box remotely and add an arrow line beween bubble and label box
  3. somewhat change the label orientation
  4. label_box overlapping bubble is better than label_box overlapping label_box?

I just want to make the chart easy for human eyes to comprehand, so some overlap is OK, not as rigid a constraint as http://en.wikipedia.org/wiki/Automatic_label_placement suggests. And the bubble quantity within the chart is less than 150 most of the time.

I find the so called Force-based label placement http://bl.ocks.org/MoritzStefaner/1377729 is quite interesting. I don't know if there is any python code/package available to implement the algorithm.

I am not an academic guy and not looking for an optimum solution, and my python codes need to label many many charts, so the the speed/memory is in the scope of consideration.

I am looking for a quick and effective solution. Any help (code,algorithm,tips,thoughts) on this subject? Thanks.

解决方案

The following builds on tcaswell's answer.

Networkx layout methods such as nx.spring_layout rescale the positions so that they all fit in a unit square (by default). Even the position of the fixed data_nodes are rescaled. So, to apply the pos to the original scatter_data, an unshifting and unscaling must be performed.

Note also that nx.spring_layout has a k parameter which controls the optimal distance between nodes. As k increases, so does the distance of the annotations from the data points.

import numpy as np
import matplotlib.pyplot as plt
import networkx as nx
np.random.seed(2016)

N = 20
scatter_data = np.random.rand(N, 3)*10


def repel_labels(ax, x, y, labels, k=0.01):
    G = nx.DiGraph()
    data_nodes = []
    init_pos = {}
    for xi, yi, label in zip(x, y, labels):
        data_str = 'data_{0}'.format(label)
        G.add_node(data_str)
        G.add_node(label)
        G.add_edge(label, data_str)
        data_nodes.append(data_str)
        init_pos[data_str] = (xi, yi)
        init_pos[label] = (xi, yi)

    pos = nx.spring_layout(G, pos=init_pos, fixed=data_nodes, k=k)

    # undo spring_layout's rescaling
    pos_after = np.vstack([pos[d] for d in data_nodes])
    pos_before = np.vstack([init_pos[d] for d in data_nodes])
    scale, shift_x = np.polyfit(pos_after[:,0], pos_before[:,0], 1)
    scale, shift_y = np.polyfit(pos_after[:,1], pos_before[:,1], 1)
    shift = np.array([shift_x, shift_y])
    for key, val in pos.items():
        pos[key] = (val*scale) + shift

    for label, data_str in G.edges():
        ax.annotate(label,
                    xy=pos[data_str], xycoords='data',
                    xytext=pos[label], textcoords='data',
                    arrowprops=dict(arrowstyle="->",
                                    shrinkA=0, shrinkB=0,
                                    connectionstyle="arc3", 
                                    color='red'), )
    # expand limits
    all_pos = np.vstack(pos.values())
    x_span, y_span = np.ptp(all_pos, axis=0)
    mins = np.min(all_pos-x_span*0.15, 0)
    maxs = np.max(all_pos+y_span*0.15, 0)
    ax.set_xlim([mins[0], maxs[0]])
    ax.set_ylim([mins[1], maxs[1]])

fig, ax = plt.subplots()
ax.scatter(scatter_data[:, 0], scatter_data[:, 1],
           c=scatter_data[:, 2], s=scatter_data[:, 2] * 150)
labels = ['ano_{}'.format(i) for i in range(N)]
repel_labels(ax, scatter_data[:, 0], scatter_data[:, 1], labels, k=0.008)

plt.show()

with k=0.011 yields

and with k=0.008 yields

这篇关于如何改进 matplotlib 散点图的标签放置(代码、算法、提示)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆