Matplotlib:避免在“散点/点/蜂窝”中的重叠数据点情节 [英] Matplotlib: avoiding overlapping datapoints in a "scatter/dot/beeswarm" plot

查看:1906
本文介绍了Matplotlib:避免在“散点/点/蜂窝”中的重叠数据点情节的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

使用matplotlib绘制点图时,我想偏移重叠的数据点,以使它们全部可见。例如,如果我有

  CategoryA:0,0,3,0,5 
CategoryB:5,10 ,5,5,10

我想让 CategoryA 0数据点并排设置,而不是直接在上面,而仍然与 CategoryB 不同。



在R( ggplot2 )中有一个jitter 。在matplotlib中是否有类似的选项,还是有另一种方法可以产生类似的结果?



strong>澄清, beeswarm R 中的情节本质上就是我的想法, pybeeswarm 是一个早期但有用的开始在matplotlib / Python版本。



编辑 以添加Seaborn的 Swarmplot ,在0.7版本中引入,是我想要的一个很好的实现。



 <$ c> 

$ c> def rand_jitter(arr):
stdev = .01 *(max(arr)-min(arr))
return arr + np.random.randn(len(arr))* stdev

def jitter(x,y,s = 20,c ='b',marker ='o',cmap = None,norm = None,vmin = None,vmax = None,alpha = None, linewidths = None,verts = None,hold = None,** kwargs):
return scatter(rand_jitter(x),rand_jitter(y),s = s,c = c,marker = marker,cmap = cmap, norm = norm,vmin = vmin,vmax = vmax,alpha = alpha,linewidths = linewidths,verts = verts,hold = hold,** kwargs)

stdev 变量确保抖动足以在不同的比例上被看到,但它假设轴的极限<0> / code>。


When drawing a dot plot using matplotlib, I would like to offset overlapping datapoints to keep them all visible. For examples, if I have

CategoryA: 0,0,3,0,5  
CategoryB: 5,10,5,5,10  

I want each of the CategoryA "0" datapoints to be set side by side, rather than right on top of each other, while still remaining distinct from CategoryB.

In R (ggplot2) there is a "jitter" option that does this. Is there a similar option in matplotlib, or is there another approach that would lead to a similar result?

Edit: to clarify, the "beeswarm" plot in R is essentially what I have in mind, and pybeeswarm is an early but useful start at a matplotlib/Python version.

Edit: to add that Seaborn's Swarmplot, introduced in version 0.7, is an excellent implementation of what I wanted.

解决方案

Extending the answer by @user2467675, here's how I did it:

def rand_jitter(arr):
    stdev = .01*(max(arr)-min(arr))
    return arr + np.random.randn(len(arr)) * stdev

def jitter(x, y, s=20, c='b', marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, hold=None, **kwargs):
    return scatter(rand_jitter(x), rand_jitter(y), s=s, c=c, marker=marker, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths, verts=verts, hold=hold, **kwargs)

The stdev variable makes sure that the jitter is enough to be seen on different scales, but it assumes that the limits of the axes are 0 and the max value.

You can then call jitter instead of scatter.

这篇关于Matplotlib:避免在“散点/点/蜂窝”中的重叠数据点情节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆