Matplotlib:避免在“散点/点/蜂窝”中的重叠数据点情节 [英] Matplotlib: avoiding overlapping datapoints in a "scatter/dot/beeswarm" plot
问题描述
使用matplotlib绘制点图时,我想偏移重叠的数据点,以使它们全部可见。例如,如果我有
CategoryA:0,0,3,0,5
CategoryB:5,10 ,5,5,10
我想让 CategoryA
0数据点并排设置,而不是直接在上面,而仍然与 CategoryB
不同。
在R( ggplot2
)中有一个jitter
。在matplotlib中是否有类似的选项,还是有另一种方法可以产生类似的结果?
strong>澄清, beeswarm
R 中的情节本质上就是我的想法, pybeeswarm
是一个早期但有用的开始在matplotlib / Python版本。
编辑 以添加Seaborn的 Swarmplot ,在0.7版本中引入,是我想要的一个很好的实现。
<$ c> $ c> def rand_jitter(arr):
stdev = .01 *(max(arr)-min(arr))
return arr + np.random.randn(len(arr))* stdev
def jitter(x,y,s = 20,c ='b',marker ='o',cmap = None,norm = None,vmin = None,vmax = None,alpha = None, linewidths = None,verts = None,hold = None,** kwargs):
return scatter(rand_jitter(x),rand_jitter(y),s = s,c = c,marker = marker,cmap = cmap, norm = norm,vmin = vmin,vmax = vmax,alpha = alpha,linewidths = linewidths,verts = verts,hold = hold,** kwargs)
stdev
变量确保抖动足以在不同的比例上被看到,但它假设轴的极限<0> / code>。When drawing a dot plot using matplotlib, I would like to offset overlapping datapoints to keep them all visible. For examples, if I have
CategoryA: 0,0,3,0,5 CategoryB: 5,10,5,5,10
I want each of the
CategoryA
"0" datapoints to be set side by side, rather than right on top of each other, while still remaining distinct fromCategoryB
.In R (
ggplot2
) there is a"jitter"
option that does this. Is there a similar option in matplotlib, or is there another approach that would lead to a similar result?Edit: to clarify, the
"beeswarm"
plot in R is essentially what I have in mind, andpybeeswarm
is an early but useful start at a matplotlib/Python version.Edit: to add that Seaborn's Swarmplot, introduced in version 0.7, is an excellent implementation of what I wanted.
解决方案Extending the answer by @user2467675, here's how I did it:
def rand_jitter(arr): stdev = .01*(max(arr)-min(arr)) return arr + np.random.randn(len(arr)) * stdev def jitter(x, y, s=20, c='b', marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, hold=None, **kwargs): return scatter(rand_jitter(x), rand_jitter(y), s=s, c=c, marker=marker, cmap=cmap, norm=norm, vmin=vmin, vmax=vmax, alpha=alpha, linewidths=linewidths, verts=verts, hold=hold, **kwargs)
The
stdev
variable makes sure that the jitter is enough to be seen on different scales, but it assumes that the limits of the axes are 0 and the max value.You can then call
jitter
instead ofscatter
.这篇关于Matplotlib:避免在“散点/点/蜂窝”中的重叠数据点情节的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!