将散点图分配到特定的箱中 [英] Allocate scatter plot into specific bins
问题描述
我有一个scatter plot
,它被分类为4 Bins
.这些在中间用两个arcs
和一个line
分隔(请参见下图).
两个arcs
略有问题.如果X-Coordiante
大于ang2
,则不会归因于正确的Bin
. (请参见下图)
import math
import matplotlib.pyplot as plt
import matplotlib as mpl
X = [24,15,71,72,6,13,77,52,52,62,46,43,31,35,41]
Y = [94,61,76,83,69,86,78,57,45,94,82,74,56,70,94]
fig, ax = plt.subplots()
ax.set_xlim(-100,100)
ax.set_ylim(-40,140)
ax.grid(False)
plt.scatter(X,Y)
#middle line
BIN_23_X = 0
#two arcs
ang1 = -60, 60
ang2 = 60, 60
angle = math.degrees(math.acos(2/9.15))
E_xy = 0,60
Halfway = mpl.lines.Line2D((BIN_23_X,BIN_23_X), (0,125), color = 'white', lw = 1.5, alpha = 0.8, zorder = 1)
arc1 = mpl.patches.Arc(ang1, 70, 110, angle = 0, theta2 = angle, theta1 = 360-angle, color = 'white', lw = 2)
arc2 = mpl.patches.Arc(ang2, 70, 110, angle = 0, theta2 = 180+angle, theta1 = 180-angle, color = 'white', lw = 2)
Oval = mpl.patches.Ellipse(E_xy, 160, 130, lw = 3, edgecolor = 'black', color = 'white', alpha = 0.2)
ax.add_line(Halfway)
ax.add_patch(arc1)
ax.add_patch(arc2)
ax.add_patch(Oval)
#Sorting the coordinates into bins
def get_nearest_arc_vert(x, y, arc_vertices):
err = (arc_vertices[:,0] - x)**2 + (arc_vertices[:,1] - y)**2
nearest = (arc_vertices[err == min(err)])[0]
return nearest
arc1v = ax.transData.inverted().transform(arc1.get_verts())
arc2v = ax.transData.inverted().transform(arc2.get_verts())
def classify_pointset(vx, vy):
bins = {(k+1):[] for k in range(4)}
for (x,y) in zip(vx, vy):
nx1, ny1 = get_nearest_arc_vert(x, y, arc1v)
nx2, ny2 = get_nearest_arc_vert(x, y, arc2v)
if x < nx1:
bins[1].append((x,y))
elif x > nx2:
bins[4].append((x,y))
else:
if x < BIN_23_X:
bins[2].append((x,y))
else:
bins[3].append((x,y))
return bins
#Bins Output
bins_red = classify_pointset(X,Y)
all_points = [None] * 5
for bin_key in [1,2,3,4]:
all_points[bin_key] = bins_red[bin_key]
输出:
[[], [], [(24, 94), (15, 61), (71, 76), (72, 83), (6, 69), (13, 86), (77, 78), (62, 94)], [(52, 57), (52, 45), (46, 82), (43, 74), (31, 56), (35, 70), (41, 94)]]
这不太正确.查看下面的figure output
,4 coordinates
在Bin 3
中,而11
在Bin 4
中.但是8
属于Bin 3
,而7
属于Bin 4
.
我认为问题是blue coordinates
.具体而言,当X-Coordinate
大于ang2
时,即60
.如果我将这些值更改为小于60
,它们将被更正为Bin 3
.
我不确定是否应该扩展将arcs
大于60
或是否可以改进代码?
请注意,这仅适用于Bin 4
和ang2
. Bin 1
和ang1
会出现此问题.也就是说,如果X-Cooridnate 小于60 ,则不会将其归因于Bin 1
预期输出:
[[], [], [(24, 94), (15, 61), (6, 69), (13, 86)], [(71, 76), (72, 83), (52, 57), (52, 45), (46, 82), (43, 74), (31, 56), (35, 70), (41, 94), (77, 78), (62, 94)]]
注意:首选预期的输出.该示例使用一个row
输入数据.但是,我的数据集更大.如果我们使用大量的rows
,则输出应逐行显示.例如
#Numerous rows
X = np.random.randint(50, size=(100, 10))
Y = np.random.randint(80, size=(100, 10))
出局:
Row 0 = [(x,y)],[(x,y)],[(x,y)],[(x,y)]
Row 1 = [(x,y)],[(x,y)],[(x,y)],[(x,y)]
Row 2 = [(x,y)],[(x,y)],[(x,y)],[(x,y)]
etc
补丁对是否包含点进行了测试:contains_point
甚至对点数组进行了测试:contains_points
仅此而已,我为您提供了一个代码段,您可以在添加补丁的部分和#Sorting the coordinates into bins
代码块之间添加该代码段.
它添加了两个附加的(透明)椭圆,以计算如果圆弧是完全封闭的椭圆,圆弧是否将包含点.那么,如果某点属于大椭圆形,左或右省略号或x坐标为正或负,则bin计算只是测试的布尔组合.
ov1 = mpl.patches.Ellipse(ang1, 70, 110, alpha=0)
ov2 = mpl.patches.Ellipse(ang2, 70, 110, alpha=0)
ax.add_patch(ov1)
ax.add_patch(ov2)
for px, py in zip(X, Y):
in_oval = Oval.contains_point(ax.transData.transform(([px, py])), 0)
in_left = ov1.contains_point(ax.transData.transform(([px, py])), 0)
in_right = ov2.contains_point(ax.transData.transform(([px, py])), 0)
on_left = px < 0
on_right = px > 0
if in_oval:
if in_left:
n_bin = 1
elif in_right:
n_bin = 4
elif on_left:
n_bin = 2
elif on_right:
n_bin = 3
else:
n_bin = -1
else:
n_bin = -1
print('({:>2}/{:>2}) is {}'.format(px, py, 'in Bin ' +str(n_bin) if n_bin>0 else 'outside'))
输出为:
(24/94) is in Bin 3
(15/61) is in Bin 3
(71/76) is in Bin 4
(72/83) is in Bin 4
( 6/69) is in Bin 3
(13/86) is in Bin 3
(77/78) is outside
(52/57) is in Bin 4
(52/45) is in Bin 4
(62/94) is in Bin 4
(46/82) is in Bin 4
(43/74) is in Bin 4
(31/56) is in Bin 4
(35/70) is in Bin 4
(41/94) is in Bin 4
请注意,当点的x坐标= 0时,您仍然应该决定如何定义bin-在它们等于外部时,因为on_left
和on_right
都不会对它们负责... /p>
PS:感谢@ImportanceOfBeingErnest提供了必要转换的提示: https://stackoverflow.com/a/49112347/8300135
注意:对于以下所有编辑,您都需要 import numpy as np
编辑:
用于计算每个X, Y
数组输入的bin分布的函数:
def bin_counts(X, Y):
bc = dict()
E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
L = np.array(X) < 0
R = np.array(X) > 0
bc[1] = np.sum(E & E_l)
bc[2] = np.sum(E & L & ~E_l)
bc[3] = np.sum(E & R & ~E_r)
bc[4] = np.sum(E & E_r)
return bc
将导致以下结果:
bin_counts(X, Y)
Out: {1: 0, 2: 0, 3: 4, 4: 10}
X和Y的两个2D数组中有很多行:
np.random.seed(42)
X = np.random.randint(-80, 80, size=(100, 10))
Y = np.random.randint(0, 120, size=(100, 10))
循环遍历所有行:
for xr, yr in zip(X, Y):
print(bin_counts(xr, yr))
结果:
{1: 1, 2: 2, 3: 6, 4: 0}
{1: 1, 2: 0, 3: 4, 4: 2}
{1: 5, 2: 2, 3: 1, 4: 1}
...
{1: 3, 2: 2, 3: 2, 4: 0}
{1: 2, 2: 4, 3: 1, 4: 1}
{1: 1, 2: 1, 3: 6, 4: 2}
为了不返回每个仓中的点数,而是返回包含四个数组的数组,四个数组包含每个仓中的点的x,y坐标,请使用以下命令:
X = [24,15,71,72,6,13,77,52,52,62,46,43,31,35,41]
Y = [94,61,76,83,69,86,78,57,45,94,82,74,56,70,94]
def bin_points(X, Y):
X = np.array(X)
Y = np.array(Y)
E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
L = X < 0
R = X > 0
bp1 = np.array([X[E & E_l], Y[E & E_l]]).T
bp2 = np.array([X[E & L & ~E_l], Y[E & L & ~E_l]]).T
bp3 = np.array([X[E & R & ~E_r], Y[E & R & ~E_r]]).T
bp4 = np.array([X[E & E_r], Y[E & E_r]]).T
return [bp1, bp2, bp3, bp4]
print(bin_points(X, Y))
[array([], shape=(0, 2), dtype=int32), array([], shape=(0, 2), dtype=int32), array([[24, 94],
[15, 61],
[ 6, 69],
[13, 86]]), array([[71, 76],
[72, 83],
[52, 57],
[52, 45],
[62, 94],
[46, 82],
[43, 74],
[31, 56],
[35, 70],
[41, 94]])]
...同样,要将其应用于大型2D阵列,只需对其进行迭代:
np.random.seed(42)
X = np.random.randint(-100, 100, size=(100, 10))
Y = np.random.randint(-40, 140, size=(100, 10))
bincol = ['r', 'g', 'b', 'y', 'k']
for xr, yr in zip(X, Y):
for i, binned_points in enumerate(bin_points(xr, yr)):
ax.scatter(*binned_points.T, c=bincol[i], marker='o' if i<4 else 'x')
I have a scatter plot
that gets sorted into 4 Bins
. These are separated by two arcs
and a line
in the middle (see figure below).
There's a slight problem with the two arcs
. If the X-Coordiante
is greater than the ang2
it doesn't get attributed to the correct Bin
. (Please see figure below)
import math
import matplotlib.pyplot as plt
import matplotlib as mpl
X = [24,15,71,72,6,13,77,52,52,62,46,43,31,35,41]
Y = [94,61,76,83,69,86,78,57,45,94,82,74,56,70,94]
fig, ax = plt.subplots()
ax.set_xlim(-100,100)
ax.set_ylim(-40,140)
ax.grid(False)
plt.scatter(X,Y)
#middle line
BIN_23_X = 0
#two arcs
ang1 = -60, 60
ang2 = 60, 60
angle = math.degrees(math.acos(2/9.15))
E_xy = 0,60
Halfway = mpl.lines.Line2D((BIN_23_X,BIN_23_X), (0,125), color = 'white', lw = 1.5, alpha = 0.8, zorder = 1)
arc1 = mpl.patches.Arc(ang1, 70, 110, angle = 0, theta2 = angle, theta1 = 360-angle, color = 'white', lw = 2)
arc2 = mpl.patches.Arc(ang2, 70, 110, angle = 0, theta2 = 180+angle, theta1 = 180-angle, color = 'white', lw = 2)
Oval = mpl.patches.Ellipse(E_xy, 160, 130, lw = 3, edgecolor = 'black', color = 'white', alpha = 0.2)
ax.add_line(Halfway)
ax.add_patch(arc1)
ax.add_patch(arc2)
ax.add_patch(Oval)
#Sorting the coordinates into bins
def get_nearest_arc_vert(x, y, arc_vertices):
err = (arc_vertices[:,0] - x)**2 + (arc_vertices[:,1] - y)**2
nearest = (arc_vertices[err == min(err)])[0]
return nearest
arc1v = ax.transData.inverted().transform(arc1.get_verts())
arc2v = ax.transData.inverted().transform(arc2.get_verts())
def classify_pointset(vx, vy):
bins = {(k+1):[] for k in range(4)}
for (x,y) in zip(vx, vy):
nx1, ny1 = get_nearest_arc_vert(x, y, arc1v)
nx2, ny2 = get_nearest_arc_vert(x, y, arc2v)
if x < nx1:
bins[1].append((x,y))
elif x > nx2:
bins[4].append((x,y))
else:
if x < BIN_23_X:
bins[2].append((x,y))
else:
bins[3].append((x,y))
return bins
#Bins Output
bins_red = classify_pointset(X,Y)
all_points = [None] * 5
for bin_key in [1,2,3,4]:
all_points[bin_key] = bins_red[bin_key]
Output:
[[], [], [(24, 94), (15, 61), (71, 76), (72, 83), (6, 69), (13, 86), (77, 78), (62, 94)], [(52, 57), (52, 45), (46, 82), (43, 74), (31, 56), (35, 70), (41, 94)]]
This isn't quite right. Looking at the figure output
below, 4 coordinates
are in Bin 3
and 11
are in Bin 4
. But 8
are attributed to Bin 3
and 7
are attributed to Bin 4
.
I think the problem is the blue coordinates
. Specifically, when the X-Coordinate
is greater than ang2
, which is 60
. If I alter these to be less than 60
they will be corrected into Bin 3
.
I'm not sure if I should extend the arcs
to be greater than 60
or if the code can be improved?
Please note this is just for Bin 4
and ang2
. The issue will occur for Bin 1
and ang1
. That is, if the X-Cooridnate is less than 60 it won't get attributed to Bin 1
Intended Output:
[[], [], [(24, 94), (15, 61), (6, 69), (13, 86)], [(71, 76), (72, 83), (52, 57), (52, 45), (46, 82), (43, 74), (31, 56), (35, 70), (41, 94), (77, 78), (62, 94)]]
Note: The intended output is preferred. The example uses one row
of input data. However, my dataset is much larger. If we use numerous rows
the output should be row by row. e.g
#Numerous rows
X = np.random.randint(50, size=(100, 10))
Y = np.random.randint(80, size=(100, 10))
Out:
Row 0 = [(x,y)],[(x,y)],[(x,y)],[(x,y)]
Row 1 = [(x,y)],[(x,y)],[(x,y)],[(x,y)]
Row 2 = [(x,y)],[(x,y)],[(x,y)],[(x,y)]
etc
Patches have a test for containing points or not: contains_point
and even for arrays of points:contains_points
Just to play with I have a code snippet for you, which you can add between the part where you're adding your patches and the #Sorting the coordinates into bins
codeblock.
It adds two additional (transparent) ellipses for calculating if the arcs would contain points if they were fully closed ellipses. Then your bin calculation is just a boolean combination of tests if a point belongs to the big oval, the left or right ellipsis or has positive or negative x-coordinate.
ov1 = mpl.patches.Ellipse(ang1, 70, 110, alpha=0)
ov2 = mpl.patches.Ellipse(ang2, 70, 110, alpha=0)
ax.add_patch(ov1)
ax.add_patch(ov2)
for px, py in zip(X, Y):
in_oval = Oval.contains_point(ax.transData.transform(([px, py])), 0)
in_left = ov1.contains_point(ax.transData.transform(([px, py])), 0)
in_right = ov2.contains_point(ax.transData.transform(([px, py])), 0)
on_left = px < 0
on_right = px > 0
if in_oval:
if in_left:
n_bin = 1
elif in_right:
n_bin = 4
elif on_left:
n_bin = 2
elif on_right:
n_bin = 3
else:
n_bin = -1
else:
n_bin = -1
print('({:>2}/{:>2}) is {}'.format(px, py, 'in Bin ' +str(n_bin) if n_bin>0 else 'outside'))
The output is:
(24/94) is in Bin 3
(15/61) is in Bin 3
(71/76) is in Bin 4
(72/83) is in Bin 4
( 6/69) is in Bin 3
(13/86) is in Bin 3
(77/78) is outside
(52/57) is in Bin 4
(52/45) is in Bin 4
(62/94) is in Bin 4
(46/82) is in Bin 4
(43/74) is in Bin 4
(31/56) is in Bin 4
(35/70) is in Bin 4
(41/94) is in Bin 4
Note you still should decide how to define bins when points have x-coord=0 - at the moment they're equal to outside, as on_left
and on_right
both do not feel responsible for them...
PS: Thanks to @ImportanceOfBeingErnest for the hint to the necessary transformation: https://stackoverflow.com/a/49112347/8300135
Note: for all the following EDITS you'll need to import numpy as np
EDIT:
Function for counting the bin distribution per X, Y
array input:
def bin_counts(X, Y):
bc = dict()
E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
L = np.array(X) < 0
R = np.array(X) > 0
bc[1] = np.sum(E & E_l)
bc[2] = np.sum(E & L & ~E_l)
bc[3] = np.sum(E & R & ~E_r)
bc[4] = np.sum(E & E_r)
return bc
Will lead to this result:
bin_counts(X, Y)
Out: {1: 0, 2: 0, 3: 4, 4: 10}
EDIT2: many rows in two 2D-arrays for X and Y:
np.random.seed(42)
X = np.random.randint(-80, 80, size=(100, 10))
Y = np.random.randint(0, 120, size=(100, 10))
looping over all the rows:
for xr, yr in zip(X, Y):
print(bin_counts(xr, yr))
result:
{1: 1, 2: 2, 3: 6, 4: 0}
{1: 1, 2: 0, 3: 4, 4: 2}
{1: 5, 2: 2, 3: 1, 4: 1}
...
{1: 3, 2: 2, 3: 2, 4: 0}
{1: 2, 2: 4, 3: 1, 4: 1}
{1: 1, 2: 1, 3: 6, 4: 2}
EDIT3: for returning not the number of points in each bin, but an array with four arrays containing the x,y-coordinates of the points in each bin, use the following:
X = [24,15,71,72,6,13,77,52,52,62,46,43,31,35,41]
Y = [94,61,76,83,69,86,78,57,45,94,82,74,56,70,94]
def bin_points(X, Y):
X = np.array(X)
Y = np.array(Y)
E = Oval.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_l = ov1.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
E_r = ov2.contains_points(ax.transData.transform(np.array([X, Y]).T), 0)
L = X < 0
R = X > 0
bp1 = np.array([X[E & E_l], Y[E & E_l]]).T
bp2 = np.array([X[E & L & ~E_l], Y[E & L & ~E_l]]).T
bp3 = np.array([X[E & R & ~E_r], Y[E & R & ~E_r]]).T
bp4 = np.array([X[E & E_r], Y[E & E_r]]).T
return [bp1, bp2, bp3, bp4]
print(bin_points(X, Y))
[array([], shape=(0, 2), dtype=int32), array([], shape=(0, 2), dtype=int32), array([[24, 94],
[15, 61],
[ 6, 69],
[13, 86]]), array([[71, 76],
[72, 83],
[52, 57],
[52, 45],
[62, 94],
[46, 82],
[43, 74],
[31, 56],
[35, 70],
[41, 94]])]
...and again, for applying this to the big 2D-arrays, just iterate over them:
np.random.seed(42)
X = np.random.randint(-100, 100, size=(100, 10))
Y = np.random.randint(-40, 140, size=(100, 10))
bincol = ['r', 'g', 'b', 'y', 'k']
for xr, yr in zip(X, Y):
for i, binned_points in enumerate(bin_points(xr, yr)):
ax.scatter(*binned_points.T, c=bincol[i], marker='o' if i<4 else 'x')
这篇关于将散点图分配到特定的箱中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!