在散点图中删除线下的数据(Python) [英] Removing Data Below A Line In A Scatterplot (Python)

查看:106
本文介绍了在散点图中删除线下的数据(Python)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

因此,我有一些代码可以绘制我的数据集的2dhistogram.我这样绘制它:

So I had code that graphed a 2dhistogram of my dataset. I plotted it like so:

histogram = plt.hist2d(fehsc, ofesc, bins=nbins, range=[[-1,.5],[0.225,0.4]])

虽然我只想查看特定行上方的数据,所以我添加了以下内容,并且效果很好:

I wanted to only look at data above a certain line though, so I added the following and it worked just fine:

counts = histogram[0]
xpos = histogram[1]
ypos = histogram[2]
image = histogram[3]
newcounts = counts #we're going to iterate over this

for i in range (nbins):
    xin = xpos[i]
    yin = ypos
    yline = m*xin + b
    reset = np.where(yin < yline) #anything less than yline we want to be 0
    #index = index[0:len(index)-1]  
    countout = counts[i]
    countout[reset] = 0
    newcounts[i] = countout

但是,我现在需要在该切割区域绘制一条回归线.在plt.2dhist中无法这样做(AFAIK),因此我正在使用plt.scatter.问题是我不知道该如何切割-我无法为散点图编制索引.

However, I now need to draw a regression line through that cut region. Doing so is not possible (AFAIK) in plt.2dhist, so I'm using plt.scatter. Problem is I don't know how to make that cut anymore - I can't index the scatterplot.

我现在有这个:

plt.xlim(-1,.5)
plt.ylim(.225, .4)

scatter = plt.scatter(fehsc,ofesc, marker = ".")

,我只想保留某些行上方的数据:

and I only want to retain the data above some line:

xarr = np.arange(-1,0.5, 0.015)
yarr = m*xarr + b
plt.plot(xarr, yarr, color='r')

我尝试使用变量的一些变体来运行循环,但我实际上并不了解或不知道如何使其工作.

I've tried running the loop with some variations of the variables but I don't actually understand or know how to get it to work.

推荐答案

可以在绘制数据之前为数据定义mask,然后仅绘制实际满足条件的数据点.在下面的示例中,某一行上方的所有数据点均以绿色绘制,而该行下方的所有数据点均以黑色绘制.

You could define a mask for your data before you plot and then just plot the data points that actually meet your criteria. Below an example, where all data points above a certain line are plotted in green and all data points below the line are plotted in black.

from matplotlib import pyplot as plt
import numpy as np

#the scatterplot data
xvals = np.random.rand(100)
yvals = np.random.rand(100)

#the line
b  = 0.1
m = 1
x = np.linspace(0,1,num=100)
y = m*x+b

mask = yvals > m*xvals+b

plt.scatter(xvals[mask],yvals[mask],color='g')
plt.scatter(xvals[~mask],yvals[~mask],color='k')
plt.plot(x,y,'r')
plt.show()

结果看起来像这样

希望这会有所帮助.

编辑:

如果要创建2D直方图,将线下的部分设置为零,则可以先使用numpy(作为数组)生成直方图,然后将该数组内的值设置为如果垃圾箱低于该线,则为零.之后,您可以使用plt.pcolormesh:

If you want to create a 2D histogram, where the portion below the line is set to zero, you can do that by first generating the histogram using numpy (as an array) and then setting the values inside that array to zero, if the bins fall below the line. After that, you can plot the matrix using plt.pcolormesh:

from matplotlib import pyplot as plt
import numpy as np

#the scatterplot data
xvals = np.random.rand(1000)
yvals = np.random.rand(1000)
histogram,xbins,ybins = np.histogram2d(xvals,yvals,bins=50)

#computing the bin centers from the bin edges:
xcenters = 0.5*(xbins[:-1]+xbins[1:])
ycenters = 0.5*(ybins[:-1]+ybins[1:])

#the line
b  = 0.1
m = 1
x = np.linspace(0,1,num=100)
y = m*x+b

#hiding the part of the histogram below the line
xmesh,ymesh = np.meshgrid(xcenters,ycenters)
mask = m*xmesh+b > ymesh
histogram[mask] = 0

#making the plot
mat = plt.pcolormesh(xcenters,ycenters,histogram)
line = plt.plot(x,y,'r')
plt.xlim([0,1])
plt.ylim([0,1])
plt.show()

结果将是这样的:

这篇关于在散点图中删除线下的数据(Python)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆