如何使用网格从数据集中采样点? [英] How to sample points from a data set using a grid?

查看:67
本文介绍了如何使用网格从数据集中采样点?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我有一些数据包含大约一百万个 (r, phi) 坐标,以及它们的强度.我想以网格模式对这些数据进行采样,这样我就可以减少使用的内存并更快地绘图.但是,我想对 X,Y 中的数据进行采样,因为我会将坐标转换为 (X,Y) 坐标以绘制它们.

So I have some data with around a million (r, phi) coordinates, along with their intensities. I want to sample this data in a grid pattern so I can reduce memory used, and plot faster. However I want to sample the data in X,Y as I will be converting the coordinates to (X,Y) coordinates to plot them.

我想我可以使用网格来制作一个我想要采样的模板,但我被困在下一步.

I was thinking I could use a meshgrid to come up with a template I'd like to sample, but I'm stuck on the next step.

我似乎无法在 google 或这里找到任何有用的搜索,但如果这个问题太简单,我深表歉意!

I can't seem to find anything useful searching on google or here, but apologies if this is too simple a question!

我正在使用 numpy,我的数据现在存储为三个单独的数组.我打算使用 np.meshgrid 和后来的 scipy.interpolate.griddata 进行插值.

I'm using numpy and my data is stored as three seperate arrays right now. I was planning to use np.meshgrid and later scipy.interpolate.griddata for interpolation.

rphiintensity 都是 np.array 形状为 (million,)

r, phi and intensity are all np.arrays with shape (million,)

例如

r = array([1560.8, 1560.8003119, 1560.8006238, ..., 3556.831746,
           3558.815873 , 3560.8      ])

我从这个开始;

r = data[:, 0]  # radius
phi = data[:, 1]  # altitude angle
h2o = data[:, 2]  # intensity

x = r * np.sin(phi)  # It's a left handed coordinate system
z = r * np.cos(phi)

对于采样网格,我得到了这个;

And for the sampling grid I have got this;

Xscale = np.linspace(min(x), max(x), 1000)
Zscale = np.linspace(min(z), max(z), 1000)

[X, Z] = np.meshgrid(Xscale, Zscale)

推荐答案

如果你提供了一些数据来处理就好了.没关系,我们会创造一些.

It would have been nice if you have provided some data to work on. It doesn't matter, we will create some.

让我们从 r,theta 任意值创建 x,y 值:

Lets create x,y values from r,theta arbitrary values :

import numpy as np
import matplotlib.pyplot as plt

theta=np.linspace(0.,50.,1000)
r=np.linspace(5.,10,1000)

x=r*np.sin(theta)
y=r*np.cos(theta)

plt.plot(x,y,linestyle='',marker='.')

情节给出:

现在添加任意强度值:

intensity=np.sqrt(x**2+y**2)

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(x, y, intensity)

散点图给出:

如果我理解得很好,我们应该离你的起点不远.我们现在有 3 个包含 1000 个值的数组.我们将把它减少到一个 20x20 的 mesgrid.我们必须首先创建 x 和 y bin,然后从 scipy 调用 binned_statistic_2d 方法,就是这样.

If I understand well we should not be far from your starting point. We have now 3 arrays with 1000 values. We are going to reduce it to a 20x20 mesgrid. We have to first create the x and y bins, then call the binned_statistic_2d method from scipy and that's it.

import scipy.stats as stats

binx=np.linspace(-10.,10.,20)
biny=np.linspace(-10.,10.,20)

ret = stats.binned_statistic_2d(x, y, intensity, 'mean', bins=[binx,biny])

Z=ret.statistic
Z = np.ma.masked_invalid(Z) # allow to mask Nan values got in bins where there is no value
X, Y = np.meshgrid(binx,biny)

plt.pcolor(X,Y,Z)
plt.show()

pcolor 图给出:

The pcolor plot gives :

按照您的评论要求,我们现在可以回到原来的 x,y,z 数组结构.

As requested in your comment, we can now go back to the original x,y,z arrays structure.

首先,我们必须计算bins的中心坐标

First, we have to calculate the center coordinates of the bins

binx_centers=(binx[1:] + binx[:-1])/2
biny_centers=(biny[1:] + biny[:-1])/2
Xcenters, Ycenters = np.meshgrid(binx_centers,biny_centers)

然后我们可以得到未屏蔽的值(见上面的解释)

Then we can get the not masked values (see explanation above)

xnew=np.ma.masked_array(Xcenters, Z.mask).compressed()
ynew=np.ma.masked_array(Ycenters, Z.mask).compressed()
znew=Z.compressed()

我们可以检查新的尺寸:

We can check the new size :

print(znew.shape)

仅给出 235 个值(而不是 1000 个.):

Gives only 235 values (instead of 1000.):

(235L,) 

以及带有压缩值的新散点图:

And the new scatter plot with the compressed values :

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.scatter(xnew, ynew, znew)

我们得到:

这篇关于如何使用网格从数据集中采样点?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆