scipy.stats.binned_statistic_2d 适用于计数但不意味着 [英] scipy.stats.binned_statistic_2d works for count but not mean

查看:67
本文介绍了scipy.stats.binned_statistic_2d 适用于计数但不意味着的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一些卫星数据,如下所示(散点图):

我现在想将这些数据随时间和纬度放入一个规则网格中,并使每个 bin 等于落入其中的所有数据点的平均值.我一直在试验

现在,如果我将统计数据更改为均值、np.mean、np.ma.mean 等......这是我得到的图,它似乎挑选出有数据的地方和没有数据的地方:

即使此数据的最小值和最大值分别为 612 和 2237026.我已经编写了一些手动执行此操作的代码,但它并不漂亮并且需要永远(而且我还没有完全考虑边缘效应,因此运行到错误然后修复它需要永远).

我希望得到一些建议以使其发挥作用.谢谢!

我刚刚注意到我在运行脚本后收到运行时警告,但我在网上找不到任何相关信息.谷歌搜索警告返回零结果.除计数外的每个统计选项都会出现警告.

<块引用>

AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\matplotlib\colors.py:494:运行时警告:在less cbook._putmask(xa,xa<0.0, -1)

Edit2:我在下面附上了一些重复我的问题的代码.此代码适用于统计计数​​,但不适用于均值或任何其他统计.此代码以相同的方式产生与之前相同的运行时警告.

将 matplotlib.pyplot 导入为 plt将 numpy 导入为 np来自 scipy 导入统计x = np.random.rand(1000)y = np.random.rand(1000)z = np.arange(1000)H, xedges, yedges, binnumber = stats.binned_statistic_2d(x, y, values = z, statistic='count' , bins = [20, 20])H2, xedges2, yedges2, binnumber2 = stats.binned_statistic_2d(x, y, values = z, statistic='mean' , bins = [20, 20])XX, YY = np.meshgrid(xedges, yedges)XX2, YY2 = np.meshgrid(xedges2, yedges2)fig = plt.figure(figsize = (13,7))ax1=plt.subplot(111)plot1 = ax1.pcolormesh(XX,YY,H.T)cbar = plt.colorbar(plot1,ax=ax1, pad = .015, aspect=10)plt.show()fig2 = plt.figure(figsize = (13,7))ax2=plt.subplot(111)plot2 = ax2.pcolormesh(XX2,YY2,H2.T)cbar = plt.colorbar(plot2,ax=ax2, pad = .015, aspect=10)plt.show()

编辑 3:User8153 能够识别问题.解决方案是从发生 nans 的 scipy stats 中屏蔽数组.我使用了

解决方案

当在 binned_statistic_2d 中使用 'count' 统计数据时,空箱被标记为零,您将其屏蔽在你的代码中.如果您切换到 'mean''median' 统计数据,则空箱由 NaN 表示,因此您必须为此调整掩码.一种方法是替换

H = np.ma.masked_where(H==0, H)

H = np.ma.masked_invalid(H)

I have some satellite data which looks like the following (scatter plot):

I now want to bin this data into a regular grid over time and latitude and have each bin be equal to the mean of the all the data points that fall within it. I have been experimenting with scipy.stats.binned_statistic_2d and am baffled at the results I am getting.

First, if I pass the "count" statistic into the scipy binning function, it appears to work correctly (minimal code and plot below).

id1 = np.ma.masked_where(id1==0, id1) #id1 is the actual data and I have tried using this masking argument and without to the same effect

x_range = np.arange(0,24.25,.25) #setting grid spacing for x and y
y_range = np.arange(-13,14,1)

xbins, ybins = len(x_range), len(y_range) #number of bins in each dimension

H, xedges, yedges, binnumber = stats.binned_statistic_2d(idtime, idlat, values = id1, statistic='count' , bins = [xbins, ybins])  #idtime and idlat are the locations of each id1 value in time and latitude
H = np.ma.masked_where(H==0, H) #masking where there was no data
XX, YY = np.meshgrid(xedges, yedges)

fig = plt.figure(figsize = (13,7))
ax1=plt.subplot(111)
plot1 = ax1.pcolormesh(XX,YY,H.T)

Resulting Plot

Now if I change the statistic to mean, np.mean, np.ma.mean, etc... this is the plot I get which appears to pick out places there is data and where there is none:

Even though the min and max values for this data are 612 and 2237026 respectively. I have written some code that does this manually, but it isn't pretty and takes forever (and I haven't completely accounted for edge effects so running to error and then fixing it is taking forever).

I would love some advice to get this to work. Thanks!

Edit: I just noticed that I am getting a runtime warning after running the script which I can't find any information about online. A google search for the warning returns zero results. The warning occurs for every statistic option except for count.

AppData\Local\Enthought\Canopy\edm\envs\User\lib\site-packages\matplotlib\colors.py:494: RuntimeWarning: invalid value encountered in less cbook._putmask(xa, xa < 0.0, -1)

Edit2: I am attaching some code below that duplicates my problem. This code works for the statistic count but not for mean or any other statistic. This code produces the same run time warning from before in the same manner.

import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

x = np.random.rand(1000)
y = np.random.rand(1000)

z = np.arange(1000)

H, xedges, yedges, binnumber = stats.binned_statistic_2d(x, y, values = z, statistic='count' , bins = [20, 20])
H2, xedges2, yedges2, binnumber2 = stats.binned_statistic_2d(x, y, values = z, statistic='mean' , bins = [20, 20])

XX, YY = np.meshgrid(xedges, yedges)
XX2, YY2 = np.meshgrid(xedges2, yedges2)

fig = plt.figure(figsize = (13,7))
ax1=plt.subplot(111)
plot1 = ax1.pcolormesh(XX,YY,H.T)
cbar = plt.colorbar(plot1,ax=ax1, pad = .015, aspect=10)
plt.show()

fig2 = plt.figure(figsize = (13,7))
ax2=plt.subplot(111)
plot2 = ax2.pcolormesh(XX2,YY2,H2.T)
cbar = plt.colorbar(plot2,ax=ax2, pad = .015, aspect=10)
plt.show()

Edit 3: User8153 was able to identify the problem. The solution was to mask the array from scipy stats where nans occur. I used np.ma.masked_invalid() to do this. Plots of my original data and test data are below for the mean statistic.

解决方案

When using the 'count' statistics in binned_statistic_2d empty bins are marked as zero, which you mask in your code. If you switch to 'mean' or 'median' statistics then empty bins are represented by NaN, so you have to adjust the mask for that. One way to do that is to replace

H = np.ma.masked_where(H==0, H)

by

H = np.ma.masked_invalid(H)

这篇关于scipy.stats.binned_statistic_2d 适用于计数但不意味着的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆