使用histogram2d python查找均值bin值 [英] find mean bin values using histogram2d python

查看:120
本文介绍了使用histogram2d python查找均值bin值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如何在python中使用2D直方图计算垃圾箱的平均值?我在x和y轴上有温度范围,我正在尝试绘制各个温度下使用垃圾箱的闪电概率.我正在从一个csv文件中读取数据,而我的代码是这样的:

How do you calculate the mean values for bins with a 2D histogram in python? I have temperature ranges for the x and y axis and I am trying to plot the probability of lightning using bins for the respective temperatures. I am reading in the data from a csv file and my code is such:

filename = 'Random_Events_All_Sorted_85GHz.csv'
df = pd.read_csv(filename)

min37 = df.min37
min85 = df.min85
verification = df.five_min_1

#Numbers
x = min85
y = min37
H = verification

#Estimate the 2D histogram
nbins = 4
H, xedges, yedges = np.histogram2d(x,y,bins=nbins)

#Rotate and flip H
H = np.rot90(H) 
H = np.flipud(H)

#Mask zeros
Hmasked = np.ma.masked_where(H==0,H)

#Plot 2D histogram using pcolor
fig1 = plt.figure()
plt.pcolormesh(xedges,yedges,Hmasked)
plt.xlabel('min 85 GHz PCT (K)')
plt.ylabel('min 37 GHz PCT (K)')
cbar = plt.colorbar()
cbar.ax.set_ylabel('Probability of Lightning (%)')

plt.show()

这使图表看起来很漂亮,但是绘制的数据是计数或落入每个仓中的样本数.验证变量是一个包含1和0的数组,其中1表示闪电,0表示没有闪电.我希望图表中的数据基于验证变量中的数据,成为给定bin的闪电概率-因此,我需要bin_mean * 100才能获得该百分比.

This makes a nice looking plot, but the data that is plotted is the count, or number of samples that fall into each bin. The verification variable is an array that contains 1's and 0's, where a 1 indicates lightning and a 0 indicates no lightning. I want the data in the plot to be the probability of lightning for a given bin based on the data from the verification variable - thus I need bin_mean*100 in order to get this percentage.

我尝试使用类似于此处显示的方法(在带有scipy/numpy 的python),但我很难使它适用于2D直方图.

I tried using an approach similar to what is shown here (binning data in python with scipy/numpy), but I was having difficulty getting it to work for a 2D histogram.

推荐答案

这至少可以通过以下方法实现

This is doable at least with the following method

# xedges, yedges as returned by 'histogram2d'

# create an array for the output quantities
avgarr = np.zeros((nbins, nbins))

# determine the X and Y bins each sample coordinate belongs to
xbins = np.digitize(x, xedges[1:-1])
ybins = np.digitize(y, yedges[1:-1])

# calculate the bin sums (note, if you have very many samples, this is more
# effective by using 'bincount', but it requires some index arithmetics
for xb, yb, v in zip(xbins, ybins, verification):
    avgarr[yb, xb] += v

# replace 0s in H by NaNs (remove divide-by-zero complaints)
# if you do not have any further use for H after plotting, the
# copy operation is unnecessary, and this will the also take care
# of the masking (NaNs are plotted transparent)
divisor = H.copy()
divisor[divisor==0.0] = np.nan

# calculate the average
avgarr /= divisor

# now 'avgarr' contains the averages (NaNs for no-sample bins)

如果您事先知道bin的边缘,则只需添加一行就可以对直方图部分进行处理.

If you know the bin edges beforehand, you can do the histogram part in the same just by adding one row.

这篇关于使用histogram2d python查找均值bin值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆