pandas /Python:二维直方图因值错误而失败 [英] Pandas/Python: 2D histogram fails with value error

查看:126
本文介绍了 pandas /Python:二维直方图因值错误而失败的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从熊猫数据框比率"创建2D直方图 X和Y轴应该是从数据帧进行转换的,即X和Y轴是从原始帧列开始缩放"的,而bin的高度取决于每个x/y bin中的命中数.

I am trying to create a 2D histrogram from a Pandas data frame "rates" The X and Y axis are supposed to be transforms from the dataframe, i.e., the X and Y axis are 'scaled' from the original frame columns and the bin heigths are according to the number of hits in each x/y bin.

import numpy, pylab, pandas
import matplotlib.pyplot as plt

list(rates.columns.values)
['sizes', 'transfers', 'positioning']

x=(rates["sizes"]/1024./1024.)
y=((rates["sizes"]/rates["transfers"])/1024.)+rates["positioning]

所以,我尝试使用

histo, xedges, yedges = numpy.histogram2d(x, y, bins=(100,100))

但是,此操作失败

File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/numpy/lib/twodim_base.py", line 650, in histogram2d
 hist, edges = histogramdd([x, y], bins, range, normed, weights)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py" line 363, in histogramdd
 decimal = int(-log10(mindiff)) + 6
ValueError: cannot convert float NaN to integer

我已经将所有NaN删除在我的rame'rates.dropna()'中-但实际上是从我猜到的错误来看,这并非归因于框架中的NaN.

I have already dropped all NaN in my rame 'rates.dropna()' - but actually from the error I guess, that it is not due to NaNs in my frame.

也许有人有一个主意,这里出了什么问题?

Maybe somebody has an idea, what goes wrong here?

推荐答案

在@jme的帮助下,我走上了正确的轨道

with help from @jme I got on the right track

我没有检查过有问题的值对 x:y = 0.0:inf 显然不是一个好的2D直方图向量,即,在转换原始值时,我不得不抓住这种情况.

I had not checked for a problematic value pair x:y = 0.0:inf can obviously not be a good 2D histogram vector, i.e., when transforming the original values I have to catch such cases.

另一件事:numpy直方图对我的DataFrame系列有一些问题,因此我必须从该系列中获得一个适当的numpy.arrary才能正确绘制它们,例如,

another thing: numpy histogram had some issues for me with DataFrame series, so I had to get a proper numpy.arrary from the series to plot them properly, e.g.,

histo, xedges, yedges = np.histogram2d(np.array(x[1:MAX]),np.array(y[1:MAX]), bins=(100,100))

将序列分割为最大变量MAX

for slicing the series up to some variable MAX

这篇关于 pandas /Python:二维直方图因值错误而失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆