pandas /Python:二维直方图因值错误而失败 [英] Pandas/Python: 2D histogram fails with value error
问题描述
我正在尝试从熊猫数据框比率"创建2D直方图 X和Y轴应该是从数据帧进行转换的,即X和Y轴是从原始帧列开始缩放"的,而bin的高度取决于每个x/y bin中的命中数.>
I am trying to create a 2D histrogram from a Pandas data frame "rates" The X and Y axis are supposed to be transforms from the dataframe, i.e., the X and Y axis are 'scaled' from the original frame columns and the bin heigths are according to the number of hits in each x/y bin.
import numpy, pylab, pandas
import matplotlib.pyplot as plt
list(rates.columns.values)
['sizes', 'transfers', 'positioning']
x=(rates["sizes"]/1024./1024.)
y=((rates["sizes"]/rates["transfers"])/1024.)+rates["positioning]
所以,我尝试使用
histo, xedges, yedges = numpy.histogram2d(x, y, bins=(100,100))
但是,此操作失败
File "<stdin>", line 1, in <module>
File "/usr/lib64/python2.7/site-packages/numpy/lib/twodim_base.py", line 650, in histogram2d
hist, edges = histogramdd([x, y], bins, range, normed, weights)
File "/usr/lib64/python2.7/site-packages/numpy/lib/function_base.py" line 363, in histogramdd
decimal = int(-log10(mindiff)) + 6
ValueError: cannot convert float NaN to integer
我已经将所有NaN删除在我的rame'rates.dropna()'中-但实际上是从我猜到的错误来看,这并非归因于框架中的NaN.
I have already dropped all NaN in my rame 'rates.dropna()' - but actually from the error I guess, that it is not due to NaNs in my frame.
也许有人有一个主意,这里出了什么问题?
Maybe somebody has an idea, what goes wrong here?
推荐答案
在@jme的帮助下,我走上了正确的轨道
with help from @jme I got on the right track
我没有检查过有问题的值对 x:y = 0.0:inf 显然不是一个好的2D直方图向量,即,在转换原始值时,我不得不抓住这种情况.
I had not checked for a problematic value pair x:y = 0.0:inf can obviously not be a good 2D histogram vector, i.e., when transforming the original values I have to catch such cases.
另一件事:numpy直方图对我的DataFrame系列有一些问题,因此我必须从该系列中获得一个适当的numpy.arrary才能正确绘制它们,例如,
another thing: numpy histogram had some issues for me with DataFrame series, so I had to get a proper numpy.arrary from the series to plot them properly, e.g.,
histo, xedges, yedges = np.histogram2d(np.array(x[1:MAX]),np.array(y[1:MAX]), bins=(100,100))
将序列分割为最大变量MAX
for slicing the series up to some variable MAX
这篇关于 pandas /Python:二维直方图因值错误而失败的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!