为什么"numpy.mean"返回"INF"? [英] Why does "numpy.mean" return 'inf'?
问题描述
我需要计算具有超过1000行的数组的列中的均值.
I need to calculate the mean in columns of an array with more than 1000 rows.
np.mean(some_array)
给了我
inf
作为输出
但是我很确定这些值还可以.我正在将此处中的csv加载到我的Data
变量中,并且水泥"列为我认为健康".
but i am pretty sure the values are ok. I am loading a csv from here into my Data
variable and column 'cement' is "healthy" from my point of view.
In[254]:np.mean(Data[:230]['Cement'])
Out[254]:275.75
但是如果我增加行数 问题开始了:
but if I increase the number of rows the problem starts:
In [259]:np.mean(Data[:237]['Cement'])
Out[259]:inf
但是当我查看数据时
In [261]:Data[230:237]['Cement']
Out[261]:
array([[ 425. ],
[ 333. ],
[ 250.25],
[ 491. ],
[ 160. ],
[ 229.75],
[ 338. ]], dtype=float16)
我找不到此行为的原因 P.S这在使用wakari(基于云的Ipython)的Python 3.x中发生
i do not find a reason for this behaviour P.S This happens in Python 3.x using wakari (cloud based Ipython)
Numpy版本"1.8.1"
Numpy Version '1.8.1'
我正在使用以下数据加载数据:
I am loading the Data with:
No_Col=9
conv = lambda valstr: float(valstr.replace(',','.'))
c={}
for i in range(0,No_Col,1):
c[i] = conv
Data=np.genfromtxt(get_data,dtype=float16 , delimiter='\t', skip_header=0, names=True, converters=c)
推荐答案
我猜这个问题是精确的(正如其他人也评论过的那样).直接从我们看到的mean()
文档中引用
I will guess that the problem is precision (as others have also commented). Quoting directly from the documentation for mean()
we see
注释
算术平均值是沿轴的元素之和除以 通过元素的数量.
The arithmetic mean is the sum of the elements along the axis divided by the number of elements.
请注意,对于浮点输入,均值使用
输入具有相同的精度.根据输入数据,这可以
导致结果不准确,尤其是对于float32
(请参见
下面的示例).使用
dtype
关键字可以缓解此问题.
Note that for floating-point input, the mean is computed using the
same precision the input has. Depending on the input data, this can
cause the results to be inaccurate, especially for float32
(see
example below). Specifying a higher-precision accumulator using the
dtype
keyword can alleviate this issue.
由于数组的类型为float16,因此精度非常有限.使用dtype=np.float64
可能会减轻溢出.另请参见mean()
文档中的示例.
Since your array is of type float16 you have very limited precision. Using dtype=np.float64
will probably alleviate the overflow. Also see the examples in the mean()
documentation.
这篇关于为什么"numpy.mean"返回"INF"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!