如何预防NAN问题? [英] How can I prevent NAN issues?
问题描述
我收到Mean of empty slice
运行时警告.
当我打印出我的变量是什么(numpy数组)时,有几个
其中包含nan
值.运行时警告正在看行
58作为问题.我可以进行哪些更改才能使其正常工作?
I'm getting Mean of empty slice
runtime warnings.
When I print out what my variables are (numpy arrays), several
of them contain nan
values. The Runtime Warning is looking at line
58 as the issue. What can I change to make it work?
有时程序会毫无问题地运行.大多数时候它会 不是.
Sometimes the program will run with no issues. Most times it does not.
这是从头开始的K-Means聚类算法 虹膜数据集.它首先提示用户输入 他们想要的质心(簇).然后随机生成 给定范围内的簇数(从已加载的簇数开始) 在文本文件中.
This is a K-Means from scratch algorithm that is clustering the iris data set. It first prompts the users for the amount of centroids they want (clusters). It then randomly generates said number of clusters in the given range from the numbers in the loaded in text file.
我在else语句中有break值以防止无限 循环.
I have the break value in the else statement to prevent infinite loops.
是因为我减去数字时数字小于零 文件中数据点的质心?
Is it because I am having numbers go below zero when I subtract the Centroids from the data points in the file?
运行时出现错误:
How Many Centrouds? 3
Dimensionality of Data: (150, 4)
Starting Centroiuds:
[[ 1.4 7.9 0.2 3.4]
[ 7.8 0.2 4.3 1.4]
[ 5.7 6.9 3. 6.6]]
t0 :
[[[-3.7 4.4 -1.2 3.2]
[ 2.7 -3.3 2.9 1.2]
[ 0.6 3.4 1.6 6.4]]
[[-3.5 4.9 -1.2 3.2]
[ 2.9 -2.8 2.9 1.2]
[ 0.8 3.9 1.6 6.4]]
[[-3.3 4.7 -1.1 3.2]
[ 3.1 -3. 3. 1.2]
[ 1. 3.7 1.7 6.4]]
...,
[[-5.1 4.9 -5. 1.4]
[ 1.3 -2.8 -0.9 -0.6]
[-0.8 3.9 -2.2 4.6]]
[[-4.8 4.5 -5.2 1.1]
[ 1.6 -3.2 -1.1 -0.9]
[-0.5 3.5 -2.4 4.3]]
[[-4.5 4.9 -4.9 1.6]
[ 1.9 -2.8 -0.8 -0.4]
[-0.2 3.9 -2.1 4.8]]]
Warning (from warnings module):
File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 59
warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.
Warning (from warnings module):
File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 68
ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
---------------
Starting Centroids:
[[ 1.4 7.9 0.2 3.4]
[ 7.8 0.2 4.3 1.4]
[ 5.7 6.9 3. 6.6]]
Starting NewMeans:
[[ nan nan nan nan]
[ 5.84333333 3.054 3.75866667 1.19866667]
[ nan nan nan nan]]
Starting Centroids Now:
[[ nan nan nan nan]
[ 5.84333333 3.054 3.75866667 1.19866667]
[ nan nan nan nan]]
NewMeans now:
[[ nan nan nan nan]
[ 5.84333333 3.054 3.75866667 1.19866667]
[ nan nan nan nan]]
Python代码:
import numpy as np
from pprint import pprint
import random
import sys
import warnings
arglist = sys.argv
#UNCOMMENT BELOW IN FINAL PROGRAM
'''
NoOfCentroids = int(arglist[2])
dataPointsFromFile = np.array(np.loadtxt(sys.argv[1], delimiter = ','))
'''
dataPointsFromFile = np.array(np.loadtxt('iris.txt', delimiter = ','))
NoOfCentroids = input('How Many Centrouds? ')
dataRange = ([])
#UNCOMMENT BELOW IN FINAL PROGRAM
'''
with open(arglist[1]) as f:
print 'Points in data set: ',sum(1 for _ in f)
'''
dataRange.append(round(np.amin(dataPointsFromFile),1))
dataRange.append(round(np.amax(dataPointsFromFile),1))
dataRange = np.asarray(dataRange)
dataPoints = np.array(dataPointsFromFile)
print 'Dimensionality of Data: ', dataPoints.shape
randomCentroids = []
data = ([])
templist = []
i = 0
while i<NoOfCentroids:
for j in range(len(dataPointsFromFile[1,:])):
cat = round(random.uniform(np.amin(dataPointsFromFile),np.amax(dataPointsFromFile)),1)
templist.append(cat)
randomCentroids.append(templist)
templist = []
i = i+1
centroids = np.asarray(randomCentroids)
def kMeans(array1, array2):
ConvergenceCounter = 1
keepGoing = True
StartingCentroids = np.copy(centroids)
print 'Starting Centroiuds:\n {}'.format(StartingCentroids)
while keepGoing:
#--------------Find The new means---------#
t0 = StartingCentroids[None, :, :] - dataPoints[:, None, :]
print 't0 :\n {}'.format(t0)
t1 = np.linalg.norm(t0, axis=-1)
t2 = np.argmin(t1, axis=-1)
#------Push the new means to a new array for comparison---------#
CentroidMeans = []
for x in range(len(StartingCentroids)):
CentroidMeans.append(np.mean(dataPoints[t2 == [x]], axis=0))
#--------Convert to a numpy array--------#
NewMeans = np.asarray(CentroidMeans)
#------Compare the New Means with the Starting Means------#
if np.array_equal(NewMeans,StartingCentroids):
print ('Convergence has been reached after {} moves'.format(ConvergenceCounter))
print ('Starting Centroids:\n{}'.format(centroids))
print ('Final Means:\n{}'.format(NewMeans))
print ('Final Cluster assignments: {}'.format(t2))
for x in xrange(len(StartingCentroids)):
print ('Cluster {}:\n'.format(x)), dataPoints[t2 == [x]]
for x in xrange(len(StartingCentroids)):
print ('Size of Cluster {}:'.format(x)), len(dataPoints[t2 == [x]])
keepGoing = False
else:
print 15*'-'
ConvergenceCounter = ConvergenceCounter +1
print 'Starting Centroids:\n'
print StartingCentroids
print '\n'
print 'Starting NewMeans:\n'
print NewMeans
StartingCentroids =np.copy(NewMeans)
print 'Starting Centroids Now:\n'
print StartingCentroids
print '\n'
print 'NewMeans now:'
print NewMeans
break
kMeans(centroids, dataPoints)
推荐答案
我认为警告出现在
np.mean(dataPoints[t2 == [x]], axis=0)
如果t2 == [x]
全部为False(t2
和x
之间没有匹配项,则dataPoints[...]
将是一个空数组,从而导致mean
警告.
If t2 == [x]
is all False (no match between t2
and x
, then dataPoints[...]
will be an empty array, resulting in the mean
warning.
我认为您需要对该测试更加谨慎.如果蒙版数组为空,甚至可以跳过mean
.
I think you need to be more careful with that test. Maybe even skip the mean
if the masked array is empty.
==
测试是不可预测的.您需要使用np.isclose
或np.allclose
之类的东西来测试容差的等效性.
==
tests with floating values are unpredictable. You need to use something like np.isclose
or np.allclose
to test equivalence with a tolerance.
第二个警告来自mean
计算的后面,大概是在尝试将元素数除以0时.
The second warning comes from later in the mean
calc, presumably when trying to divide by 0, the number of elements.
完整的mean
代码可在numpy.core._methods.py
中找到.
The full mean
code can be found in numpy.core._methods.py
.
总而言之,请勿尝试获取空数组的mean
.
In sum, don't try to take the mean
of an empty array.
这篇关于如何预防NAN问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!