如何预防NAN问题? [英] How can I prevent NAN issues?

查看:53
本文介绍了如何预防NAN问题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到Mean of empty slice运行时警告. 当我打印出我的变量是什么(numpy数组)时,有几个 其中包含nan值.运行时警告正在看行 58作为问题.我可以进行哪些更改才能使其正常工作?

I'm getting Mean of empty slice runtime warnings. When I print out what my variables are (numpy arrays), several of them contain nan values. The Runtime Warning is looking at line 58 as the issue. What can I change to make it work?

有时程序会毫无问题地运行.大多数时候它会 不是.

Sometimes the program will run with no issues. Most times it does not.

这是从头开始的K-Means聚类算法 虹膜数据集.它首先提示用户输入 他们想要的质心(簇).然后随机生成 给定范围内的簇数(从已加载的簇数开始) 在文本文件中.

This is a K-Means from scratch algorithm that is clustering the iris data set. It first prompts the users for the amount of centroids they want (clusters). It then randomly generates said number of clusters in the given range from the numbers in the loaded in text file.

我在else语句中有break值以防止无限 循环.

I have the break value in the else statement to prevent infinite loops.

是因为我减去数字时数字小于零 文件中数据点的质心?

Is it because I am having numbers go below zero when I subtract the Centroids from the data points in the file?

运行时出现错误:

How Many Centrouds? 3
Dimensionality of Data:  (150, 4)
Starting Centroiuds:
 [[ 1.4  7.9  0.2  3.4]
 [ 7.8  0.2  4.3  1.4]
 [ 5.7  6.9  3.   6.6]]
t0 :
 [[[-3.7  4.4 -1.2  3.2]
  [ 2.7 -3.3  2.9  1.2]
  [ 0.6  3.4  1.6  6.4]]

 [[-3.5  4.9 -1.2  3.2]
  [ 2.9 -2.8  2.9  1.2]
  [ 0.8  3.9  1.6  6.4]]

 [[-3.3  4.7 -1.1  3.2]
  [ 3.1 -3.   3.   1.2]
  [ 1.   3.7  1.7  6.4]]

 ..., 
 [[-5.1  4.9 -5.   1.4]
  [ 1.3 -2.8 -0.9 -0.6]
  [-0.8  3.9 -2.2  4.6]]

 [[-4.8  4.5 -5.2  1.1]
  [ 1.6 -3.2 -1.1 -0.9]
  [-0.5  3.5 -2.4  4.3]]

 [[-4.5  4.9 -4.9  1.6]
  [ 1.9 -2.8 -0.8 -0.4]
  [-0.2  3.9 -2.1  4.8]]]

Warning (from warnings module):
  File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 59
    warnings.warn("Mean of empty slice.", RuntimeWarning)
RuntimeWarning: Mean of empty slice.

Warning (from warnings module):
  File "C:\Python27\lib\site-packages\numpy\core\_methods.py", line 68
    ret, rcount, out=ret, casting='unsafe', subok=False)
RuntimeWarning: invalid value encountered in true_divide
---------------
Starting Centroids:

[[ 1.4  7.9  0.2  3.4]
 [ 7.8  0.2  4.3  1.4]
 [ 5.7  6.9  3.   6.6]]


Starting NewMeans:

[[        nan         nan         nan         nan]
 [ 5.84333333  3.054       3.75866667  1.19866667]
 [        nan         nan         nan         nan]]
Starting Centroids Now:

[[        nan         nan         nan         nan]
 [ 5.84333333  3.054       3.75866667  1.19866667]
 [        nan         nan         nan         nan]]


NewMeans now:
[[        nan         nan         nan         nan]
 [ 5.84333333  3.054       3.75866667  1.19866667]
 [        nan         nan         nan         nan]]

Python代码:

import numpy as np
from pprint import pprint
import random
import sys
import warnings

arglist = sys.argv 

#UNCOMMENT BELOW IN FINAL PROGRAM
'''
NoOfCentroids = int(arglist[2])
dataPointsFromFile = np.array(np.loadtxt(sys.argv[1], delimiter = ','))
'''

dataPointsFromFile = np.array(np.loadtxt('iris.txt', delimiter = ','))

NoOfCentroids = input('How Many Centrouds? ')

dataRange = ([])

#UNCOMMENT BELOW IN FINAL PROGRAM
'''
with open(arglist[1]) as f:
    print 'Points in data set: ',sum(1 for _ in f)
'''
dataRange.append(round(np.amin(dataPointsFromFile),1))
dataRange.append(round(np.amax(dataPointsFromFile),1))
dataRange = np.asarray(dataRange)

dataPoints = np.array(dataPointsFromFile)
print 'Dimensionality of Data: ', dataPoints.shape

randomCentroids = []
data = ([])
templist = []
i = 0

while i<NoOfCentroids:
    for j in range(len(dataPointsFromFile[1,:])):
        cat = round(random.uniform(np.amin(dataPointsFromFile),np.amax(dataPointsFromFile)),1)
        templist.append(cat)
    randomCentroids.append(templist)
    templist = []
    i = i+1

centroids = np.asarray(randomCentroids)

def kMeans(array1, array2):
    ConvergenceCounter = 1
    keepGoing = True
    StartingCentroids = np.copy(centroids)
    print 'Starting Centroiuds:\n {}'.format(StartingCentroids)
    while keepGoing:      
        #--------------Find The new means---------#
        t0 = StartingCentroids[None, :, :] - dataPoints[:, None, :]
        print 't0 :\n {}'.format(t0)
        t1 = np.linalg.norm(t0, axis=-1)
        t2 = np.argmin(t1, axis=-1)
        #------Push the new means to a new array for comparison---------#
        CentroidMeans = []
        for x in range(len(StartingCentroids)):
            CentroidMeans.append(np.mean(dataPoints[t2 == [x]], axis=0))
        #--------Convert to a numpy array--------#
        NewMeans = np.asarray(CentroidMeans)
        #------Compare the New Means with the Starting Means------#
        if np.array_equal(NewMeans,StartingCentroids):
            print ('Convergence has been reached after {} moves'.format(ConvergenceCounter))
            print ('Starting Centroids:\n{}'.format(centroids))
            print ('Final Means:\n{}'.format(NewMeans))
            print ('Final Cluster assignments: {}'.format(t2))
            for x in xrange(len(StartingCentroids)):
                print ('Cluster {}:\n'.format(x)), dataPoints[t2 == [x]]
            for x in xrange(len(StartingCentroids)):
                print ('Size of Cluster {}:'.format(x)), len(dataPoints[t2 == [x]])
            keepGoing = False
        else:
            print 15*'-'
            ConvergenceCounter  = ConvergenceCounter +1
            print 'Starting Centroids:\n'
            print StartingCentroids
            print '\n'
            print 'Starting NewMeans:\n'
            print NewMeans
            StartingCentroids =np.copy(NewMeans)
            print 'Starting Centroids Now:\n'
            print StartingCentroids
            print '\n'
            print 'NewMeans now:'
            print NewMeans
            break


kMeans(centroids, dataPoints)

推荐答案

我认为警告出现在

np.mean(dataPoints[t2 == [x]], axis=0)

如果t2 == [x]全部为False(t2x之间没有匹配项,则dataPoints[...]将是一个空数组,从而导致mean警告.

If t2 == [x] is all False (no match between t2 and x, then dataPoints[...] will be an empty array, resulting in the mean warning.

我认为您需要对该测试更加谨慎.如果蒙版数组为空,甚至可以跳过mean.

I think you need to be more careful with that test. Maybe even skip the mean if the masked array is empty.

==测试是不可预测的.您需要使用np.isclosenp.allclose之类的东西来测试容差的等效性.

== tests with floating values are unpredictable. You need to use something like np.isclose or np.allclose to test equivalence with a tolerance.

第二个警告来自mean计算的后面,大概是在尝试将元素数除以0时.

The second warning comes from later in the mean calc, presumably when trying to divide by 0, the number of elements.

完整的mean代码可在numpy.core._methods.py中找到.

The full mean code can be found in numpy.core._methods.py.

总而言之,请勿尝试获取空数组的mean.

In sum, don't try to take the mean of an empty array.

这篇关于如何预防NAN问题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆