Scikit-learn - ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 和随机森林来说太大的值 [英] Scikit-learn - ValueError: Input contains NaN, infinity or a value too large for dtype('float32') with Random Forest

查看:55
本文介绍了Scikit-learn - ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 和随机森林来说太大的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

首先,我检查了有关此错误的不同帖子,但没有一个可以解决我的问题.

First, I have checked the different posts concerning this error and none of them can solve my issue.

所以我使用 RandomForest 并且我能够生成森林并进行预测,但有时在生成森林期间,我会收到以下错误.

So I am using RandomForest and I am able to generate the forest and to do a prediction but sometimes during the generation of the forest, I get the following error.

ValueError: 输入包含 NaN、无穷大或对于 dtype('float32') 来说太大的值.

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

此错误发生在相同的数据集上.有时数据集会在训练过程中产生错误,而大多数情况下不会.错误有时发生在训练开始时,有时发生在训练过程中.

This error occurs with the same dataset. Sometimes the dataset creates an error during the training and most of the time not. The error sometimes occurs at the start and sometimes in the middle of the training.

这是我的代码:

import pandas as pd
from sklearn import ensemble
import numpy as np

def azureml_main(dataframe1 = None, dataframe2 = None):

    # Execution logic goes here

    Input = dataframe1.values[:,:]
    InputData = Input[:,:15]
    InputTarget = Input[:,16:]

    limitTrain = 2175

    clf = ensemble.RandomForestClassifier(n_estimators = 10000, n_jobs = 4 );

    features=np.empty([len(InputData),10])
    j=0
    for i in range (0,14):
        if (i == 1 or i == 4 or i == 5 or i == 6 or i == 8 or i == 9 or  i == 10 or i == 11 or i == 13 or i == 14):
            features[:,j] = (InputData[:, i])
            j += 1     
        
    clf.fit(features[:limitTrain,:],np.asarray(InputTarget[:limitTrain,1],dtype = np.float32))

    res = clf.predict_proba(features[limitTrain+1:,:])

    listreu = np.empty([len(res),5])
    for i in range(len(res)):
        if(res[i,0] > 0.5):
            listreu[i,4] = 0;
        elif(res[i,1] > 0.5):
            listreu[i,4] = 1;
        elif(res[i,2] > 0.5):
            listreu[i,4] = 2;
        else:
            listreu[i,4] = 3;
    

    listreu[:,0] = features[limitTrain+1:,0]
    listreu[:,1] = InputData[limitTrain+1:,2]
    listreu[:,2] = InputData[limitTrain+1:,3]
    listreu[:,3] = features[limitTrain+1:,1]



    # Return value must be of a sequence of pandas.DataFrame
    return pd.DataFrame(listreu),

我在本地和 Azure ML Studio 上运行我的代码,两种情况下都会出现错误.

I ran my code locally and on Azure ML Studio and the error occurs in both cases.

我确定这不是由于我的数据集造成的,因为大部分时间我都没有收到错误消息,而且我自己是从不同的输入生成数据集的.

I am sure that it is not due to my dataset since most of the time I don't get the error and I am generating the dataset myself from different input.

这是我使用的数据集的一部分

我可能发现我有 0 值,而不是真正的 0 值.这些值就像

I probably found out that I had 0 value which were not real 0 value. The values were like

3.0x10^-314

3.0x10^-314

推荐答案

自从我纠正了编辑的问题后,我就没有更多的错误了.我只是用零替换 3.0x10^-314 值.

Since I correct the problem of the edit, I have no more errors. I just replace 3.0x10^-314 values with zeros.

这篇关于Scikit-learn - ValueError:输入包含 NaN、无穷大或对于 dtype('float32') 和随机森林来说太大的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆