ValueError:未知标签类型:在实现MLPClassifier时 [英] ValueError: Unknown label type: while implementing MLPClassifier

查看:94
本文介绍了ValueError:未知标签类型:在实现MLPClassifier时的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据框的年,月,日,小时,分钟,秒,每日_KWH列.我需要使用神经网络预测每日KWH.请让我知道如何解决

  Daily_KWH_System年月日小时分秒0 4136.900384 2016 9 7 0 0 01 3061.657187 2016 9 8 0 0 02 4099.614033 2016 9 9 0 0 03 3922.490275 2016 9 10 0 0 04 3957.128982 2016 9 11 0 0 0 

当我拟合模型时,出现值错误.

到目前为止的代码:

  X = df [['year','month','day','hour','minute','second']]y = df ['Daily_KWH_System']从sklearn.cross_validation导入train_test_splitX_train,X_test,y_train,y_test = train_test_split(X,y)从sklearn.preprocessing导入StandardScaler定标器= StandardScaler()#只适合训练数据scaler.fit(X_train)#y_train.shape#X_train.shapeX_train = scaler.transform(X_train)X_test = scaler.transform(X_test)从sklearn.neural_network导入MLPClassifiermlp = MLPClassifier(hidden_​​layer_sizes =(30,30,30))#y_train = np.asarray(df ['Daily_KWH_System'],dtype ="| S6")mlp.fit(X_train,y_train) 

错误:

  ValueError:未知标签类型:(array([2.27016856e + 02,3.02173014e + 03,4.29404190e + 03,2.41273427e + 02,1.76714247e + 02,4.23374425e + 03, 

解决方案

首先,这是一个回归问题,而不是分类问题,因为 Daily_KWH_System 列中的值不构成标签集.相反,它们似乎是(至少基于提供的示例)实数.

如果无论如何都要将其作为分类问题,则根据 sklearn文档:

在scikit-learn中进行分类时,y是整数的向量或字符串.

在您的情况下, y 是浮点数的向量,因此会出现错误.因此,代替了行

  y = df ['Daily_KWH_System'] 

写一行

  y = np.asarray(df ['Daily_KWH_System'],dtype ="| S6") 

,这将解决问题.(您可以在此处了解有关此方法的更多信息: Python RandomForest-未知标签错误)

但是,由于在这种情况下回归更为合适,因此请代替上述更改

从sklearn.neural_network导入

  MLPClassifiermlp = MLPClassifier(hidden_​​layer_sizes =(30,30,30)) 

使用

从sklearn.neural_network导入

  MLPRegressormlp = MLPRegressor(hidden_​​layer_sizes =(30,30,30)) 

代码将运行而不会引发错误(但是肯定没有足够的数据来检查我们获得的模型是否运行良好).

话虽如此,我认为这不是选择此问题功能的正确方法.

在此问题中,我们处理形成时间序列的实数序列.我们可以选择的一项合理功能是自起点以来经过的秒数(或分钟\小时\天等).由于此特定数据仅包含天,月和年(其他值始终为0),因此我们可以选择从开始算起经过的天数作为特征.然后您的数据框将如下所示:

  Daily_KWH_System days_passed0 4136.900384 01 3061.657187 12 4099.614033 23 3922.490275 34 3957.128982 4 

您可以将 days_passed 列中的值作为特征,并将 Daily_KWH_System 中的值作为目标.您还可以添加一些指标功能.例如,如果您认为年底可能会影响目标,则可以添加指示符功能来指示月份是否为12月.

如果数据确实是每天的(至少在此示例中每天有一个数据点),并且您想使用神经网络解决此问题,那么另一种合理的方法是将其作为时间序列来处理,并尝试适合递归神经网络.这是描述此方法的几篇很棒的博客文章:

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

http://machinelearningmastery.com/time-series-casting-long-短期内存网络-python/

I have dataframe with columns Year, month, day,hour, minute, second, Daily_KWH. I need to predict Daily KWH using neural netowrk. Please let me know how to go about it

      Daily_KWH_System  year  month  day  hour  minute  second
0          4136.900384  2016      9    7     0       0       0
1          3061.657187  2016      9    8     0       0       0
2          4099.614033  2016      9    9     0       0       0
3          3922.490275  2016      9   10     0       0       0
4          3957.128982  2016      9   11     0       0       0

I'm getting the Value Error, when I'm fitting the model.

code so far:

X = df[['year','month','day','hour','minute','second']]
y = df['Daily_KWH_System']

from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y)

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit only to the training data
scaler.fit(X_train)

#y_train.shape
#X_train.shape

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

from sklearn.neural_network import MLPClassifier

mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

#y_train = np.asarray(df['Daily_KWH_System'], dtype="|S6") 

mlp.fit(X_train,y_train)

Error:

ValueError: Unknown label type: (array([  2.27016856e+02,   3.02173014e+03,   4.29404190e+03,
     2.41273427e+02,   1.76714247e+02,   4.23374425e+03,

解决方案

First of all, this is a regression problem and not a classification problem, as the values in the Daily_KWH_System column do not form a set of labels. Instead, they seem to be (at least based on the provided example) real numbers.

If you want to approach it as a classification problem regardless, then according to sklearn documentation:

When doing classification in scikit-learn, y is a vector of integers or strings.

In your case, y is a vector of floats, and therefore you get the error. Thus, instead of the line

y = df['Daily_KWH_System']

write the line

y = np.asarray(df['Daily_KWH_System'], dtype="|S6")

and this will resolve the issue. (You can read more about this approach here: Python RandomForest - Unknown label Error)

Yet, as regression is more appropriate in this case, then instead of the above change, replace the lines

from sklearn.neural_network import MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(30,30,30))

with

from sklearn.neural_network import MLPRegressor
mlp = MLPRegressor(hidden_layer_sizes=(30,30,30))

The code will run without throwing an error (but there certainly isn't enough data to check whether the model that we get performs well).

With that being said, I don't think that this is the right approach for choosing features for this problem.

In this problem we deal with a sequence of real numbers that form a time series. One reasonable feature that we could choose is the number of seconds (or minutes\hours\days etc) that passed since the starting point. Since this particular data contains only days, months and years (other values are always 0), we could choose as a feature the number of days that passed since the beginning. Then your data frame will look like:

      Daily_KWH_System  days_passed 
0          4136.900384    0   
1          3061.657187    1     
2          4099.614033    2  
3          3922.490275    3   
4          3957.128982    4  

You could take the values in the column days_passed as features and the values in Daily_KWH_System as targets. You may also add some indicator features. For example, if you think that the end of the year may affect the target, you can add an indicator feature that indicates whether the month is December or not.

If the data is indeed daily (at least in this example you have one data point per day) and you want to tackle this problem with neural networks, then another reasonable approach would be to handle it as a time series and try to fit recurrent neural network. Here are couple of great blog posts that describe this approach:

http://machinelearningmastery.com/time-series-prediction-lstm-recurrent-neural-networks-python-keras/

http://machinelearningmastery.com/time-series-forecasting-long-short-term-memory-network-python/

这篇关于ValueError:未知标签类型:在实现MLPClassifier时的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆