获取ValueError:使用scikit Learn的LabelEncoder时y包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

查看：339 发布时间：2020/5/4 9:47:55 python machine-learning scikit-learn prediction

本文介绍了获取ValueError:使用scikit Learn的LabelEncoder时y包含新标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类似的系列:

df['ID'] = ['ABC123', 'IDF345', ...]

我正在使用scikit的LabelEncoder将其转换为数值，以馈入RandomForestClassifier.

I'm using scikit's LabelEncoder to convert it to numerical values to be fed into the RandomForestClassifier.

在培训期间，我的工作如下:

During the training, I'm doing as follows:

le_id = LabelEncoder()
df['ID'] = le_id.fit_transform(df.ID)

但是，现在为了进行测试/预测，当我传入新数据时，我想基于le_id从此数据中转换"ID"，即，如果存在相同的值，则根据上述标签编码器对其进行转换，否则请分配一个新的数值.

But, now for testing/prediction, when I pass in new data, I want to transform the 'ID' from this data based on le_id i.e., if same values are present then transform it according to the above label encoder, otherwise assign a new numerical value.

在测试文件中，我正在执行以下操作:

In the test file, I was doing as follows:

new_df['ID'] = le_dpid.transform(new_df.ID)

但是，出现以下错误:ValueError: y contains new labels

But, I'm getting the following error: ValueError: y contains new labels

我该如何解决?谢谢！

更新:

因此，我要做的任务是使用以下示例(例如)作为训练数据，并预测新的BankNum ID组合的'High', 'Mod', 'Low'值.该模型应从训练数据集中学习给出高"，给出低"的特征.例如，当存在多个具有相同BankNum和不同ID的条目时，将在高"下面给出.

So the task I have is to use the below (for example) as training data and predict the 'High', 'Mod', 'Low' values for new BankNum, ID combinations. The model should learn the characteristics where a 'High' is given, where a 'Low' is given from the training dataset. For example, below a 'High' is given when there are multiple entries with same BankNum and different IDs.

df = 

BankNum   | ID    | Labels

0098-7772 | AB123 | High
0098-7772 | ED245 | High
0098-7772 | ED343 | High
0870-7771 | ED200 | Mod
0870-7771 | ED100 | Mod
0098-2123 | GH564 | Low

然后通过类似以下内容进行预测:

And then predict it on something like:

BankNum   |  ID | 

00982222  | AB999 | 
00982222  | AB999 |
00981111  | AB890 |

我正在做这样的事情:

df['BankNum'] = df.BankNum.astype(np.float128)

    le_id = LabelEncoder()
    df['ID'] = le_id.fit_transform(df.ID)

X_train, X_test, y_train, y_test = train_test_split(df[['BankNum', 'ID'], df.Labels, test_size=0.25, random_state=42)
    clf = RandomForestClassifier(random_state=42, n_estimators=140)
    clf.fit(X_train, y_train)

获取ValueError:使用scikit Learn的LabelEncoder时y包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

获取ValueError:使用scikit Learn的LabelEncoder时y包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn&#39;s LabelEncoder

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

获取ValueError:使用scikit Learn的LabelEncoder时y包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

登录关闭