获取 ValueError: y 使用 scikit learn 的 LabelEncoder 时包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

查看：22 发布时间：2021/12/14 10:00:11 python machine-learning encoding scikit-learn categorical-data

本文介绍了获取 ValueError: y 使用 scikit learn 的 LabelEncoder 时包含新标签的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个类似的系列:

df['ID'] = ['ABC123', 'IDF345', ...]

我正在使用 scikit 的 LabelEncoder 将其转换为数值以输入 RandomForestClassifier.

I'm using scikit's LabelEncoder to convert it to numerical values to be fed into the RandomForestClassifier.

在培训期间，我做如下:

During the training, I'm doing as follows:

le_id = LabelEncoder()
df['ID'] = le_id.fit_transform(df.ID)

但是，现在为了测试/预测，当我传入新数据时，我想根据 le_id 从此数据转换ID"，即，如果存在相同的值，则根据给上面的标签编码器，否则分配一个新的数值.

But, now for testing/prediction, when I pass in new data, I want to transform the 'ID' from this data based on le_id i.e., if same values are present then transform it according to the above label encoder, otherwise assign a new numerical value.

在测试文件中，我是这样做的:

In the test file, I was doing as follows:

new_df['ID'] = le_dpid.transform(new_df.ID)

但是，我收到以下错误:ValueError: y contains new labels

But, I'm getting the following error: ValueError: y contains new labels

我该如何解决这个问题??谢谢！

How do I fix this?? Thanks!

更新:

所以我的任务是使用以下(例如)作为训练数据并预测新 BankNum、ID 组合的 'High'、'Mod'、'Low' 值.模型应该学习从训练数据集中给出高"和低"的特征.例如，当有多个条目具有相同的 BankNum 和不同的 ID 时，会在下面给出一个高".

So the task I have is to use the below (for example) as training data and predict the 'High', 'Mod', 'Low' values for new BankNum, ID combinations. The model should learn the characteristics where a 'High' is given, where a 'Low' is given from the training dataset. For example, below a 'High' is given when there are multiple entries with same BankNum and different IDs.

df = 

BankNum   | ID    | Labels

0098-7772 | AB123 | High
0098-7772 | ED245 | High
0098-7772 | ED343 | High
0870-7771 | ED200 | Mod
0870-7771 | ED100 | Mod
0098-2123 | GH564 | Low

然后根据以下内容对其进行预测:

And then predict it on something like:

BankNum   |  ID | 

00982222  | AB999 | 
00982222  | AB999 |
00981111  | AB890 |

我正在做这样的事情:

df['BankNum'] = df.BankNum.astype(np.float128)

    le_id = LabelEncoder()
    df['ID'] = le_id.fit_transform(df.ID)

X_train, X_test, y_train, y_test = train_test_split(df[['BankNum', 'ID'], df.Labels, test_size=0.25, random_state=42)
    clf = RandomForestClassifier(random_state=42, n_estimators=140)
    clf.fit(X_train, y_train)

获取 ValueError: y 使用 scikit learn 的 LabelEncoder 时包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

获取 ValueError: y 使用 scikit learn 的 LabelEncoder 时包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn&#39;s LabelEncoder

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

获取 ValueError: y 使用 scikit learn 的 LabelEncoder 时包含新标签 [英] Getting ValueError: y contains new labels when using scikit learn's LabelEncoder

登录关闭