分类中的目标变量是否需要数字编码? [英] Is numerical encoding necessary for the target variable in classification?

查看：108 发布时间：2020/5/4 10:08:47 python machine-learning sklearn-pandas

本文介绍了分类中的目标变量是否需要数字编码?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用sklearn进行文本分类，我的所有功能都是数字，但是目标变量标签在文本中.我可以理解将特征编码为数字的基本原理，但是认为这不适用于目标变量吗?

I am using sklearn for text classification, all my features are numerical but my target variable labels are in text. I can understand the rationale behind encoding features to numerics but don't think this applies for the target variable?

推荐答案

如果目标变量为文本形式，则可以将其转换为数字形式(或者可以不使用它，请参见下面的注释)，以便任何将Oci(One Versus All)方案选为Scikit-learn算法:您的学习算法仅在将残差转换为从0到(类别数)的数字代码时，才尝试猜测每个类别与残差类别的比较-1).

If your target variable is in textual form, you can transform it into numeric form (or you can leave it alone, please see my note below) in order for any Scikit-learn algorithm to pick it in an OVA (One Versus All) scheme: your learning algorithm will try to guess each class as compared against the residual ones only when they will be transformed into numeric codes starting from 0 to (number of classes - 1).

例如，在Scikit-Learn文档的示例中，您可以找出虹膜的类别，因为有三种模型可以评估每种可能的类别:

For instance, in this example from the Scikit-Learn documentation, you can figure out the class of your iris because there are three models that evaluate each possible class:

0级与1级和2级
1级与0级和2级
第2类与第0和第1类

自然地，类0、1和2是Setosa，Versicolor和Virginica，但是该算法需要将它们表示为数字代码，您可以通过研究示例代码的结果进行验证:

Naturally, classes 0, 1 and 2 are Setosa, Versicolor, and Virginica, but the algorithm needs them expressed as numeric codes, as you can verify by exploring the results of the example code:

list(iris.target_names)
['setosa', 'versicolor', 'virginica']

np.unique(Y)
array([0, 1, 2])

注意:确实，Scikit-learn本身可以对目标标签进行编码如果它们是字符串.在Scikit-learn的Github页面上进行物流回归 ( https://github.com/scikit-学习/scikit-learn/blob/master/sklearn/linear_model/logistic.py ) 您可以在行1623和1624看到代码调用标签编码器的地方并自动对标签进行编码:

NOTE: it is true that Scikit-learn encodes by itself the target labels if they are strings. On Scikit-learn's Github page for logistic regression (https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/logistic.py) you can see at rows 1623 and 1624 where the code calls the label encoder and it encodes labels automatically:

# Encode for string labels
label_encoder = LabelEncoder().fit(y)
y = label_encoder.transform(y)

这篇关于分类中的目标变量是否需要数字编码?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

分类中的目标变量是否需要数字编码? [英] Is numerical encoding necessary for the target variable in classification?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

分类中的目标变量是否需要数字编码? [英] Is numerical encoding necessary for the target variable in classification?

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭