给定数值目标变量,我是否应该转换目标变量以获得多类分类的指标矩阵? [英] Given numerical target variable, should I transform the target variable to obtain indicator matrix for multiclass classification?

查看:39
本文介绍了给定数值目标变量,我是否应该转换目标变量以获得多类分类的指标矩阵?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 RandomForestClassifier 处理多类分类问题.目标变量 Y 仅包含 3 个值 {-1,0,1} 之一.我了解数字编码是必要的.

I am working on a multiclass classification problem using RandomForestClassifier. The target variable Y only contain one of 3 values {-1,0,1 }. I understand that numerical encoding is necessary.

但是,我想了解是否有必要通过执行 pd.get_dummies(Y) 来转换 Y 以获得如下所示的指标矩阵,然后将此指标矩阵输入 RandomForestClassifier?

However, I would like to understand if it is necessary for me to transform Y to obtain an indicator matrix like below by doing pd.get_dummies(Y) and then feed this indicator matrix into the RandomForestClassifier?

      -1.0   0.0   1.0
0        0     0     1
1        1     0     0
2        0     0     1
3        1     0     0
4        1     0     0
   ...   ...   ...
6516     1     0     0
6517     0     0     1
6518     0     0     1
6519     0     0     1
6520     1     0     0

与将未变换的目标变量 Y(即一维序列)输入 RandomForestClassifier 相比,这将如何影响机器学习算法?结果会不同吗?为什么?

Comparing above to feeding the untransformed target variable Y (i.e. a 1 dimensional series) into RandomForestClassifier, how would this affect the machine learning algorithm ? Would the results be different and why ?

RandomForestClassifier 在这两种不同的情况下做不同的事情吗?推荐哪种方法(指标矩阵与未变换)?

Is the RandomForestClassifier doing different things under these 2 different scenarios ? Which approach is recommended (indicator matrix vs untransformed)?

推荐答案

我认为没有任何理由偏爱其中一个.文档声明您可以将形状为 (n_samples,)(n_samples, n_outputs) 的类似数组作为 y 传递给 sklearn.ensemble.RandomForestClassifier.fit().

I don't think there's any reason to prefer one over the other. The documentation states that you can pass an array-like of shape (n_samples,) or (n_samples, n_outputs) as y to sklearn.ensemble.RandomForestClassifier.fit().

唯一的区别是 .predict() 如何返回预测的类.我建议您根据需要进行预测的格式来决定 Y 的形状.

The only difference would be how .predict() returns the predicted classes. I recommend you decide the shape of Y based on the format that you need the predictions to be in.

除此之外,每个估计量的拆分过程完全相同.

Aside from that, the splitting process of each estimator is the exact same.

这篇关于给定数值目标变量,我是否应该转换目标变量以获得多类分类的指标矩阵?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆