scikit学习伪变量的创建 [英] scikit learn creation of dummy variables

查看：75 发布时间：2020/5/4 9:16:21 python machine-learning scikit-learn

本文介绍了scikit学习伪变量的创建的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

在scikit-learn中，我需要哪些模型将分类变量分解为虚拟二进制字段?

In scikit-learn, which models do I need to break categorical variables into dummy binary fields?

例如，如果列为political-party，并且值为democrat，republican和green，则对于许多算法，您必须将其分为三列，其中每一行只能容纳一个1，其余所有必须为0.

For example, if the column is political-party, and the values are democrat, republican and green, for many algorithms, you have to break this into three columns where each row can only hold one 1, and all the rest must be 0.

这避免了强制离散化[democrat, republican and green] => [0, 1, 2]时不存在的序数，因为democrat和green实际上并不比另一对更远".

This avoids enforcing an ordinality that doesn't exist when discretizing [democrat, republican and green] => [0, 1, 2], since democrat and green aren't actually "farther" away then another pair.

此scikit-learn中的哪些算法需要转换为虚拟变量?对于那些不是的算法，它不会受到伤害，对吧?

For which algorithms in scikit-learn is this transformation into dummy variables necessary? And for those algorithms that aren't, it can't hurt, right?

scikit学习伪变量的创建 [英] scikit learn creation of dummy variables

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录关闭

scikit学习伪变量的创建 [英] scikit learn creation of dummy variables

问题描述

推荐答案

相关文章

AI人工智能最新文章

热门教程

热门工具

登录 关闭

登录关闭