sklearn 随机森林可以直接处理分类特征吗? [英] Can sklearn random forest directly handle categorical features?

查看：152 发布时间：2021/7/2 20:04:56 python scikit-learn random-forest one-hot-encoding

本文介绍了sklearn 随机森林可以直接处理分类特征吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个分类特征，颜色，它接受值

Say I have a categorical feature, color, which takes the values

['红色','蓝色','绿色','橙色'],

['red', 'blue', 'green', 'orange'],

我想用它来预测随机森林中的某些东西.如果我对它进行单热编码(即我将其更改为四个虚拟变量)，我如何告诉 sklearn 这四个虚拟变量实际上是一个变量?具体来说，当 sklearn 随机选择要在不同节点上使用的特征时，它应该包括红色、蓝色、绿色和橙色的假人，或者不应该包括其中任何一个.

and I want to use it to predict something in a random forest. If I one-hot encode it (i.e. I change it to four dummy variables), how do I tell sklearn that the four dummy variables are really one variable? Specifically, when sklearn is randomly selecting features to use at different nodes, it should either include the red, blue, green and orange dummies together, or it shouldn't include any of them.

我听说没有办法做到这一点，但我想一定有一种方法可以处理分类变量，而不必随意将它们编码为数字或类似的东西.

I've heard that there's no way to do this, but I'd imagine there must be a way to deal with categorical variables without arbitrarily coding them as numbers or something like that.

sklearn 随机森林可以直接处理分类特征吗? [英] Can sklearn random forest directly handle categorical features?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

sklearn 随机森林可以直接处理分类特征吗? [英] Can sklearn random forest directly handle categorical features?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭