如何手动选择决策树的功能 [英] How to manually select the features of the decision tree
问题描述
我需要能够更改用于构建决策树的功能(具有机器学习的含义).以Iris数据集为例,我希望能够选择Sepallength作为根节点中使用的功能,并选择Petallength作为第一级节点中使用的功能,依此类推.
I need to be able to change the features (with the machine learning meaning) that are used to build the decision tree. Given the example of the Iris Dataset, I want to be able to select the Sepallength as the feature used in the root node and the Petallength as a feature used in the nodes of the first level, and so on.
我想明确一点,我的目的不是更改最小样本分割和决策树的随机状态.而是选择特征-被分类元素的特征-并将它们放在决策树的某些节点中.
I want to be clear, my aim is not to change the minimum sample split and the random state of the decision tree. But rather to select the features - the characteristics of the elements that are classified - and put them in some nodes of the decision tree.
然后,代码应该能够找到最佳阈值-每个节点的范围-产生最佳分割.
The code should then be able to find the best threshold - range for each node - to generate the best split.
这里有一些有关生成树的通用代码.
Here some general code about the tree generation.
from sklearn.tree import DecisionTreeClassifier
from sklearn.datasets import load_iris
clf = DecisionTreeClassifier(random_state=0)
iris = load_iris()
clf.fit(iris.data,iris.target)
你们有没有做过?
推荐答案
你们有没有做过?
Does any of you have ever done this?
不,您可能是第一个!
哈哈,但您可以通过多种方式选择它,也可以在官方文档中找到它: https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
Haha, but you can select it in several ways, you can also find it in the offical documentation: https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html
# import some data to play with
iris = datasets.load_iris()
X = iris.data[:, :2] # we only take the first two features.
y = iris.target
然后您要执行的操作: clf.fit(X,y)
then you are doing: clf.fit(X, y)
此处介绍了其他方法:在熊猫中选择多个列数据框
Ohter ways to do it are explained here: Selecting multiple columns in a pandas dataframe
这篇关于如何手动选择决策树的功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!