在SciKit-Learn中使用管道的排列重要性 [英] Permutation importance using a Pipeline in SciKit-Learn

查看：18 发布时间：2022/4/9 18:44:56 python machine-learning scikit-learn pipeline

本文介绍了在SciKit-Learn中使用管道的排列重要性的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

我使用的是SciKit中的确切示例，它将permutation_importance与tree feature_importances

进行了比较

如您所见，使用了管道：

rf = Pipeline([
    ('preprocess', preprocessing),
    ('classifier', RandomForestClassifier(random_state=42))
])
rf.fit(X_train, y_train)

permutation_importance：

现在，当您拟合Pipeline时，它将逐个拟合所有转换并转换数据，然后使用最终估计器拟合转换后的数据。

在后面的示例中，他们在拟合的模型上使用permutation_importance：

result = permutation_importance(rf, X_test, y_test, n_repeats=10,
                                random_state=42, n_jobs=2)

问题：我不明白的是result中的功能仍然是原始的未转换功能。为何会是这样呢？这能正常工作吗？那么Pipeline的用途是什么？

tree feature_importance：在同一示例中，当它们使用feature_importance时，结果会被转换：

tree_feature_importances = (
    rf.named_steps['classifier'].feature_importances_)

我显然可以转换我的功能，然后使用permutation_importance，但示例中提供的步骤似乎是故意的，应该有permutation_importance不转换功能的原因。