是否可以在 sklearn 管道中切换某个步骤? [英] Is it possible to toggle a certain step in sklearn pipeline?
问题描述
我想知道我们是否可以在 sklearn.pipeline
中设置一个可选"步骤.例如,对于分类问题,我可能想尝试使用 ExtraTreesClassifier
和 AND 而没有在它之前进行 PCA
转换.在实践中,它可能是一个带有额外参数的管道,用于指定 PCA
步骤的切换,以便我可以通过 GridSearch
等对其进行优化.我不在 sklearn 源代码中看到这样的实现,但有什么解决方法吗?
I wonder if we can set up an "optional" step in sklearn.pipeline
. For example, for a classification problem, I may want to try an ExtraTreesClassifier
with AND without a PCA
transformation ahead of it. In practice, it might be a pipeline with an extra parameter specifying the toggle of the PCA
step, so that I can optimize on it via GridSearch
and etc. I don't see such an implementation in sklearn source, but is there any work-around?
此外,由于管道中后续步骤的可能参数值可能取决于前一步中的参数(例如,ExtraTreesClassifier.max_features
的有效值取决于 PCA.n_components
),是否可以在 sklearn.pipeline
和 sklearn.grid_search
中指定这样的条件依赖?
Furthermore, since the possible parameter values of a following step in pipeline might depend on the parameters in a previous step (e.g., valid values of ExtraTreesClassifier.max_features
depend on PCA.n_components
), is it possible to specify such a conditional dependency in sklearn.pipeline
and sklearn.grid_search
?
谢谢!
推荐答案
Pipeline
步骤目前无法在网格搜索中成为可选步骤,但您可以将PCA
类包装到您自己的OptionalPCA
组件中使用布尔参数在请求时关闭 PCA 作为快速解决方法.您可能想查看 hyperopt 以设置更复杂的搜索空间.我认为默认情况下它具有很好的 sklearn 集成来支持这种模式,但我再也找不到文档了.也许看看这个演讲.Pipeline
steps cannot currently be made optional in a grid search but you could wrap thePCA
class into your ownOptionalPCA
component with a boolean parameter to turn off PCA when requested as a quick workaround. You might want to have a look at hyperopt to setup more complex search spaces. I think it has good sklearn integration to support this kind of patterns by default but I cannot find the doc anymore. Maybe have a look at this talk.对于依赖参数问题,
GridSearchCV
支持参数树来处理这种情况如文档中所示.For the dependent parameters problem,
GridSearchCV
supports trees of parameters to handle this case as demonstrated in the documentation.这篇关于是否可以在 sklearn 管道中切换某个步骤?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!