从sklearn.pipeline.Pipeline获取转换器结果 [英] getting transformer results from sklearn.pipeline.Pipeline

查看:328
本文介绍了从sklearn.pipeline.Pipeline获取转换器结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用sklearn.pipeline.Pipeline对象进行群集.

I am using a sklearn.pipeline.Pipeline object for my clustering.

pipe = sklearn.pipeline.Pipeline([('transformer1': transformer1),
                                  ('transformer2': transformer2),
                                  ('clusterer': clusterer)])

然后我要通过轮廓分数来评估结果.

Then I am evaluating the result by using the silhouette score.

sil = preprocessing.silhouette_score(X, y)

我想知道如何从管道中获取X或转换后的数据,因为它仅返回clusterer.fit_predict(X).

I'm wondering how I can get the X or the transformed data from the pipeline as it only returns the clusterer.fit_predict(X).

我了解我可以通过将管道拆分为

I understand that I can do this by just splitting the pipeline as

pipe = sklearn.pipeline.Pipeline([('transformer1': transformer1),
                                  ('transformer2': transformer2)])

X = pipe.fit_transform(data)
res = clusterer.fit_predict(X)
sil = preprocessing.silhouette_score(X, res)

但是我只想在一个管道中完成所有操作.

but I would like to just do it all in one pipeline.

推荐答案

如果您想同时在管道的中间步骤上调整数据并进行转换,则没有必要重用相同的管道,而最好使用新的管道如您所指定的,因为调用fit()会忘记所有先前学习的数据.

If you want to both fit and transform the data on intermediate steps of the pipeline then it makes no sense to reuse the same pipeline and better to use a new one as you specified, because calling fit() will forget all about previously learnt data.

但是,如果您只想transform()并在已安装的管道上查看中间数据,则可以通过访问

However if you only want to transform() and see the intermediate data on an already fitted pipeline, then its possible by accessing the named_steps parameter.

new_pipe = sklearn.pipeline.Pipeline([('transformer1': 
                                           old_pipe.named_steps['transformer1']),
                                      ('transformer2': 
                                          old_pipe.named_steps['transformer2'])])

或直接使用内部变量steps,例如:

Or directly using the inner varible steps like:

transformer_steps = old_pipe.steps
new_pipe = sklearn.pipeline.Pipeline([('transformer1': transformer_steps[0]),
                                  ('transformer2': transformer_steps[1])])

然后调用new_pipe.transform().

更新: 如果您具有0.18或更高版本,则可以将管道内不需要的估计量设置为None以在同一管道中获得结果.在此问题在scikit-learn github 中进行了讨论 以上情况的用法:

Update: If you have version 0.18 or above, then you can set the non-required estimator inside the pipeline to None to get the result in same pipeline. Its discussed in this issue at scikit-learn github Usage for above in your case:

pipe.set_params(clusterer=None)
pipe.transform(df)

但是请注意,可能将拟合的clusterer存储在其他地方,否则,当您想使用该功能时,您需要再次拟合整个管道.

But be aware to maybe store the fitted clusterer somewhere else to do so, else you need to fit the whole pipeline again when wanting to use that functionality.

这篇关于从sklearn.pipeline.Pipeline获取转换器结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆