从sklearn.pipeline.Pipeline获取转换器结果 [英] getting transformer results from sklearn.pipeline.Pipeline
问题描述
我正在使用sklearn.pipeline.Pipeline
对象进行群集.
I am using a sklearn.pipeline.Pipeline
object for my clustering.
pipe = sklearn.pipeline.Pipeline([('transformer1': transformer1),
('transformer2': transformer2),
('clusterer': clusterer)])
然后我要通过轮廓分数来评估结果.
Then I am evaluating the result by using the silhouette score.
sil = preprocessing.silhouette_score(X, y)
我想知道如何从管道中获取X
或转换后的数据,因为它仅返回clusterer.fit_predict(X)
.
I'm wondering how I can get the X
or the transformed data from the pipeline as it only returns the clusterer.fit_predict(X)
.
我了解我可以通过将管道拆分为
I understand that I can do this by just splitting the pipeline as
pipe = sklearn.pipeline.Pipeline([('transformer1': transformer1),
('transformer2': transformer2)])
X = pipe.fit_transform(data)
res = clusterer.fit_predict(X)
sil = preprocessing.silhouette_score(X, res)
但是我只想在一个管道中完成所有操作.
but I would like to just do it all in one pipeline.
推荐答案
如果您想同时在管道的中间步骤上调整数据并进行转换,则没有必要重用相同的管道,而最好使用新的管道如您所指定的,因为调用fit()
会忘记所有先前学习的数据.
If you want to both fit and transform the data on intermediate steps of the pipeline then it makes no sense to reuse the same pipeline and better to use a new one as you specified, because calling fit()
will forget all about previously learnt data.
但是,如果您只想transform()
并在已安装的管道上查看中间数据,则可以通过访问
However if you only want to transform()
and see the intermediate data on an already fitted pipeline, then its possible by accessing the named_steps
parameter.
new_pipe = sklearn.pipeline.Pipeline([('transformer1':
old_pipe.named_steps['transformer1']),
('transformer2':
old_pipe.named_steps['transformer2'])])
或直接使用内部变量steps
,例如:
Or directly using the inner varible steps
like:
transformer_steps = old_pipe.steps
new_pipe = sklearn.pipeline.Pipeline([('transformer1': transformer_steps[0]),
('transformer2': transformer_steps[1])])
然后调用new_pipe.transform()
.
更新:
如果您具有0.18或更高版本,则可以将管道内不需要的估计量设置为None
以在同一管道中获得结果.在此问题在scikit-learn github 中进行了讨论
以上情况的用法:
Update:
If you have version 0.18 or above, then you can set the non-required estimator inside the pipeline to None
to get the result in same pipeline. Its discussed in this issue at scikit-learn github
Usage for above in your case:
pipe.set_params(clusterer=None)
pipe.transform(df)
但是请注意,可能将拟合的clusterer
存储在其他地方,否则,当您想使用该功能时,您需要再次拟合整个管道.
But be aware to maybe store the fitted clusterer
somewhere else to do so, else you need to fit the whole pipeline again when wanting to use that functionality.
这篇关于从sklearn.pipeline.Pipeline获取转换器结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!