如何使用sklearn Pipeline转换项目? [英] How to transform items using sklearn Pipeline?
问题描述
我有一个简单的scikit-learn Pipeline
,分为两个步骤:TfIdfVectorizer
,然后是LinearSVC
.
I have a simple scikit-learn Pipeline
of two steps: a TfIdfVectorizer
followed by a LinearSVC
.
我已经使用我的数据来适应管道.一切都很好.
I have fit the pipeline using my data. All good.
现在,我想使用适合的pipeline
变换(而不是预测!)项目.
Now I want to transform (not predict!) an item, using my fitted pipeline
.
我尝试了pipeline.transform([item])
,但是与pipeline.named_steps['tfidf'].transform([item])
相比,它给出了不同的结果.甚至结果的形状和类型也不同:第一个是1x3000 CSR矩阵,第二个是1x15000 CSC矩阵.哪一个是正确的?他们为什么不同?
I tried pipeline.transform([item])
, but it gives a different result compared to pipeline.named_steps['tfidf'].transform([item])
. Even the shape and type of the result is different: the first is a 1x3000 CSR matrix, the second a 1x15000 CSC matrix. Which one is correct? Why do they differ?
当使用scikit-learn的Pipeline
时,如何转换项目,即在最终估算器之前获得项目的矢量表示?
How do I transform items, i.e. get an item's vector representation before the final estimator, when using scikit-learn's Pipeline
?
推荐答案
在最后一步中,您不能在包含Non-transformer的管道上调用transform方法. 如果您不想在此类管道上调用transfrom,则最后一个估算者必须是转换器.
You can't call a transform method on a pipeline which contains Non-transformer on last step. If you wan't to call transfrom on such pipeline last estimator must be a transformer.
即使方法文档也是如此:
Even method doc says so:
将变换应用于数据,并且将变换方法应用于 最终估算者.仅在最终估算器实现时有效 转换.
Applies transforms to the data, and the transform method of the final estimator. Valid only if the final estimator implements transform.
此外,没有一种方法可以使用除最后一个估计器之外的所有估计器. 您可以创建自己的管道,并继承scikit-learn的管道中的所有内容,但可以添加一种方法,例如:
Also, there is no method to use every estimator except last one. Thou you can make your own Pipeline, and inherit everything from scikit-learn's Pipeline, but add one method, something like:
def just_transforms(self, X):
"""Applies all transforms to the data, without applying last
estimator.
Parameters
----------
X : iterable
Data to predict on. Must fulfill input requirements of first step of
the pipeline.
"""
Xt = X
for name, transform in self.steps[:-1]:
Xt = transform.transform(Xt)
return Xt
这篇关于如何使用sklearn Pipeline转换项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!