如何使用sklearn Pipeline转换项目? [英] How to transform items using sklearn Pipeline?

查看:133
本文介绍了如何使用sklearn Pipeline转换项目?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个简单的scikit-learn Pipeline,分为两个步骤:TfIdfVectorizer,然后是LinearSVC.

I have a simple scikit-learn Pipeline of two steps: a TfIdfVectorizer followed by a LinearSVC.

我已经使用我的数据来适应管道.一切都很好.

I have fit the pipeline using my data. All good.

现在,我想使用适合的pipeline变换(而不是预测!)项目.

Now I want to transform (not predict!) an item, using my fitted pipeline.

我尝试了pipeline.transform([item]),但是与pipeline.named_steps['tfidf'].transform([item])相比,它给出了不同的结果.甚至结果的形状和类型也不同:第一个是1x3000 CSR矩阵,第二个是1x15000 CSC矩阵.哪一个是正确的?他们为什么不同?

I tried pipeline.transform([item]), but it gives a different result compared to pipeline.named_steps['tfidf'].transform([item]). Even the shape and type of the result is different: the first is a 1x3000 CSR matrix, the second a 1x15000 CSC matrix. Which one is correct? Why do they differ?

当使用scikit-learn的Pipeline时,如何转换项目,即在最终估算器之前获得项目的矢量表示?

How do I transform items, i.e. get an item's vector representation before the final estimator, when using scikit-learn's Pipeline?

推荐答案

在最后一步中,您不能在包含Non-transformer的管道上调用transform方法. 如果您不想在此类管道上调用transfrom,则最后一个估算者必须是转换器.

You can't call a transform method on a pipeline which contains Non-transformer on last step. If you wan't to call transfrom on such pipeline last estimator must be a transformer.

即使方法文档也是如此:

Even method doc says so:

将变换应用于数据,并且将变换方法应用于 最终估算者.仅在最终估算器实现时有效 转换.

Applies transforms to the data, and the transform method of the final estimator. Valid only if the final estimator implements transform.

此外,没有一种方法可以使用除最后一个估计器之外的所有估计器. 您可以创建自己的管道,并继承scikit-learn的管道中的所有内容,但可以添加一种方法,例如:

Also, there is no method to use every estimator except last one. Thou you can make your own Pipeline, and inherit everything from scikit-learn's Pipeline, but add one method, something like:

def just_transforms(self, X):
    """Applies all transforms to the data, without applying last 
       estimator.

    Parameters
    ----------
    X : iterable
        Data to predict on. Must fulfill input requirements of first step of
        the pipeline.
    """
    Xt = X
    for name, transform in self.steps[:-1]:
        Xt = transform.transform(Xt)
    return Xt

这篇关于如何使用sklearn Pipeline转换项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆