sklearn.pipeline.Pipeline 到底是什么? [英] What is exactly sklearn.pipeline.Pipeline?

查看:33
本文介绍了sklearn.pipeline.Pipeline 到底是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法弄清楚 sklearn.pipeline.Pipeline 究竟是如何工作的.

I can't figure out how the sklearn.pipeline.Pipeline works exactly.

doc 中有一些解释.例如它们是什么意思:

There are a few explanation in the doc. For example what do they mean by:

带有最终估计器的转换管道.

Pipeline of transforms with a final estimator.

为了让我的问题更清楚,什么是步骤?它们是如何工作的?

To make my question clearer, what are steps? How do they work?

编辑

感谢答案,我可以让我的问题更清楚:

Thanks to the answers I can make my question clearer:

当我调用管道并传递时,作为步骤,两个转换器和一个估计器,例如:

When I call pipeline and pass, as steps, two transformers and one estimator, e.g:

pipln = Pipeline([("trsfm1",transformer_1),
                  ("trsfm2",transformer_2),
                  ("estmtr",estimator)])

当我调用它时会发生什么?

What happens when I call this?

pipln.fit()
OR
pipln.fit_transform()

我无法弄清楚估算器如何成为变压器以及如何安装变压器.

I can't figure out how an estimator can be a transformer and how a transformer can be fitted.

推荐答案

Transformer in scikit-learn - 一些具有 fit 和 transform 方法或 fit_transform 方法的类.

Transformer in scikit-learn - some class that have fit and transform method, or fit_transform method.

Predictor - 一些具有 fit 和 predict 方法或 fit_predict 方法的类.

Predictor - some class that has fit and predict methods, or fit_predict method.

Pipeline 只是一个抽象的概念,它不是一些现有的机器学习算法.通常在 ML 任务中,您需要在应用最终估计器之前对原始数据集执行一系列不同的转换(查找特征集、生成新特征、仅选择一些好的特征).

Pipeline is just an abstract notion, it's not some existing ml algorithm. Often in ML tasks you need to perform sequence of different transformations (find set of features, generate new features, select only some good features) of raw dataset before applying final estimator.

这里是流水线使用的一个很好的例子.Pipeline 为您提供了所有 3 个转换步骤和结果估计器的单一界面.它将转换器和预测器封装在里面,现在您可以执行以下操作:

Here is a good example of Pipeline usage. Pipeline gives you a single interface for all 3 steps of transformation and resulting estimator. It encapsulates transformers and predictors inside, and now you can do something like:

    vect = CountVectorizer()
    tfidf = TfidfTransformer()
    clf = SGDClassifier()

    vX = vect.fit_transform(Xtrain)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

    # Now evaluate all steps on test set
    vX = vect.fit_transform(Xtest)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

只要:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier()),
])
predicted = pipeline.fit(Xtrain).predict(Xtrain)
# Now evaluate all steps on test set
predicted = pipeline.predict(Xtest)

使用管道,您可以轻松地对该元估计器的每个步骤的参数集执行网格搜索.如上面的链接所述.除了最后一步之外的所有步骤都必须是变换,最后一步可以是变换器或预测器.编辑答案:当您调用 pipln.fit() 时 - 管道内的每个转换器都将安装在前一个转换器的输出上(第一个转换器是在原始数据集上学习的).最后一个估计器可能是转换器或预测器,只有当你的最后一个估计器是转换器(实现 fit_transform,或分别实现转换和拟合方法)时,你才能在管道上调用 fit_transform(),你可以在管道上调用 fit_predict() 或 predict() 仅当您的最后一个估算器是预测器.所以你不能在管道上调用 fit_transform 或转换,最后一步是预测器.

With pipelines you can easily perform a grid-search over set of parameters for each step of this meta-estimator. As described in the link above. All steps except last one must be transforms, last step can be transformer or predictor. Answer to edit: When you call pipln.fit() - each transformer inside pipeline will be fitted on outputs of previous transformer (First transformer is learned on raw dataset). Last estimator may be transformer or predictor, you can call fit_transform() on pipeline only if your last estimator is transformer (that implements fit_transform, or transform and fit methods separately), you can call fit_predict() or predict() on pipeline only if your last estimator is predictor. So you just can't call fit_transform or transform on pipeline, last step of which is predictor.

这篇关于sklearn.pipeline.Pipeline 到底是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆