sklearn.pipeline.Pipeline到底是什么? [英] What is exactly sklearn.pipeline.Pipeline?

查看:75
本文介绍了sklearn.pipeline.Pipeline到底是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不知道 sklearn.pipeline.Pipeline 是如何工作的.

I can't figure out how the sklearn.pipeline.Pipeline works exactly.

doc 中有一些解释..例如,它们的意思是:

There are a few explanation in the doc. For example what do they mean by:

具有最终估计量的变换管道.

Pipeline of transforms with a final estimator.

为了使我的问题更清楚,什么是步骤?它们如何工作?

To make my question clearer, what are steps? How do they work?

修改

借助答案,我可以使问题更清楚:

Thanks to the answers I can make my question clearer:

当我调用管道并通过时,需要两个转换器和一个估计器,例如:

When I call pipeline and pass, as steps, two transformers and one estimator, e.g:

pipln = Pipeline([("trsfm1",transformer_1),
                  ("trsfm2",transformer_2),
                  ("estmtr",estimator)])

我叫这个怎么办?

pipln.fit()
OR
pipln.fit_transform()

我不知道估算器如何成为变压器以及如何装配变压器.

I can't figure out how an estimator can be a transformer and how a transformer can be fitted.

推荐答案

Transformer -一些具有fit和transform方法或fit_transform方法的类.

Transformer in scikit-learn - some class that have fit and transform method, or fit_transform method.

预测器-一些具有fit和预测方法或fit_predict方法的类.

Predictor - some class that has fit and predict methods, or fit_predict method.

Pipeline 只是一个抽象的概念,它不是一些现有的机器学习算法.在ML任务中,通常需要在应用最终估计器之前对原始数据集执行一系列不同的转换序列(查找特征集,生成新特征,仅选择一些良好特征).

Pipeline is just an abstract notion, it's not some existing ml algorithm. Often in ML tasks you need to perform sequence of different transformations (find set of features, generate new features, select only some good features) of raw dataset before applying final estimator.

此处是管道用法的一个很好的例子.管道为您提供了所有3个转换步骤和最终估算器的单一界面.它在内部封装了转换器和预测变量,现在您可以执行以下操作:

Here is a good example of Pipeline usage. Pipeline gives you a single interface for all 3 steps of transformation and resulting estimator. It encapsulates transformers and predictors inside, and now you can do something like:

    vect = CountVectorizer()
    tfidf = TfidfTransformer()
    clf = SGDClassifier()

    vX = vect.fit_transform(Xtrain)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

    # Now evaluate all steps on test set
    vX = vect.fit_transform(Xtest)
    tfidfX = tfidf.fit_transform(vX)
    predicted = clf.fit_predict(tfidfX)

只需:

pipeline = Pipeline([
    ('vect', CountVectorizer()),
    ('tfidf', TfidfTransformer()),
    ('clf', SGDClassifier()),
])
predicted = pipeline.fit(Xtrain).predict(Xtrain)
# Now evaluate all steps on test set
predicted = pipeline.predict(Xtest)

使用管道,您可以轻松地针对该元估计器的每个步骤对一组参数进行网格搜索.如以上链接中所述.除最后一个步骤外,所有步骤都必须是转换步骤,最后一个步骤可以是转换器或预测值.答案进行修改:当您调用 pipln.fit()时-管道中的每个转换器都将适合先前转换器的输出(第一个转换器在原始数据集上学习).最后一个估计器可以是转换器或预测器,仅当您的最后一个估计器是转换器(可以实现fit_transform或分别转换和拟合方法)时,才可以在管道上调用fit_transform(),仅在以下情况下可以在管道上调用fit_predict()或predict():您的最后一个估算器是预测器.因此,您无法调用fit_transform或在管道上进行转换,后者的最后一步是预测变量.

With pipelines you can easily perform a grid-search over set of parameters for each step of this meta-estimator. As described in the link above. All steps except last one must be transforms, last step can be transformer or predictor. Answer to edit: When you call pipln.fit() - each transformer inside pipeline will be fitted on outputs of previous transformer (First transformer is learned on raw dataset). Last estimator may be transformer or predictor, you can call fit_transform() on pipeline only if your last estimator is transformer (that implements fit_transform, or transform and fit methods separately), you can call fit_predict() or predict() on pipeline only if your last estimator is predictor. So you just can't call fit_transform or transform on pipeline, last step of which is predictor.

这篇关于sklearn.pipeline.Pipeline到底是什么?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆