在PySpark PipelineModel的各个阶段都可以访问方法吗? [英] Any way to access methods from individual stages in PySpark PipelineModel?

查看：170 发布时间：2020/9/4 8:04:52 python apache-spark pyspark apache-spark-mllib apache-spark-ml

本文介绍了在PySpark PipelineModel的各个阶段都可以访问方法吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我已经创建了PipelineModel用于在Spark 2.0中执行LDA(通过PySpark API):

I've created a PipelineModel for doing LDA in Spark 2.0 (via PySpark API):

def create_lda_pipeline(minTokenLength=1, minDF=1, minTF=1, numTopics=10, seed=42, pattern='[\W]+'):
    """
    Create a pipeline for running an LDA model on a corpus. This function does not need data and will not actually do
    any fitting until invoked by the caller.
    Args:
        minTokenLength:
        minDF: minimum number of documents word is present in corpus
        minTF: minimum number of times word is found in a document
        numTopics:
        seed:
        pattern: regular expression to split words

    Returns:
        pipeline: class pyspark.ml.PipelineModel
    """
    reTokenizer = RegexTokenizer(inputCol="text", outputCol="tokens", pattern=pattern, minTokenLength=minTokenLength)
    cntVec = CountVectorizer(inputCol=reTokenizer.getOutputCol(), outputCol="vectors", minDF=minDF, minTF=minTF)
    lda = LDA(k=numTopics, seed=seed, optimizer="em", featuresCol=cntVec.getOutputCol())
    pipeline = Pipeline(stages=[reTokenizer, cntVec, lda])
    return pipeline

我想使用训练有素的模型和LDAModel.logPerplexity()方法来计算数据集的困惑度，所以我尝试运行以下命令:

I want to calculate the perplexity on a dataset using the trained model with the LDAModel.logPerplexity() method, so I tried running the following:

try:
    training = get_20_newsgroups_data(test_or_train='test')
    pipeline = create_lda_pipeline(numTopics=20, minDF=3, minTokenLength=5)
    model = pipeline.fit(training)  # train model on training data
    testing = get_20_newsgroups_data(test_or_train='test')
    perplexity = model.logPerplexity(testing)
    pprint(perplexity)

这只会导致以下AttributeError:

'PipelineModel' object has no attribute 'logPerplexity'

我理解为什么会发生此错误，因为logPerplexity方法属于LDAModel，而不是PipelineModel，但是我想知道是否可以从该阶段访问该方法.

I understand why this error happens, since the logPerplexity method belongs to LDAModel, not PipelineModel, but I am wondering if there is a way to access the method from that stage.

在PySpark PipelineModel的各个阶段都可以访问方法吗? [英] Any way to access methods from individual stages in PySpark PipelineModel?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在PySpark PipelineModel的各个阶段都可以访问方法吗? [英] Any way to access methods from individual stages in PySpark PipelineModel?

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭