在Pyspark中评估分类器时，"SparkSession"对象没有属性"serializer" [英] 'SparkSession' object has no attribute 'serializer' when evaluating a classifier in Pyspark

查看：219 发布时间：2020/9/4 21:12:52 python apache-spark pyspark apache-spark-sql

本文介绍了在Pyspark中评估分类器时，"SparkSession"对象没有属性"serializer"的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在以批处理模式使用Apache Spark.我已经建立了一个完整的管道，将文本转换为TFIDF向量，然后使用Logistic回归预测布尔类:

I am using Apache spark in batch mode. I have set up an entire pipeline that transforms text into TFIDF vectors and then predicts a boolean class using Logistic regression:

# Chain previously created feature transformers, indexers and regression in a Pipeline
pipeline = Pipeline(stages=[tokenizer, hashingTF, idf, 
                        labelIndexer, featureIndexer, lr])
#Fit the full model to the training data
model = pipeline.fit(trainingData)

#Predict test data 
predictions = model.transform(testData)

我可以检查predictions，这是一个火花数据帧，这正是我期望的. 接下来，我想看一个混淆矩阵，所以我将分数和标签转换为RDD并将其传递给BinaryClassificationMetrics():

I can examine predictions, which is a spark dataframe, and it is what I expect it to be. Next, I want to see a confusion matrix, so I convert the scores and labels to a RDD and pass that to BinaryClassificationMetrics():

predictionAndLabels = predictions.select('prediction','label').rdd

最后，我将其传递给BinaryClassificationMetrics:

Finally, I pass that to the BinaryClassificationMetrics:

metrics = BinaryClassificationMetrics(predictionAndLabels) #this errors out

这是错误:

AttributeError: 'SparkSession' object has no attribute 'serializer'

此错误没有帮助，对其进行搜索会引发一系列广泛的问题.我发现唯一看起来相似的是这篇帖子，没有答案:

This error is not helpful and searching for it raises a broad spectrum of issues. the only thing I've found that seems similar is this post which has no answers: How to resolve error "AttributeError: 'SparkSession' object has no attribute 'serializer'?

感谢您的协助！

推荐答案

为了繁荣，这是我为解决此问题所做的工作.当我启动Spark Session和SQL上下文时，我正在这样做，这是不对的:

For prosperity's sake, here's what I did to fix this. When I initiate the Spark Session and the SQL context, I was doing this, which is not right:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sc)

此问题已通过解决来解决:

This problem was resolved by doing this instead:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)

我不确定为什么需要将其明确，如果有人知道的话，欢迎社区提出澄清.

I'm not sure why that needed to be explicit, and would welcome clarification from the community if someone knows.

这篇关于在Pyspark中评估分类器时，"SparkSession"对象没有属性"serializer"的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在Pyspark中评估分类器时，"SparkSession"对象没有属性"serializer" [英] 'SparkSession' object has no attribute 'serializer' when evaluating a classifier in Pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在Pyspark中评估分类器时，"SparkSession"对象没有属性"serializer" [英] &#39;SparkSession&#39; object has no attribute &#39;serializer&#39; when evaluating a classifier in Pyspark

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

在Pyspark中评估分类器时，"SparkSession"对象没有属性"serializer" [英] 'SparkSession' object has no attribute 'serializer' when evaluating a classifier in Pyspark

登录关闭