在Pyspark中评估分类器时,"SparkSession"对象没有属性"serializer" [英] 'SparkSession' object has no attribute 'serializer' when evaluating a classifier in Pyspark

查看:219
本文介绍了在Pyspark中评估分类器时,"SparkSession"对象没有属性"serializer"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在以批处理模式使用Apache Spark.我已经建立了一个完整的管道,将文本转换为TFIDF向量,然后使用Logistic回归预测布尔类:

I am using Apache spark in batch mode. I have set up an entire pipeline that transforms text into TFIDF vectors and then predicts a boolean class using Logistic regression:

# Chain previously created feature transformers, indexers and regression in a Pipeline
pipeline = Pipeline(stages=[tokenizer, hashingTF, idf, 
                        labelIndexer, featureIndexer, lr])
#Fit the full model to the training data
model = pipeline.fit(trainingData)

#Predict test data 
predictions = model.transform(testData)

我可以检查predictions,这是一个火花数据帧,这正是我期望的. 接下来,我想看一个混淆矩阵,所以我将分数和标签转换为RDD并将其传递给BinaryClassificationMetrics():

I can examine predictions, which is a spark dataframe, and it is what I expect it to be. Next, I want to see a confusion matrix, so I convert the scores and labels to a RDD and pass that to BinaryClassificationMetrics():

predictionAndLabels = predictions.select('prediction','label').rdd

最后,我将其传递给BinaryClassificationMetrics:

Finally, I pass that to the BinaryClassificationMetrics:

metrics = BinaryClassificationMetrics(predictionAndLabels) #this errors out

这是错误:

AttributeError: 'SparkSession' object has no attribute 'serializer'

此错误没有帮助,对其进行搜索会引发一系列广泛的问题.我发现唯一看起来相似的是这篇帖子,没有答案:

This error is not helpful and searching for it raises a broad spectrum of issues. the only thing I've found that seems similar is this post which has no answers: How to resolve error "AttributeError: 'SparkSession' object has no attribute 'serializer'?

感谢您的协助!

推荐答案

为了繁荣,这是我为解决此问题所做的工作.当我启动Spark Session和SQL上下文时,我正在这样做,这是不对的:

For prosperity's sake, here's what I did to fix this. When I initiate the Spark Session and the SQL context, I was doing this, which is not right:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sc)

此问题已通过解决来解决:

This problem was resolved by doing this instead:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)

我不确定为什么需要将其明确,如果有人知道的话,欢迎社区提出澄清.

I'm not sure why that needed to be explicit, and would welcome clarification from the community if someone knows.

这篇关于在Pyspark中评估分类器时,"SparkSession"对象没有属性"serializer"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆