如何使用Apache Spark 2.4.5和PySpark(Python)评估分类器 [英] How to evaluate a classifier with Apache Spark 2.4.5 and PySpark (Python)

查看:180
本文介绍了如何使用Apache Spark 2.4.5和PySpark(Python)评估分类器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想知道最好的方法是使用Apache Spark 2.4.5和PySpark(Python)评估合适的二进制分类模型.我想考虑不同的指标,例如准确性,准确性,召回率,auc和f1得分.

I'm wondering what the best way is to evaluate a fitted binary classification model using Apache Spark 2.4.5 and PySpark (Python). I want to consider different metrics such as accuracy, precision, recall, auc and f1 score.

让我们假设给出了以下内容:

Let us assume that the following is given:

# pyspark.sql.dataframe.DataFrame in VectorAssembler format containing two columns: target and features
# DataFrame we want to evaluate
df

# Fitted pyspark.ml.tuning.TrainValidationSplitModel (any arbitrary ml algorithm)
model

1.选项

BinaryClassificationEvaluator MulticlassClassificationEvaluator 可以自行计算上述所有指标.因此,我们同时使用了两个评估器.

Neither BinaryClassificationEvaluator nor MulticlassClassificationEvaluator can calculate all metrics mentioned above on their own. Thus, we use both evaluators.

from pyspark.ml.evaluation import BinaryClassificationEvaluator, MulticlassClassificationEvaluator

# Create both evaluators
evaluatorMulti = MulticlassClassificationEvaluator(labelCol="target", predictionCol="prediction")
evaluator = BinaryClassificationEvaluator(labelCol="target", rawPredictionCol="prediction", metricName='areaUnderROC')

# Make predicitons
predictionAndTarget = model.transform(df).select("target", "prediction")

# Get metrics
acc = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "accuracy"})
f1 = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "f1"})
weightedPrecision = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "weightedPrecision"})
weightedRecall = evaluatorMulti.evaluate(predictionAndTarget, {evaluatorMulti.metricName: "weightedRecall"})
auc = evaluator.evaluate(predictionAndTarget)

缺点

  • 在评估二进制分类器时使用MulticlassClassificationEvaluator似乎很奇怪和矛盾
  • 我必须使用两个不同的评估器来计算五个指标
  • MulticlassClassificationEvaluator仅计算 weightedPrecision weightedRecall (对于多类分类而言,这是可以的).但是,在二进制情况下,这两个指标是否等于 precision recall ?
  • It seems weird and contradictory to use MulticlassClassificationEvaluator when evaluating a binary classifier
  • I have to use two different evaluators to calculate five metrics
  • MulticlassClassificationEvaluator only calculates weightedPrecision and weightedRecall (which is ok for a multi class classification). However, are these two metrics equal to precision and recall in a binary case ?

2.选项

将基于RDD的API与 BinaryClassificatinMetrics

Using RDD based API with BinaryClassificatinMetrics and MulticlassMetrics. Again, both metrics can't calculate all metrics mentioned above on their own (at least not in python ..). Thus, we use both.

from pyspark.mllib.evaluation import BinaryClassificationMetrics, MulticlassMetrics

# Make prediction
predictionAndTarget = model.transform(df).select("target", "prediction")

# Create both evaluators
metrics_binary = BinaryClassificationMetrics(predictionAndTarget.rdd.map(tuple))
metrics_multi = MulticlassMetrics(predictionAndTarget.rdd.map(tuple))

acc = metrics_multi.accuracy
f1 = metrics_multi.fMeasure(1.0)
precision = metrics_multi.precision(1.0)
recall = metrics_multi.recall(1.0)
auc = metrics_binary.areaUnderROC

缺点

上方

  • 就我而言(〜1.000.000行),选项2似乎比选项1快

惊喜

  • 就我而言,使用选项1与使用选项2时,我会得到不同的 f1 areaUnderRoc 值.

选项3

使用numpy和sklearn

Use numpy and sklearn

import numpy as np
from sklearn.metrics import accuracy_score, precision_score, recall_score, roc_auc_score, f1_score

# Make predicitons
predictionAndTarget = model.transform(df).select("target", "prediction")

predictionAndTargetNumpy = np.array((predictionAndTarget.collect()))

acc = accuracy_score(predictionAndTargetNumpy[:,0], predictionAndTargetNumpy[:,1])
f1 = f1_score(predictionAndTargetNumpy[:,0], predictionAndTargetNumpy[:,1])
precision = precision_score(predictionAndTargetNumpy[:,0], predictionAndTargetNumpy[:,1])
recall = recall_score(predictionAndTargetNumpy[:,0], predictionAndTargetNumpy[:,1])
auc = roc_auc_score(predictionAndTargetNumpy[:,0], predictionAndTargetNumpy[:,1])

缺点

  • 使用sklearn和numpy似乎很奇怪,因为Apache Spark声称拥有自己的评估API
  • 如果数据集太大,则甚至无法使用numpy和sklearn.

总结我的问题:

  1. 建议使用上述哪个选项(如果有)来使用Apache Spark 2.4.5和PySpark评估二进制分类器.
  2. 还有其他选择吗?我错过了重要的东西吗?
  3. 为什么在使用选项1与使用选项2时,度量标准得到不同的结果

推荐答案

不确定现在是否相关,但可以回答您的问题3,因此可能是问题1的反面-

Not sure if it is relevant now , but can answer your question 3 and thus may be question 1 inturn-

Spark ML提供了加权精度和加权召回指标仅作为MulticlassClassificationEvaluator模块的一部分.如果您希望对总体精度"度量标准具有同等的解释,尤其是在二进制分类"等于Scikit world的情况下,则最好计算混淆矩阵"并使用Precision&的公式进行评估.回想一下

Spark ML provides Weighted Precision & Weighted Recall metrics only as part of MulticlassClassificationEvaluator module. If you're looking to have equivalent interpretation of Overall Precision metric, especially incase of Binary Classification equivalent to Scikit world , then better to compute Confusion Matrix and evaluate using the formula of Precision & Recall

Spark ML使用的加权精度是使用两个类的精度计算的,然后使用测试集中的每个类标签的权重相加,即

Weighted precision ,used by Spark ML ,is computed using precision of both the classes and then adding using weight of each class label in test set i.e.

Prec (Label 1) = TP/(TP+FP)
Prec (Label 0) = TN/(TN+FN)
Weight of Label 1 in test set WL1 = L1/(L1+L2)
Weight of Label 0 in test set WL2 = L2/(L1+L2)
Weighted precision = (PrecL1 * WL1) + (PrecL0 * WL2)

加权精度和召回率将大于总体精度和召回率.回想一下,即使在数据集中类之间的轻微不平衡,也就是基于Sklearn的&基于Spark ML的会有所不同.

Weighted Precision &Recall will be more than Overall Precision & Recall in case of even slight class imbalance in the dataset and thus metrics between Sklearn based & Spark ML based will differ.

作为示例,类不平衡数据集的混淆矩阵如下:

As an illustration , a Confusion Matrix of class imbalance dataset as below :

 array([[3969025,  445123],
       [ 284283, 1663913]])
 
 Total 1 Class labels   1948196
 Total 0 Class labels   4414148

 Proportion Label 1 :0.306207272
 Proportion Label 0 :0.693792728


Spark ML will give metrics :
Accuracy : 0.8853557745384405
Weighted Precision : 0.8890015815237463
WeightedRecall :    0.8853557745384406
F-1 Score  :  0.8865644697253956

而实际总体指标计算得出(相当于Scikit):

whereas Actual Overall metrics computation gives (Scikit Equivalent):

 Accuracy:  0.8853557745384405
 Precision: 0.7889448070113549
 Recall:    0.8540788503826103
 AUC:   0.8540788503826103
 f1:    0.8540788503826103

因此,Spark ML加权版本夸大了我们在二进制分类中特别观察到的总体度量计算

Thus Spark ML weighted version inflates the otherwise Overall metric computation that we observe especially for Binary Classification

这篇关于如何使用Apache Spark 2.4.5和PySpark(Python)评估分类器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆