SparkML 交叉验证是否仅适用于“标签"?柱子? [英] Does SparkML Cross Validation Only Work With a "label" Column?

查看：50 发布时间：2021/6/24 20:36:20 apache-spark pyspark cross-validation apache-spark-ml

本文介绍了SparkML 交叉验证是否仅适用于“标签"?柱子?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当我使用数据集运行交叉验证 example在名为label"的列not中有标签我在 Spark 3.1.1 上观察到 IllegalArgumentException.为什么?

When I am running the cross validation example with a dataset that has the label in a column not named "label" I am observing an IllegalArgumentException on Spark 3.1.1. Why?

下面的代码已被修改为重命名标签"列到目标"中并且 labelCol 已设置为目标"对于回归模型.此代码导致异常，同时将所有内容保留在标签"处.工作正常.

The below code has been modified to rename "label" column into "target" and the labelCol has been set to "target" for the regression model. This code is causing the exception, while leaving everything at "label" works fine.

from pyspark.ml import Pipeline
from pyspark.ml.classification import LogisticRegression
from pyspark.ml.evaluation import BinaryClassificationEvaluator
from pyspark.ml.feature import HashingTF, Tokenizer
from pyspark.ml.tuning import CrossValidator, ParamGridBuilder

training = spark.createDataFrame([
    (0, "a b c d e spark", 1.0),
    (1, "b d", 0.0),
    (2, "spark f g h", 1.0),
    (3, "hadoop mapreduce", 0.0),
    (4, "b spark who", 1.0),
    (5, "g d a y", 0.0),
    (6, "spark fly", 1.0),
    (7, "was mapreduce", 0.0),
    (8, "e spark program", 1.0),
    (9, "a e c l", 0.0),
    (10, "spark compile", 1.0),
    (11, "hadoop software", 0.0)
], ["id", "text", "target"]) # try switching between "target" and "label"

tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol=tokenizer.getOutputCol(), outputCol="features")

lr = LogisticRegression(maxIter=10, labelCol="target") #try switching between "target" and "label"

pipeline = Pipeline(stages=[tokenizer, hashingTF, lr])

paramGrid = ParamGridBuilder() \
    .addGrid(hashingTF.numFeatures, [10, 100, 1000]) \
    .addGrid(lr.regParam, [0.1, 0.01]) \
    .build()

crossval = CrossValidator(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          evaluator=BinaryClassificationEvaluator(),
                          numFolds=2)  


cvModel = crossval.fit(training)

这是否是预期的行为?

SparkML 交叉验证是否仅适用于“标签"?柱子? [英] Does SparkML Cross Validation Only Work With a "label" Column?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SparkML 交叉验证是否仅适用于“标签"?柱子? [英] Does SparkML Cross Validation Only Work With a &quot;label&quot; Column?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

SparkML 交叉验证是否仅适用于“标签"?柱子? [英] Does SparkML Cross Validation Only Work With a "label" Column?

登录关闭