Spark 错误:线程“main"中的异常java.lang.UnsupportedOperationException [英] Spark error: Exception in thread "main" java.lang.UnsupportedOperationException

查看:37
本文介绍了Spark 错误:线程“main"中的异常java.lang.UnsupportedOperationException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个 Scala/spark 程序,它可以找到员工的最高工资.员工数据在 CSV 文件中可用,薪水列有一个逗号分隔符,它还有一个 $ 前缀,例如74,628.00 美元.

I am writing a Scala/spark program which would find the max salary of the employee. The employee data is available in a CSV file, and the salary column has a comma separator for thousands and also it has a $ prefixed to it e.g. $74,628.00.

为了处理这个逗号和美元符号,我在 Scala 中编写了一个解析器函数,它将在,"上拆分每一行,然后将每一列映射到要分配给案例类的各个变量.

To handle this comma and dollar sign, I have written a parser function in scala which would split each line on "," and then map each column to individual variables to be assigned to a case class.

我的解析器程序如下所示.为了消除逗号和美元符号,我使用替换函数将其替换为空,然后最后将其键入为 Int.

My parser program looks like below. In this to eliminate the comma and dollar signs I am using the replace function to replace it with empty, and then finally typecase to Int.

def ParseEmployee(line: String): Classes.Employee = {
    val fields = line.split(",")
    val Name = fields(0)
    val JOBTITLE = fields(2)
    val DEPARTMENT = fields(3)
    val temp = fields(4)

    temp.replace(",","")//To eliminate the ,
    temp.replace("$","")//To remove the $
    val EMPLOYEEANNUALSALARY = temp.toInt //Type cast the string to Int

    Classes.Employee(Name, JOBTITLE, DEPARTMENT, EMPLOYEEANNUALSALARY)
  }

我的案例类如下

case class Employee (Name: String,
                      JOBTITLE: String,
                     DEPARTMENT: String,
                     EMPLOYEEANNUALSALARY: Number,
)

我的 spark 数据框 sql 查询如下所示

My spark dataframe sql query looks like below

val empMaxSalaryValue = sc.sqlContext.sql("Select Max(EMPLOYEEANNUALSALARY) From EMP")
empMaxSalaryValue.show

当我运行这个程序时,我得到以下异常

when I Run this program I am getting this below exception

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for Number
- field (class: "java.lang.Number", name: "EMPLOYEEANNUALSALARY")
- root class: "Classes.Employee"
    at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:625)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:619)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:607)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:607)
    at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:438)
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
    at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
    at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:282)
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:272)
    at CalculateMaximumSalary$.main(CalculateMaximumSalary.scala:27)
    at CalculateMaximumSalary.main(CalculateMaximumSalary.scala)

  1. 知道为什么我会收到这个错误吗?我在这里做的错误是什么,为什么它不能类型转换为数字?

  1. Any idea why I am getting this error? what is the mistake I am doing here and why it is not able to typecast to number?

是否有更好的方法来处理获得员工最高工资的问题?

Is there any better approach to handle this problem of getting maximum salary of the employee?

推荐答案

Spark SQL 仅提供数量有限的 Encoders 以具体类为目标.不支持像 Number 这样的抽象类(可以与有限的二进制 Encoders 一起使用).

Spark SQL provides only a limited number of Encoders which target concrete classes. Abstract classes like Number are not supported (can be used with limited binary Encoders).

既然你无论如何都转换为Int,只需重新定义类:

Since you convert to Int anyway, just redefine the class:

case class Employee (
  Name: String,
  JOBTITLE: String,
  DEPARTMENT: String,
  EMPLOYEEANNUALSALARY: Int
)

这篇关于Spark 错误:线程“main"中的异常java.lang.UnsupportedOperationException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆