火花错误:线程"main"中的异常; java.lang.UnsupportedOperationException [英] Spark error: Exception in thread "main" java.lang.UnsupportedOperationException

查看:488
本文介绍了火花错误:线程"main"中的异常; java.lang.UnsupportedOperationException的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在编写一个Scala/spark程序,该程序可以找到员工的最高薪水.员工数据在CSV文件中可用,并且薪金列具有数千个逗号分隔符,并且其前缀为$,例如$ 74,628.00.

I am writing a Scala/spark program which would find the max salary of the employee. The employee data is available in a CSV file, and the salary column has a comma separator for thousands and also it has a $ prefixed to it e.g. $74,628.00.

为了处理这个逗号和美元符号,我在scala中编写了一个解析器函数,该函数将在,"上的每一行分开,然后将每一列映射到要分配给case类的各个变量.

To handle this comma and dollar sign, I have written a parser function in scala which would split each line on "," and then map each column to individual variables to be assigned to a case class.

我的解析器程序如下所示.为了消除逗号和美元符号,我正在使用replace函数将其替换为空,然后最后将大小写键入Int.

My parser program looks like below. In this to eliminate the comma and dollar signs I am using the replace function to replace it with empty, and then finally typecase to Int.

def ParseEmployee(line: String): Classes.Employee = {
    val fields = line.split(",")
    val Name = fields(0)
    val JOBTITLE = fields(2)
    val DEPARTMENT = fields(3)
    val temp = fields(4)

    temp.replace(",","")//To eliminate the ,
    temp.replace("$","")//To remove the $
    val EMPLOYEEANNUALSALARY = temp.toInt //Type cast the string to Int

    Classes.Employee(Name, JOBTITLE, DEPARTMENT, EMPLOYEEANNUALSALARY)
  }

我的Case类如下所示

My Case class look like below

case class Employee (Name: String,
                      JOBTITLE: String,
                     DEPARTMENT: String,
                     EMPLOYEEANNUALSALARY: Number,
)

我的spark dataframe sql查询如下所示

My spark dataframe sql query looks like below

val empMaxSalaryValue = sc.sqlContext.sql("Select Max(EMPLOYEEANNUALSALARY) From EMP")
empMaxSalaryValue.show

运行该程序时,出现以下异常

when I Run this program I am getting this below exception

Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for Number
- field (class: "java.lang.Number", name: "EMPLOYEEANNUALSALARY")
- root class: "Classes.Employee"
    at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:625)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:619)
    at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:607)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
    at scala.collection.immutable.List.foreach(List.scala:381)
    at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
    at scala.collection.immutable.List.flatMap(List.scala:344)
    at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:607)
    at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:438)
    at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
    at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
    at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:282)
    at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:272)
    at CalculateMaximumSalary$.main(CalculateMaximumSalary.scala:27)
    at CalculateMaximumSalary.main(CalculateMaximumSalary.scala)

  1. 知道为什么我会收到此错误吗?我在这里犯的错误是什么?为什么不能将其强制转换为数字?

  1. Any idea why I am getting this error? what is the mistake I am doing here and why it is not able to typecast to number?

有没有更好的方法来解决这个问题,从而获得员工的最高薪水?

Is there any better approach to handle this problem of getting maximum salary of the employee?

推荐答案

Spark SQL仅提供有限数量的Encoders作为目标具体类.不支持像Number这样的抽象类(可以与有限的二进制Encoders一起使用).

Spark SQL provides only a limited number of Encoders which target concrete classes. Abstract classes like Number are not supported (can be used with limited binary Encoders).

由于您仍然要转换为Int,因此只需重新定义类即可:

Since you convert to Int anyway, just redefine the class:

case class Employee (
  Name: String,
  JOBTITLE: String,
  DEPARTMENT: String,
  EMPLOYEEANNUALSALARY: Int
)

这篇关于火花错误:线程"main"中的异常; java.lang.UnsupportedOperationException的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆