将案例类传递给函数参数 [英] Passing case class into function arguments

查看：57 发布时间：2021/4/8 19:45:02 scala apache-spark apache-spark-dataset case-class classtag

本文介绍了将案例类传递给函数参数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

很抱歉提出一个简单的问题.我想将case类传递给函数参数，并且想在函数内部进一步使用它.到现在为止，我已经使用 TypeTag 和 ClassTag 进行了尝试，但是由于某种原因，我无法正确使用它，或者可能是因为我不在正确的位置.

用例与此类似:

 案例类infoData(colA:Int，colB:String)案例类someOtherData(col1:String，col2:String，col3:Int)def readCsv [T:???](path:String，passedCaseClass:???):数据集[???] = {sqlContext.读.option("header"，"true").csv(路径).as [passedCaseClass]}

它会被这样称呼:

  val infoDf = readCsv("/src/main/info.csv"，infoData)val otherDf = readCsv("/src/main/someOtherData.csv"，someOtherData)

解决方案

首先将函数定义更改为:

 对象t0 {def readCsv [T](路径:字符串)(隐式火花:SparkSession，编码器:Encoder [T]):数据集[T] = {火花.读.option("header"，"true").csv(路径).as [T]}}

您无需执行任何类型的反射即可创建通用的readCsv函数.这里的关键是Spark在编译时需要编码器.因此，您可以将其作为隐式参数传递，编译器将添加它.

由于Spark SQL可以反序列化包括默认编码器在内的产品类型(您的案例类)，因此很容易像下面这样调用函数:

 案例类infoData(colA:Int，colB:String)案例类someOtherData(col1:字符串，col2:字符串，col3:整数)对象测试{导入t0._隐式val spark = SparkSession.builder().getOrCreate()导入spark.implicits._readCsv [infoData]("/tmp")}

希望有帮助

sorry for asking a simple question. I want to pass a case class to a function argument and I want to use it further inside the function. Till now I have tried this with TypeTag and ClassTag but for some reason, I am unable to properly use it or may be I am not looking at the correct place.

Use cases is something similar to this:

case class infoData(colA:Int,colB:String)
case class someOtherData(col1:String,col2:String,col3:Int)

def readCsv[T:???](path:String,passedCaseClass:???): Dataset[???] = {
  sqlContext
    .read
    .option("header", "true")
    .csv(path)
    .as[passedCaseClass]
}

It will be called something like this:

val infoDf = readCsv("/src/main/info.csv",infoData)
val otherDf = readCsv("/src/main/someOtherData.csv",someOtherData)

解决方案

First change your function definition to:

object t0 {
    def readCsv[T] (path: String)(implicit spark: SparkSession, encoder: Encoder[T]): Dataset[T] = {
      spark
        .read
        .option("header", "true")
        .csv(path)
        .as[T]
    }
}

You don´t need to perform any kind of reflection to create a generic readCsv function. The key here is that Spark needs the encoder at compile time. So you can pass it as implicit parameter and the compiler will add it.

Because Spark SQL can deserialize product types(your case classes) including the default encoders, it is easy to call your function like:

case class infoData(colA: Int, colB: String)
case class someOtherData(col1: String, col2: String, col3: Int)

object test {
  import t0._

  implicit val spark = SparkSession.builder().getOrCreate()

  import spark.implicits._
  readCsv[infoData]("/tmp")

}

Hope it helps

这篇关于将案例类传递给函数参数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

将案例类传递给函数参数 [英] Passing case class into function arguments

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

将案例类传递给函数参数 [英] Passing case class into function arguments

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭