Spark/scala使用特征中的泛型创建空数据集 [英] Spark/scala create empty dataset using generics in a trait

查看:244
本文介绍了Spark/scala使用特征中的泛型创建空数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个名为trait的类型参数,它需要一个类型参数,并且它的一种方法需要能够创建一个空的类型化数据集.

I have a trait called that takes a type parameter, and one of its methods needs to be able to create an empty typed dataset.

trait MyTrait[T] {
    val sparkSession: SparkSession
    val spark = sparkSession.session
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        import spark.implicits._ // to access .toDS() function
        // DOESN'T WORK.
        val emptyRDD = sparkContext.parallelize(Seq[T]())
        val accumulator = emptyRDD.toDS()
        ...
    }
}

到目前为止,我还没有开始工作.它抱怨no ClassTag for T,而value toDS is not a member of org.apache.spark.rdd.RDD[T]

So far I have not gotten it to work. It complains no ClassTag for T, and that value toDS is not a member of org.apache.spark.rdd.RDD[T]

任何帮助将不胜感激.谢谢!

Any help would be appreciated. Thanks!

推荐答案

您必须在同一范围内同时提供ClassTag[T]Encoder[T].例如:

You have to provide both ClassTag[T] and Encoder[T] in the same scope. For example:

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}
import scala.reflect.ClassTag


trait MyTrait[T] {
    val ct: ClassTag[T]
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        val emptyRDD = sparkContext.emptyRDD[T](ct)
        spark.createDataset(emptyRDD)(enc)
    }
}

具体实现:

class Foo extends MyTrait[Int] {
   val sparkSession = SparkSession.builder.getOrCreate()
   import sparkSession.implicits._

   val ct = implicitly[ClassTag[Int]]
   val enc = implicitly[Encoder[Int]]
}

可以跳过RDD:

import org.apache.spark.sql.{SparkSession, Dataset, Encoder}

trait MyTrait[T] {
    val enc: Encoder[T]

    val sparkSession: SparkSession
    val sparkContext = spark.sparkContext

    def createEmptyDataset(): Dataset[T] = {
        spark.emptyDataset[T](enc)
    }
}

检查如何将特征声明为采用隐式构造函数参数"?,特别是 Blasorblade Alexey Romanov

这篇关于Spark/scala使用特征中的泛型创建空数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆