Spark/scala使用特征中的泛型创建空数据集 [英] Spark/scala create empty dataset using generics in a trait
本文介绍了Spark/scala使用特征中的泛型创建空数据集的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个名为trait的类型参数,它需要一个类型参数,并且它的一种方法需要能够创建一个空的类型化数据集.
I have a trait called that takes a type parameter, and one of its methods needs to be able to create an empty typed dataset.
trait MyTrait[T] {
val sparkSession: SparkSession
val spark = sparkSession.session
val sparkContext = spark.sparkContext
def createEmptyDataset(): Dataset[T] = {
import spark.implicits._ // to access .toDS() function
// DOESN'T WORK.
val emptyRDD = sparkContext.parallelize(Seq[T]())
val accumulator = emptyRDD.toDS()
...
}
}
到目前为止,我还没有开始工作.它抱怨no ClassTag for T
,而value toDS is not a member of org.apache.spark.rdd.RDD[T]
So far I have not gotten it to work. It complains no ClassTag for T
, and that value toDS is not a member of org.apache.spark.rdd.RDD[T]
任何帮助将不胜感激.谢谢!
Any help would be appreciated. Thanks!
推荐答案
您必须在同一范围内同时提供ClassTag[T]
和Encoder[T]
.例如:
You have to provide both ClassTag[T]
and Encoder[T]
in the same scope. For example:
import org.apache.spark.sql.{SparkSession, Dataset, Encoder}
import scala.reflect.ClassTag
trait MyTrait[T] {
val ct: ClassTag[T]
val enc: Encoder[T]
val sparkSession: SparkSession
val sparkContext = spark.sparkContext
def createEmptyDataset(): Dataset[T] = {
val emptyRDD = sparkContext.emptyRDD[T](ct)
spark.createDataset(emptyRDD)(enc)
}
}
具体实现:
class Foo extends MyTrait[Int] {
val sparkSession = SparkSession.builder.getOrCreate()
import sparkSession.implicits._
val ct = implicitly[ClassTag[Int]]
val enc = implicitly[Encoder[Int]]
}
可以跳过RDD
:
import org.apache.spark.sql.{SparkSession, Dataset, Encoder}
trait MyTrait[T] {
val enc: Encoder[T]
val sparkSession: SparkSession
val sparkContext = spark.sparkContext
def createEmptyDataset(): Dataset[T] = {
spark.emptyDataset[T](enc)
}
}
检查如何将特征声明为采用隐式构造函数参数"?,特别是 Blasorblade 和 Alexey Romanov
这篇关于Spark/scala使用特征中的泛型创建空数据集的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文