为什么 Spark/Scala 编译器无法在 RDD[Map[Int, Int]] 上找到 toDF? [英] Why does Spark/Scala compiler fail to find toDF on RDD[Map[Int, Int]]?

查看：33 发布时间：2021/11/14 22:25:05 scala apache-spark apache-spark-sql

本文介绍了为什么 Spark/Scala 编译器无法在 RDD[Map[Int, Int]] 上找到 toDF?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

为什么下面会出现错误?

Why does the following end up with an error?

scala> import sqlContext.implicits._
import sqlContext.implicits._

scala> val rdd = sc.parallelize(1 to 10).map(x => (Map(x  -> 0), 0))
rdd: org.apache.spark.rdd.RDD[(scala.collection.immutable.Map[Int,Int], Int)] = MapPartitionsRDD[20] at map at <console>:27

scala> rdd.toDF
res8: org.apache.spark.sql.DataFrame = [_1: map<int,int>, _2: int]

scala> val rdd = sc.parallelize(1 to 10).map(x => Map(x  -> 0))
rdd: org.apache.spark.rdd.RDD[scala.collection.immutable.Map[Int,Int]] = MapPartitionsRDD[23] at map at <console>:27

scala> rdd.toDF
<console>:30: error: value toDF is not a member of org.apache.spark.rdd.RDD[scala.collection.immutable.Map[Int,Int]]
              rdd.toDF

那么这里到底发生了什么，toDF 可以将 (scala.collection.immutable.Map[Int,Int], Int) 类型的 RDD 转换为 DataFrame，但不能将 scala 类型的 RDD 转换为 DataFrame.collection.immutable.Map[Int,Int].这是为什么?

So what exactly is happening here, toDF can convert RDD of type (scala.collection.immutable.Map[Int,Int], Int) to DataFrame but not of type scala.collection.immutable.Map[Int,Int]. Why is that?

推荐答案

同样的原因你不能使用

For the same reason why you cannot use

sqlContext.createDataFrame(1 to 10).map(x => Map(x  -> 0))

如果您查看 org.apache.spark.sql.SQLContext 源代码，您会发现 createDataFrame 方法的两种不同实现:

If you take a look at the org.apache.spark.sql.SQLContext source you'll find two different implementations of the createDataFrame method:

def createDataFrame[A <: Product : TypeTag](rdd: RDD[A]): DataFrame

和

def createDataFrame[A <: Product : TypeTag](data: Seq[A]): DataFrame

如您所见，两者都要求 A 是 Product 的子类.当您在 RDD[(Map[Int,Int], Int)] 上调用 toDF 时，它会起作用，因为 Tuple2 确实是一个 产品.Map[Int,Int] 本身不是错误.

As you can see both require A to be a subclass of Product. When you call toDF on a RDD[(Map[Int,Int], Int)] it works because Tuple2 is indeed a Product. Map[Int,Int] by itself is not hence the error.

你可以通过用 Tuple1 包裹 Map 来使它工作:

You can make it work by wrapping Map with Tuple1:

sc.parallelize(1 to 10).map(x => Tuple1(Map(x  -> 0))).toDF

这篇关于为什么 Spark/Scala 编译器无法在 RDD[Map[Int, Int]] 上找到 toDF?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

为什么 Spark/Scala 编译器无法在 RDD[Map[Int, Int]] 上找到 toDF? [英] Why does Spark/Scala compiler fail to find toDF on RDD[Map[Int, Int]]?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

为什么 Spark/Scala 编译器无法在 RDD[Map[Int, Int]] 上找到 toDF? [英] Why does Spark/Scala compiler fail to find toDF on RDD[Map[Int, Int]]?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭