将Dataframe转换回Spark中案例类的RDD [英] Convert Dataframe back to RDD of case class in Spark
问题描述
我正在尝试将多个案例类的数据帧转换为这些多个案例类的rdd.我找不到任何解决方案. wrappedArray
使我发疯:P
I am trying to convert a dataframe of multiple case classes to an rdd of these multiple cases classes. I cant find any solution. This wrappedArray
has drived me crazy :P
例如,假设我具有以下条件:
For example, assuming I am having the following:
case class randomClass(a:String,b: Double)
case class randomClass2(a:String,b: Seq[randomClass])
case class randomClass3(a:String,b:String)
val anRDD = sc.parallelize(Seq(
(randomClass2("a",Seq(randomClass("a1",1.1),randomClass("a2",1.1))),randomClass3("aa","aaa")),
(randomClass2("b",Seq(randomClass("b1",1.2),randomClass("b2",1.2))),randomClass3("bb","bbb")),
(randomClass2("c",Seq(randomClass("c1",3.2),randomClass("c2",1.2))),randomClass3("cc","Ccc"))))
val aDF = anRDD.toDF()
假设我有aDF
,如何获得anRDD
???
Assuming that I am having the aDF
how can I get the anRDD
???
我尝试这样的操作只是为了获得第二列,但它给出了一个错误:
I tried something like this just to get the second column but it was giving an error:
aDF.map { case r:Row => r.getAs[randomClass3]("_2")}
推荐答案
您可以使用Dataset[randomClass3]
进行间接转换:
You can convert indirectly using Dataset[randomClass3]
:
aDF.select($"_2.*").as[randomClass3].rdd
火花DatataFrame
/Dataset[Row]
使用 Spark SQL,数据帧和数据集指南对getAs
的任何调用都应使用此映射.
Spark DatataFrame
/ Dataset[Row]
represents data as the Row
objects using mapping described in Spark SQL, DataFrames and Datasets Guide Any call to getAs
should use this mapping.
对于第二列struct<a: string, b: string>
,它也将是Row
:
For the second column, which is struct<a: string, b: string>
, it would be a Row
as well:
aDF.rdd.map { _.getAs[Row]("_2") }
如 Tzach Zohar 所述,要获取完整的RDD,您需要:
As commented by Tzach Zohar to get back a full RDD you'll need:
aDF.as[(randomClass2, randomClass3)].rdd
这篇关于将Dataframe转换回Spark中案例类的RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!