将Dataframe转换回Spark中案例类的RDD [英] Convert Dataframe back to RDD of case class in Spark

查看:81
本文介绍了将Dataframe转换回Spark中案例类的RDD的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将多个案例类的数据帧转换为这些多个案例类的rdd.我找不到任何解决方案. wrappedArray使我发疯:P

I am trying to convert a dataframe of multiple case classes to an rdd of these multiple cases classes. I cant find any solution. This wrappedArray has drived me crazy :P

例如,假设我具有以下条件:

For example, assuming I am having the following:

case class randomClass(a:String,b: Double)
case class randomClass2(a:String,b: Seq[randomClass])
case class randomClass3(a:String,b:String)

val anRDD = sc.parallelize(Seq(
 (randomClass2("a",Seq(randomClass("a1",1.1),randomClass("a2",1.1))),randomClass3("aa","aaa")),
 (randomClass2("b",Seq(randomClass("b1",1.2),randomClass("b2",1.2))),randomClass3("bb","bbb")),
 (randomClass2("c",Seq(randomClass("c1",3.2),randomClass("c2",1.2))),randomClass3("cc","Ccc"))))

val aDF = anRDD.toDF()

假设我有aDF,如何获得anRDD ???

Assuming that I am having the aDF how can I get the anRDD???

我尝试这样的操作只是为了获得第二列,但它给出了一个错误:

I tried something like this just to get the second column but it was giving an error:

aDF.map { case r:Row => r.getAs[randomClass3]("_2")}

推荐答案

您可以使用Dataset[randomClass3]进行间接转换:

You can convert indirectly using Dataset[randomClass3]:

aDF.select($"_2.*").as[randomClass3].rdd

火花DatataFrame/Dataset[Row]使用 Spark SQL,数据帧和数据集指南getAs的任何调用都应使用此映射.

Spark DatataFrame / Dataset[Row] represents data as the Row objects using mapping described in Spark SQL, DataFrames and Datasets Guide Any call to getAs should use this mapping.

对于第二列struct<a: string, b: string>,它也将是Row:

For the second column, which is struct<a: string, b: string>, it would be a Row as well:

aDF.rdd.map { _.getAs[Row]("_2") }

Tzach Zohar 所述,要获取完整的RDD,您需要:

As commented by Tzach Zohar to get back a full RDD you'll need:

aDF.as[(randomClass2, randomClass3)].rdd 

这篇关于将Dataframe转换回Spark中案例类的RDD的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆