怎么把RDD [List [Int]]转换成DataFrame? [英] How to convert RDD[List[Int]] to DataFrame?

查看:75
本文介绍了怎么把RDD [List [Int]]转换成DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个RDD[List[Int]],我不知道list[Int]的计数,我想将我Rdd[List[Int]]转换为DataFrame,我该怎么办?

I hava a RDD[List[Int]] ,I don not know the count of list[Int],I want to convert i Rdd[List[Int]] to DataFrame,How should I do?

这是我的输入:

    val l1=Array(1,2,3,4)
    val l2=Array(1,2,3,4)
    val Lz=Seq(l1,l2)
    val rdd1=sc.parallelize(Lz,2) 

这是我的预期结果:

+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
|  1|  2|  3|  4|
|  1|  2|  3|  4|
+---+---+---+---+

推荐答案

您可以执行以下操作:

val l1=Array(1,2,3,4)
val l2=Array(1,2,3,4)
val Lz=Seq(l1,l2)
val df = sc.parallelize(Lz,2).map{
    case Array(val1, val2, val3, val4) => (val1, val2, val3, val4)
}.toDF

df.show
// +---+---+---+---+
// | _1| _2| _3| _4|
// +---+---+---+---+
// |  1|  2|  3|  4|
// |  1|  2|  3|  4|
// +---+---+---+---+

如果您有很多列,则需要以不同的方式进行操作,但是您需要了解数据的架构,否则将无法执行以下操作:

If you have lots of columns, you would need to proceed differently but you need to know the schema of your data otherwise you'll not be able to perform the following :

val sch = df.schema // I just took the schema from the old df but you can add one programmatically 

val df2 = spark.createDataFrame(sc.parallelize(Lz,2).map{ Row.fromSeq(_) }, sch)

df2.show
// +---+---+---+---+
// | _1| _2| _3| _4|
// +---+---+---+---+
// |  1|  2|  3|  4|
// |  1|  2|  3|  4|
// +---+---+---+---+

除非提供模式,否则除了拥有数组列之外,您将无能为力:

Unless you provide a schema, you won't be able to do much except having an array column :

val df3 = sc.parallelize(Lz,2).toDF
// df3: org.apache.spark.sql.DataFrame = [value: array<int>]
df3.show
// +------------+
// |       value|
// +------------+
// |[1, 2, 3, 4]|
// |[1, 2, 3, 4]|
// +------------+
df3.printSchema
//root
// |-- value: array (nullable = true)
// |    |-- element: integer (containsNull = false)

这篇关于怎么把RDD [List [Int]]转换成DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆