怎么把RDD [List [Int]]转换成DataFrame? [英] How to convert RDD[List[Int]] to DataFrame?
本文介绍了怎么把RDD [List [Int]]转换成DataFrame?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个RDD[List[Int]]
,我不知道list[Int]
的计数,我想将我Rdd[List[Int]]
转换为DataFrame
,我该怎么办?
I hava a RDD[List[Int]]
,I don not know the count of list[Int]
,I want to convert i Rdd[List[Int]]
to DataFrame
,How should I do?
这是我的输入:
val l1=Array(1,2,3,4)
val l2=Array(1,2,3,4)
val Lz=Seq(l1,l2)
val rdd1=sc.parallelize(Lz,2)
这是我的预期结果:
+---+---+---+---+
| _1| _2| _3| _4|
+---+---+---+---+
| 1| 2| 3| 4|
| 1| 2| 3| 4|
+---+---+---+---+
推荐答案
您可以执行以下操作:
val l1=Array(1,2,3,4)
val l2=Array(1,2,3,4)
val Lz=Seq(l1,l2)
val df = sc.parallelize(Lz,2).map{
case Array(val1, val2, val3, val4) => (val1, val2, val3, val4)
}.toDF
df.show
// +---+---+---+---+
// | _1| _2| _3| _4|
// +---+---+---+---+
// | 1| 2| 3| 4|
// | 1| 2| 3| 4|
// +---+---+---+---+
如果您有很多列,则需要以不同的方式进行操作,但是您需要了解数据的架构,否则将无法执行以下操作:
If you have lots of columns, you would need to proceed differently but you need to know the schema of your data otherwise you'll not be able to perform the following :
val sch = df.schema // I just took the schema from the old df but you can add one programmatically
val df2 = spark.createDataFrame(sc.parallelize(Lz,2).map{ Row.fromSeq(_) }, sch)
df2.show
// +---+---+---+---+
// | _1| _2| _3| _4|
// +---+---+---+---+
// | 1| 2| 3| 4|
// | 1| 2| 3| 4|
// +---+---+---+---+
除非提供模式,否则除了拥有数组列之外,您将无能为力:
Unless you provide a schema, you won't be able to do much except having an array column :
val df3 = sc.parallelize(Lz,2).toDF
// df3: org.apache.spark.sql.DataFrame = [value: array<int>]
df3.show
// +------------+
// | value|
// +------------+
// |[1, 2, 3, 4]|
// |[1, 2, 3, 4]|
// +------------+
df3.printSchema
//root
// |-- value: array (nullable = true)
// | |-- element: integer (containsNull = false)
这篇关于怎么把RDD [List [Int]]转换成DataFrame?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文