如何将一行数组平面映射为多行? [英] How do I flatMap a row of arrays into multiple rows?
本文介绍了如何将一行数组平面映射为多行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
解析一些jsons后,我有一个一列的数组DataFrame
After parsing some jsons I have a one-column DataFrame of arrays
scala> val jj =sqlContext.jsonFile("/home/aahu/jj2.json")
res68: org.apache.spark.sql.DataFrame = [r: array<bigint>]
scala> jj.first()
res69: org.apache.spark.sql.Row = [List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]
我想将每一行分解成几行.怎么样?
I'd like to explode each row out into several rows. How?
原始json文件:
{"r": [0,1,2,3,4,5,6,7,8,9]}
{"r": [0,1,2,3,4,5,6,7,8,9]}
我想要一个 20 行的 RDD 或 DataFrame.
I want an RDD or a DataFrame with 20 rows.
我不能在这里简单地使用 flatMap - 我不确定 spark 中合适的命令是什么:
I can't simply use flatMap here - I'm not sure what the appropriate command in spark is:
scala> jj.flatMap(r => r)
<console>:22: error: type mismatch;
found : org.apache.spark.sql.Row
required: TraversableOnce[?]
jj.flatMap(r => r)
推荐答案
您可以使用 DataFrame.explode
来实现您的愿望.以下是我在 spark-shell 中使用您的示例 json 数据尝试的内容.
You can use DataFrame.explode
to achieve what you desire. Below is what I tried in spark-shell with your sample json data.
import scala.collection.mutable.ArrayBuffer
val jj1 = jj.explode("r", "r1") {list : ArrayBuffer[Long] => list.toList }
val jj2 = jj1.select($"r1")
jj2.collect
可以参考API文档了解更多DataFrame.explode
You can refer to API documentation to understand more DataFrame.explode
这篇关于如何将一行数组平面映射为多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文