我如何flatMap阵列的一行到多行? [英] How do I flatMap a row of arrays into multiple rows?

查看:2304
本文介绍了我如何flatMap阵列的一行到多行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

解析一些jsons后,我有阵列的一列数据框

 斯卡拉> VAL JJ = sqlContext.jsonFile(/家庭/ aahu / jj2.json)
res68:org.apache.spark.sql.DataFrame = [R:数组< BIGINT>]
斯卡拉> jj.first()
res69:org.apache.spark.sql.Row = [列表(0,1,2,3,4,5,6,7,8,9)]

我想了爆炸每一行成多行。怎么样?

编辑:

原始JSON文件:

  {R:[0,1,2,3,4,5,6,7,8,9]}
{R:[0,1,2,3,4,5,6,7,8,9]}

我要一个RDD或20行的数据帧。

我只是不能在这里使用flatMap - 我不知道在火花相应的命令是:

 斯卡拉> jj.flatMap(R => r)的
<&控制台GT;:22:错误:类型不匹配;
 发现:org.apache.spark.sql.Row
 必需的:[?] TraversableOnce
              jj.flatMap(R => r)的


解决方案

您可以使用 DataFrame.explode 来实现你的愿望。下面是我在火花壳试着与你的样品JSON数据。

 进口scala.collection.mutable.ArrayBuffer
VAL JJ1 = jj.explode(R,R1){列表:ArrayBuffer [龙] => list.toList}
VAL JJ2 = jj1.select($R1)
jj2.collect

您可以参考API文档来了解更多的 DataFrame.explode

After parsing some jsons I have a one-column DataFrame of arrays

scala> val jj =sqlContext.jsonFile("/home/aahu/jj2.json")
res68: org.apache.spark.sql.DataFrame = [r: array<bigint>]
scala> jj.first()
res69: org.apache.spark.sql.Row = [List(0, 1, 2, 3, 4, 5, 6, 7, 8, 9)]

I'd like to explode each row out into several rows. How?

edit:

Original json file:

{"r": [0,1,2,3,4,5,6,7,8,9]}
{"r": [0,1,2,3,4,5,6,7,8,9]}

I want an RDD or a DataFrame with 20 rows.

I can't simply use flatMap here - I'm not sure what the appropriate command in spark is:

scala> jj.flatMap(r => r)
<console>:22: error: type mismatch;
 found   : org.apache.spark.sql.Row
 required: TraversableOnce[?]
              jj.flatMap(r => r)

解决方案

You can use DataFrame.explode to achieve what you desire. Below is what I tried in spark-shell with your sample json data.

import scala.collection.mutable.ArrayBuffer
val jj1 = jj.explode("r", "r1") {list : ArrayBuffer[Long] => list.toList }
val jj2 = jj1.select($"r1")
jj2.collect

You can refer to API documentation to understand more DataFrame.explode

这篇关于我如何flatMap阵列的一行到多行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆