星火不上不下dataframes [英] Spark flattening out dataframes
问题描述
入门的火花,我想知道如何 flatmap
或爆炸
一个数据帧。
getting started with spark I would like to know how to flatmap
or explode
a dataframe.
它使用 df.groupBy(columName)创建计数
并具有以下结构,如果我收集它。
It was created using df.groupBy("columName").count
and has the following structure if I collect it:
[[Key1, count], [Key2, count2]]
不过,我更希望有类似
But I would rather like to have something like
Map(bar -> 1, foo -> 1, awesome -> 1)
什么是实现这样的合适的工具? Flatmap,爆炸或其他什么东西?
What is the right tool to achieve something like this? Flatmap, explode or something else?
上下文:我想用火花jobserver。它似乎只提供情况有意义的结果(例如工作JSON序列化)我在后者forml提供数据
Context: I want to use spark-jobserver. It only seems to provide meaningful results (e.g. a working json serialization) in case I supply the data in the latter forml
推荐答案
我假设你调用收集
或 collectAsList
上的数据帧?这将返回一个数组[行]
/ 列表[行]
。
I'm assuming you're calling collect
or collectAsList
on the DataFrame? That would return an Array[Row]
/ List[Row]
.
如果这样 - 转变到这些地图的最简单方法是使用下面的RDD,其recrods映射到键值元组,并使用 collectAsMap
:
If so - the easiest way to transform these into maps is to use the underlying RDD, map its recrods into key-value tuples and use collectAsMap
:
def counted = df.groupBy("columName").count()
// obviously, replace "keyColumn" and "valueColumn" with your actual column names
def result = counted.rdd.map(r => (r.getAs[String]("keyColumn"), r.getAs[Long]("valueColumn"))).collectAsMap()
结果
的类型地图[字符串,龙]
预期。
这篇关于星火不上不下dataframes的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!