星火不上不下dataframes [英] Spark flattening out dataframes

查看：234 发布时间：2016/5/22 16:30:18 scala apache-spark spark-dataframe flatmap spark-jobserver

本文介绍了星火不上不下dataframes的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

入门的火花，我想知道如何 flatmap 或爆炸一个数据帧。

getting started with spark I would like to know how to flatmap or explode a dataframe.

它使用 df.groupBy（columName）创建计数并具有以下结构，如果我收集它。

It was created using df.groupBy("columName").count and has the following structure if I collect it:

 [[Key1, count], [Key2, count2]]

不过，我更希望有类似

But I would rather like to have something like

Map(bar -> 1, foo -> 1, awesome -> 1)

什么是实现这样的合适的工具？ Flatmap，爆炸或其他什么东西？

What is the right tool to achieve something like this? Flatmap, explode or something else?

上下文：我想用火花jobserver。它似乎只提供情况有意义的结果（例如工作JSON序列化）我在后者forml提供数据

Context: I want to use spark-jobserver. It only seems to provide meaningful results (e.g. a working json serialization) in case I supply the data in the latter forml

推荐答案

我假设你调用收集或 collectAsList 上的数据帧？这将返回一个数组[行] / 列表[行] 。

I'm assuming you're calling collect or collectAsListon the DataFrame? That would return an Array[Row] / List[Row].

如果这样 - 转变到这些地图的最简单方法是使用下面的RDD，其recrods映射到键值元组，并使用 collectAsMap ：

If so - the easiest way to transform these into maps is to use the underlying RDD, map its recrods into key-value tuples and use collectAsMap:

def counted = df.groupBy("columName").count()
// obviously, replace "keyColumn" and "valueColumn" with your actual column names
def result = counted.rdd.map(r => (r.getAs[String]("keyColumn"), r.getAs[Long]("valueColumn"))).collectAsMap()

结果的类型地图[字符串，龙] 预期。

这篇关于星火不上不下dataframes的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

星火不上不下dataframes [英] Spark flattening out dataframes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

星火不上不下dataframes [英] Spark flattening out dataframes

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭