分组后将Spark DataFrame的行聚合为String [英] Aggregate rows of Spark DataFrame to String after groupby

查看：525 发布时间：2020/9/4 6:20:24 scala apache-spark dataframe

本文介绍了分组后将Spark DataFrame的行聚合为String的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

Spark和Scale我都是新手，可能真的需要一个提示来解决我的问题.因此，我有两个DataFrames A(列ID和名称)和B(列ID和文本)想要加入它们，按ID分组，并将所有文本行组合为一个字符串:

I'm quite new both Spark and Scale and could really need a hint to solve my problem. So I have two DataFrames A (columns id and name) and B (columns id and text) would like to join them, group by id and combine all rows of text into a single String:

+--------+--------+
|      id|    name|
+--------+--------+
|       0|       A|
|       1|       B|
+--------+--------+

+--------+ -------+
|      id|    text|
+--------+--------+
|       0|     one|
|       0|     two|
|       1|   three|
|       1|    four|
+--------+--------+

所需结果:

+--------+--------+----------+
|      id|    name|     texts|
+--------+--------+----------+
|       0|       A|   one two|
|       1|       B|three four|
+--------+--------+----------+

到目前为止，我正在尝试以下方法:

So far I'm trying the following:

var C = A.join(B, "id")
var D = C.groupBy("id", "name").agg(collect_list("text") as "texts")

除了我的texts列是String Array而不是String之外，这还不错.非常感谢您的帮助.

This works quite well besides that my texts column is an Array of Strings instead of a String. I would appreciate some help very much.

推荐答案

我只是在您的功能中添加一些次要功能以提供正确的解决方案，即

I am just adding some minor functions in yours to give the right solution, which is

A.join(B, Seq("id"), "left").orderBy("id").groupBy("id", "name").agg(concat_ws(" ", collect_list("text")) as "texts")

这篇关于分组后将Spark DataFrame的行聚合为String的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

分组后将Spark DataFrame的行聚合为String [英] Aggregate rows of Spark DataFrame to String after groupby

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

分组后将Spark DataFrame的行聚合为String [英] Aggregate rows of Spark DataFrame to String after groupby

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭