如何在Dataset< Row>上的groupby之后获取所有列在Spark SQL 2.1.0中 [英] How to get all columns after groupby on Dataset<Row> in spark sql 2.1.0

查看：579 发布时间：2020/9/4 2:56:42 apache-spark apache-spark-sql

本文介绍了如何在Dataset< Row>上的groupby之后获取所有列在Spark SQL 2.1.0中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

首先，我是SPARK的新手

First, I am very new to SPARK

我的数据集中有数百万条记录，我想使用名称"列进行分组，并查找具有最长使用期限的名称.我得到正确的结果，但是我的结果集中需要所有列.

I have millions of records in my Dataset and i wanted to groupby with name column and finding names which having maximum age. I am getting correct results but I need all columns in my resultset.

Dataset<Row> resultset = studentDataSet.select("*").groupBy("name").max("age");
resultset.show(1000,false);

我在结果集数据集中仅获得姓名和max(age).

I am getting only name and max(age) in my resultset dataset.

推荐答案

对于您的解决方案，您必须尝试其他方法.您几乎在那儿寻求解决方案，但让我帮助您理解.

For your solution you have to try different approach. You was almost there for solution but let me help you understand.

Dataset<Row> resultset = studentDataSet.groupBy("name").max("age");

现在您可以做的是可以将resultset与studentDataSet

now what you can do is you can join the resultset with studentDataSet

Dataset<Row> joinedDS = studentDataset.join(resultset, "name");

groupBy的问题是，应用groupBy后您会得到RelationalGroupedDataset，因此这取决于您要执行的下一个操作(如sum, min, mean, max等)，然后这些操作的结果与groupBy结合在一起

The problem with groupBy this that after applying groupBy you get RelationalGroupedDataset so it depends on what next operation you perform like sum, min, mean, max etc then the result of these operation joined with groupBy

在您的情况下，name列与age的max连接在一起，因此它将仅返回两列，但是如果在age上应用apply groupBy然后在'age'上应用max列，您将获得两列，第一列是age，第二列是max(age).

As in you case name column is joined with the max of age so it will return only two columns but if use apply groupBy on age and then apply max on 'age' column you will get two column one is age and second is max(age).

注意:-代码未经测试，请根据需要进行更改希望这可以清除您的查询

Note :- code is not tested please make changes if needed Hope this clears you query

这篇关于如何在Dataset< Row>上的groupby之后获取所有列在Spark SQL 2.1.0中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何在Dataset< Row>上的groupby之后获取所有列在Spark SQL 2.1.0中 [英] How to get all columns after groupby on Dataset<Row> in spark sql 2.1.0

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在Dataset&lt; Row&gt;上的groupby之后获取所有列在Spark SQL 2.1.0中 [英] How to get all columns after groupby on Dataset&lt;Row&gt; in spark sql 2.1.0

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

如何在Dataset< Row>上的groupby之后获取所有列在Spark SQL 2.1.0中 [英] How to get all columns after groupby on Dataset<Row> in spark sql 2.1.0

登录关闭