Spark 中的 Dataframe 连接可以保留顺序吗? [英] Can Dataframe joins in Spark preserve order?

查看：35 发布时间：2021/11/14 22:57:17 apache-spark dataframe spark-dataframe

本文介绍了Spark 中的 Dataframe 连接可以保留顺序吗?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我目前正在尝试将两个 DataFrame 连接在一起，但在其中一个 DataFrame 中保留相同的顺序.

I'm currently trying to join two DataFrames together but retain the same order in one of the Dataframes.

从哪些操作保留 RDD 顺序?，似乎(如果这是不准确的，因为我是 Spark 的新手)连接不保留顺序，因为由于数据位于不同的分区中，行连接/到达"最终数据帧的顺序不是指定的顺序.

From Which operations preserve RDD order?, it seems that (correct me if this is inaccurate because I'm new to Spark) joins do not preserve order because rows are joined / "arrive" at the final dataframe not in a specified order due to the data being in different partitions.

如何在保留一张表的顺序的同时执行两个 DataFrame 的连接?

How could one perform a join of two DataFrames while preserving the order of one table?

例如，

<代码>+------------+---------+|列 1 |col2 |+------------+------------+|0 |||1 |乙 |+------------+---------+

加入

<代码>+------------+---------+|col2 |col3 |+------------+------------+|乙 |× |||是 |+------------+---------+

on col2 应该给

<代码>+------------+------------+|列 1 |col2 |第 3 列 |+------------+---------+----------+|0 ||是 ||1 |乙 |× |+------------+---------+---------+

我听说过一些关于使用 coalesce 或 repartition 的事情，但我不确定.任何建议/方法/见解表示赞赏.

I've heard some things about using coalesce or repartition, but I'm not sure. Any suggestions/methods/insights are appreciated.

编辑:这是否类似于在 MapReduce 中使用一个 reducer?如果是这样，那在 Spark 中会是什么样子?

Edit: would this be analogous to having one reducer in MapReduce? If so, how would that look like in Spark?

Spark 中的 Dataframe 连接可以保留顺序吗? [英] Can Dataframe joins in Spark preserve order?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

Spark 中的 Dataframe 连接可以保留顺序吗? [英] Can Dataframe joins in Spark preserve order?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭