如何在查询结果中添加整数唯一 ID - 有效地? [英] How to add an integer unique id to query results - efficiently?

查看：24 发布时间：2021/11/14 22:11:49 hadoop apache-spark hive apache-spark-sql hiveql

本文介绍了如何在查询结果中添加整数唯一 ID - __有效地__?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

给定一个查询，select * from ...(可能是 CTAS 语句的一部分)

Given a query, select * from ... (that might be part of CTAS statement)

目标是添加一个额外的列，ID，其中 ID 是一个唯一的整数.

The goal is to add an additional column, ID, where ID is a unique integer.

select ... as ID,* from ...

附言

ID does not have to be sequential (there could be gaps)
The ID could be arbitrary (doesn't have to represent a specific order within the result set)

row_number 逻辑上解决了问题-

select row_number() over () as ID,* from ...

问题是，至少就目前而言，全局 row_number(没有 partition by)正在使用单个减速器 (hive)/任务 (spark) 实现.

The problem is, that at least for now, global row_number (no partition by) is being implemented using a single reducer (hive) / task (spark).

如何在查询结果中添加整数唯一 ID - __有效地__? [英] How to add an integer unique id to query results - __efficiently__?