如何在查询结果中添加整数唯一 ID - __有效地__? [英] How to add an integer unique id to query results - __efficiently__?

查看:24
本文介绍了如何在查询结果中添加整数唯一 ID - __有效地__?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给定一个查询,select * from ...(可能是 CTAS 语句的一部分)

Given a query, select * from ... (that might be part of CTAS statement)

目标是添加一个额外的列,ID,其中 ID 是一个唯一的整数.

The goal is to add an additional column, ID, where ID is a unique integer.

select ... as ID,* from ...

附言

  • ID 不必是连续的(可能有间隙)
  • ID 可以是任意的(不必表示结果集中的特定顺序)
  • ID does not have to be sequential (there could be gaps)
  • The ID could be arbitrary (doesn't have to represent a specific order within the result set)

row_number 逻辑上解决了问题-

select row_number() over () as ID,* from ...

问题是,至少就目前而言,全局 row_number(没有 partition by)正在使用单个减速器 (hive)/任务 (spark) 实现.

The problem is, that at least for now, global row_number (no partition by) is being implemented using a single reducer (hive) / task (spark).

推荐答案

如果你正在使用 Spark-sql 你最好的选择是使用内置函数

If you are using Spark-sql your best bet would be to use the inbuilt function

monotonicically_increasing_id

monotonically_increasing_id

在单独的列中生成唯一的随机 ID.正如您所说,您不需要它是连续的,因此理想情况下这应该足以满足您的要求.

which generates unique random id in a separate column. And as you said you don't need it to be sequential so this should ideally suffice your requirement.

这篇关于如何在查询结果中添加整数唯一 ID - __有效地__?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆