Amazon Redshift - 横向列别名参考 [英] Amazon Redshift - lateral column alias reference

查看:33
本文介绍了Amazon Redshift - 横向列别名参考的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于

<块引用>

Amazon Redshift 宣布支持横向列别名引用:

对横向列别名引用的支持使您无需在 SELECT 列表中重复相同的表达式即可编写查询.例如,您可以定义别名 'probability' 并在同一个 select 语句中使用它:

选择点击次数/展示次数作为概率,回合(100 * 概率,1)作为原始数据的百分比;

基本相同:

选择 1 个 AS 列,col + 1 AS col2;

db<>小提琴演示

大多数 SQL RDBMS 将返回错误:Unknown column 'col' in 'field list'

<小时>

它看起来是一个有趣的语言扩展,但有一个警告.如果我有一个不确定的函数怎么办:

select RAND() AS col,col + 1 AS col2-- 如果 RAND() 返回 0.5 那么我会期望-- 0.5 和 1.5-- 我得到:0.3 和 1.7-- 这意味着查询被评估为:选择 RAND() 作为 col,RAND() + 1 AS col2

与来自 PostgreSQL 的 LATERAL JOIN 进行比较(是的,我知道这是不同的功能,我希望lateral coulmn alias" 的行为方式相同):

SELECT s.col, s.col+1 AS col2FROM t ,LATERAL (SELECT RANDOM()) AS s(col)-- 0.19089933477628307 1.190899334776283

db<>小提琴演示

但事实并非如此.我得到了两个独立的运行,如果它是简单的内联",这似乎是有效的:

<块引用>

选择列表

别名在目标列表中定义后立即被识别.您可以在同一目标列表中在它之后定义的其他表达式中使用别名.下面的例子说明了这一点.

横向别名引用的好处是,在同一目标列表中构建更复杂的表达式时,您无需重复别名表达式.当 Amazon Redshift 解析这种类型的引用时,它只是内联之前定义的别名.如果在 FROM 子句中定义了一个与之前的别名表达式同名的列,则 FROM 子句中的列优先.

当我们使用不确定性或时间敏感的函数/引用/子查询时,我的理解是否正确,并且此功能不安全"?

解决方案

这种语法不安全.事实上,仅仅内联代码意味着它甚至不能提供性能优势.它只是语法糖.

鉴于有简单的替代方案——CTE 和子查询——我会避免使用这个新功能".

如果有关闭此功能的设置,我会推荐使用它.

顺便说一句,许多 SQL 新手发现这很令人不安.这样做的目的是为了避免歧义.以下查询应该返回什么?

select (a + 1) as b, bfrom (选择 1 作为 a, 0 作为 b) x;

SQL 的设计者可能认为解决这种情况的规则比仅仅重写子查询更复杂.

我所知道的能够很好地解决这个问题的一个数据库"实际上是 SAS proc SQL.它引入了 calculated 关键字,所以你可以这样写:

select (a + 1) as b, 计算出b, bfrom (选择 1 作为 a, 0 作为 b) x;

这将返回 2, 2, 0.

换句话说,我认为亚马逊并没有在这个功能"的实现上花太多心思.

Based on

Amazon Redshift announces support for lateral column alias reference:

The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement:

select clicks / impressions as probability, 
        round(100 * probability, 1) as percentage from raw_data;

Which is basically the same as:

select 1 AS col
      ,col + 1 AS col2;

db<>fiddle demo

Most SQL RDBMSes will return an error: Unknown column 'col' in 'field list'


It looks like as interesting language extension but there is a caveat. What if I have an undeterministic function:

select RAND() AS col
      ,col + 1 AS col2

-- if RAND() returns 0.5 then I would expect
-- 0.5 and 1.5

-- I get: 0.3 and 1.7
-- it means that the query was evaluated as:
select RAND() AS col,
       RAND() + 1 AS col2

Comparing with LATERAL JOIN from PostgreSQL(yes, I am aware this is different feature, I would expect "lateral coulmn alias" to behave the same way):

SELECT s.col, s.col+1 AS col2
FROM t ,LATERAL (SELECT RANDOM()) AS s(col)  
-- 0.19089933477628307  1.190899334776283

db<>fiddle demo

But it is not a case. I am getting two independent runs which seems to be valid if it is simple "inlining":

SELECT List

The alias is recognized right after it is defined in the target list. You can use an alias in other expressions defined after it in the same target list. The following example illustrates this.

The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the FROM clause as the previously aliased expression, the column in the FROM clause takes priority.

Is my understanding correct and this functionality is not "safe" when we are using undeterministic or time-sensitive function/references/subqueries?

解决方案

This syntax is not safe. In fact, merely inlining the code means that it does not even provide a performance advantage. It is only syntactic sugar.

Given that there are easy alternatives -- CTEs and subqueries -- I would just avoid this new "feature".

If there were a setting to turn this off, I would recommend using it.

Incidentally, many newcomers to SQL find this quite disconcerting. This purpose is to avoid ambiguity. What should the following query return?

select (a + 1) as b, b 
from (select 1 as a, 0 as b) x;

The designers of SQL probably felt that the rules around resolving such situations are more complex than merely rewriting a subquery.

The one "database" that I know of that resolves this well is actually SAS proc SQL. It introduced the calculated keyword, so you can write:

select (a + 1) as b, calculated b, b
from (select 1 as a, 0 as b) x;

And this would return 2, 2, 0.

In other words, I don't think Amazon put much thought into the implementation of this "feature".

这篇关于Amazon Redshift - 横向列别名参考的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆