Amazon Redshift-横向列别名参考 [英] Amazon Redshift - lateral column alias reference

查看:102
本文介绍了Amazon Redshift-横向列别名参考的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

基于

Amazon Redshift宣布支持横向列别名引用:

对横向列别名引用的支持使您可以编写查询,而不必在SELECT列表中重复相同的表达式.例如,您可以定义别名概率"并在同一select语句中使用它:

select clicks / impressions as probability, 
        round(100 * probability, 1) as percentage from raw_data;

与以下基本相同:

select 1 AS col
      ,col + 1 AS col2;

db<>小提琴演示

大多数SQL RDBMS都将返回错误:Unknown column 'col' in 'field list'


它看起来像是有趣的语言扩展,但有一个警告.如果我的功能不确定,该怎么办:

select RAND() AS col
      ,col + 1 AS col2

-- if RAND() returns 0.5 then I would expect
-- 0.5 and 1.5

-- I get: 0.3 and 1.7
-- it means that the query was evaluated as:
select RAND() AS col,
       RAND() + 1 AS col2

与PostgreSQL的LATERAL JOIN相比(是的,我知道这是不同的功能,我希望侧面库伦别名"的行为相同):

SELECT s.col, s.col+1 AS col2
FROM t ,LATERAL (SELECT RANDOM()) AS s(col)  
-- 0.19089933477628307  1.190899334776283

db<>小提琴演示

但事实并非如此.我得到了两次独立运行,如果简单地进行内联",这似乎是有效的:

选择列表

在目标列表中定义别名后立即识别别名.您可以在同一目标列表中在别名之后定义的其他表达式中使用别名.以下示例对此进行了说明.

横向别名引用的好处是,在同一目标列表中构建更复杂的表达式时,无需重复使用别名表达式.当Amazon Redshift解析此类引用时,它只是内联先前定义的别名.如果在FROM子句中定义的名称与先前别名的表达式相同,则FROM子句中的列优先.

我的理解正确吗?当我们使用不确定性或时间敏感的功能/引用/子查询时,此功能不是安全的"吗?

解决方案

此语法不安全.实际上,仅内联代码意味着它甚至没有提供性能优势.只是语法糖.

鉴于存在简单的替代方法-CTE和子查询–我只是避免使用这种新的功能".

如果有将其关闭的设置,我建议您使用它.

偶然地,许多SQL的新手都感到这很令人不安.目的是避免歧义.以下查询应返回什么?

select (a + 1) as b, b 
from (select 1 as a, 0 as b) x;

SQL的设计者可能认为解决这种情况的规则比仅重写子查询要复杂得多.

据我所知,可以很好地解决这一问题的一个数据库"实际上是SAS proc SQL.它引入了calculated关键字,因此您可以编写:

select (a + 1) as b, calculated b, b
from (select 1 as a, 0 as b) x;

这将返回2, 2, 0.

换句话说,我认为亚马逊并没有为实现这一功能"投入太多精力.

Based on

Amazon Redshift announces support for lateral column alias reference:

The support for lateral column alias reference enables you to write queries without repeating the same expressions in the SELECT list. For example, you can define the alias 'probability' and use it within the same select statement:

select clicks / impressions as probability, 
        round(100 * probability, 1) as percentage from raw_data;

Which is basically the same as:

select 1 AS col
      ,col + 1 AS col2;

db<>fiddle demo

Most SQL RDBMSes will return an error: Unknown column 'col' in 'field list'


It looks like as interesting language extension but there is a caveat. What if I have an undeterministic function:

select RAND() AS col
      ,col + 1 AS col2

-- if RAND() returns 0.5 then I would expect
-- 0.5 and 1.5

-- I get: 0.3 and 1.7
-- it means that the query was evaluated as:
select RAND() AS col,
       RAND() + 1 AS col2

Comparing with LATERAL JOIN from PostgreSQL(yes, I am aware this is different feature, I would expect "lateral coulmn alias" to behave the same way):

SELECT s.col, s.col+1 AS col2
FROM t ,LATERAL (SELECT RANDOM()) AS s(col)  
-- 0.19089933477628307  1.190899334776283

db<>fiddle demo

But it is not a case. I am getting two independent runs which seems to be valid if it is simple "inlining":

SELECT List

The alias is recognized right after it is defined in the target list. You can use an alias in other expressions defined after it in the same target list. The following example illustrates this.

The benefit of the lateral alias reference is you don't need to repeat the aliased expression when building more complex expressions in the same target list. When Amazon Redshift parses this type of reference, it just inlines the previously defined aliases. If there is a column with the same name defined in the FROM clause as the previously aliased expression, the column in the FROM clause takes priority.

Is my understanding correct and this functionality is not "safe" when we are using undeterministic or time-sensitive function/references/subqueries?

解决方案

This syntax is not safe. In fact, merely inlining the code means that it does not even provide a performance advantage. It is only syntactic sugar.

Given that there are easy alternatives -- CTEs and subqueries -- I would just avoid this new "feature".

If there were a setting to turn this off, I would recommend using it.

Incidentally, many newcomers to SQL find this quite disconcerting. This purpose is to avoid ambiguity. What should the following query return?

select (a + 1) as b, b 
from (select 1 as a, 0 as b) x;

The designers of SQL probably felt that the rules around resolving such situations are more complex than merely rewriting a subquery.

The one "database" that I know of that resolves this well is actually SAS proc SQL. It introduced the calculated keyword, so you can write:

select (a + 1) as b, calculated b, b
from (select 1 as a, 0 as b) x;

And this would return 2, 2, 0.

In other words, I don't think Amazon put much thought into the implementation of this "feature".

这篇关于Amazon Redshift-横向列别名参考的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆