FIRST_VALUE() 中的意外行为与 IGNORE NULLS (Vertica) [英] Unexpected behavior in FIRST_VALUE() with IGNORE NULLS (Vertica)

查看:80
本文介绍了FIRST_VALUE() 中的意外行为与 IGNORE NULLS (Vertica)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 Vertica 的带有 IGNORE NULLS 参数的 FIRST_VALUE() 分析函数中看到了意外行为.它似乎不应该返回 NULL.

I'm seeing unexpected behavior in Vertica's FIRST_VALUE() analytic function with the IGNORE NULLS parameter. It appears to return NULL when it shouldn't.

问题发生在这个非常小的表中:

The issue occurs in this very tiny table:

drop table if exists temp;
create table temp (time_ timestamp(6), name varchar(10));
insert into temp (time_) values ('2016-03-18 20:32:16.144');
insert into temp (time_, name) values ('2016-03-18 20:52:09.062', 'abc');

这里是表格的内容(select * from temp):

Here are the contents of the table (select * from temp):

time_                   | name
------------------------+--------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc

这是我正在运行的查询:

Here is the query I'm running:

select time_,
  first_value(name ignore nulls) over (order by time_) first_name
from temp;

以下是此查询返回的结果:

Here are the results this query returns:

time_                   | first_name
------------------------+------------
2016-03-18 20:32:16.144 | <null>
2016-03-18 20:52:09.062 | abc

以下是我期望(和希望)从这个查询中得到的结果:

Here are the results I would expect (and desire) from this query:

time_                   | first_name
------------------------+------------
2016-03-18 20:32:16.144 | abc
2016-03-18 20:52:09.062 | abc

上面的查询是否有一个非常基本的语法错误?此问题出现在 Vertica Community Edition 7.1.1 上.

Does the above query have a very fundamental syntax mistake? This issue occurs on Vertica Community Edition 7.1.1.

推荐答案

该功能按预期工作.
over (order by time_)over (order by time_range unbounded缨) 的捷径和当前行),这意味着每一行只能看到它前面的行,包括它自己.
第一行只能看到它自己,因此其范围内没有非 NULL 值.

The function works as expected.
over (order by time_) is a shortcut for over (order by time_ range unbounded preceding) which is a shortcut for over (order by time_ range between unbounded preceding and current row), which means every row sees only the rows that preceded it, including itself.
The first row sees only itself therefore there isn't a non NULL value in its scope.

如果你想要整个范围的第一个非 NULL 值,你必须指定整个范围:

If you want the first non NULL value of the whole scope, you have to specify the whole scope:

first_value(name ignore nulls) over 
    (order by time_ range between unbounded preceding and unbounded following) first_name

<小时>

不,这绝对不是错误.


No, this is definitly not a bug.

您可能一直在使用诸如 sum(x) over (order by y) 之类的语法来运行总计,而 RANGE UNBOUNDED PRECEDING 的默认窗口对您来说似乎很自然.
由于您没有为 FIRST_VALUE 函数定义显式窗口,因此您一直在使用相同的默认窗口.

You've probably have been using syntax like sum(x) over (order by y) for running totals and the default window of RANGE UNBOUNDED PRECEDING seemed very natural to you.
Since you had not define an explicit window for the FIRST_VALUE function, you have been using the same default window.

这是另一个测试用例:

ts val
-- ----
1  NULL
2  X
3  NULL
4  Y
5  NULL

您希望从以下函数中得到什么?

What would you expect to get from the following function?

last_value (val) order (by ts)

您希望从以下函数中得到什么?

What would you expect to get from the following function?

last_value (val ignore nulls) order (by ts)

这篇关于FIRST_VALUE() 中的意外行为与 IGNORE NULLS (Vertica)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆