为什么LAST_VALUE不返回最后一个值? [英] Why doesn't LAST_VALUE return the last value?

查看:170
本文介绍了为什么LAST_VALUE不返回最后一个值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用如下查询在有序分区上找到y的最后一个值:

I want to find the last value of y over an ordered partition using a query like this:

SELECT
  x,
  LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC)
FROM table

但是LAST_VALUE返回的值不是给定分区的y的最后一个值(在这种情况下为最大值).为什么?

But LAST_VALUE returns lots of values that aren't the last value (in this case, the largest value) of y for a given partition. Why?

(在这种情况下,可以使用MAX代替LAST_VALUE来找到最大值,但是为什么LAST_VALUE也不返回最大值?)

(In this case, MAX can be used instead of LAST_VALUE to find the largest value, but why doesn't LAST_VALUE return the max value too?)

推荐答案

TLDR:您想要的查询是:

SELECT
  x,
  LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM table

可能后跟GROUP BY折叠来自分析函数的重复输出行.

Possibly followed by GROUP BY to collapse duplicate output rows from the analytic function.

当然,如果需要的话,只在无序分区上使用MAX会更简单:

And of course, it's simpler to just use MAX over an unordered partition if that's all you need:

SELECT
  x,
  MAX(y) OVER (PARTITION BY x)
FROM table


在回答这个问题之前,这里有一些解析函数的背景知识(又称窗口函数).以下所有内容都是标准SQL,并非特定于BigQuery.


Before answering this question, here's a little background on analytic functions (a.k.a. window functions). All of the below is standard SQL and not specific to BigQuery.

首先,分析函数不是聚合函数.聚合函数将多个输入行折叠为一个输出行,而分析函数为每个输入行仅计算一个输出行.因此,您需要确保正在考虑每个输入行的输出是什么.

First, analytic functions are not aggregation functions. Whereas aggregation functions collapse multiple input rows into a single output row, analytic functions compute exactly one output row for every input row. So you need to make sure you're thinking about what the output is for every input row.

第二,分析函数在行的窗口"上操作,该窗口是该行所属的分区"的子集.输入行的分区由PARTITION BY子句确定,或者如果您希望该分区是整个输入行集,则可以将其忽略.窗口由ROWS子句提供,但是如果您未指定窗口(用户通常不指定),则默认为整个分区(不应用任何顺序时)或该分区中的行集从第一行到当前行(当存在ORDER BY时).请注意,分区中每个输入行的窗口可能会有所不同!

Second, analytic functions operate over a "window" of rows that is a subset of the "partition" to which the row belongs. The partition for an input row is determined by the PARTITION BY clause, or you can omit it if you want the partition to be the entire set of input rows. The window is given by the ROWS clause, but if you don't specify it (and users usually don't), it defaults to either the entire partition (when no ordering is applied) or the set of rows in the partition from the first row to the current row (when an ORDER BY is present). Note that the window can differ for each input row in a partition!

现在,返回到LAST_VALUE.尽管上述默认窗口在许多情况下是合理的(例如,计算累加总和),但是在LAST_VALUE下,它的工作效果非常差. LAST_VALUE函数返回窗口中最后一行的值,默认情况下,窗口中的最后一行是当前行.

Now, back to LAST_VALUE. Although the default window described above is reasonable in many cases (e.g., computing cumulative sums), it works spectacularly poorly with LAST_VALUE. The LAST_VALUE function returns the value of the last row in the window, and by default the last row in the window is the current row.

因此,要解决此问题,您需要明确指定LAST_VALUE的窗口是整个分区,而不仅仅是整个当前行.您可以按照以下步骤进行操作:

So to fix the problem, you need to explicitly specify that the window for LAST_VALUE is the entire partition, not just the rows up to the current row. You can do so as follows:

SELECT x, LAST_VALUE(y) OVER (PARTITION BY x ORDER BY y ASC
  ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM table

要对此进行测试,下面是一个示例:

To test this out, here's an example:

SELECT
  x,
  FIRST_VALUE(x) OVER (ORDER BY x ASC) first_asc,
  FIRST_VALUE(x) OVER (ORDER BY x DESC) first_desc,
  LAST_VALUE(x) OVER (ORDER BY x ASC) last_asc,
  LAST_VALUE(x) OVER (ORDER BY x DESC) last_desc,
FROM
  (SELECT 4 as x),
  (SELECT 2 as x),
  (SELECT 1 as x),
  (SELECT 3 as x)

x,first_asc,first_desc,last_asc,last_desc
1,1,4,1,1
2,1,4,2,2
3,1,4,3,3
4,1,4,4,4

请注意,LAST_VALUE返回1、2、3、4而不是4,因为每个输入行的窗口都会改变.

Note that LAST_VALUE returns 1, 2, 3, 4 instead of just 4 because the window changes for each input row.

现在,我们指定一个窗口作为整个分区:

Now let's specify a window that is the entire partition:

SELECT
  x,
  FIRST_VALUE(x) OVER (ORDER BY x ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) first_asc,
  FIRST_VALUE(x) OVER (ORDER BY x DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) first_desc,
  LAST_VALUE(x) OVER (ORDER BY x ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) last_asc,
  LAST_VALUE(x) OVER (ORDER BY x DESC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) last_desc,
FROM
  (SELECT 4 as x),
  (SELECT 2 as x),
  (SELECT 1 as x),
  (SELECT 3 as x)

x,first_asc,first_desc,last_asc,last_desc
1,1,4,4,1
2,1,4,4,1
3,1,4,4,1
4,1,4,4,1

现在,我们按预期获得了LAST_VALUE的4分.

Now we get 4 for LAST_VALUE as expected.

这篇关于为什么LAST_VALUE不返回最后一个值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆