使用单个查询选择每列的最后一个值 [英] Select the last value of each column, with a single query
问题描述
有以下数据(空白表示NULL):
Having the following data (blank means NULL):
ID ColA ColB ColC
1 15 20
2 11 4
3 3
如何在单个查询中获取每列的最后一个非 NULL 值?所以给定数据的结果是:
How can I get the last not-NULL values of each column in a single query? So the resulting for the given data would be:
ColA ColB ColC
11 3 20
我没有找到太多,似乎做类似于我描述的事情的函数是 COALESCE
,但在我的情况下它没有按预期工作.
I have not found much, the function that seemed to do something similar to what I describe was COALESCE
, but it does not work as expected in my case.
推荐答案
看起来您必须使用普通 SQL 为每列运行单独的查询.对于只有 3 列的小表,@Guffa 的查询应该没问题.
Looks like you would have to run a separate query per column with plain SQL. For a small table and only 3 columns, @Guffa's query should be fine.
您可以使用三个窗口函数在一个查询中执行相同的操作:不确定这是否比三个单独的子查询快:
You can do the same in one query with three window functions: Not sure if this is faster than three individual subqueries:
SELECT first_value(cola) OVER (ORDER BY cola IS NULL, id DESC) AS cola
,first_value(colb) OVER (ORDER BY colb IS NULL, id DESC) AS colb
,first_value(colc) OVER (ORDER BY colc IS NULL, id DESC) AS colc
FROM tbl
LIMIT 1;
count()
作为窗口函数
您还可以利用 count()
不计算 NULL
值这一事实.
count()
as window function
You can also exploit the fact that count()
does not count NULL
values.
WITH x AS (
SELECT CASE WHEN count(cola) OVER w = 1 THEN cola ELSE NULL END AS cola
,CASE WHEN count(colb) OVER w = 1 THEN colb ELSE NULL END AS colb
,CASE WHEN count(colc) OVER w = 1 THEN colc ELSE NULL END AS colc
FROM tbl
-- WHERE id > x -- safe to ignore a certain portion from a large table?
WINDOW w AS (ORDER BY id DESC)
)
SELECT max(cola) AS cola, max(colb) AS colb, max(colc) AS colc
FROM x
对于更大的表和更多的列,递归 CTE 或者程序函数会更快:
For bigger tables and more columns, a recursive CTE or a procedural function will be considerably faster:
WITH RECURSIVE x AS (
SELECT cola, colb, colc
,row_number() OVER (ORDER BY id DESC) AS rn
FROM tbl
)
, y AS (
SELECT rn, cola, colb, colc
FROM x
WHERE rn = 1
UNION ALL
SELECT x.rn
, COALESCE(y.cola,x.cola)
, COALESCE(y.colb,x.colb)
, COALESCE(y.colc,x.colc)
FROM y
JOIN x ON x.rn = y.rn + 1
WHERE y.cola IS NULL OR y.colb IS NULL OR y.colc IS NULL
)
SELECT cola, colb, colc
FROM y
ORDER BY rn DESC
LIMIT 1;
PL/pgSQL 函数
为了获得最佳性能,我的钱都花在了这个上:
PL/pgSQL function
My money is on this one for best performance:
CREATE OR REPLACE FUNCTION f_last_nonull(OUT cola int
, OUT colb int
, OUT colc int) AS
$func$
DECLARE
r record;
BEGIN
FOR r IN
SELECT t.cola, t.colb, t.colc
FROM tbl t
ORDER BY t.id DESC
LOOP
IF cola IS NULL AND r.cola IS NOT NULL THEN cola := r.cola; END IF;
IF colb IS NULL AND r.colb IS NOT NULL THEN colb := r.colb; END IF;
IF colc IS NULL AND r.colc IS NOT NULL THEN colc := r.colc; END IF;
EXIT WHEN NOT (cola IS NULL OR colb IS NULL OR colc IS NULL);
END LOOP;
END
$func$ LANGUAGE plpgsql;
调用:
SELECT * FROM f_last_nonull();
cola | colb | colc
-----+------+------
11 | 3 | 20
使用 EXPLAIN ANALYZE
进行测试.如果您能回来比较解决方案就好了.
Test with EXPLAIN ANALYZE
. Would be nice if you could come back with a comparison of the solutions.
这篇关于使用单个查询选择每列的最后一个值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!