窗口函数的确定性排序顺序 [英] Deterministic sort order for window functions
问题描述
我有一个状态
表,我想获取最新的详细信息。
I've a status
table and I want to fetch the latest details.
Slno | ID | Status | date
1 | 1 | Pass | 15-06-2015 11:11:00 - this is inserted first
2 | 1 | Fail | 15-06-2015 11:11:00 - this is inserted second
3 | 2 | Fail | 15-06-2015 12:11:11 - this is inserted first
4 | 2 | Pass | 15-06-2015 12:11:11 - this is inserted second
我使用带有按ID顺序按日期desc
进行分区以获取第一个值。
例外的输出:
I use a window function with partition by ID order by date desc
to fetch the first value.
Excepted Output :
2 | 1 | Fail | 15-06-2015 11:11:00 - this is inserted second
4 | 2 | Pass | 15-06-2015 12:11:11 - this is inserted second
实际输出:
1 | 1 | Pass | 15-06-2015 11:11:00 - this is inserted first
3 | 2 | Fail | 15-06-2015 12:11:11 - this is inserted first
根据[ http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_order_by_WF .html] ,向window函数添加第二个 ORDER BY
列可以解决该问题。但是我没有任何其他列来区分行!
还有另一种方法可以解决此问题吗?
According to [http://docs.aws.amazon.com/redshift/latest/dg/r_Examples_order_by_WF.html], adding a second ORDER BY
column to the window function may solve the problem. But I don't have any other column to differentiate the rows!
Is there another approach to solve the issue?
编辑:我已经为了清楚起见,在此处添加了 slno
。我在表中没有 slno
!
I've added slno
here for clarity. I don't have slno
as such in the table!
我的SQL:
with range as (
select id from status where date between 01-06-2015 and 30-06-2015
), latest as (
select status, id, row_number() OVER (PARTITION BY id ORDER BY date DESC) row_num
)
select * from latest where row_num = 1
推荐答案
如果您没有 slno
在表中,那么您就没有可靠的信息,而是首先插入哪一行。表中没有自然顺序,行的物理顺序可以随时更改(进行任何更新,或使用 VACUUM
等)
If you don't have slno
in your table, then you don't have any reliable information which row was inserted first. There is no natural order in a table, the physical order of rows can change any time (with any update, or with VACUUM
, etc.)
您可以使用 不可靠 的技巧:按内部 ctid $排序c $ c>。
select *
from (
select id, status
, row_number() OVER (PARTITION BY id
ORDER BY date, ctid) AS row_num
from status -- that's your table name??
where date >= '2015-06-01' -- assuming column is actually a date
and date < '2015-07-01'
) sub
where row_num = 1;
-
在没有其他信息的情况下,哪一行排在最前面(这是设计错误,先对其进行修复!),您可以尝试使用内部元组ID
ctid
In absence of any other information which row came first (which is a design error to begin with, fix it!), you might try to save what you can using the internal tuple ID
ctid
- In-order sequence generation
行在最初插入时将保持物理顺序,但是通过对表的任何写操作或
VACUUM
或其他事件,行可以随时更改。
这是最后的手段,它会会中断。Rows will be in physical order when inserted initially, but that can change any time with any write operation to the table or
VACUUM
or other events.
This is a measure of last resort and it will break.您的出现的查询在以下几个方面无效:第一个CTE中缺少列名,第二个CTE中缺少表名,...
Your presented query was invalid on several counts: missing column name in 1st CTE, missing table name in 2nd CTE, ...
您不需要CTE
更简单,将
DISTINCT ON
(注意事项ctid
应用相同):Simpler with
DISTINCT ON
(considerations forctid
apply the same):SELECT DISTINCT ON (id) id, status FROM status WHERE date >= '2015-06-01' AND date < '2015-07-01' ORDER BY id, date, ctid;
- 选择每个GROUP BY组中的第一行?
- Select first row in each GROUP BY group?
这篇关于窗口函数的确定性排序顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!