仅返回 BigQuery 表中具有重复项的最新行 [英] Return only the newest rows from a BigQuery table with a duplicate items
问题描述
我有一个包含许多重复项的表 - 许多行具有相同的 id
,也许唯一的区别是 requested_at
列.
I have a table with many duplicate items – Many rows with the same id
, perhaps with the only difference being a requested_at
column.
我想从表中执行 select *
,但只返回具有相同 id
的一行 - 最近请求的.
I'd like to do a select *
from the table, but only return one row with the same id
– the most recently requested.
我已经查看了 group by id
但是我需要为每一列做一个聚合.requested_at
- max(requested_at) as required_at
这很容易 - 但其他的很难.
I've looked into group by id
but then I need to do an aggregate for each column. This is easy with requested_at
– max(requested_at) as requested_at
– but the others are tough.
如何确保获得与最近更新的行相对应的 title
等值?
How do I make sure I get the value for title
, etc that corresponds to that most recently updated row?
推荐答案
我建议采用类似的形式,避免窗口函数中的排序:
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
这篇关于仅返回 BigQuery 表中具有重复项的最新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!