从BigQuery表中返回具有重复项目的最新行 [英] Return only the newest rows from a BigQuery table with a duplicate items
问题描述
我有一个包含许多重复项的表 - 许多行具有相同的 id
,或许唯一的区别是 requested_at
column。
我想在表中做一个 select *
,但只返回一个行$ id
- 最近请求的。
我看了按id编组
,但是我需要为每列做一个聚合。使用 requested_at
- max(requested_at)作为requested_at
很容易,但其他人很难。
如何确保获得与最近更新的行相对应的 title
等的值?
我建议一个类似的窗体,避免在窗口函数中进行排序:
<$ p $ (< code> SELECT *
FROM(
SELECT
*,
MAX(< timestamp_column>)
OVER(PARTITION BY< id_column> )
AS max_timestamp,
FROM< table>
)
WHERE< timestamp_column> = max_timestamp
I have a table with many duplicate items – Many rows with the same id
, perhaps with the only difference being a requested_at
column.
I'd like to do a select *
from the table, but only return one row with the same id
– the most recently requested.
I've looked into group by id
but then I need to do an aggregate for each column. This is easy with requested_at
– max(requested_at) as requested_at
– but the others are tough.
How do I make sure I get the value for title
, etc that corresponds to that most recently updated row?
I suggest a similar form that avoids a sort in the window function:
SELECT *
FROM (
SELECT
*,
MAX(<timestamp_column>)
OVER (PARTITION BY <id_column>)
AS max_timestamp,
FROM <table>
)
WHERE <timestamp_column> = max_timestamp
这篇关于从BigQuery表中返回具有重复项目的最新行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!