BigQuery:搜索多个表并使用first_seen和last_seen进行聚合 [英] Bigquery: search multiple tables and aggregate with first_seen and last_seen
本文介绍了BigQuery:搜索多个表并使用first_seen和last_seen进行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
table1
id,timestamp,data
1,1428969600,AAAAA
2,1428969600,CCCCC
[..]
20,1428969600,ZZZZZ
表2
id,时间戳,数据
1,1429056000,AAAAA
2,1429056000,BBBBB
3,1429056000,CCCCC
[..]
20,1429056000,ZZZZZ
table3
id,时间戳,数据
1,1429142400,AAAAA
2,1429142400,BBBBB
3,1429142400,CCCCC
[..]
20,1429142400,ZZZZZ
我想要搜索所有表格(table1,table2和table3 )来查看数据字段中第一个和最后一个字段的值何时出现,并将相关字段timestamp。
这应该是结果:
id,timestamp_first,timestamp_last,data
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ
有人可以给我一些提示,告诉我如何进行搜索这是什么?
Martin
解决方案
在BigQuery中,union的语法是逗号)。然后有两种方法:
- 使用分析函数FIRST_VALUE和LAST_VALUE。
SELECT id,timestamp_first,timestamp_last,data FROM
(SELECT
id,
timestamp,
FIRST_VALUE(timestamp)OVER(
PARTITION BY ID
ORDER BY时间戳ASC
无界前置和无界之间的行)
AS timestamp_first,
LAST_VALUE(时间戳)OVER(
PARTITION BY id
ORDER BY时间戳ASC
无限制前置和无限制之间的行)
AS timestamp_last
FROM table1,table2,table3
- 2>
- 在时间戳上使用聚合MIN / MAX查找第一个/最后一个,然后返回到相同的表中。
SELECT a.id id,timestamp_first,timestamp_last,data FROM
(SELECT id,data FROM table1,table2,table3)a
INNER JOIN
(选择
id,
MIN(时间戳))时间戳_first,
MAX(timestamp)timestamp_last
FROM table1,table2,table3 GROUP BY id)b
ON a.id = b.id
I have a Bigquery database with multiple tables:
table1
id,timestamp,data
1,1428969600,AAAAA
2,1428969600,CCCCC
[..]
20,1428969600,ZZZZZ
table2
id,timestamp,data
1,1429056000,AAAAA
2,1429056000,BBBBB
3,1429056000,CCCCC
[..]
20,1429056000,ZZZZZ
table3
id,timestamp,data
1,1429142400,AAAAA
2,1429142400,BBBBB
3,1429142400,CCCCC
[..]
20,1429142400,ZZZZZ
I want to run a search over all the tables (table1, table2 and table3) to see when the value in the field "data" first and last appeared and take the associated field "timestamp".
This should be the result:
id,timestamp_first, timestamp_last,data
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ
Can someone give me some tips how I can make a search like this?
Martin
解决方案
I would first union the tables (in BigQuery the syntax for union is comma). Then there are two approaches:
- Use analytic functions FIRST_VALUE and LAST_VALUE.
SELECT id, timestamp_first, timestamp_last, data FROM (SELECT id, timestamp, FIRST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_first, LAST_VALUE(timestamp) OVER( PARTITION BY id ORDER BY timestamp ASC ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS timestamp_last FROM table1, table2, table3
- Use aggregation MIN/MAX on timestamp to find first/last and then join back to the same tables.
SELECT a.id id, timestamp_first, timestamp_last, data FROM (SELECT id, data FROM table1,table2,table3) a INNER JOIN (SELECT id, MIN(timestamp) timestamp_first, MAX(timestamp) timestamp_last FROM table1,table2,table3 GROUP BY id) b ON a.id = b.id
这篇关于BigQuery:搜索多个表并使用first_seen和last_seen进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文