BigQuery:搜索多个表并使用first_seen和last_seen进行聚合 [英] Bigquery: search multiple tables and aggregate with first_seen and last_seen

查看:354
本文介绍了BigQuery:搜索多个表并使用first_seen和last_seen进行聚合的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  table1 
id,timestamp,data
1,1428969600,AAAAA
2,1428969600,CCCCC
[..]
20,1428969600,ZZZZZ

表2
id,时间戳,数据
1,1429056000,AAAAA
2,1429056000,BBBBB
3,1429056000,CCCCC
[..]
20,1429056000,ZZZZZ

table3
id,时间戳,数据
1,1429142400,AAAAA
2,1429142400,BBBBB
3,1429142400,CCCCC
[..]
20,1429142400,ZZZZZ

我想要搜索所有表格(table1,table2和table3 )来查看数据字段中第一个和最后一个字段的值何时出现,并将相关字段timestamp。

这应该是结果:

  id,timestamp_first,timestamp_last,data 
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ

有人可以给我一些提示,告诉我如何进行搜索这是什么?



Martin

解决方案

在BigQuery中,union的语法是逗号)。然后有两种方法:


  1. 使用分析函数FIRST_VALUE和LAST_VALUE。



 SELECT id,timestamp_first,timestamp_last,data FROM 
(SELECT
id,
timestamp,
FIRST_VALUE(timestamp)OVER(
PARTITION BY ID
ORDER BY时间戳ASC
无界前置和无界之间的行)
AS timestamp_first,
LAST_VALUE(时间戳)OVER(
PARTITION BY id
ORDER BY时间戳ASC
无限制前置和无限制之间的行)
AS timestamp_last
FROM table1,table2,table3



    2>
  1. 在时间戳上使用聚合MIN / MAX查找第一个/最后一个,然后返回到相同的表中。



 SELECT a.id id,timestamp_first,timestamp_last,data FROM 
(SELECT id,data FROM table1,table2,table3)a
INNER JOIN
(选择
id,
MIN(时间戳))时间戳_first,
MAX(timestamp)timestamp_last
FROM table1,table2,table3 GROUP BY id)b
ON a.id = b.id


I have a Bigquery database with multiple tables:

table1
    id,timestamp,data
    1,1428969600,AAAAA
    2,1428969600,CCCCC
    [..]
    20,1428969600,ZZZZZ

table2
    id,timestamp,data
    1,1429056000,AAAAA
    2,1429056000,BBBBB
    3,1429056000,CCCCC
    [..]
    20,1429056000,ZZZZZ

table3
    id,timestamp,data
    1,1429142400,AAAAA
    2,1429142400,BBBBB
    3,1429142400,CCCCC
    [..]
    20,1429142400,ZZZZZ

I want to run a search over all the tables (table1, table2 and table3) to see when the value in the field "data" first and last appeared and take the associated field "timestamp".

This should be the result:

id,timestamp_first, timestamp_last,data
1,1428969600,1429142400,AAAAA
2,1429056000,1429142400,BBBBB
3,1428969600,1429142400,CCCCC
[..]
20,1428969600,1429142400,ZZZZZ

Can someone give me some tips how I can make a search like this?

Martin

解决方案

I would first union the tables (in BigQuery the syntax for union is comma). Then there are two approaches:

  1. Use analytic functions FIRST_VALUE and LAST_VALUE.

SELECT id, timestamp_first, timestamp_last, data FROM
(SELECT 
  id,
  timestamp,
  FIRST_VALUE(timestamp) OVER(
    PARTITION BY id
    ORDER BY timestamp ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  AS timestamp_first,
  LAST_VALUE(timestamp) OVER(
    PARTITION BY id
    ORDER BY timestamp ASC
    ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
  AS timestamp_last
FROM table1, table2, table3

  1. Use aggregation MIN/MAX on timestamp to find first/last and then join back to the same tables.

SELECT a.id id, timestamp_first, timestamp_last, data FROM
(SELECT id, data FROM table1,table2,table3) a
INNER JOIN
(SELECT 
   id, 
   MIN(timestamp) timestamp_first,
   MAX(timestamp) timestamp_last 
 FROM table1,table2,table3 GROUP BY id) b
ON a.id = b.id

这篇关于BigQuery:搜索多个表并使用first_seen和last_seen进行聚合的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆