BigQuery在一段时间间隔内选择数据 [英] BigQuery select data within a time interval

查看:202
本文介绍了BigQuery在一段时间间隔内选择数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的数据看起来像


名称|从| To_City |请求日期



Andy |巴黎|伦敦| 08/21/2014 12:00



Lena |科隆|柏林| 08/22/2014 18:00

Andy |巴黎|伦敦| 08/22/2014 06:00



Lisa |罗马| Neapel | 08/25/2014 18:00

Lena |罗马|伦敦| 08/21/2014 20:00



Lisa |罗马| Neapel | 08/24/2014 18:00

Andy |巴黎|伦敦| 08/25/2014 12:00


我想查找一个人在一天内有多少次相同的驱动器请求。我很想收到一张表格:


name |从| To_City | avg请求日期| #请求



Andy |巴黎|伦敦| 08/21/2014 21:00 | 2



Lena |科隆|柏林| 08/22/2014 18:00 | 1



Lisa |罗马| Neapel | 08/25/2014 06:00 | 2



Lena |罗马|伦敦| 08/21/2014 20:00 | 1



Andy |巴黎|伦敦| 08/25/2014 12:00 | 1


这是 group by 子句的结果。但是写一个这样的条件来检查是否有多少相同的请求在初始请求的24小时内是一般可行的?
现在我在Excel中下载数据并在那里执行数据,但有很多数据,因此效率不高......


示例数据:



让我们先建立一个样本数据集:

  select * from(选择'Andy'作为名字,'Paris'作为f,'London'作为'2014-08-21 12:00'作为日期),
(选择'Lena'作为名字,'Koln'作为f ,'柏林','2014-08-22 18:00'为日期),
(选择'Andy'作为'Paris'作为f''伦敦'',2014-08- 22 06:00'as date),
(选择'Lisa'作为名字,'Rome'作为f'Neapel','2014-08-25 18:00'as date),
(选择'Lena'作为名称'Rome'作为f''London'',2014-08-21 20:00'作为日期),
(选择'Lisa'作为名称'Rome'作为f,'Neapel','2014-08-24 18:00'),
(选择'Andy'作为'Paris'作为f''London'','2014- 08-25 12:00'as date)


解方案

做到这一点是使用窗口函数与RANGE窗口的一种方法。为此,首先日期需要转换为天,因为RANGE需要排序列为连续编号。 PARTITION BY子句与GROUP BY类似 - 它列出了定义相同驱动器请求的列(在您的案例中,名称,从和到)。然后,您可以简单地使用COUNT(*)来计算此窗口中的天数。

  select name,f,to,date ,计数(*)
超过(按名称分区,f到
按天排序
在1以前和1之后)从(
选择名称,f,到,日期,整数(时间戳(日期)/ 1000000/60/60/24)日
(选择'Andy'作为'Paris'作为f''London'',2014-08-21 12 :00'as date),
(选择'Lena'作为名字,'Koln'as f,'Berlin','2014-08-22 18:00'as date),
选择Andy作为名字,'Paris'作为f,'London'作为'2014-08-22 06:00'作为日期),
(选择'Lisa'作为名字'Rome'作为f ,'Neapel','2014-08-25 18:00'as date),
(选择'Lena'作为'罗马'作为f''伦敦'',2014-08- 21 20:00'as date),
(选择'Lisa'作为名字,'Rome'作为'Neapel','2014-08-24 18:00'as date),
(选择'Andy'作为名称,'Paris'为f,'London','2014-08-25 12:00'as date))


my data looks like

name| From | To_City | Date of request

Andy| Paris | London| 08/21/2014 12:00

Lena | Koln | Berlin | 08/22/2014 18:00

Andy| Paris | London | 08/22/2014 06:00

Lisa | Rome | Neapel | 08/25/2014 18:00

Lena | Rome | London | 08/21/2014 20:00

Lisa | Rome | Neapel | 08/24/2014 18:00

Andy| Paris | London| 08/25/2014 12:00

I want to find how many identical drive requests a person had within +/- one day. I'd love to receive a table saying:

name| From | To_City | avg Date of request | # requests

Andy| Paris | London| 08/21/2014 21:00 | 2

Lena | Koln | Berlin | 08/22/2014 18:00 | 1

Lisa | Rome | Neapel | 08/25/2014 06:00 | 2

Lena | Rome | London | 08/21/2014 20:00 | 1

Andy| Paris | London| 08/25/2014 12:00 | 1

This would be the result of a group by clause. But is it in general feasible to write such a condition that would check whether and how many identical request there are within 24 hours of an initial request? By now I download the data in Excel and do it there but there is a lot of data and hence it is not efficient...

Sample data:

Let's build a sample dataset first:

select * from (select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date)

解决方案

One way to do it is to use window functions with the RANGE window. In order to do that, first dates need to be converted to days because RANGE requires the sorting column to be sequential numbers. PARTITION BY clause is similar to GROUP BY - it lists the columns that define "identical" drive requests (in your case - name, from and to). Then you can simply use COUNT(*) to count number of days within such window.

select name, f, to, date, count(*) 
  over(partition by name, f, to
       order by day
       range between 1 preceding and 1 following) from (
select name, f, to, date, integer(timestamp(date)/1000000/60/60/24) day from
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-21 12:00' as date),
(select 'Lena' as name,'Koln' as f,'Berlin' as to, '2014-08-22 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-22 06:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-25 18:00' as date),
(select 'Lena' as name,'Rome' as f,'London' as to, '2014-08-21 20:00' as date),
(select 'Lisa' as name,'Rome' as f,'Neapel' as to, '2014-08-24 18:00' as date),
(select 'Andy' as name,'Paris' as f,'London' as to, '2014-08-25 12:00' as date))

这篇关于BigQuery在一段时间间隔内选择数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆