BigQuery:如何计算每个日期和类别的不同访问者的运行次数? [英] BigQuery: How to calculate the running count of distinct visitors for each day and category?

查看:115
本文介绍了BigQuery:如何计算每个日期和类别的不同访问者的运行次数?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Google BigQuery中,我有一个这样的表:


startTime:STRING,visitorId:STRING,category:STRING


此内容示例:

  startTime visitorId类别
------------------- --------- --------
2013-11-27 00:00:00 AX
2013-11-27 05:00:00 AX
2013-11-27 07:00:00 BX
2013-11-28 08:00:00 CX

我想得到以下结果:

 天分类runningCountOfDistinctVisitors 
--------- -------- ------------- -----------------
2013-11-27 X 2
2013-11-28 X 3

我试过了下面的查询,但它似乎不起作用(它已经在1.2M行表上运行了3个多小时,仍然没有完成):

  SELECT left(a.startTime,10)as day,
a.category,
count(distinct a.visitorId)as runningCountOfDistinctVisitors
FROM [MyDataset.MyTable] a
LEFT JOIN EACH [MyDataset.MyTable] b ON a.category = b.category
WHERE left(b.startTime,10)< left(a.startTime,10)
GROUP aach by a.category,day
ORDER BY a.category,day

我也尝试过使用分区功能,但是统计不明显似乎不被支持。

解决方案

试试这个:

ts:timestamp,visitor:string ,category:string


  ts访客类别
--------------- -------- ------- --------
2013-11-27 00:00:00 UTC AX
2013-11-27 00: 00:00 UTC AX
2013-11-27 00:00:00 UTC BX
2013-11-28 00:00:00 UTC CX
2013-11-27 00:00: 00 UTC AY
2013-11-28 00:00:00 UTC BY
2013-11-29 00:00:00 UTC CY

查询:

  select 
day,category,sum (cd)
超过
(按类别按天分类)作为running_total
从(选择日期(ts)作为日期,类别,计数(独立访客)作为cd从
[test.runningtotal] group by day,category)

这会产生:

 天分类running_total 
---------- -------- --- ----------
2013-11-27 X 2
2013-11-28 X 3
2013-11-27 Y 1
2013-11 -28 Y 2
2013-11-29 Y 3

我没有测试过这个在大数据集上,但它可能比JOIN解决方案快。


In Google BigQuery I have a table like this:

startTime:STRING, visitorId:STRING, category:STRING

Example for this content:

startTime            visitorId   category
-------------------  ---------   --------
2013-11-27 00:00:00     A           X         
2013-11-27 05:00:00     A           X 
2013-11-27 07:00:00     B           X 
2013-11-28 08:00:00     C           X 

I would like to have the following result:

day         category  runningCountOfDistinctVisitors  
---------   --------  ------------------------------   
2013-11-27     X                   2
2013-11-28     X                   3

I have tried the following query but it does not seems to work (it's been running for over 3 hours on 1.2M rows table and still hasn't finished) :

SELECT left(a.startTime,10) as day, 
a.category,
count(distinct a.visitorId) as runningCountOfDistinctVisitors
FROM [MyDataset.MyTable] a 
LEFT JOIN EACH [MyDataset.MyTable] b ON a.category = b.category 
WHERE left(b.startTime,10) < left(a.startTime,10)
GROUP EACH BY a.category, day
ORDER BY a.category, day

I also tried to work with the partition function but count distinct does not seems to be supported.

解决方案

Try this:

ts:timestamp, visitor:string, category:string

ts                       visitor  category
-----------------------  -------  --------
2013-11-27 00:00:00 UTC  A        X  
2013-11-27 00:00:00 UTC  A        X  
2013-11-27 00:00:00 UTC  B        X  
2013-11-28 00:00:00 UTC  C        X  
2013-11-27 00:00:00 UTC  A        Y  
2013-11-28 00:00:00 UTC  B        Y  
2013-11-29 00:00:00 UTC  C        Y

query:

select 
  day, category, sum(cd) 
over
  (partition by category order by day) as running_total
from (select date(ts) as day, category, count(distinct visitor) as cd from
  [test.runningtotal] group by day, category)

this will produce:

day         category  running_total
----------  --------  -------------
2013-11-27  X         2  
2013-11-28  X         3  
2013-11-27  Y         1  
2013-11-28  Y         2  
2013-11-29  Y         3

I didn't test this on large dataset but it might be faster than the JOIN solution.

这篇关于BigQuery:如何计算每个日期和类别的不同访问者的运行次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆