BigQuery:如何计算每个日期和类别的不同访问者的运行次数? [英] BigQuery: How to calculate the running count of distinct visitors for each day and category?
问题描述
startTime:STRING,visitorId:STRING,category:STRING
此内容示例:
startTime visitorId类别
------------------- --------- --------
2013-11-27 00:00:00 AX
2013-11-27 05:00:00 AX
2013-11-27 07:00:00 BX
2013-11-28 08:00:00 CX
我想得到以下结果:
天分类runningCountOfDistinctVisitors
--------- -------- ------------- -----------------
2013-11-27 X 2
2013-11-28 X 3
我试过了下面的查询,但它似乎不起作用(它已经在1.2M行表上运行了3个多小时,仍然没有完成):
SELECT left(a.startTime,10)as day,
a.category,
count(distinct a.visitorId)as runningCountOfDistinctVisitors
FROM [MyDataset.MyTable] a
LEFT JOIN EACH [MyDataset.MyTable] b ON a.category = b.category
WHERE left(b.startTime,10)< left(a.startTime,10)
GROUP aach by a.category,day
ORDER BY a.category,day
我也尝试过使用分区功能,但是统计不明显似乎不被支持。
试试这个:
ts:timestamp,visitor:string ,category:string
ts访客类别
--------------- -------- ------- --------
2013-11-27 00:00:00 UTC AX
2013-11-27 00: 00:00 UTC AX
2013-11-27 00:00:00 UTC BX
2013-11-28 00:00:00 UTC CX
2013-11-27 00:00: 00 UTC AY
2013-11-28 00:00:00 UTC BY
2013-11-29 00:00:00 UTC CY
查询:
select
day,category,sum (cd)
超过
(按类别按天分类)作为running_total
从(选择日期(ts)作为日期,类别,计数(独立访客)作为cd从
[test.runningtotal] group by day,category)
这会产生:
天分类running_total
---------- -------- --- ----------
2013-11-27 X 2
2013-11-28 X 3
2013-11-27 Y 1
2013-11 -28 Y 2
2013-11-29 Y 3
我没有测试过这个在大数据集上,但它可能比JOIN解决方案快。
In Google BigQuery I have a table like this:
startTime:STRING, visitorId:STRING, category:STRING
Example for this content:
startTime visitorId category
------------------- --------- --------
2013-11-27 00:00:00 A X
2013-11-27 05:00:00 A X
2013-11-27 07:00:00 B X
2013-11-28 08:00:00 C X
I would like to have the following result:
day category runningCountOfDistinctVisitors
--------- -------- ------------------------------
2013-11-27 X 2
2013-11-28 X 3
I have tried the following query but it does not seems to work (it's been running for over 3 hours on 1.2M rows table and still hasn't finished) :
SELECT left(a.startTime,10) as day,
a.category,
count(distinct a.visitorId) as runningCountOfDistinctVisitors
FROM [MyDataset.MyTable] a
LEFT JOIN EACH [MyDataset.MyTable] b ON a.category = b.category
WHERE left(b.startTime,10) < left(a.startTime,10)
GROUP EACH BY a.category, day
ORDER BY a.category, day
I also tried to work with the partition function but count distinct does not seems to be supported.
Try this:
ts:timestamp, visitor:string, category:string
ts visitor category
----------------------- ------- --------
2013-11-27 00:00:00 UTC A X
2013-11-27 00:00:00 UTC A X
2013-11-27 00:00:00 UTC B X
2013-11-28 00:00:00 UTC C X
2013-11-27 00:00:00 UTC A Y
2013-11-28 00:00:00 UTC B Y
2013-11-29 00:00:00 UTC C Y
query:
select
day, category, sum(cd)
over
(partition by category order by day) as running_total
from (select date(ts) as day, category, count(distinct visitor) as cd from
[test.runningtotal] group by day, category)
this will produce:
day category running_total
---------- -------- -------------
2013-11-27 X 2
2013-11-28 X 3
2013-11-27 Y 1
2013-11-28 Y 2
2013-11-29 Y 3
I didn't test this on large dataset but it might be faster than the JOIN solution.
这篇关于BigQuery:如何计算每个日期和类别的不同访问者的运行次数?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!