Google BigQuery:滚动计数不同 [英] Google BigQuery: Rolling Count Distinct
问题描述
我有一张桌子,上面只是日期和用户ID(未汇总)的列表.
I have a table with is simply a list of dates and user IDs (not aggregated).
通过计算过去45天内出现的ID的不同数量,我们为给定日期定义了一个称为活跃用户"的指标.
We define a metric called active users for a given date by counting the distinct number of IDs that appear in the previous 45 days.
我正在尝试在BigQuery中运行一个查询,该查询每天返回当天以及该天的活动用户数(计算45天前到今天的不重复用户数).
I am trying to run a query in BigQuery that, for each day, returns the day plus the number of active users for that day (count distinct user from 45 days ago until today).
我已经尝试过使用窗口函数,但是无法弄清楚如何根据列中的日期值定义范围.相反,我相信以下查询将在像MySQL这样的数据库中运行,但不适用于BigQuery.
I have experimented with window functions, but can't figure out how to define a range based on the date values in a column. Instead, I believe the following query would work in a database like MySQL, but does not in BigQuery.
SELECT
day,
(SELECT
COUNT(DISTINCT visid)
FROM daily_users
WHERE day BETWEEN DATE_ADD(t.day, -45, "DAY") AND t.day
) AS active_users
FROM daily_users AS t
GROUP BY 1
这在BigQuery中不起作用:"SELECT子句中不允许子选择."
This doesn't work in BigQuery: "Subselect not allowed in SELECT clause."
如何在BigQuery中执行此操作?
How to do this in BigQuery?
推荐答案
BigQuery 文档声称count(distinct)
作为窗口函数.但是,这对您没有帮助,因为您没有在寻找传统的窗框.
BigQuery documentation claims that count(distinct)
works as a window function. However, that doesn't help you, because you are not looking for a traditional window frame.
一种方法会在访问后为每个日期添加一条记录:
One method would adds a record for each date after a visit:
select theday, count(distinct visid)
from (select date_add(u.day, n.n, "day") as theday, u.visid
from daily_users u cross join
(select 1 as n union all select 2 union all . . .
select 45
) n
) u
group by theday;
注意:在BigQuery中可能会有更简单的方法来生成一系列45个整数.
Note: there may be simpler ways to generate a series of 45 integers in BigQuery.
这篇关于Google BigQuery:滚动计数不同的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!