SQL查询回答：如果<事件1> <事件2>发生在时间点A。在B-C时间段内发生？ [英] SQL query to answer: If <event 1> occurs in timepoint A, does <event 2> occur in time period B-C?

查看：102 发布时间：2020/10/18 21:29:49 sql datetime count google-bigquery window-functions

本文介绍了SQL查询回答：如果<事件1> <事件2>发生在时间点A。在B-C时间段内发生？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在查询一个大型数据集，以确定在不同时间点内的一系列活动事件（即事件1,2，..）是否导致用户在活动后的三天内活动（活动，不活动）每个事件（但不能与活动事件本身在同一天）。

I'm querying a large data set to figure out if a bunch of campaign events (i.e. event 1,2,..) during different timepoints gives a result in user activity (active, inactive) during the following 3 days after each event (but not in the same day as the campaign event itself).

我正在合并两个表来执行此操作，并且它们看起来像这样合并：

I'm merging two tables to do this, and they look like this merged:

| date       | user | events | day_activity  | 
| 2020-01-01 | 1    | event1 | active        | 
| 2020-01-01 | 2    | event1 | inactive      | 
| 2020-01-02 | 1    | null   | inactive      |  
| 2020-01-02 | 2    | null   | active        | 
| 2020-01-03 | 1    | null   | inactive      | 
| 2020-01-03 | 2    | null   | active        | 
| 2020-01-04 | 1    | null   | active        | 
| 2020-01-04 | 2    | null   | active        |

我想要实现的是，对于发生事件的每个用户/日期/事件帮派（=行），添加另一列称为 3_day_activity 的列，该列不包含该事件（=当前行）当天的活动，而仅包含以下3天的活动（每个活动日的得分为1）。该表的第一天的管理示例（我为用户1的添加列中的活动天数添加*，为用户2的列中计数的事件数＃））：

What I am trying to achieve is, for each user/date/event gang (= row) where an event occured, to add another column called 3_day_activity, containing the activity not on the event (= current row) day but the following 3 days only (giving a score of 1 per active day). An example for how the 1st day of this table would look after (I add * in the activity days counted in the added column for user 1, and # for the events counted in the column for user 2)):

| date       | user | events | day_activity  | 3_day_activity
| 2020-01-01 | 1    | event1 | active        | 1
| 2020-01-01 | 2    | event1 | inactive      | 3
| 2020-01-02 | 1    | null   | inactive * (0)| null (bco no event)
| 2020-01-02 | 2    | null   | active # (1)  | null (bco no event)
| 2020-01-03 | 1    | null   | inactive * (0)| null (bco no event)
| 2020-01-03 | 2    | null   | active # (1)  | null (bco no event)
| 2020-01-04 | 1    | null   | active * (1)  | null (bco no event)
| 2020-01-04 | 2    | null   | active # (1)  | null (bco no event)

我尝试使用窗口函数解决此问题。它可以运行，但是我认为我对如何设计它误解了一些重要的想法，因为结果包含大量重复...

I tried solving this with a window function. It runs, but I think I misunderstood some important idea on how to design it, because the result contains a ton of repetitions...

  cm.date, 
  cm.user,
  event,
  day_activity,
  COUNTIF(active_today = 'active') OVER 3d_later AS 3_day_activity
FROM `customer_message` cm
INNER JOIN `customer_day` ud
  ON cm.user = ud.user
  AND cm.date = ud.date
WHERE 
  cm.date > '2019-12-25'
WINDOW 3d_later AS (PARTITION BY user ORDER BY UNIX_DATE(cm.date) RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING)

编辑：

我被要求提供一个示例，说明这种重复的外观。如果添加 ORDER BY 3_day_activity，这就是我看到的内容查询末尾的子句：

I was asked to supply an example of how this repetition might look. Here's what I see if I add an "ORDER BY 3_day_activity" clause at the end of the query:

Row    date         user  day_activity  3_day_activity  
1      2020-01-01   2     active        243
2      2020-01-01   2     active        243
3      2020-01-01   2     active        243
4      2020-01-01   2     active        243
5      2020-01-01   2     active        243
6      2020-01-01   2     active        243
7      2020-01-02   2     active        243
8      2020-01-02   2     active        243

EDIT2：

EDIT2 :

这仍然没有解决。.根据一位评论者，但即使问题不相同，我也无法这样做（我想是由于与此相似）。我已经根据用户和日期对分组进行了测试，但是由于没有在 COUNTIF子句中进行汇总，因此我抛出了错误。

This remains unsolved.. I have tried asking another question, as per the suggestion of one commenter, but I am locked from doing so even if the problem is not identical (I suppose due to the similarities to this one). I have tested grouping based on user and date, but I then it instead throws an error due to not aggregating in the 'COUNTIF' clause.

这是上述尝试； SQL：计算时出错，要求进行汇总，分组和窗口化

推荐答案

以下示例适用于BigQuery标准SQL

Below example is for BigQuery Standard SQL

#standardSQL
SELECT *, IF(events IS NULL, 0, COUNTIF(day_activity = 'active') OVER(three_day_activity_window)) AS three_day_activity
FROM `project.dataset.table`
WINDOW three_day_activity_window AS (
  PARTITION BY user 
  ORDER BY UNIX_DATE(date) 
  RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING
)

您可以使用问题中的示例数据来测试，玩游戏，如下例所示

You can test, play with above using sample data from your question as in below example

#standardSQL
WITH `project.dataset.table` AS (
  SELECT DATE '2020-01-01' date , 1 user, 'event1' events, 'active' day_activity UNION ALL
  SELECT '2020-01-01', 2, 'event1', 'inactive' UNION ALL
  SELECT '2020-01-02', 1, NULL, 'inactive' UNION ALL
  SELECT '2020-01-02', 2, NULL, 'active' UNION ALL
  SELECT '2020-01-03', 1, NULL, 'inactive' UNION ALL
  SELECT '2020-01-03', 2, NULL, 'active' UNION ALL
  SELECT '2020-01-04', 1, NULL, 'active' UNION ALL
  SELECT '2020-01-04', 2, NULL, 'active' 
)
SELECT *, IF(events IS NULL, 0, COUNTIF(day_activity = 'active') OVER(three_day_activity_window)) AS three_day_activity
FROM `project.dataset.table`
WINDOW three_day_activity_window AS (
  PARTITION BY user 
  ORDER BY UNIX_DATE(date) 
  RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING
)
ORDER BY date, user

具有输出

Row date        user    events  day_activity    three_day_activity   
1   2020-01-01  1       event1  active          1    
2   2020-01-01  2       event1  inactive        3    
3   2020-01-02  1       null    inactive        0    
4   2020-01-02  2       null    active          0    
5   2020-01-03  1       null    inactive        0    
6   2020-01-03  2       null    active          0    
7   2020-01-04  1       null    active          0    
8   2020-01-04  2       null    active          0

是否更新-避免在同一天内多次注册同一位用户为活跃用户（并计算出一笔不菲的金额）？

Update for - to avoid registering the same user as active multiple times in one day (and tallying those up to a huge sum)?

如果您希望避免计算同一天用户的所有活动-使用以下调整后的版本（请注意示例数据中的额外条目以介绍用户同一天的多项活动）

If you want to avoid counting all activity for user on same day - use below adjusted version (note extra entry in sample data to introduce user's multiple activity on same day)

#standardSQL
WITH `project.dataset.table` AS (
  SELECT DATE '2020-01-01' DATE , 1 user, 'event1' events, 'active' day_activity UNION ALL
  SELECT '2020-01-01', 2, 'event1', 'inactive' UNION ALL
  SELECT '2020-01-02', 1, NULL, 'inactive' UNION ALL
  SELECT '2020-01-02', 2, NULL, 'active' UNION ALL
  SELECT '2020-01-03', 1, NULL, 'inactive' UNION ALL
  SELECT '2020-01-03', 2, NULL, 'active' UNION ALL
  SELECT '2020-01-04', 1, NULL, 'active' UNION ALL
  SELECT '2020-01-04', 1, NULL, 'active' UNION ALL
  SELECT '2020-01-04', 2, NULL, 'active' 
)
SELECT *, 
  IF(events IS NULL, 0, COUNTIF(day_activity = 'active') OVER(three_day_activity_window)) AS three_day_activity
FROM (
  SELECT date, user, MAX(events) events, MIN(day_activity) day_activity
  FROM `project.dataset.table` 
  GROUP BY date, user
)
WINDOW three_day_activity_window AS (
  PARTITION BY user 
  ORDER BY UNIX_DATE(date) 
  RANGE BETWEEN 1 FOLLOWING AND 3 FOLLOWING
)
ORDER BY date, user

这篇关于SQL查询回答：如果<事件1> <事件2>发生在时间点A。在B-C时间段内发生？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

SQL查询回答：如果<事件1> <事件2>发生在时间点A。在B-C时间段内发生？ [英] SQL query to answer: If <event 1> occurs in timepoint A, does <event 2> occur in time period B-C?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

SQL查询回答：如果&lt;事件1&gt; &lt;事件2&gt;发生在时间点A。在B-C时间段内发生？ [英] SQL query to answer: If &lt;event 1&gt; occurs in timepoint A, does &lt;event 2&gt; occur in time period B-C?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

SQL查询回答：如果<事件1> <事件2>发生在时间点A。在B-C时间段内发生？ [英] SQL query to answer: If <event 1> occurs in timepoint A, does <event 2> occur in time period B-C?

登录关闭