BigQuery和Google Analytics(分析)SQL查询 [英] BigQuery and Google Analytics SQL query

查看:58
本文介绍了BigQuery和Google Analytics(分析)SQL查询的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从从Google Analytics(分析)数据导入BigQuery的表中构建矩阵.该表表示网站上的点击量,其中包含session_ID和一些属性(例如url,时间戳等).此外,还有一些基于用户定义操作的元数据,我们将其称为事件.下面是表格的示例.

I am trying to build up a matrix out of a table that is imported from Google Analytics data into BigQuery. The table represents hits on a website that contain session_IDs alongside some properties such as the url, timestamp etc. Also, there are some metadata based on user-defined actions that we refer to as events. Below is an example of the table.

session_id  hit_timestamp   url event_category
1           11:12:23        url134      event1
1           11:14:23        url2234     event2
1           11:16:23        url_target  null
2           03:12:11        url2344     event1
2           03:14:11        url43245    event2
3           09:10:11        url5533     event2
3           09:09:11        url_target  null
4           08:08:08        url64356    event2
4           08:09:08        url56456    event2
4           08:10:08        url_target  null

预期结果应该类似于下表.

The intended result should be something like the below table.

session_id  event1  event2  target
1           1       1       1
2           0       0       0
3           0       0       0
4           0       2       1

请注意,任何不会导致url_target的事件都应表示为零,包括目标.这意味着查询应调查时间戳记,以通过查看事件时间戳记来检查是否有url_target跟随任何事件.例如,event2后没有"url_target",这就是为什么我们将其表示为零.session_id 3中的情况相同,因为event2之后没有url_target,请注意url_target的时间戳记在event2之前,而不是在event2之后.因此表示为零.

Note that any event does not lead to url_target should be denoted as zeros including the target. This means the query should look into timestamp to check that any events are followed by url_target by looking into their timestamp. For example, event2 was not followed by "url_target", that is why we are denoting it as zeros. Same case in session_id 3, as event2 was not followed by url_target, note the timestamp of url_target which was before event2, not after it. Hence denoted as zeros.

在构建用于生成该矩阵的SQL查询方面的任何帮助,我将不胜感激.我只能按session_id分组,然后使用"count"执行计数事件,但无法找到写入SQL查询以与时间戳进行匹配并检查其他字段.

I would appreciate any help in constructing the SQL query to produce that matrix. I was only able to group by session_id and then perform counting events using "count", but was not able to find the write SQL query to match against timestamp and check other fields.

推荐答案

使用子查询来计算第一个(或最后一个)目标时间.然后使用 countif()和聚合:

Use a subquery to calculate the first (or last) target time. Then use countif() and aggregation:

select session_id,
       countif(target_hit_timestamp > hit_timestamp and category = 'event1') as event1,
       countif(target_hit_timestamp > hit_timestamp and category = 'event2') as event2,
       countif(url like '%target') as target
from (select t.*,
             min(case when url like '%target' then hit_timestamp end) over (partition by session_id) as target_hit_timestamp
      from t
     ) t
group by session_id

这篇关于BigQuery和Google Analytics(分析)SQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆