BigQuery和Google Analytics(分析)SQL查询 [英] BigQuery and Google Analytics SQL query
问题描述
我正在尝试从从Google Analytics(分析)数据导入BigQuery的表中构建矩阵.该表表示网站上的点击量,其中包含session_ID和一些属性(例如url,时间戳等).此外,还有一些基于用户定义操作的元数据,我们将其称为事件.下面是表格的示例.
I am trying to build up a matrix out of a table that is imported from Google Analytics data into BigQuery. The table represents hits on a website that contain session_IDs alongside some properties such as the url, timestamp etc. Also, there are some metadata based on user-defined actions that we refer to as events. Below is an example of the table.
session_id hit_timestamp url event_category
1 11:12:23 url134 event1
1 11:14:23 url2234 event2
1 11:16:23 url_target null
2 03:12:11 url2344 event1
2 03:14:11 url43245 event2
3 09:10:11 url5533 event2
3 09:09:11 url_target null
4 08:08:08 url64356 event2
4 08:09:08 url56456 event2
4 08:10:08 url_target null
预期结果应该类似于下表.
The intended result should be something like the below table.
session_id event1 event2 target
1 1 1 1
2 0 0 0
3 0 0 0
4 0 2 1
请注意,任何不会导致url_target的事件都应表示为零,包括目标.这意味着查询应调查时间戳记,以通过查看事件时间戳记来检查是否有url_target跟随任何事件.例如,event2后没有"url_target",这就是为什么我们将其表示为零.session_id 3中的情况相同,因为event2之后没有url_target,请注意url_target的时间戳记在event2之前,而不是在event2之后.因此表示为零.
Note that any event does not lead to url_target should be denoted as zeros including the target. This means the query should look into timestamp to check that any events are followed by url_target by looking into their timestamp. For example, event2 was not followed by "url_target", that is why we are denoting it as zeros. Same case in session_id 3, as event2 was not followed by url_target, note the timestamp of url_target which was before event2, not after it. Hence denoted as zeros.
在构建用于生成该矩阵的SQL查询方面的任何帮助,我将不胜感激.我只能按session_id分组,然后使用"count"执行计数事件,但无法找到写入SQL查询以与时间戳进行匹配并检查其他字段.
I would appreciate any help in constructing the SQL query to produce that matrix. I was only able to group by session_id and then perform counting events using "count", but was not able to find the write SQL query to match against timestamp and check other fields.
推荐答案
使用子查询来计算第一个(或最后一个)目标时间.然后使用 countif()
和聚合:
Use a subquery to calculate the first (or last) target time. Then use countif()
and aggregation:
select session_id,
countif(target_hit_timestamp > hit_timestamp and category = 'event1') as event1,
countif(target_hit_timestamp > hit_timestamp and category = 'event2') as event2,
countif(url like '%target') as target
from (select t.*,
min(case when url like '%target' then hit_timestamp end) over (partition by session_id) as target_hit_timestamp
from t
) t
group by session_id
这篇关于BigQuery和Google Analytics(分析)SQL查询的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!