了解自我连接和扁平化 [英] Understanding self joins and flattening
问题描述
我将从我是新手的事实开始,并且设法一起破解这个原始查询。我已经浏览了很多例子,但我只是没有把自己的头围绕自我连接并显示我想看到的数据。
我正在给BQ喂食手机应用程序数据每天,因此正在查询多个表。我试图通过日期查询IMEI发生致命事故的计数。这个查询的确给了我大部分我想要的输出,因为它返回了Date,IMEI和Count。
但是,我希望输出是Date,IMEI,Branch,Truck和伯爵。 user_dim.user_properties.key是一个嵌套字段,在我的查询中,我特别要求user_dim.user_properties.key ='imei_id',并在user_dim.user_properties.value.value.string_value中获取它的值。
我不明白我将如何执行连接以获取user_dim.user_properties.key ='truck_id'和user_dim.user_properties.key ='的值branch_id'并最终让我的输出成为:Date,IMEI,Branch,Truck和Count在一行中。
感谢您的帮助。
SELECT
event_dim .date AS Date,
user_dim.user_properties.value.value.string_value AS IMEI,
COUNT(*)AS计数
FROM
FLATTEN((
SELECT
FROM
TABLE_QUERY([smarttruck-6d137:com_usiinc_android_ANDROID],'table_id CONTAINSapp_events_')),user_dim.user_properties)
WHERE
user_dim.user_properties.key ='imei_id'
AND event_dim.name ='app_exception'
AND event_dim.params.key ='fatal'
AND event_dim.params.value.int_value = 1
AND event_dim .date ='20170807'
GROUP BY
日期,
IMEI
ORDER BY
计数DESC
这是一个适合您的查询,使用标准SQL :
# standardSQL
SEL ECT
event_dim.date AS日期,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key ='imei_id')AS IMEI,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key ='branch_id')AS branch_id,
(SELECT value.value.string_value
FROM UNNEST (user_dim.user_properties)
WHERE key ='truck_id')AS truck_id,
COUNT(*)AS计数
从`smarttruck-6d137.com_usiinc_android_ANDROID.app_events_ *`
CROSS JOIN UNNEST(event_dim)AS event_dim
WHERE
event_dim.name ='app_exception'AND
EXISTS(
SELECT 1 FROM UNNEST(event_dim.params)
WHERE key ='致命'AND value.int_value = 1
)AND
event_dim.date ='20170807'
GROUP BY
日期,
IMEI,
branch_id,
truck_id
ORDER BY
计数DESC;
有一些想法/建议,但:
- 要限制扫描的数据量,您可能需要过滤
_TABLE_SUFFIX ='20170807'
而不是event_dim.date ='20170807'
。这将更便宜,并且(如果我理解正确)将返回相同的结果。 - 如果IMEI,branch_id和truck_id的组合是唯一的,则计算可能没有好处您可以删除
COUNT(*)
以及GROUP BY
/ORDER BY
子句。
I'll start with the fact that I'm a newbie and I managed to hack this original query together. I've looked over many examples but I'm just not wrapping my head around self joins and displaying the data I want to see.
I'm feeding BQ with mobile app data daily and thus am querying multiple tables. I'm trying to query for a count of fatal crashes by IMEI by date. This query does give me most of the output I want as it returns Date, IMEI and Count.
However, I want the output to be Date, IMEI, Branch, Truck and Count. user_dim.user_properties.key is a nested field and in my query I'm specifically asking for user_dim.user_properties.key = 'imei_id' and getting it's value in user_dim.user_properties.value.value.string_value.
I don't understand how I would perform the join to also get back the values where user_dim.user_properties.key = 'truck_id' and user_dim.user_properties.key = 'branch_id' and ultimately getting my output to be: Date, IMEI, Branch, Truck and Count in one row.
Thanks for your help.
SELECT
event_dim.date AS Date,
user_dim.user_properties.value.value.string_value AS IMEI,
COUNT(*) AS Count
FROM
FLATTEN( (
SELECT
*
FROM
TABLE_QUERY([smarttruck-6d137:com_usiinc_android_ANDROID],'table_id CONTAINS "app_events_"')), user_dim.user_properties)
WHERE
user_dim.user_properties.key = 'imei_id'
AND event_dim.name = 'app_exception'
AND event_dim.params.key = 'fatal'
AND event_dim.params.value.int_value = 1
AND event_dim.date = '20170807'
GROUP BY
Date,
IMEI
ORDER BY
Count DESC
Here is a query that should work for you, using standard SQL:
#standardSQL
SELECT
event_dim.date AS Date,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'imei_id') AS IMEI,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'branch_id') AS branch_id,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key = 'truck_id') AS truck_id,
COUNT(*) AS Count
FROM `smarttruck-6d137.com_usiinc_android_ANDROID.app_events_*`
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE
event_dim.name = 'app_exception' AND
EXISTS (
SELECT 1 FROM UNNEST(event_dim.params)
WHERE key = 'fatal' AND value.int_value = 1
) AND
event_dim.date = '20170807'
GROUP BY
Date,
IMEI,
branch_id,
truck_id
ORDER BY
Count DESC;
A couple of thoughts/suggestions, though:
- To restrict how much data you scan, you probably want to filter on
_TABLE_SUFFIX = '20170807'
instead ofevent_dim.date = '20170807'
. This will be cheaper and (if I understand correctly) will return the same results. - If the combinations of IMEI, branch_id, and truck_id are unique, there probably isn't a benefit to computing the count, so you can remove the
COUNT(*)
and also theGROUP BY
/ORDER BY
clauses.
这篇关于了解自我连接和扁平化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!