了解自我连接和扁平化 [英] Understanding self joins and flattening

查看:132
本文介绍了了解自我连接和扁平化的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将从我是新手的事实开始,并且设法一起破解这个原始查询。我已经浏览了很多例子,但我只是没有把自己的头围绕自我连接并显示我想看到的数据。



我正在给BQ喂食手机应用程序数据每天,因此正在查询多个表。我试图通过日期查询IMEI发生致命事故的计数。这个查询的确给了我大部分我想要的输出,因为它返回了Date,IMEI和Count。



但是,我希望输出是Date,IMEI,Branch,Truck和伯爵。 user_dim.user_properties.key是一个嵌套字段,在我的查询中,我特别要求user_dim.user_properties.key ='imei_id',并在user_dim.user_properties.value.value.string_value中获取它的值。



我不明白我将如何执行连接以获取user_dim.user_properties.key ='truck_id'和user_dim.user_properties.key ='的值branch_id'并最终让我的输出成为:Date,IMEI,Branch,Truck和Count在一行中。



感谢您的帮助。

  SELECT 
event_dim .date AS Date,
user_dim.user_properties.value.value.string_value AS IMEI,
COUNT(*)AS计数
FROM
FLATTEN((
SELECT

FROM
TABLE_QUERY([smarttruck-6d137:com_usiinc_android_ANDROID],'table_id CONTAINSapp_events_')),user_dim.user_properties)
WHERE
user_dim.user_properties.key ='imei_id'
AND event_dim.name ='app_exception'
AND event_dim.params.key ='fatal'
AND event_dim.params.value.int_value = 1
AND event_dim .date ='20170807'
GROUP BY
日期,
IMEI
ORDER BY
计数DESC


解决方案

这是一个适合您的查询,使用标准SQL

 # standardSQL 
SEL ECT
event_dim.date AS日期,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key ='imei_id')AS IMEI,
(SELECT value.value.string_value
FROM UNNEST(user_dim.user_properties)
WHERE key ='branch_id')AS branch_id,
(SELECT value.value.string_value
FROM UNNEST (user_dim.user_properties)
WHERE key ='truck_id')AS truck_id,
COUNT(*)AS计数
从`smarttruck-6d137.com_usiinc_android_ANDROID.app_events_ *`
CROSS JOIN UNNEST(event_dim)AS event_dim
WHERE
event_dim.name ='app_exception'AND
EXISTS(
SELECT 1 FROM UNNEST(event_dim.params)
WHERE key ='致命'AND value.int_value = 1
)AND
event_dim.date ='20170807'
GROUP BY
日期,
IMEI,
branch_id,
truck_id
ORDER BY
计数DESC;

有一些想法/建议,但:


  • 要限制扫描的数据量,您可能需要过滤 _TABLE_SUFFIX ='20170807'而不是 event_dim.date ='20170807'。这将更便宜,并且(如果我理解正确)将返回相同的结果。

  • 如果IMEI,branch_id和truck_id的组合是唯一的,则计算可能没有好处您可以删除 COUNT(*)以及 GROUP BY / ORDER BY 子句。


I'll start with the fact that I'm a newbie and I managed to hack this original query together. I've looked over many examples but I'm just not wrapping my head around self joins and displaying the data I want to see.

I'm feeding BQ with mobile app data daily and thus am querying multiple tables. I'm trying to query for a count of fatal crashes by IMEI by date. This query does give me most of the output I want as it returns Date, IMEI and Count.

However, I want the output to be Date, IMEI, Branch, Truck and Count. user_dim.user_properties.key is a nested field and in my query I'm specifically asking for user_dim.user_properties.key = 'imei_id' and getting it's value in user_dim.user_properties.value.value.string_value.

I don't understand how I would perform the join to also get back the values where user_dim.user_properties.key = 'truck_id' and user_dim.user_properties.key = 'branch_id' and ultimately getting my output to be: Date, IMEI, Branch, Truck and Count in one row.

Thanks for your help.

SELECT
  event_dim.date AS Date,
  user_dim.user_properties.value.value.string_value AS IMEI,
COUNT(*) AS Count
FROM
    FLATTEN( (
    SELECT
      *
    FROM
  TABLE_QUERY([smarttruck-6d137:com_usiinc_android_ANDROID],'table_id CONTAINS "app_events_"')), user_dim.user_properties)
WHERE
  user_dim.user_properties.key = 'imei_id'
  AND event_dim.name = 'app_exception'
  AND event_dim.params.key = 'fatal'
  AND event_dim.params.value.int_value = 1
  AND event_dim.date = '20170807'
GROUP BY
  Date,
  IMEI
ORDER BY
  Count DESC

解决方案

Here is a query that should work for you, using standard SQL:

#standardSQL
SELECT
  event_dim.date AS Date,
  (SELECT value.value.string_value
   FROM UNNEST(user_dim.user_properties)
   WHERE key = 'imei_id') AS IMEI,
  (SELECT value.value.string_value
   FROM UNNEST(user_dim.user_properties)
   WHERE key = 'branch_id') AS branch_id,
  (SELECT value.value.string_value
   FROM UNNEST(user_dim.user_properties)
   WHERE key = 'truck_id') AS truck_id,
  COUNT(*) AS Count
FROM `smarttruck-6d137.com_usiinc_android_ANDROID.app_events_*`
CROSS JOIN UNNEST(event_dim) AS event_dim
WHERE
  event_dim.name = 'app_exception' AND
  EXISTS (
    SELECT 1 FROM UNNEST(event_dim.params)
    WHERE key = 'fatal' AND value.int_value = 1
  ) AND
  event_dim.date = '20170807'
GROUP BY
  Date,
  IMEI,
  branch_id,
  truck_id
ORDER BY
  Count DESC;

A couple of thoughts/suggestions, though:

  • To restrict how much data you scan, you probably want to filter on _TABLE_SUFFIX = '20170807' instead of event_dim.date = '20170807'. This will be cheaper and (if I understand correctly) will return the same results.
  • If the combinations of IMEI, branch_id, and truck_id are unique, there probably isn't a benefit to computing the count, so you can remove the COUNT(*) and also the GROUP BY/ORDER BY clauses.

这篇关于了解自我连接和扁平化的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆