BigQuery连接了三个表 [英] BigQuery join of three tables

查看:91
本文介绍了BigQuery连接了三个表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图在BigQuery中加入三张表格;表1具有一个事件的记录(即每行是一条记录),表2具有第二个事件的记录,表3具有类别名称



我想根据类别和设备平台生成一个表格1和表格2的最终表格。但是,每次运行时,都会收到一个错误消息,指出 joined.t3.category不是联接中的任何表的字段。



以下是我当前的代码:

  Select count(distinct joined.t1.Id)as t1_events,count(distinct t2.Id) as t2_events,joined.t1.Origin as platform,joined.t3.category as category 
$ b from



SELECT
Id ,
Origin,
CatId

FROM [testing.table_1] as t1

JOIN(SELECT类别,
类别ID

FROM [testing.table_3])as t3

on t1.CatId = t3.CategoryID

)AS加入

JOIN( SELECT b,
CategoryId

FROM [testing.table_2])as t2

ON(joined.t1.CatId = t2.CategoryId)

按平台,类别分组;

作为参考,下面是表1和表2之间的简单连接,它们完美地工作:

  Select count(distinct t1.Id)as t1_event,count(distinct t2.Id)as t2_events,t1.Origin as platform 
$来自testing.table_1的b $ b作为t1

在t1.CatId = t2.CategoryId

上JOIN testing.table_2作为t2

按平台分组;


解决方案

简单的解决方法是添加 category 字段在第一个内部 SELECT 中 - 否则对最外层 SELECT 不可见 - - 因此错误!这是问题!

另外,在BigQuery Legacy SQL中,您可以使用 EXACT_COUNT_DISTINCT ,否则您会得到统计学近似值 - 请参阅 COUNT([DISTINCT ])



因此,对于Legacy SQL,您的查询可能如下所示:

  SELECT 
EXACT_COUNT_DISTINCT(joined.t1.Id)AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id)AS t2_events,
joined.t1.Origin AS平台
joined.t3.category AS类别
FROM(
SELECT
Id,Origin,CatId,类别
FROM [testing.table_1] AS t1
JOIN(SELECT类别,CategoryID FROM [testing.table_3])AS t3
ON t1.CatId = t3.CategoryID
)AS加入
JOIN(SELECT Id,CategoryId FROM [testing.table_2])AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY平台类别



此外,我觉得你可以进一步简化它(假设没有不明确的字段)

  SELECT 
EXACT_COUNT_DISTINCT(joined.t1.Id)AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id)AS t2_events,
joined.t1.Origin AS平台,
joined.t3.category AS类别
FROM(
SELECT
Id,Origin,CatId,类别
FROM [testing.table_1] AS t1
JOIN [testing.table_3] AS t3
ON t1.CatId = t3.CategoryID
)AS加入
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY平台,类别

当然,您需要做同样的事情如果您使用标准SQL版本(如Elliott所建议的那样:

  SELECT 
COUNT(DISTINCT joined。 t1.Id)AS t1_events,
COUNT(DISTINCT t2.Id)AS t2_events,
joined.t1.Origin AS平台,
joined.t3.category AS类别
FROM(
SELEC T
Id,Origin,CatId,类别
FROM`testing.table_1`作为t1
JOIN`testing.table_3`作为t3
ON t1.CatId = t3.CategoryID
)AS加入
JOIN`testing.table_2`作为t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY平台,类别


I am trying to join three tables in BigQuery; table 1 has records of one event (i.e. each row is one record), table 2 has records of a second event, and table 3 has category names.

I want to produce a final table that has counts for table 1 and table 2 by category and device platform. However, every time I run this I get an error that says joined.t3.category is not a field of either table in the join.

Here's my current code:

Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category

from 

(

SELECT 
        Id,
        Origin,
        CatId

    FROM [testing.table_1] as t1

JOIN (SELECT category,
            CategoryID

FROM [testing.table_3]) as t3

on t1.CatId = t3.CategoryID

) AS joined

JOIN (SELECT Id,
            CategoryId

FROM [testing.table_2]) as t2

ON (joined.t1.CatId = t2.CategoryId)    

Group by platform,category;

For reference, here's a simpler join between tables 1 and 2 that works perfectly:

Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform

from testing.table_1 as t1

JOIN testing.table_2 as t2

on t1.CatId = t2.CategoryId

Group by platform;

解决方案

The simple fix is to add category field in first inner SELECT - otherwise it is not visible to outermost SELECT - thus the error! That was the issue!

Also, in BigQuery Legacy SQL you can use EXACT_COUNT_DISTINCT otherwise you get statistical approximation - see more in COUNT([DISTINCT])

So, for Legacy SQL your query can look like:

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

Moreover, I feel like you can simplify it further (assuming there will be no ambiguous fields)

SELECT
  EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
  EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT
    Id, Origin, CatId, category
  FROM [testing.table_1] AS t1
  JOIN [testing.table_3] AS t3
  ON t1.CatId = t3.CategoryID 
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category

Of course you will need to do same fix if you will use Standard SQL version of it (as Elliott has suggested:

SELECT
  COUNT(DISTINCT joined.t1.Id) AS t1_events,
  COUNT(DISTINCT t2.Id) AS t2_events,
  joined.t1.Origin AS platform,
  joined.t3.category AS category
FROM (
  SELECT 
    Id, Origin, CatId, category
  FROM `testing.table_1` AS t1
  JOIN `testing.table_3` AS t3
  ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category 

这篇关于BigQuery连接了三个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆