BigQuery连接了三个表 [英] BigQuery join of three tables
问题描述
我试图在BigQuery中加入三张表格;表1具有一个事件的记录(即每行是一条记录),表2具有第二个事件的记录,表3具有类别名称
我想根据类别和设备平台生成一个表格1和表格2的最终表格。但是,每次运行时,都会收到一个错误消息,指出 joined.t3.category不是联接中的任何表的字段。
以下是我当前的代码:
Select count(distinct joined.t1.Id)as t1_events,count(distinct t2.Id) as t2_events,joined.t1.Origin as platform,joined.t3.category as category
$ b from
(
SELECT
Id ,
Origin,
CatId
FROM [testing.table_1] as t1
JOIN(SELECT类别,
类别ID
FROM [testing.table_3])as t3
on t1.CatId = t3.CategoryID
)AS加入
JOIN( SELECT b,
CategoryId
FROM [testing.table_2])as t2
ON(joined.t1.CatId = t2.CategoryId)
按平台,类别分组;
作为参考,下面是表1和表2之间的简单连接,它们完美地工作:
Select count(distinct t1.Id)as t1_event,count(distinct t2.Id)as t2_events,t1.Origin as platform
$来自testing.table_1的b $ b作为t1
在t1.CatId = t2.CategoryId
上JOIN testing.table_2作为t2
按平台分组;
简单的解决方法是添加 category
字段在第一个内部 SELECT
中 - 否则对最外层 SELECT
不可见 - - 因此错误!这是问题!
另外,在BigQuery Legacy SQL中,您可以使用 EXACT_COUNT_DISTINCT ,否则您会得到统计学近似值 - 请参阅 COUNT([DISTINCT ])
因此,对于Legacy SQL,您的查询可能如下所示:
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id)AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id)AS t2_events,
joined.t1.Origin AS平台
joined.t3.category AS类别
FROM(
SELECT
Id,Origin,CatId,类别
FROM [testing.table_1] AS t1
JOIN(SELECT类别,CategoryID FROM [testing.table_3])AS t3
ON t1.CatId = t3.CategoryID
)AS加入
JOIN(SELECT Id,CategoryId FROM [testing.table_2])AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY平台类别
此外,我觉得你可以进一步简化它(假设没有不明确的字段)
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id)AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id)AS t2_events,
joined.t1.Origin AS平台,
joined.t3.category AS类别
FROM(
SELECT
Id,Origin,CatId,类别
FROM [testing.table_1] AS t1
JOIN [testing.table_3] AS t3
ON t1.CatId = t3.CategoryID
)AS加入
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY平台,类别
当然,您需要做同样的事情如果您使用标准SQL版本(如Elliott所建议的那样:
SELECT
COUNT(DISTINCT joined。 t1.Id)AS t1_events,
COUNT(DISTINCT t2.Id)AS t2_events,
joined.t1.Origin AS平台,
joined.t3.category AS类别
FROM(
SELEC T
Id,Origin,CatId,类别
FROM`testing.table_1`作为t1
JOIN`testing.table_3`作为t3
ON t1.CatId = t3.CategoryID
)AS加入
JOIN`testing.table_2`作为t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY平台,类别
I am trying to join three tables in BigQuery; table 1 has records of one event (i.e. each row is one record), table 2 has records of a second event, and table 3 has category names.
I want to produce a final table that has counts for table 1 and table 2 by category and device platform. However, every time I run this I get an error that says joined.t3.category is not a field of either table in the join.
Here's my current code:
Select count(distinct joined.t1.Id) as t1_events, count(distinct t2.Id) as t2_events, joined.t1.Origin as platform, joined.t3.category as category
from
(
SELECT
Id,
Origin,
CatId
FROM [testing.table_1] as t1
JOIN (SELECT category,
CategoryID
FROM [testing.table_3]) as t3
on t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id,
CategoryId
FROM [testing.table_2]) as t2
ON (joined.t1.CatId = t2.CategoryId)
Group by platform,category;
For reference, here's a simpler join between tables 1 and 2 that works perfectly:
Select count(distinct t1.Id) as t1_event, count(distinct t2.Id) as t2_events, t1.Origin as platform
from testing.table_1 as t1
JOIN testing.table_2 as t2
on t1.CatId = t2.CategoryId
Group by platform;
The simple fix is to add category
field in first inner SELECT
- otherwise it is not visible to outermost SELECT
- thus the error! That was the issue!
Also, in BigQuery Legacy SQL you can use EXACT_COUNT_DISTINCT otherwise you get statistical approximation - see more in COUNT([DISTINCT])
So, for Legacy SQL your query can look like:
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN (SELECT category, CategoryID FROM [testing.table_3]) AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN (SELECT Id, CategoryId FROM [testing.table_2]) AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
Moreover, I feel like you can simplify it further (assuming there will be no ambiguous fields)
SELECT
EXACT_COUNT_DISTINCT(joined.t1.Id) AS t1_events,
EXACT_COUNT_DISTINCT(t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM [testing.table_1] AS t1
JOIN [testing.table_3] AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN [testing.table_2] AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
Of course you will need to do same fix if you will use Standard SQL version of it (as Elliott has suggested:
SELECT
COUNT(DISTINCT joined.t1.Id) AS t1_events,
COUNT(DISTINCT t2.Id) AS t2_events,
joined.t1.Origin AS platform,
joined.t3.category AS category
FROM (
SELECT
Id, Origin, CatId, category
FROM `testing.table_1` AS t1
JOIN `testing.table_3` AS t3
ON t1.CatId = t3.CategoryID
) AS joined
JOIN `testing.table_2` AS t2
ON joined.t1.CatId = t2.CategoryId
GROUP BY platform, category
这篇关于BigQuery连接了三个表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!