BigQuery:联合两个基于联合Google电子表格的不同表格 [英] BigQuery: Union two different tables which are based on federated Google Spreadsheet

查看:176
本文介绍了BigQuery:联合两个基于联合Google电子表格的不同表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个不同的谷歌电子表格:



一个有4列

  + ------ + ------ + ------ + ------ + 
| Col1 | Col2 | Col5 | Col6 |
+ ------ + ------ + ------ + ------ +
| ID1 | A | B | C |
| ID2 | D | E | F |
+ ------ + ------ + ------ + ------ +

其中之一是前一个文件的4列,还有2列

  + ------ + ------ + ------ + ------ + ------ + ------ + 
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+ ------ + ------ + ------ + ------ + ------ + ------ +
| ID3 | G | H | J | K | L |
| ID4 | M | N | O | P | Q |
+ ------ + ------ + ------ + ------ + ------ + ------ +

我将它们配置为Google BigQuery中的联合源代码,现在我需要创建一个视图来连接两个表的数据。

这两个表都有 Col1 列,其中包含一个ID,这个ID在所有表中都是唯一的,不包含复制的数据。



我在寻找的结果表如下:

  + ------ + ------ + ------ + ------ + ------ + ------ + 
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+ ------ + ------ + ------ + ------ + ------ + ------ +
| ID1 | A | NULL | NULL | B | C |
| ID2 | D | NULL | NULL | E | F |
| ID3 | G | H | J | K | L |
| ID4 | M | N | O | P | Q |
+ ------ + ------ + ------ + ------ + ------ + ------ +

对于第一个文件没有的列,我期待 NULL value。



我使用的是standardSQL,下面是一条可用于生成示例数据的语句:

  #standardsQL 

WITH table1 AS(
选择A作为Col1,B作为Col2,C AS Col3
UNION ALL
选择D作为Col1,E作为Col2,F作为Col3
),

table2 AS(
选择G作为Col1,H作为Col2,J作为Col3,K作为Col4,L作为Col5
联合全部
选择M作为Col1,N as Col2,OAS Col3,PAS Col4,QAS Col5

简单的 UNION ALL 不起作用,因为表有不同的列

  SELECT * FROM table1 
UNION ALL
SELECT * FROM table2

错误:UNION ALL中的查询有不匹配的列数;查询1有3列,查询2有5列[17:1]

通配符运算符是不是合适的方式,因为联邦资源不支持该功能。

  SELECT * FROM`table *`

错误:无法通过前缀查询外部表

当然,这是一个示例数据, 5列,真正的表格有20-40列。因此,我需要显式 SELECT 字段的一个示例,这不是一个可观的方法。



有没有工作方式来加入这两个表?

解决方案





$ b

  #standardsQL 
SELECT *,NULL AS COL5,NULL AS COL6 FROM table1
UNION ALL
SELECT * FROM table2

哟可以用你的例子来检查这个
$ b

  #standardsQL 
WITH table1 AS(
SELECTID1AS COL1,AAS COL2,BAS COL3,CAS COL4
UNION ALL
SELECTID2, D,E,F
),
表2 AS(
SELECTID3Col1,GAS Col2,HAS Col3,JAS Col4 ,KAS Col5,LAS Col6
UNION ALL
SELECTID4,M,N,O,P,Q

SELECT *,NULL AS COL5,NULL AS COL6 FROM table1
UNION ALL
SELECT * FROM table2


I have two different Google Spreadsheet:

One with 4 columns

+------+------+------+------+
| Col1 | Col2 | Col5 | Col6 |
+------+------+------+------+
| ID1  | A    | B    | C    |
| ID2  | D    | E    | F    |
+------+------+------+------+

One with the 4 columns of the previous file, and 2 more columns

+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID3  | G    | H    | J    | K    | L    |
| ID4  | M    | N    | O    | P    | Q    |
+------+------+------+------+------+------+

I configured them as Federated source in Google BigQuery, now I need to create a view that will join data of both tables.

Both tables have Col1 column, which contains an ID, this ID is unique across alla the tables, does not contain replicated data.

The resulting table I'm looking for is the following one:

+------+------+------+------+------+------+
| Col1 | Col2 | Col3 | Col4 | Col5 | Col6 |
+------+------+------+------+------+------+
| ID1  | A    | NULL | NULL | B    | C    |
| ID2  | D    | NULL | NULL | E    | F    |
| ID3  | G    | H    | J    | K    | L    |
| ID4  | M    | N    | O    | P    | Q    |
+------+------+------+------+------+------+

For the columns that the first file does not have, I'm expecting a NULL value.

I'm using standardSQL, here is a statement you can use to generate a sample data:

#standardsQL

WITH table1 AS (
  SELECT "A" as Col1, "B" as Col2, "C" AS Col3
  UNION ALL
  SELECT "D" as Col1, "E" as Col2, "F" AS Col3
),

table2 AS (
  SELECT "G" as Col1, "H" as Col2, "J" AS Col3, "K" AS Col4, "L" AS Col5
  UNION ALL
  SELECT "M" as Col1, "N" as Col2, "O" AS Col3, "P" AS Col4, "Q" AS Col5
)

A simple UNION ALL is not working because tables have different columns

SELECT * FROM table1
UNION ALL
SELECT * FROM table2

Error: Queries in UNION ALL have mismatched column count; query 1 has 3 columns, query 2 has 5 columns at [17:1]

And wildcard operator is not a suitable way because Federated sources does not support that

SELECT * FROM `table*`

Error: External tables cannot be queried through prefix

Of course this is a sample data, with only 3-5 columns, the real tables have 20-40 columns. So an example where I need to explicitly SELECT field by field it is not a considerable way.

Is there a working way to join this two tables?

解决方案

Is there a working way to join this two tables?

#standardsQL
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2  

Yo can check this using your example

#standardsQL
WITH table1 AS (
  SELECT "ID1" AS Col1, "A" AS Col2, "B" AS Col3, "C" AS Col4 
  UNION ALL
  SELECT "ID2", "D", "E", "F"
),
table2 AS (
  SELECT "ID3" Col1, "G" AS Col2, "H" AS Col3, "J" AS Col4, "K" AS Col5, "L" AS Col6 
  UNION ALL
  SELECT "ID4", "M", "N", "O", "P", "Q" 
)
SELECT *, NULL AS Col5, NULL AS Col6 FROM table1
UNION ALL
SELECT * FROM table2

这篇关于BigQuery:联合两个基于联合Google电子表格的不同表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆