在Google BigQuery中加入2个同样大小的表格 [英] Joining 2 equally sized tables in Google BigQuery

查看:94
本文介绍了在Google BigQuery中加入2个同样大小的表格的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图加入2个表格,每个表格有57,191行。 BigQ在内部/左侧寻找更大的桌子,右侧寻找更小的桌子。当我在左边的表B中运行它时,它的错误是'大表A必须首先出现'。当我切换查询并将表A置于From子句中时,它的错误为'大表B必须首先出现'。所以当我按照它的指示去做时,它并没有解决它,但却暗示了我的第一个(不正确的)尝试,除非我以某种方式来修补它。

有点讽刺的是,如果2个表的大小相同,那么它决定一个更大,据推测,其中一个不小于另一个。我试图找到一个解决方案,其中不包括我在表中添加无意义的行,然后在连接工作后尝试删除它(因为BigQ现在不加载我的单行csv文件,我相信它是)



Google SQL语法连接规则似乎是 $ b join_type
Bigquery支持INNER(缺省值)和LEFT OUTER连接
table_2
这是连接中的第二个表,它必须很小,并且会连接到出现在FROM子句中的表中。这可以是一个表名或另一个SELECT子句,在这种情况下,你必须提供一个别名
join_condition_1,...,join_condition_N,...
一组连接条件,必须是(这就是说,我们只支持将这些条件与AND联系在一起)。



我正在运行的实际SQL是

  SELECT lt.activeprosperloans,[fieldsredacted],... 
FROM prosperloans1.listings2 AS lt
JOIN prosperloans1.zjoinedperfloans as ln
ON lt.key = listingkey;

和实际错误读取:
错误:大表prosperloans1.zjoinedperfloans必须显示为最后一个表中的连接查询



感谢
Shawn

解决方案

由于这个问题得到了回答,BigQuery添加了JOIN EACH,这是一种连接两个大表的方法。



这个回应的其余部分是历史性的:
一个大表(用于加入的目的) )超过7 MB。为了完成一个连接,整个小表被发送到集群中的每个节点,所以我们对它进行了相当大的限制。尽管两者的行数相同,但一个表大于7 MB,而另一个表更小。



缩小一个表的大小的表格是在查询中应用过滤器和列过滤器,并将结果另存为另一个临时表,然后将JOIN应用于临时表。例如。如果表中有10列的数据跨越了一个月的数据,但只需要3列用于连接查询和最后一天的数据,则可以先选择三列和最近的数据,然后给结果命名。然后你可以对该表进行连接。


I am trying to join 2 tables, each with 57,191 rows. BigQ is looking for a larger table on the inner/left and smaller on the right. When I run it with Table B on the left, it errors as 'The large table A must appear first'. When I switch the query and put Table A in the From clause, it errors as 'The large table B must appear first'. So when I do as it instructs, it does not fix it but suggests my first (incorrect) attempt, unless I am somehow botching it.

It is a bit ironic that if the 2 tables are the same size it decides one is larger based on, presumably, that one is not smaller than the other. I am trying to find a solution which does not include me adding a meaningless row to 1 of the tables and then trying to delete it after the join works, (since BigQ is not loading my single row csv file right now, I am sure it's due to my error.)

The Google SQL syntax join rule seems to be

"join_type Bigquery supports INNER (the default) and LEFT OUTER joins. table_2 This is the second table in the join, which must be small, and will be joined to the table appearing in the FROM clause. Note that this can be either a table name or another SELECT clause, in which case you must provide an alias. join_condition_1, ..., join_condition_N, ... The set of join conditions, which must be a collection of equality conditions, all of which must be met for a row to be included in the result. (That is, we only support connecting these conditions with AND.) "

The actual SQL I am running is

SELECT lt.activeprosperloans,[fieldsredacted], ...
FROM prosperloans1.listings2 AS lt
JOIN prosperloans1.zjoinedperfloans as ln
ON lt.key = listingkey;

and the actual error reads: Error: Large table prosperloans1.zjoinedperfloans must appear as the leftmost table in a join query

Thanks Shawn

解决方案

Since this question was answered, BigQuery added JOIN EACH, which is a way to join two large tables. See Fh's answer for instructions on how to use JOIN EACH.

The rest of this response is for historical purposes: A large table (for the purpose of join) is anything over 7 MB. In order to do a join, the entire small table is sent to every node in the cluster, so we place a pretty significant limit on it. It may be that despite both being the same number of rows, the one table is larger than 7 MB while the other is smaller.

One way to reduce the size of one of the tables is to apply filters and column filters in a query and save the result as another temporary table, then apply the JOIN to the temporary table. E.g. if you have 10 columns in a table that spans a month worth of data but you only need 3 columns for the join query and the last day's data, you can first just select the three columns and the recent data, and give the result a name. Then you can do the join against that table.

这篇关于在Google BigQuery中加入2个同样大小的表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆