在BigQuery SQL中将表格分割成多个表格 [英] Split a table into multiple tables in BigQuery SQL
问题描述
我想在Google BigQuery中将具有1.2亿行的表拆分为多个大小相等的表.我找到了这个 answer 的问题,该问题询问有关获取表的行号的问题.示例解决方案是:
I would like to split a table with 120 million rows into multiple equally-sized tables in Google BigQuery. I found this answer to a question that asks about getting row numbers for a table. The example solution was:
SELECT
ROW_NUMBER() OVER() row_number,
contributor_username,
FROM (
SELECT contributor_username,
FROM [publicdata:samples.wikipedia]
GROUP BY contributor_username
)
这将允许您获取row_number,然后可以通过手动选择来创建表
which would allow you to get a row_number, and then you could create a table by manually selecting
WHERE row_number BETWEEN x AND y
这适用于示例中的表,但是对于具有117MM行的表,使用 ROW_NUMBER()
会产生资源超出错误.
This works for the table in the example, but using ROW_NUMBER()
yields a resources exceeded error for a table with 117MM rows.
然后我尝试了以下想法:
I then tried the following idea:
SELECT field1, field2
FROM (
SELECT field1, field2, ntile(100) over (order by row_id ) as ntile
FROM (
SELECT hash(some_unique_key) as row_id, field1, field2
FROM table_with_177_MM_rows
)
)
WHERE ntile = 1
然后我需要将其运行为ntile = [1,...,100].这也失败了.有什么方法可以将GBQ中具有唯一键的大表拆分成较小的表?
I would then need to run this for ntile=[1,...,100]. This also failed. Is there any way to split a large table with a unique key in GBQ into smaller tables?
推荐答案
类似于上面的第二个示例,但不使用ntile窗口函数:
Similar to my second example above, but without using the ntile window function:
SELECT field1, field2
FROM (
SELECT hash(some_unique_key) as row_id, field1, field2
FROM table_with_177_MM_rows
)
WHERE abs(row_id) % {n_tables} = {table_id}
对于 n_tables = 10
,您将运行 table_id = {0,1,2,... 9}
来获取每个表.也许有更好的方法?
For n_tables = 10
, you would run for table_id={0,1,2,...9}
to get each table. Maybe there's a better way?
这篇关于在BigQuery SQL中将表格分割成多个表格的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!