bigquery输出中的group_concat / string_agg的最大限制是多少? [英] What is the max limit of group_concat/string_agg in bigquery output?
问题描述
我正在使用 group_concat / string_agg (可能是varchar),并且希望确保bigquery不会删除任何数据连接。
如果特定查询的内存不足,BigQuery不会丢弃数据;你会得到一个错误。你应该尽量保持你的行大小低于100MB,因为除此之外你会开始犯错。您可以尝试使用如下示例创建大型字符串:
#standardSQL
SELECT STRING_AGG(单词)AS单词FROM`bigquery-public-data.samples.shakespeare`;
此表中有164,656行,此查询创建一个字符串,其中包含1,168,286个字符(大约一兆字节在尺寸方面)。不过,如果您在单个执行节点上运行的查询需要的数量超过几百兆,那么您将开始看到一个错误:
<$ p $ (CONCAT(word,corpus))作为单词
从`bigquery-public-data.samples.shakespeare`
CROSS JOIN UNNEST( GENERATE_ARRAY(1,1000));
这会导致错误:
查询执行过程中超出资源。
如果您单击UI中的解释选项卡,可以看到失败发生在阶段1,同时构建 STRING_AGG
的结果。在这种情况下,字符串的长度应该是3,303,599,000个字符,或者大小约为3.3 GB。
I am using group_concat/string_agg (possibly varchar) and want to ensure that bigquery won't drop any of the data concatenated.
BigQuery will not drop data if a particular query runs out of memory; you will get an error instead. You should try to keep your row sizes below ~100MB, since beyond that you'll start getting errors. You can try creating a large string with an example like this:
#standardSQL
SELECT STRING_AGG(word) AS words FROM `bigquery-public-data.samples.shakespeare`;
There are 164,656 rows in this table, and this query creates a string with 1,168,286 characters (around a megabyte in size). You'll start to see an error if you run a query that requires more than something on the order of hundreds of megabytes on a single node of execution, though:
#standardSQL
SELECT STRING_AGG(CONCAT(word, corpus)) AS words
FROM `bigquery-public-data.samples.shakespeare`
CROSS JOIN UNNEST(GENERATE_ARRAY(1, 1000));
This results in an error:
Resources exceeded during query execution.
If you click on the "Explanation" tab in the UI, you can see that the failure happened during stage 1 while building the results of STRING_AGG
. In this case, the string would have been 3,303,599,000 characters long, or approximately 3.3 GB in size.
这篇关于bigquery输出中的group_concat / string_agg的最大限制是多少?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!