查询执行过程中资源超出 [英] resource exceeded during query execution
问题描述
以下查询运行到查询执行期间超出的资源在bigquery上发生错误。处理的数据大约为700 MB,这与BigQuery控制台上显示的数量并不相同。我们通过使用组,因为在使用 group by 时,我们会得到相同的错误以及使用每组的建议。相关的工作ID是:
工作编号:fast-chess-620:job_41Fq1q3zFGB3FsACtuAiymTOCIU (每个组合)
作业ID: fast-chess-620:job_VVd2jPGX-nHsdZW5GlEU6bBgpnU (group by)
<$从
中选择col_1,col_2,col_3(选择col_1,date(sec_to_timestamp(col_4))作为col_2,count(col_5)作为col_3从
(TABLE_DATE_RANGE(table_prefix_1_,
date_add(usec_to_timestamp(utc_usec_to_month(now())),-6,MONTH),
date_add(usec_to_timestamp(utc_usec_to_month(now())) -1,MONTH))),
(TABLE_DATE_RANGE([table_prefix_2_],
usec_to_timestamp(utc_usec_to_month(now())),
usec_to_timestamp(utc_usec_to_day(now()))))
每组由1,2顺序排列1,2)x)x
每组由1,2顺序排列1,2
你能帮我们解决这个问题吗?
问题是ORDER BY。请参阅stackoverflow响应此处。
查看您的查询的日志,该群组产生超过1500万个结果。为了对它们进行排序,bigquery必须在单个节点中执行排序操作。
您是否真的需要排序结果?如果你这样做,你需要所有的结果吗?如果使用带有LIMIT的ORDER BY,它应该会成功,因为它可以保留最高值。
您应该能够使此查询成功运行if您:
$ b
The following query is running into a Resource exceeded during query execution error on bigquery. The data processed is around 700 MB , which is not that much , as displayed on the bigquery console. We are using group each by because on using group by we get the same error along with the suggestion to use group each by. The associated job id is
Job ID: fast-chess-620:job_41Fq1q3zFGB3FsACtuAiymTOCIU (group each by)
Job ID: fast-chess-620:job_VVd2jPGX-nHsdZW5GlEU6bBgpnU (group by)
select col_1,col_2, count(col_3) from
(select col_1, col_2, col_3 from
(select col_1, date(sec_to_timestamp(col_4)) as col_2, count(col_5) as col_3 from
(TABLE_DATE_RANGE(table_prefix_1_,
date_add(usec_to_timestamp(utc_usec_to_month(now())), -6, "MONTH"),
date_add(usec_to_timestamp(utc_usec_to_month(now())), -1, "MONTH"))),
(TABLE_DATE_RANGE([table_prefix_2_],
usec_to_timestamp(utc_usec_to_month(now())),
usec_to_timestamp(utc_usec_to_day(now()))))
group each by 1,2 order by 1,2) x) x
group each by 1,2 order by 1,2
Can you please help us resolve the issue.
It looks like the issue is the ORDER BY. See the stackoverflow response here.
Looking at the logs for your query, the group by produces more than 15 million results. In order to sort them, bigquery must perform the sort operation in a single node.
Do you really need a sorted result? If you do, do you need all of the results? If you use an ORDER BY with a LIMIT, it should succeed, since it can just keep the top values.
You should be able to get this query to run successfully if you:
- Use a GROUP EACH BY (which you already have) for both GROUP BY operations.
- Drop the inner ORDER BY since it doesn't actually help, since the outer query reorders things.
- It will probably work with just #1 and #2, but I'd also suggest either drop the outer ORDER BY or add a LIMIT constraint.
这篇关于查询执行过程中资源超出的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!