带有"Order Each by"子句的Google BigQuery大表(105M条记录)会产生"Resources Exceeds Query Execution".错误 [英] Google BigQuery large table (105M records) with 'Order Each by' clause produce "Resources Exceeds Query Execution" error
问题描述
当Google Big Query大表(105M条记录)带有" Order Each by "子句时,我遇到严重问题"资源超出查询执行".
I am running into Serious issue "Resources Exceeds Query Execution" when Google Big Query large table (105M records) with 'Order Each by' clause.
以下是示例查询(使用公共数据集:维基百科):
Here is the sample query (which using public data set: Wikipedia):
SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc
如何在不添加Limit关键字的情况下解决此问题.
How to solve this without adding Limit keyword.
推荐答案
在大数据数据库上使用by命令不是一项常规操作,在某些时候,它超出了大数据资源的属性.您应该考虑将查询分片或在导出的数据中运行订单.
Using order by on big data databases is not an ordinary operation and at some point it exceeds the attributes of big data resources. You should consider sharding your query or run the order by in your exported data.
As I explained to you today in your other question, adding allowLargeResults
will allow you to return large response, but you can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults
, because the query output can no longer be computed in parallel.
这里您可以尝试的一种选择是分片查询.
One option here that you may try is sharding your query.
where ABS(HASH(Id) % 4) = 0
您可以大量使用上述参数来获得较小的结果集,然后进行组合.
You can play with the above parameters a lot to achieve smaller resultsets and then combining.
还请阅读第9章-了解查询执行解释了内部分片的工作原理.
Also read Chapter 9 - Understanding Query Execution it explaines how internally sharding works.
您还应该阅读推出BigQuery清单
这篇关于带有"Order Each by"子句的Google BigQuery大表(105M条记录)会产生"Resources Exceeds Query Execution".错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!