带有"Order Each by"子句的Google BigQuery大表(105M条记录)会产生"Resources Exceeds Query Execution".错误 [英] Google BigQuery large table (105M records) with 'Order Each by' clause produce "Resources Exceeds Query Execution" error

查看:40
本文介绍了带有"Order Each by"子句的Google BigQuery大表(105M条记录)会产生"Resources Exceeds Query Execution".错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当Google Big Query大表(105M条记录)带有" Order Each by "子句时,我遇到严重问题"资源超出查询执行".

I am running into Serious issue "Resources Exceeds Query Execution" when Google Big Query large table (105M records) with 'Order Each by' clause.

以下是示例查询(使用公共数据集:维基百科):

Here is the sample query (which using public data set: Wikipedia):

SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc

如何在不添加Limit关键字的情况下解决此问题.

How to solve this without adding Limit keyword.

推荐答案

在大数据数据库上使用by命令不是一项常规操作,在某些时候,它超出了大数据资源的属性.您应该考虑将查询分片或在导出的数据中运行订单.

Using order by on big data databases is not an ordinary operation and at some point it exceeds the attributes of big data resources. You should consider sharding your query or run the order by in your exported data.

正如我今天在

As I explained to you today in your other question, adding allowLargeResults will allow you to return large response, but you can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.

这里您可以尝试的一种选择是分片查询.

One option here that you may try is sharding your query.

where ABS(HASH(Id) % 4) = 0

您可以大量使用上述参数来获得较小的结果集,然后进行组合.

You can play with the above parameters a lot to achieve smaller resultsets and then combining.

还请阅读第9章-了解查询执行解释了内部分片的工作原理.

Also read Chapter 9 - Understanding Query Execution it explaines how internally sharding works.

您还应该阅读推出BigQuery清单

这篇关于带有"Order Each by"子句的Google BigQuery大表(105M条记录)会产生"Resources Exceeds Query Execution".错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆