Google BigQuery 无法处理更大的结果集,出现“响应太大而无法返回"或“查询执行期间超出资源"; [英] Google BigQuery unable to process larger result set getting "Response too large to return" or "Resources exceeded during query execution"

查看:40
本文介绍了Google BigQuery 无法处理更大的结果集,出现“响应太大而无法返回"或“查询执行期间超出资源";的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在 C# 应用程序中处理大表(~105M 记录).

I am currently working with large table (~105M Records) in C# application.

  1. 使用Order by"或Order Each by"子句查询表时,出现查询执行期间资源超出"错误.

  1. When query the table with 'Order by' or 'Order Each by' clause, then i am getting "Resources exceeded during query execution" error.

如果我删除Order by"或Order Each by"子句,那么我收到的响应太大而无法返回错误.

If i remove 'Order by' or 'Order Each by' clause, then i am getting Response too large to return error.

这里是两个场景的示例查询(我使用的是维基百科公共表)

Here is the sample query for two scenarios (I am using Wikipedia public table)

  1. SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc

  1. SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc

SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] 按 ID、标题对每个分组

这是我的问题

  1. Big Query Response 的最大大小是多少?
  2. 我们如何选择查询请求中不在导出方法"中的所有记录?

推荐答案

1.Big Query Response 的最大大小是多少?

正如在 配额政策 中提到的那样,查询最大响应大小:10GB 压缩(返回大查询结果时无限制)

As it's mentioned on Quota-policy queries maximum response size: 10 GB compressed (unlimited when returning large query results)

2.我们如何选择查询请求中不在导出方法"中的所有记录?

如果您计划运行可能返回更大结果的查询,您可以在您的 作业配置.

If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.

返回大结果的查询将需要更长的时间来执行,即使结果集很小,并且受其他限制:

Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:

  • 您必须指定目标表.
  • 您不能指定顶级 ORDER BY、TOP 或 LIMIT 子句.这样做会抵消使用 allowLargeResults 的好处,因为无法再并行计算查询输出.
  • 仅当与 PARTITION BY 子句结合使用时,窗口函数才能返回大型查询结果.

阅读有关如何分页以获取结果的更多信息 此处 并阅读BigQuery Analytics 书籍,从第 200 页开始的页面,其中解释了 Jobs::getQueryResultsmaxResults 参数和 int 的阻塞模式一起工作.

Read more about how to paginate to get the results here and also read from the BigQuery Analytics book, the pages that start with page 200, where it is explained how Jobs::getQueryResults is working together with the maxResults parameter and int's blocking mode.

更新:

查询结果大小限制 - 有时,很难知道 10 GB 的压缩是多少数据意味着.

Query Result Size Limitations - Sometimes, it is hard to know what 10 GB of compressed data means.

在 BigQuery 中运行普通查询时,响应大小限制为 10 GB的压缩数据.有时,很难知道 10 GB 的压缩数据的意思.它是否被压缩了 2 倍?10 倍?结果被压缩在它们各自的列,这意味着压缩率往往非常高好的.例如,如果您有一列是国家/地区的名称,则有可能只有几个不同的值.当你只有几个不同的值,这意味着没有很多独特的信息,并且列一般会压缩得很好.如果您返回加密的数据块,它们将可能不会很好地压缩,因为它们大多是随机的.(这在上面链接的书中第 220 页上有解释)

When you run a normal query in BigQuery, the response size is limited to 10 GB of compressed data. Sometimes, it is hard to know what 10 GB of compressed data means. Does it get compressed 2x? 10x? The results are compressed within their respective columns, which means the compression ratio tends to be very good. For example, if you have one column that is the name of a country, there will likely be only a few different values. When you have only a few distinct values, this means that there isn’t a lot of unique information, and the column will generally compress well. If you return encrypted blobs of data, they will likely not compress well because they will be mostly random. (This is explained on the book linked above on page 220)

这篇关于Google BigQuery 无法处理更大的结果集,出现“响应太大而无法返回"或“查询执行期间超出资源";的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆