Google BigQuery无法处理较大的结果集,导致“响应太大而无法返回”或“在查询执行期间超出资源” [英] Google BigQuery unable to process larger result set getting "Response too large to return" or "Resources exceeded during query execution"

查看:170
本文介绍了Google BigQuery无法处理较大的结果集,导致“响应太大而无法返回”或“在查询执行期间超出资源”的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在C#应用程序中使用大型表(〜105M记录)。


  1. 使用'Order by'或'Order Each by'子句查询表时,我得到查询执行错误。如果我删除了'Order by'或'Order Each by'子句,那么我得到的响应太大而无法返回错误。

  2. / b>

以下是两个场景的示例查询(我使用维基百科公用表)


  1. SELECT Id,Title,Count(*)FROM [publicdata:samples.wikipedia] / strong>




  2. 以下是我的问题


    1. 什么是Big Query Response的最大大小?

    2. 我们如何选择Query Request中的所有记录而不是'Export Method'?


    解决方案

    <强> 1。 Big Query Response的最大大小是多少?



    正如 Quota-policy 查询最大响应大小:128 MB压缩(在返回大型查询结果时无限制)

    2。如何选择查询请求中的所有记录而不是'导出方法'?



    如果您打算运行可能会返回更大结果的查询,那么您可以在 allowLargeResults 设置为true =nofollow noreferrer>作业配置

    即使结果集很小并且是主题,查询返回较大结果的执行时间也会较长其他限制




    • 您必须指定目标表。

    • 您无法指定顶级ORDER BY,TOP或LIMIT子句。这样做会否定使用allowLargeResults的好处,因为不能再同时计算查询输出。
    • 仅当与PARTITION BY子句一起使用时,窗口函数才能返回大型查询结果。



    详细了解如何分页以获得结果这里,也阅读 BigQuery Analytics book ,以页面200开头的页面,其中它解释了 Jobs :: getQueryResults 如何与 maxResults 参数和int的阻止模式。



    更新:



    查询结果大小限制 - 有时,很难知道128 MB的压缩
    数据的含义。



    当您在BigQuery中运行普通查询时,响应大小限制为128 MB
    的压缩数据。有时候,很难知道128 MB的压缩
    数据的含义。它压缩了2倍吗? 10倍?结果压缩在
    各自的栏内,这意味着压缩比往往是非常好的
    。例如,如果您有一列是一个国家的名称,则
    可能只有几个不同的值。当你只有几个不同的
    值时,这意味着没有很多独特的信息,并且
    列通常会压缩得很好。如果你返回加密的blob数据,他们
    可能压缩不好,因为它们大多是随机的。 (这在第220页上面链接的书中有解释)


    I am currently working with large table (~105M Records) in C# application.

    1. When query the table with 'Order by' or 'Order Each by' clause, then i am getting "Resources exceeded during query execution" error.

    2. If i remove 'Order by' or 'Order Each by' clause, then i am getting Response too large to return error.

    Here is the sample query for two scenarios (I am using Wikipedia public table)

    1. SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title Order by Id, Title Desc

    2. SELECT Id,Title,Count(*) FROM [publicdata:samples.wikipedia] Group EACH by Id, title

    Here are the questions i have

    1. What is the maximum size of Big Query Response?
    2. How do we select all the records in Query Request not in 'Export Method'?

    解决方案

    1. What is the maximum size of Big Query Response?

    As it's mentioned on Quota-policy queries maximum response size: 128 MB compressed (unlimited when returning large query results)

    2. How do we select all the records in Query Request not in 'Export Method'?

    If you plan to run a query that might return larger results, you can set allowLargeResults to true in your job configuration.

    Queries that return large results will take longer to execute, even if the result set is small, and are subject to additional limitations:

    • You must specify a destination table.
    • You can't specify a top-level ORDER BY, TOP or LIMIT clause. Doing so negates the benefit of using allowLargeResults, because the query output can no longer be computed in parallel.
    • Window functions can return large query results only if used in conjunction with a PARTITION BY clause.

    Read more about how to paginate to get the results here and also read from the BigQuery Analytics book, the pages that start with page 200, where it is explained how Jobs::getQueryResults is working together with the maxResults parameter and int's blocking mode.

    Update:

    Query Result Size Limitations - Sometimes, it is hard to know what 128 MB of compressed data means.

    When you run a normal query in BigQuery, the response size is limited to 128 MB of compressed data. Sometimes, it is hard to know what 128 MB of compressed data means. Does it get compressed 2x? 10x? The results are compressed within their respective columns, which means the compression ratio tends to be very good. For example, if you have one column that is the name of a country, there will likely be only a few different values. When you have only a few distinct values, this means that there isn’t a lot of unique information, and the column will generally compress well. If you return encrypted blobs of data, they will likely not compress well because they will be mostly random. (This is explained on the book linked above on page 220)

    这篇关于Google BigQuery无法处理较大的结果集,导致“响应太大而无法返回”或“在查询执行期间超出资源”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆