BigQuery中的资源过多 [英] Resources Exceeded in BigQuery

查看:58
本文介绍了BigQuery中的资源过多的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在为以下查询获取超出的资源.我尝试以批处理模式从命令行运行,但似乎没有任何效果.有人有主意吗?

I keeping getting resources exceeded for the following query. I've tried running in batch mode and from the command line and nothing seems to be working. Anyone have ideas?

SELECT num, extid, amount, note, balance FROM ( SELECT row_number() over(partition by extid order by stamp) as num , extid, stamp, ds, amount, note, balance FROM monte.ledger2_trailing_21d WHERE ds >= '2015-02-09' ORDER BY extid, stamp ) WHERE num <= 10 limit 300

SELECT num, extid, amount, note, balance FROM ( SELECT row_number() over(partition by extid order by stamp) as num , extid, stamp, ds, amount, note, balance FROM monte.ledger2_trailing_21d WHERE ds >= '2015-02-09' ORDER BY extid, stamp ) WHERE num <= 10 limit 300

推荐答案

这是一个看似昂贵的查询;在类似SQL的环境中,时间序列分析总是很困难.您编写的PARTITION BY子句要求单个 extid 的所有数据都必须存在于单台计算机的内存中,这会使它过载并导致资源超出错误.

This is a deceptively expensive query; timeseries analysis is always hard in SQL-like environments. The PARTITION BY clause you have written requires all of the data for a single extid to be present in memory on a single machine, which is overloading it and causing your resources exceeded error.

您可以通过使用ROWS子句来限制分区范围来减轻对RAM的需求.这是一个示例:

You can mitigate this RAM requirement by having a ROWS clause to limit the scope of your partition. Here is an example:

SELECT extid, stamp, ds, amount, note, balance
FROM (
  SELECT
    extid, stamp, ds, amount, note, balance,
    MAX(tenth_stamp) OVER(PARTITION BY extid) AS target_stamp
  FROM (
    SELECT extid, stamp, ds, amount, note, balance,
      MIN(stamp) OVER (
        PARTITION BY extid
        ORDER BY stamp DESC
        ROWS BETWEEN CURRENT ROW AND 9 FOLLOWING
      ) AS tenth_stamp
    FROM
      [monte.ledger2_trailing_21d]
    WHERE ds >= '2015-02-09'
  )
)
WHERE stamp >= target_stamp
ORDER BY extid, stamp
LIMIT 300

最里面的子选择提取您的数据和一个字段tenth_stamp,该字段保存所检查的10行中最少的标记.即使对于任何给定的extid少于十行,使用MIN()都可以使这项工作有效.

The inner-most sub-select extracts your data and a field tenth_stamp that holds the least stamp of the 10 rows examined. Using MIN() makes this work even when there are fewer than 10 rows for any given extid.

中间的子选择为每个extid找到最大的tenth_stamp.这是该extid的第十张总邮票.然后,外部SELECT可以将结果限制为在最近的stamp中,对于它们各自的extid,仅在stamp中具有stamp的行,从而为您提供所需的结果.

The middle sub-select finds the largest tenth_stamp for each extid. This is the tenth total stamp for that extid. The outer SELECT can then restrict the result to only rows with a stamp within the ten most recent stamp for their respective extid, giving you the desired result.

执行时,总共需要4个阶段.它不会快速运行,但是永远不会在单个位置中需要大量数据.希望有帮助!

When executed, this takes a total of 4 stages. It will not run fast, but never requires large amounts of data in a single location. Hope that helps!

这篇关于BigQuery中的资源过多的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆