BigQuery执行时间不一致 [英] BigQuery execution time inconsistancies

查看:93
本文介绍了BigQuery执行时间不一致的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们花费了超过200美元来测试BigQuery上的执行时间,以及每次在交互式查询上的执行时间在15秒到2分钟之间的情况下完全相同的查询。任何人都可以告诉我为什么会发生这种情况?

我们需要一致的执行时间来测试和优化我们的查询。有什么方法可以预测执行时间的一致性吗?我会理解执行时间差异在-10%以上,但差异远远超过1000%,因此我们无法测试或优化任何内容,因为我们的查询设置与执行时间无关,似乎完全是随机的。我们并行运行4个查询,所有数据都在相同的数据上,并且结构相同(只是某些列名被重命名为禁用缓存),我们的执行时间为:13s,27s,32s,44s。然后再次20,13,24,45等......然后在某个时候我们运行一个查询(与上面相同)并且执行时间是400s ... WTF?

另外,BigQuery上的销售团队也不需要购买支持包(现在几次请求报价,首先是一个月前),所以留给我的只是在这里寻求帮助。

解决方案

关于执行时间不一致性,这似乎是比我预期的更高的方差。你可以提供一个快速查询和缓慢查询的工作ID,这样我就可以查找在内部查询统计中花费的时间了吗?



也就是说,一些令人惊讶的是,查询时间的相当显着的变化,而不是你所看到的范围。以下是一些因素:


  • 尾部延迟。这个查询被分解成几部分,根据你的数据的大小,可以分为几个不同的工作人员(可能有数千人)。正在从分布式文件系统集群读取数据,这可能会将数据分散到数百个磁盘或更多磁盘上(具体取决于您的表的大小)。

    这些响应中最慢的组件将决定您的总查询时间。这称为尾部延迟,这意味着你必须等待零散的尾巴完成。我们做了很多工作来尽量减少影响,复制数据和重新分派工作,但它仍然会产生很大的影响。

  • 加载。目前,当我们的集群负载很重时,它可能会减慢其他用户的响应时间。我们正在研究更好的隔离机制,但它们仍然有一些小小的出路。这不会解释您所看到的数量级的时间差异,但它可能是一个因素。
  • 限制。当单个客户同时发送多个并行查询时,这些查询可能会减慢以防止客户占用太多容量。这种情况发生的几率取决于许多因素,包括查询大小和群集上的其他负载。

  • 写作结果。如果你的结果大于100k左右,写出结果可能非常缓慢,并且可能会有荒谬的变化。这是我们目前正在调查的一个错误。




正在努力降低所有人的影响这些因素。然而,现在我们没有魔杖可以挥动,并说查询性能将保持在20%以内,除了说我们认识到这个问题并正在努力改进它之外。



如果您提供工作ID,我们可以查看您的查询的具体情况,以确定所花费的时间以及我们可以做些什么来解决问题。


We just spent well over 200$ to test execution time on BigQuery and on exactly the same queries every time execution time ranged from 15sec to 2 minutes on interactive queries. Can anyone give me any information on why is this happening?

We need consistent execution time to test and optimize our queries. Is there any way to predict the consistency on execution time? I would understand +-10% difference on execution time but difference is well over 1000%, with this we cannot test or optimize anything because our query setup has nothing to do with execution time which seems completely random. We run 4 queries in parallel, all are on the same data, and structured the same way (just some column names are renamed to disable caching), our execution time is: 13s, 27s, 32s, 44s. Then again 20, 13, 24, 45 and so on... Then at some point we run one query (the same as above) and execution time is 400s... WTF?

Also sales team on BigQuery is non-existent to buy a support package (requested a quote few times now, first being a month ago) so all that is left for me is to ask here for help.

解决方案

Regarding the execution time inconsistency, this does seem to be a higher variance than I would expect. Can you provide a job id of a fast query and a slow query so I can look up where the time was being spent in the internal query statistics?

That said, some fairly significant variation in query times, while not quite on the range of what you're seeing, is unsurprising. Here are some of the factors:

  • Tail latency. The query is broken up into pieces and farmed out to several different workers (potentially thousands, depending on the size of your data). The data is being read from a distributed filesystem cluster that will likely have your data striped across hundreds of disks or more (depending, again, on the size of your table).

    The slowest one of these components to respond will determine your total query time. This is called tail latency, meaning that you have to wait until the long tail of stragglers is finished. We do a lot of work to try to minimize the effect, replicating data and re-dispatching work, but it can still have a big effect.

  • Load. Currently when our clusters are heavily loaded, it can slow down response times for other users. We're working on much better isolation mechanisms, but they're still a little ways out. This wouldn't account for time discrepancies of the magnitude that you're seeing, but it can be a factor.

  • Throttling. When a single customer sends multiple parallel queries at once, those queries may be slowed down in order to prevent that customer from taking up too much capacity. How much and whether this happens depends on a number of factors, including query size and the other load on a cluster.

  • Writing results. If your results are larger than 100k or so, writing the results out may be very slow, and may have absurd variation. This is a bug that we're currently investigating.

There are significant efforts under way to reduce the impact of all of these factors. Right now, however, we don't have a magic wand that we can wave and say "query performance is going to be consistent to within 20%", other than saying "we recognize the issue and are working on improving it".

If you provide job ids, we can look at the specific cases of your queries to figure out where the time is being spent and if there is something we can do to address the issue.

这篇关于BigQuery执行时间不一致的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆