为什么BigQuery在非大数据量上如此缓慢? [英] Why is BigQuery so slow on non-large data sizes?

查看:224
本文介绍了为什么BigQuery在非大数据量上如此缓慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们发现BigQuery在大于100M行的数据集上运行良好,其中初始化时间实际上并未生效(或与查询的其余部分相比可以忽略不计)。



然而,在任何情况下,表现都很慢,很差,这使得它(1)不适合在交互式BI工具中工作;和(2)不如其他产品,如Redshift甚至ElasticSearch,其中数据大小在100M行以下。实际上,我们的组织中有一名工程师正在评估一种技术,用于对具有大约1000个用户的分析产品进行1M至100M行数据量的查询,他的反馈是他无法相信BigQuery的缓慢程度。 / p>

如果没有对BigQuery产品的防范,我想知道是否有任何改进计划:


  1. BigQuery的速度 - 特别是它的初始化时间 - 非大规模数据集的查询速度?

  2. BigQuery是否能够提供亚秒级的响应时间'定期'查询(例如简单的聚合组合)在特定大小的数据集上?


解决方案

元数据/启动花费的时间,但实际执行时间非常短。我们正在开展工作,将解决这个问题,但一些变化很复杂,需要一段时间。



您可以想象,在初期,BigQuery可能会有中央用于管理作业,元数据等的系统,对于使用该服务的所有N 0 实体表现得非常好。但是,一旦你到达了N 1 实体,就有必要重新构建一些东西,使它们尽可能少的延迟。有关新功能的通知 - 这也是我们宣布与启动延迟相关的API改进的地方 - 请留意我们的发布说明,您也可以订阅RSS订阅。


We have found BigQuery to work great on data sets larger than 100M rows, where the 'initialization time' doesn't really come into effect (or is negligible compared to the rest of the query).

However, on anything under that, the performance is quite slow and poor, which makes it (1) ill-suited to working in an interactive BI tool; and (2) inferior to other products, such as Redshift or even ElasticSearch where the data size is under 100M rows. Actually, we had an engineer at our organization that was evaluating a technology for doing queries on data sizes between 1M and 100M rows for an analytics product that has about 1000 users, and his feedback was that he could not believe how slow BigQuery was.

Without a defense of the BigQuery product, I was wondering if there were any plans on improving:

  1. The speed of BigQuery -- especially its initialization time -- on queries of non-massive data sets?
  2. Will BigQuery ever be able to deliver sub-second response times on 'regular' queries (such as a simple aggregation group by) on datasets under a certain size?

解决方案

It's time spent on metadata/initiation, but actual execution time is very small. We have work in progress that will address this, but some of the changes are complicated and will take a while.

You can imagine that in its infancy, BigQuery could have central systems for managing jobs, metadata, etc. in a manner that performed very well for all N0 entities using the service. Once you get to N1 entities, however, it may be necessary to rearchitect some things to make them have as little latency as possible. For notification about new features--which is also where we would announce API improvements related to start-up latency--keep an eye on our release notes, which you can also subscribe to as an RSS feed.

这篇关于为什么BigQuery在非大数据量上如此缓慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆