Google BigQuery基础架构 [英] Google BigQuery Underlying Architecture

查看:199
本文介绍了Google BigQuery基础架构的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以我刚刚开始在大约10分钟前搞砸了Google BigQuery,我想知道是否有人知道他们用来存储数据的底层架构?例如,这只是他们自己的BigTable基础设施的下一代?

So I just started messing around with Google BigQuery about 10 minutes ago, and I was wondering if anyone is aware of the underlying architecture that they're using to store the data? For example, is this just the next generation of their own BigTable infrastructure?

此外,它是否清楚他们用于索引,索引重建等等的策略类型?我只是想分析这是否成熟,在这一点上,你可以100%确定你的数据端到端,或有一个黑盒子区域,东西只是工作

Also, is it clear what sorts of strategies they're using for indexes, index rebuilds, etc? I'm just trying to analyze whether this is mature enough at this point where you can be 100% sure of what's going on with your data end-to-end, or is there a bit of a black box area where "things just work"?

推荐答案

没有索引...每个查询都是一个表扫描。查询架构在此处
您的数据存储在专用的柱状格式,称为Columnio在Colossus(GFS的后继)。 Colossus会复制数据中心内的数据,您的数据也会复制到其他地理区域,以确保即使Google数据中心脱机也可保持可用。

There are no indexes... every query is a table scan. The query architecture is described here. Your data is stored in a proprietary columnar format called ColumnIO on Colossus (a successor to GFS). Colossus replicates the data within a datacenter and your data is also replicated to other geographic regions to make sure it stays available even if a Google datacenter goes offline.

要回答您的具体


  • 虽然数据可能会临时存储在Bigtable中,但所有数据都会长期存储在Colossus中。

  • 添加到bigquery中的新数据在静态加密(即,每当将其写入永久存储时)。

  • 如前所述,没有索引,因此没有重建索引的策略。根据您向表格中添加数据的方式,您的表格可能会合并,这意味着以更有效的方式重写基础文件。

  • 巨人基于广泛的Google数据ColumnIO是Google的标准。

  • 但是,你也应该把它看成一个黑盒子。这里的所有细节可能会随着Google的存储系统成熟或架构的变化而改变。但是,它应该总是只工作(当然在SLA警告内)

  • While data may be temporarily stored in Bigtable, all data is stored long-term in Colossus (for now!).
  • New data added to bigquery is encrypted at rest (that is, whenever it is written out to permanent storage). It is also encrypted when sent over the network.
  • As mentioned, no indexes, so there are no strategies for rebuilding the index. Depending on how you add data to your table, your table may be coalesced, which means rewriting the underlying files in a more efficient manner.
  • Colossus underlies a massive amount of Google data across a wide range of services, ColumnIO is a standard throughout Google. I would call both of these technologies mature.
  • However, you should also consider it a black box. All of the details here may change as storage systems at Google mature or architectures change. However, it should always "just work" (within SLA caveats, of course)

如果您有兴趣了解更多有关如何BigQuery在封面或如何有效使用它,这里是一个无耻的插件,我们的,这是6月到期的。

If you're interested in more details about how BigQuery works under the covers or how to use it effectively, here is a shameless plug for our book on the subject which is due out in June.

这篇关于Google BigQuery基础架构的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆