带有大量表格的bigquery [英] bigquery with a large number of tables

查看:94
本文介绍了带有大量表格的bigquery的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我知道已经有一个关于表格数量限制的问题,但它是模糊的......
在一个数据集中,我想创建大约1-2千万个表格。发生这种情况是因为我想将我的用户活动表分成更小的表;为每个用户提供一张桌子。而且这个数字将会持续增长。
据我所知,BigQuery的问题没有问题......但我担心我无法从浏览器( https://bigquery.cloud.google.com/queries/appname );因为这些表格不是按时间分组的(就像时间表的表格一样),并且它们都以无限滚动的形式列出(可能阻止浏览器)。

谢谢对于任何建议

解决方案


...问题在于浏览器在列出所有$ b数据集中的$ b表

您可以使用?minimal参数将加载操作限制为每个项目30,000个表,浏览器不会被阻止。例如:

  https://bigquery.cloud.google.com/queries/<your_project_name> ;?minimal。 

详细了解显示限制


我无法轻松浏览我的数据集因此(并查询它们)

如果您计划在同一数据集中拥有2百万张表,即使Web UI是向他们展示给你,而不被阻挡 - 我真的怀疑你会以某种方式合理地视觉探索它们。太多的对象来吞下

顺便说一句,这不仅是人类的具体问题 - 即使以编程方式查询这些长表列表可能会有问题。详细了解使用meta-tables


因为表格不是按时间分组的(就像时间表中的表格一样),并且它们都以无限滚动方式列出(可能会阻塞浏览器)

没错,在BigQuery中,Web UI表格只会按照table_preffixYYYYMMDD模式进行分组。即使您将用户ID命名空间映射到YYYYMMDD值 - 您仍然会失去运气,因为您的组仍然会包含数百万张表。


感谢您的任何建议


BigQuery支持分区表,它允许在同一个表中有多个分区。不幸的是,截至今天,只有日期分区表支持,但从我听说BigQuery Team计划通过任意列添加分区。

这可能适合您所需的设计,除非会有限制列基数。

同时,如果您愿意,您可以尝试使用日期分区表功能,通过将用户标识映射到YYYYMMDD(〜9999 * 12 * 30 >> 3百万用户)


我的推荐:




  1. 按照前面(上面)部分中的建议播放/实验分区表

  2. 在BigQuery中将表拆分(拆分)为数百万个表对我而言听起来非常不切实际。你应该重新审视你的设计。你试图通过这样的分片解决什么?尝试着重于这一点,如果需要 - 在这里发布特定的问题!


I know there has already been a question regarding the table number limits, but it was vague... In a dataset I want to create about 1-2 milion tables. This happens because I want to split my users activity table into smaller tables; for each user a table. And in time this number will keep on growing. As I understand there will be no problem from BigQuery's perpective...but i'm concerned that I will not be able to access (list) those datasets from browser (https://bigquery.cloud.google.com/queries/appname); because the tables are not grouped by time (like in the case of tables with timerange) and they get all listed in an endless scroll (possibly blocking the browser)

Thank you for any suggestions

解决方案

… the problem is that the browser will get blocked while listing all tables in the dataset

You can use the "?minimal" parameter to limit the load operation to 30,000 tables per project, so browser will not be blocked. For example:

https://bigquery.cloud.google.com/queries/<your_project_name>?minimal.   

see more about Display limits

I can't easily explore my dataset because of this (and query them)

If you are planning to have 2+ million tables in same dataset, even if Web UI were to show them to you without being blocked - I really doubt you would be able to somehow reasonably visually explore them. Just too many objects to "swallow"
Btw, this is not only human specific issue - even querying such "long" tables list programmatically can be problematic. See more about Using meta-tables

because the tables are not grouped by time (like in the case of tables with timerange) and they get all listed in an endless scroll (possibly blocking the browser)

That’s right, in BigQuery Web UI tables will be grouped only if they follow table_preffixYYYYMMDD pattern. Even if you would map your userID namespace to YYYYMMDD value – you would still be out of luck as your group still will consists of those millions tables.

Thank you for any suggestions

BigQuery supports Partitioned Tables which allows to have multiple partitions in the same table. Unfortunately, as of today, only Date-Partitioned tables are supported, but from what I heard BigQuery Team plans to add partitioning by arbitrary column.
This would probably fit to your desired design, unless there will be a limitation to column cardinality.
Meantime, if you want, you can experiment with applying your design using date-partitioned tables feature by mapping userid to YYYYMMDD (~9999*12*30 >> 3+ million users)

My recommendation:

  1. Play/experiment with partitioned tables as I suggested in previous (above) section
  2. Sharding (splitting) tables in BigQuery to millions of tables sound to me extremely impractical. You should revisit your design. What it is that you are trying to address by such sharding? Try to focus on this and if needed - post specific question here on SO!

这篇关于带有大量表格的bigquery的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆