威尔卡在日表与时间分区 [英] Wilcard on day table vs time partition
问题描述
我尝试了解在大查询(例如请求的成本或可能性)之间是否存在差异:
感谢!
简单解释:当BigQuery没有可用的分区机制时,通配表是提议的替代方案。自然演化包括分区表的功能,目前有一个alpha版本包含基于列的时间分区,即让用户定义哪个列(具有 DATE
或 TIMESTAMP
数据类型)将用于分区。
因此,目前BigQuery工程师正在努力增加更多新表格分区功能,而不是传统的通配符表方法,那么我建议你使用它们。
详细解释:您正在比较两种实际上用于相同目的但具有不同含义的方法:
$ b $前一段时间,当表分区不是大查询支持的功能时,通配表是使用简洁的SQL查询来查询多个表的方式。 。通配符表表示与SQL语句中指定的通配符表达式匹配的所有表的联合。但是,通配符表格有一些限制,例如:
<因此,一般来说,建议使用通配符表在多个表上使用分区表。但是,您应该始终考虑您的使用案例,并更好地了解哪种可能性更符合您的要求。
I try to understand if there is a difference in big query (in the cost or possibility of requesting for example) between :
- Create one table per day (like my_table_2018_02_06)
- Create a time partitioned table (my-table with time partition by day).
Thanks !
Short explanation: querying multiple tables using Wildcard Tables was the proposed alternative for when BigQuery did not have a partition mechanism available. The natural evolution was to include the feature of Partitioned Table, and currently there is an alpha release consisting in column-based time partitioning, i.e. letting the user define which column (having a DATE
or TIMESTAMP
data type) will be used for the partitioning.
So currently BigQuery engineers are working in adding more new features to table partitioning, instead of the legacy Wildcard Tables methodology, then I'd suggest that you work with them.
Long explanation: you are comparing two approaches that in fact are used with the same purpose, but which have different implications:
- Wildcard Tables: some time ago, when table partitioning was not a feature supported by Big Query, Wildcard Tables was the way to query multiple tables using concise SQL queries. A Wildcard Table represents the union of all the tables that match the wildcard expression specified in the SQL statement. However, Wildcard Tables have some limitations, such as:
- Do not support views.
- Do not support cached results (queries containing wildcard tables are billed every time they are run, even if the "cached results" option is checked).
- Only work with native BigQuery storage (cannot work with external tables [Bigtable, Storage or Drive]).
- Only available in standard SQL.
- Partitioned Tables: these are unique tables that are divided into segments, split by date. There is a lot of documentation regarding how to work with Partitioned Tables, and regarding the pricing, each partition in a Partitioned Table is considered an independent entity, so if a partition was not updated for the last 90 days, this data will be considered long-term and therefore will be billed with the appropriate discount (as would happen with a normal table). Finally, Partitioned Tables are here to stay, so there are more incoming features to them, such as column-based partitioning, which is currently in alpha, and you can follow its status in this Public Issue Tracker post. On the other hand, there are also some current limitations to be considered:
- Maximum of 2500 partitions per Partitioned Table.
- Maximum of 2000 partition updates per table per day.
- Maximum of 50 partition updates every 10 seconds.
So in general, it would be advisable to work with Partitioned Tables over multiple tables using Wildcard Tables. However, you should always consider your use case and see which one of the possibilities meets your requirements better.
这篇关于威尔卡在日表与时间分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!