威尔卡在日表与时间分区 [英] Wilcard on day table vs time partition

查看:114
本文介绍了威尔卡在日表与时间分区的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试了解在大查询(例如请求的成本或可能性)之间是否存在差异:



  • 创建一个时间分区表(带有时间分区的my-table)。



感谢!

解决方案

简单解释:当BigQuery没有可用的分区机制时,通配表是提议的替代方案。自然演化包括分区表的功能,目前有一个alpha版本包含基于列的时间分区,即让用户定义哪个列(具有 DATE TIMESTAMP 数据类型)将用于分区。



因此,目前BigQuery工程师正在努力增加更多新表格分区功能,而不是传统的通配符表方法,那么我建议你使用它们。






详细解释:您正在比较两种实际上用于相同目的但具有不同含义的方法:

    $ b $前一段时间,当表分区不是大查询支持的功能时,通配表是使用简洁的SQL查询来查询多个表的方式。 。通配符表表示与SQL语句中指定的通配符表达式匹配的所有表的联合。但是,通配符表格有一些限制,例如:


    • 不支持视图。
    • 不支持缓存结果(包含通配符表的查询在每次运行时收费,即使选中了缓存结果选项)。

    • 仅适用于本机BigQuery存储(无法与外部表[Bigtable,Storage或Drive]配合使用)。

    • 仅在标准SQL中可用。


  • 分区表: 这些是独特的表格,按照日期分为细分部分。有许多关于如何使用分区表的文档,以及有关定价时,分区表中的每个分区都被视为一个独立的实体,因此如果分区在过去90天内未更新,则此数据将被视为长期数据,因此将以适当的折扣进行计费(就像在正常表)。最后,分区表会留在这里,所以它们有更多的传入功能,比如基于列的分区,它是目前位于alpha ,您可以在此公开问题跟踪器发布。另一方面,还需要考虑一些当前的限制


    • 每个分区表最多为2500个分区。 每个表每天最多有2000个分区更新。
    • >
    • 每10秒最多有50个分区更新。



      <因此,一般来说,建议使用通配符表在多个表上使用分区表。但是,您应该始终考虑您的使用案例,并更好地了解哪种可能性更符合您的要求。


      I try to understand if there is a difference in big query (in the cost or possibility of requesting for example) between :

      • Create one table per day (like my_table_2018_02_06)
      • Create a time partitioned table (my-table with time partition by day).

      Thanks !

      解决方案

      Short explanation: querying multiple tables using Wildcard Tables was the proposed alternative for when BigQuery did not have a partition mechanism available. The natural evolution was to include the feature of Partitioned Table, and currently there is an alpha release consisting in column-based time partitioning, i.e. letting the user define which column (having a DATE or TIMESTAMP data type) will be used for the partitioning.

      So currently BigQuery engineers are working in adding more new features to table partitioning, instead of the legacy Wildcard Tables methodology, then I'd suggest that you work with them.


      Long explanation: you are comparing two approaches that in fact are used with the same purpose, but which have different implications:

      • Wildcard Tables: some time ago, when table partitioning was not a feature supported by Big Query, Wildcard Tables was the way to query multiple tables using concise SQL queries. A Wildcard Table represents the union of all the tables that match the wildcard expression specified in the SQL statement. However, Wildcard Tables have some limitations, such as:
        • Do not support views.
        • Do not support cached results (queries containing wildcard tables are billed every time they are run, even if the "cached results" option is checked).
        • Only work with native BigQuery storage (cannot work with external tables [Bigtable, Storage or Drive]).
        • Only available in standard SQL.
      • Partitioned Tables: these are unique tables that are divided into segments, split by date. There is a lot of documentation regarding how to work with Partitioned Tables, and regarding the pricing, each partition in a Partitioned Table is considered an independent entity, so if a partition was not updated for the last 90 days, this data will be considered long-term and therefore will be billed with the appropriate discount (as would happen with a normal table). Finally, Partitioned Tables are here to stay, so there are more incoming features to them, such as column-based partitioning, which is currently in alpha, and you can follow its status in this Public Issue Tracker post. On the other hand, there are also some current limitations to be considered:
        • Maximum of 2500 partitions per Partitioned Table.
        • Maximum of 2000 partition updates per table per day.
        • Maximum of 50 partition updates every 10 seconds.

      So in general, it would be advisable to work with Partitioned Tables over multiple tables using Wildcard Tables. However, you should always consider your use case and see which one of the possibilities meets your requirements better.

      这篇关于威尔卡在日表与时间分区的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆