根据日期将表拆分成多个表，使用bigquery和单个查询进行分区 [英] Split table into multiple tables based on date using bigquery with a single query for partitioning

查看：419 发布时间：2018/5/7 17:39:18 google-bigquery

本文介绍了根据日期将表拆分成多个表，使用bigquery和单个查询进行分区的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

恢复表保持其原始分区，而不是全部进入当前分区。

我想我可以做的是将 bq load 添加到临时表中。然后运行一个查询，将该表拆分为每天一个表中的YYYYMMDD，该命名约定由 bq分区即 sharded_YYYYMMDD 。然后运行 bq分区。

此页

第2步 - 逐个创建分片表只扫描各列（不是全表扫描）
$ b

  #standardSQL 
 SELECT r。* 
 FROM pivot_table，UNNEST（day20160101）AS r

在Web UI上运行上述查询，目标表名为mytable_20160101

您可以在第二天运行相同的程序

 #standardSQL 
 SELECT r。* 
 FROM pivot_table，UNNEST（day20160102）AS r

现在您应该拥有名为mytable_20160102的目的地表格，依此类推

您应该可以使用任何客户端自动化/编写此步骤您选择的
注意：那些最终的日常表格将与原始表格具有完全相同的架构！

您可以使用上述方法进行多种变更 - 取决于您的创造力注意： BigQuery最多允许在表中包含10000列，因此在一年的相应日期中365列绝对不是问题：o）

The original "why" of what I want to do is:

Restore a table maintaining its original partitioning instead of it all going into today's partition.

What I thought I could do is bq load to a temporary table. Then run a query to split that table into one table per day YYYYMMDD in the naming convention needed by bq partition i.e. sharded_YYYYMMDD. Then run bq partition.

This page https://cloud.google.com/bigquery/docs/creating-partitioned-tables gives examples but it requires running a query per day. That could be hundreds:

bq query --use_legacy_sql=false --allow_large_results --replace \ --noflatten_results --destination_table 'mydataset.temps$20160101' \ 'SELECT stn,temp from `bigquery-public-data.noaa_gsod.gsod2016` WHERE mo="01" AND da="01" limit 100'

So how do I make a single query that will iterate over all the days and make one table per day?

I found a similar question here Split a table into multiple tables in BigQuery SQL but there is no answer about doing it with a single query.

解决方案

The main problem here is having full scan for each and every day. The the rest is less of a problem and can be easily scripted out in any client of your choice

So, below is to - How avoid full table scan for each and every day?

Try below step-by-step to see the approach
It is generic enough to extend/apply to your real case - meantime I am using same example as you in your question and I am limiting exercise to just 10 days

Step 1 – Create Pivot table
In this step we a) compress each row’s content into record/array and b) put them all into respective "daily" column

#standardSQL
SELECT
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160101' THEN r END) AS day20160101,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160102' THEN r END) AS day20160102,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160103' THEN r END) AS day20160103,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160104' THEN r END) AS day20160104,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160105' THEN r END) AS day20160105,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160106' THEN r END) AS day20160106,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160107' THEN r END) AS day20160107,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160108' THEN r END) AS day20160108,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160109' THEN r END) AS day20160109,
  ARRAY_CONCAT_AGG(CASE WHEN d = 'day20160110' THEN r END) AS day20160110
FROM (
  SELECT d, r, ROW_NUMBER() OVER(PARTITION BY d) AS line
  FROM (
    SELECT 
      stn, CONCAT('day', year, mo, da) AS d, ARRAY_AGG(t) AS r
    FROM `bigquery-public-data.noaa_gsod.gsod2016` AS t 
    GROUP BY stn, d
  ) 
)
GROUP BY line

Run above query in Web UI with pivot_table (you can choose whatever name you want here) as a destination

As you can see - here we will get table with 10 columns – one column for one day and schema of each column is a copy of schema of original table:

Step 2 – Creating sharded table one-by-one ONLY scanning respective column (no full table scan)

#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160101) AS r

Run above query from Web UI with destination table named mytable_20160101

You can run same for next day

#standardSQL
SELECT r.*
FROM pivot_table, UNNEST(day20160102) AS r

Now you should have destination table named mytable_20160102 and so on
You should be able to automate/script this step with any client of your choice Note: those final daily tables will have exactly same schema as original table!

There are many variations of how you can use above approach - it is up to your creativity

Note: BigQuery allows up to 10000 columns in table, so 365 columns for respective days of one year is definitely not a problem here :o)

这篇关于根据日期将表拆分成多个表，使用bigquery和单个查询进行分区的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

根据日期将表拆分成多个表，使用bigquery和单个查询进行分区 [英] Split table into multiple tables based on date using bigquery with a single query for partitioning

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

根据日期将表拆分成多个表，使用bigquery和单个查询进行分区 [英] Split table into multiple tables based on date using bigquery with a single query for partitioning

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭