如何更改BigQuery重复记录的col类型 [英] How to change the col type of a BigQuery repeated record

查看:63
本文介绍了如何更改BigQuery重复记录的col类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将重复记录的col类型从 STRING 更改为 TIMESTAMP .这里有来自BQ文档的一些建议(

然后,下面的SQL会将 update_at 列转换为 DATE ,并保存到新表(如果需要,也可以保存到同一表).

  #standardSQL创建或替换表<您的数据集> .output_table AS选择*替换(数组(按结构选择*替换(按CAST(更新日期为AS DATE)为更新日期)来自UNNEST(地点)AS位置)从< your_dataset> .sample_table; 

I'm trying to change a col type of a repeated record from STRING to TIMESTAMP. There are a few suggestions from BQ docs here (manually-changing-schemas). However, I'm running into issues with each of the recommended suggestions.

Here is an example schema:

{
  'name' => 'id',
  'type' => 'STRING',
  'mode' => 'REQUIRED'
},
{
  'name' => 'name',
  'type' => 'STRING',
  'mode' => 'REQUIRED'
},
// many more fields including nested records and repeated records
{
  'name' => 'locations',
  'type' => 'RECORD',
  'mode' => 'REPEATED',
  'fields' => [
    {
      'name' => 'city',
      'type' => 'STRING',
      'mode' => 'REQUIRED'
    },
    {
      'name' => 'updated_at',
      'type' => 'STRING',   // ** want this as TIMESTAMP **
      'mode' => 'REQUIRED'
    },
  ]
}

Issue with using Query:

I think we'd have to UNNEST the repeated record, cast the field to timestamp for each repeated record and then somehow recreate the row for insertion into the new table.

Issue with exporting table as JSON:

When exporting the table in JSON format it exports raw json representations of the data (with maps and dictionaries, as we'd expect it to).

However, we're not able to import that raw data back into BQ:

BigQuery does not support maps or dictionaries in JSON. For example, "product_categories": {"my_product": 40.0} is not valid, but "product_categories": {"column1": "my_product" , "column2": 40.0} is valid.

https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-json#limitations

Any suggestions would be much appreciated!

解决方案

Below answer is based on that:REPEATED RECORD type in BigQuery StandardSQL is represented as type ARRAY<STRUCT<f1 f1_type, f2 f2_type ... >>.

It is not my favorite since you have to specify full column list. Maybe there is a better way.

#standardSQL
-- Build sample data, try to mimic what's in question.
CREATE OR REPLACE TABLE
  <your_dataset>.sample_table AS
SELECT name, 
       array<struct<city string, update_at string>>[("SFO", "2011-1-1"), ("SEA", "2022-2-2")] 
       as locations
FROM UNNEST(['Name1', "Name2", "Name3"]) as name;

Then below SQL will convert update_at column into DATE and save to new table (or the same one if you'd like).

#standardSQL
CREATE OR REPLACE TABLE
  <your_dataset>.output_table AS
SELECT * REPLACE (
   ARRAY(SELECT AS STRUCT * REPLACE(CAST(update_at AS DATE) AS update_at)
         FROM UNNEST(locations)) 
   AS locations 
   )
FROM
  <your_dataset>.sample_table;

这篇关于如何更改BigQuery重复记录的col类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆