模式avro在时间戳中,但在bigquery中以整数形式出现 [英] Schema avro is in timestamp but in bigquery comes as integer

查看:177
本文介绍了模式avro在时间戳中,但在bigquery中以整数形式出现的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个将avro文件上传到bigquery的管道,配置的架构似乎还可以,但是BigQuery理解为整数值而不是日期字段.在这种情况下我该怎么办?

I have a pipe that uploads avro files to bigquery, the configured schema seems to be ok, but BigQuery understands as an integer value and not a date field. What can I do in this case?

Schema的avro-日期字段:

Schema´s avro - Date field:

{
  "name": "date",
  "type": {
    "type": "long",
    "logicalType": "timestamp-millis"
  },
  "doc": "the date where the transaction happend"
}

大查询表:

我尝试使用下面的代码,但它只是忽略了它.你知道原因吗?

I tried using the code below but it simply ignores it. You know the reason?

import gcloud
from gcloud import storage
from google.cloud import bigquery

def insert_bigquery_avro(target_uri, dataset_id, table_id):
    bigquery_client = bigquery.Client()
    dataset_ref = bigquery_client.dataset(dataset_id)
    job_config = bigquery.LoadJobConfig()
    job_config.autodetect = True
    job_config.source_format = bigquery.SourceFormat.AVRO
    job_config.use_avro_logical_types = True
    time_partitioning = bigquery.table.TimePartitioning()
#    time_partitioning = bigquery.table.TimePartitioning(type_=bigquery.TimePartitioningType.DAY, field="date")
    job_config.time_partitioning = time_partitioning
    uri = target_uri
    load_job = bigquery_client.load_table_from_uri(
        uri,
        dataset_ref.table(table_id),
        job_config=job_config
        )
    print('Starting job {}'.format(load_job.job_id))
    load_job.result()
    print('Job finished.')

推荐答案

之所以这样做是因为BigQuery默认情况下会忽略logicalType属性,而是使用基础的Avro类型.例如,在BigQuery中,Avro timestamp-millis逻辑类型设置为Integer.

This is intended since BigQuery by default ignores the logicalType attributes and uses the underlying Avro type instead. The Avro timestamp-millis logical type, for instance, is set to Integer in BigQuery.

要启用转换,请使用命令行工具将--use_avro_logical_types设置为True,或在调用jobs.insert方法创建加载作业时在作业资源中设置useAvroLogicalTypes属性.之后,您的字段date将在BigQuery中设置为Timestamp类型.

To enable the conversion, set the --use_avro_logical_types to True using the command-line tool, or set the useAvroLogicalTypes property in the job resource when you call the jobs.insert method to create a load job. After this, your field date will be set as Timestamp type in BigQuery.

看看 Avro逻辑类型和BigQuery 文档,以查看所有被忽略的Avro逻辑类型,以及在设置该标志后如何转换它们.这也将帮助您为您的字段确定最佳的Avro逻辑类型.

Take a look at the Avro logical types and BigQuery doc to see all the ignored Avro logical types and how they'd be converted after setting that flag. This will also help you to decide the best Avro logical type for your fields.

希望这会有所帮助.

这篇关于模式avro在时间戳中,但在bigquery中以整数形式出现的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆