如何使用气流将 bigquery 导出到 bigtable?模式问题 [英] how to export bigquery to bigtable using airflow? schema issue

查看:42
本文介绍了如何使用气流将 bigquery 导出到 bigtable?模式问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用 Airflow 以 Avro 格式将 BigQuery 行提取到 Google Cloud Storage.

I'm using Airflow to extract BigQuery rows to Google Cloud Storage in Avro format.

with models.DAG(
    "bigquery_to_bigtable",
    default_args=default_args,
    schedule_interval=None,
    start_date=datetime.now(),
    catchup=False,
    tags=["test"],
) as dag:
    
    data_to_gcs = BigQueryInsertJobOperator(
        task_id="data_to_gcs",
        project_id=project_id,
        location=location,
        configuration={
            "extract": {
                "destinationUri": gcs_uri, "destinationFormat": "AVRO",
                "sourceTable": {
                    "projectId": project_id, "datasetId": dataset_id, 
                    "tableId": table_id}}})

    gcs_to_bt = DataflowTemplatedJobStartOperator(
        task_id="gcs_to_bt",
        template="gs://dataflow-templates/latest/GCS_Avro_to_Cloud_Bigtable",
        location=location,
        parameters={
            'bigtableProjectId': project_id,
            'bigtableInstanceId': bt_instance_id,
            'bigtableTableId': bt_table_id,
            'inputFilePattern': 'gs://export/test.avro-*'
        },
    )

data_to_gcs >> gcs_to_bt

bigquery 行包含

the bigquery row contains

row_key      | 1_cnt | 2_cnt | 3_cnt
1#2021-08-03 |   1   |   2   |   2 
2#2021-08-02 |   5   |   1   |   5 
.
.
.

我想使用 row_key 列作为 bigtable 中的行键,其余列用于特定列族中的列,例如 bigtable 中的 my_cf.

I'd like to use the row_key column for row key in bigtable and rest column for columns in specific column family like my_cf in bigtable.

但是我在使用数据流将 avro 文件加载到 bigtable 时收到错误消息

However I got error messages while using dataflow to loads avro file to bigtable

"java.io.IOException: Failed to start reading from source: gs://export/test.avro-"
Caused by: org.apache.avro.AvroTypeException: Found Root, expecting com.google.cloud.teleport.bigtable.BigtableRow, missing required field key

文档 我读到的是:

Bigtable 表必须存在并且具有与以下相同的列族导出到 Avro 文件中.

The Bigtable table must exist and have the same column families as exported in the Avro files.

如何在 Avro 中导出具有相同列族的 BigQuery?

How Do I export BigQuery in Avro with same column families?

推荐答案

我认为您必须将 AVRO 转换为正确的模式.提到了文档由你还说:

I think you have to transform AVRO to proper schema. Documentation mentioned by you also says:

  • Bigtable 需要来自输入 Avro 文件的特定架构.

有一个链接 指的是必须使用的特殊数据模式.

There is a link that is referring to special data schema, which has to be used.

如果我理解正确,您只是从表中导入数据,结果虽然是 AVRO 架构,但需求架构并不多,因此您需要将数据转换为适合您的 BigTable 架构的正确架构.

If I understand correctly you are just importing data from table the result, although is AVRO schema, will not much the requirement schema, so you need to transform data to proper schema appropriate to your BigTable schema.

这篇关于如何使用气流将 bigquery 导出到 bigtable?模式问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆