如何使用气流将 bigquery 导出到 bigtable?模式问题 [英] how to export bigquery to bigtable using airflow? schema issue
问题描述
我正在使用 Airflow 以 Avro 格式将 BigQuery 行提取到 Google Cloud Storage.
I'm using Airflow to extract BigQuery rows to Google Cloud Storage in Avro format.
with models.DAG(
"bigquery_to_bigtable",
default_args=default_args,
schedule_interval=None,
start_date=datetime.now(),
catchup=False,
tags=["test"],
) as dag:
data_to_gcs = BigQueryInsertJobOperator(
task_id="data_to_gcs",
project_id=project_id,
location=location,
configuration={
"extract": {
"destinationUri": gcs_uri, "destinationFormat": "AVRO",
"sourceTable": {
"projectId": project_id, "datasetId": dataset_id,
"tableId": table_id}}})
gcs_to_bt = DataflowTemplatedJobStartOperator(
task_id="gcs_to_bt",
template="gs://dataflow-templates/latest/GCS_Avro_to_Cloud_Bigtable",
location=location,
parameters={
'bigtableProjectId': project_id,
'bigtableInstanceId': bt_instance_id,
'bigtableTableId': bt_table_id,
'inputFilePattern': 'gs://export/test.avro-*'
},
)
data_to_gcs >> gcs_to_bt
bigquery 行包含
the bigquery row contains
row_key | 1_cnt | 2_cnt | 3_cnt
1#2021-08-03 | 1 | 2 | 2
2#2021-08-02 | 5 | 1 | 5
.
.
.
我想使用 row_key
列作为 bigtable 中的行键,其余列用于特定列族中的列,例如 bigtable 中的 my_cf
.
I'd like to use the row_key
column for row key in bigtable and rest column for columns in specific column family like my_cf
in bigtable.
但是我在使用数据流将 avro 文件加载到 bigtable 时收到错误消息
However I got error messages while using dataflow to loads avro file to bigtable
"java.io.IOException: Failed to start reading from source: gs://export/test.avro-"
Caused by: org.apache.avro.AvroTypeException: Found Root, expecting com.google.cloud.teleport.bigtable.BigtableRow, missing required field key
文档 我读到的是:
Bigtable 表必须存在并且具有与以下相同的列族导出到 Avro 文件中.
The Bigtable table must exist and have the same column families as exported in the Avro files.
如何在 Avro 中导出具有相同列族的 BigQuery?
How Do I export BigQuery in Avro with same column families?
推荐答案
我认为您必须将 AVRO 转换为正确的模式.提到了文档由你还说:
I think you have to transform AVRO to proper schema. Documentation mentioned by you also says:
- Bigtable 需要来自输入 Avro 文件的特定架构.
有一个链接 指的是必须使用的特殊数据模式.
There is a link that is referring to special data schema, which has to be used.
如果我理解正确,您只是从表中导入数据,结果虽然是 AVRO 架构,但需求架构并不多,因此您需要将数据转换为适合您的 BigTable 架构的正确架构.
If I understand correctly you are just importing data from table the result, although is AVRO schema, will not much the requirement schema, so you need to transform data to proper schema appropriate to your BigTable schema.
这篇关于如何使用气流将 bigquery 导出到 bigtable?模式问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!