气流:GoogleCloudStorageToBigQueryOperator 错误 [英] Airflow: GoogleCloudStorageToBigQueryOperator Error

查看:38
本文介绍了气流:GoogleCloudStorageToBigQueryOperator 错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

尝试通过气流 GoogleCloudStorageToBigQueryOperator 运算符将数据从谷歌云存储加载到 bigquery.

Trying to load data into bigquery from google cloud storage via airflow GoogleCloudStorageToBigQueryOperator operator.

低于错误.需要有关以下错误的建议.

Getting below error. need suggestion regarding the below error.

代码:

load_into_bq = GoogleCloudStorageToBigQueryOperator(
            task_id=get_task_id("load_into_bq", flow_name),
            bucket='bigquery-source-replication',
            source_objects=flow_details["gcs_csv_filename"],
            destination_project_dataset_table='project-141508.dwh_test.datalake_production_products_intermediate',
            source_format="CSV",
            create_disposition="CREATE_IF_NEEDED",
            write_disposition="WRITE_APPEND",
            autodetect=True,
            google_cloud_storage_conn_id=GCP_CONN_ID,
            bigquery_conn_id=BQ_CONN_ID,
            dag=dag
        )

日志:

[2021-06-10 09:56:54,522] {taskinstance.py:902} INFO - Executing <Task(GoogleCloudStorageToBigQueryOperator): flow_name_load_into_bq> on 2021-06-10T09:55:01.281248+00:00
[2021-06-10 09:56:54,599] {standard_task_runner.py:54} INFO - Started process 13009 to run task
[2021-06-10 09:56:54,854] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'mysql_to_gcs_data_dag', 'flow_name_load_into_bq', '2021-06-10T09:55:01.281248+00:00', '--job_id', '19338', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/mysql_gcs_bq_poc_sourav.py', '--cfg_path', '/tmp/tmpypa_dgaw']
[2021-06-10 09:56:54,860] {standard_task_runner.py:78} INFO - Job 19338: Subtask flow_name_load_into_bq
[2021-06-10 09:56:56,025] {logging_mixin.py:112} INFO - Running <TaskInstance: mysql_to_gcs_data_dag.flow_name_load_into_bq 2021-06-10T09:55:01.281248+00:00 [running]> on host airflow-worker-567675b8f5-t58ns
[2021-06-10 09:56:56,834] {gcp_api_base_hook.py:145} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.

推荐答案

您的问题似乎与 GCP 连接有关,而不是与运营商本身有关.

Your issue seems to be with GCP connection and not with the operator itself.

有 3 种使用 GCP 进行身份验证的方法:

There are 3 ways to authenticate with GCP:

  1. 使用应用程序默认凭据
  2. 使用磁盘上的服务帐户密钥文件(JSON 格式) - 密钥文件路径.
  3. 使用连接配置中的服务帐号密钥文件(JSON 格式) - 密钥文件 JSON.
  1. Use Application Default Credentials
  2. Use a service account key file (JSON format) on disk - Keyfile Path.
  3. Use a service account key file (JSON format) from connection configuration - Keyfile JSON.

您收到此错误是因为您没有设置任何选项.

You are getting this error since you didn't set any of the options.

第一个 GoogleCloudStorageToBigQueryOperator 已弃用.您应该将 GCSToBigQueryOperator 导入为:

First GoogleCloudStorageToBigQueryOperator is deprecated. You should import GCSToBigQueryOperator as:

from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator

对于气流 >= 2.0.0:安装 Google provider:

For Airflow >= 2.0.0: Install Google provider:

pip install apache-airflow-providers-google

安装后,您可以按照 docs 并在上面列出的任何选项中设置连接.

Once installed you can follow the instructions on the docs and setup the connection in any of the options listed above.

对于气流<2.0.0:安装 Google backport provider:

pip install apache-airflow-backport-providers-google

安装后,您可以按照 docs 并在上面列出的任何选项中设置连接.

Once installed you can follow the instructions on the docs and setup the connection in any of the options listed above.

这篇关于气流:GoogleCloudStorageToBigQueryOperator 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆