气流:GoogleCloudStorageToBigQueryOperator 错误 [英] Airflow: GoogleCloudStorageToBigQueryOperator Error
问题描述
尝试通过气流 GoogleCloudStorageToBigQueryOperator 运算符将数据从谷歌云存储加载到 bigquery.
Trying to load data into bigquery from google cloud storage via airflow GoogleCloudStorageToBigQueryOperator operator.
低于错误.需要有关以下错误的建议.
Getting below error. need suggestion regarding the below error.
代码:
load_into_bq = GoogleCloudStorageToBigQueryOperator(
task_id=get_task_id("load_into_bq", flow_name),
bucket='bigquery-source-replication',
source_objects=flow_details["gcs_csv_filename"],
destination_project_dataset_table='project-141508.dwh_test.datalake_production_products_intermediate',
source_format="CSV",
create_disposition="CREATE_IF_NEEDED",
write_disposition="WRITE_APPEND",
autodetect=True,
google_cloud_storage_conn_id=GCP_CONN_ID,
bigquery_conn_id=BQ_CONN_ID,
dag=dag
)
日志:
[2021-06-10 09:56:54,522] {taskinstance.py:902} INFO - Executing <Task(GoogleCloudStorageToBigQueryOperator): flow_name_load_into_bq> on 2021-06-10T09:55:01.281248+00:00
[2021-06-10 09:56:54,599] {standard_task_runner.py:54} INFO - Started process 13009 to run task
[2021-06-10 09:56:54,854] {standard_task_runner.py:77} INFO - Running: ['airflow', 'run', 'mysql_to_gcs_data_dag', 'flow_name_load_into_bq', '2021-06-10T09:55:01.281248+00:00', '--job_id', '19338', '--pool', 'default_pool', '--raw', '-sd', 'DAGS_FOLDER/mysql_gcs_bq_poc_sourav.py', '--cfg_path', '/tmp/tmpypa_dgaw']
[2021-06-10 09:56:54,860] {standard_task_runner.py:78} INFO - Job 19338: Subtask flow_name_load_into_bq
[2021-06-10 09:56:56,025] {logging_mixin.py:112} INFO - Running <TaskInstance: mysql_to_gcs_data_dag.flow_name_load_into_bq 2021-06-10T09:55:01.281248+00:00 [running]> on host airflow-worker-567675b8f5-t58ns
[2021-06-10 09:56:56,834] {gcp_api_base_hook.py:145} INFO - Getting connection using `google.auth.default()` since no key file is defined for hook.
推荐答案
您的问题似乎与 GCP 连接有关,而不是与运营商本身有关.
Your issue seems to be with GCP connection and not with the operator itself.
有 3 种使用 GCP 进行身份验证的方法:
There are 3 ways to authenticate with GCP:
- Use Application Default Credentials
- Use a service account key file (JSON format) on disk - Keyfile Path.
- Use a service account key file (JSON format) from connection configuration - Keyfile JSON.
您收到此错误是因为您没有设置任何选项.
You are getting this error since you didn't set any of the options.
第一个 GoogleCloudStorageToBigQueryOperator
已弃用.您应该将 GCSToBigQueryOperator
导入为:
First GoogleCloudStorageToBigQueryOperator
is deprecated. You should import GCSToBigQueryOperator
as:
from airflow.providers.google.cloud.transfers.gcs_to_bigquery import GCSToBigQueryOperator
对于气流 >= 2.0.0:安装 Google provider:
For Airflow >= 2.0.0: Install Google provider:
pip install apache-airflow-providers-google
安装后,您可以按照 docs 并在上面列出的任何选项中设置连接.
Once installed you can follow the instructions on the docs and setup the connection in any of the options listed above.
对于气流<2.0.0:安装 Google backport provider:
pip install apache-airflow-backport-providers-google
安装后,您可以按照 docs 并在上面列出的任何选项中设置连接.
Once installed you can follow the instructions on the docs and setup the connection in any of the options listed above.
这篇关于气流:GoogleCloudStorageToBigQueryOperator 错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!