如何在 Apache Airflow 中查询 Google Big Query 并将结果作为 Pandas Dataframe 返回? [英] How to query Google Big Query in Apache Airflow and return results as a Pandas Dataframe?

查看:37
本文介绍了如何在 Apache Airflow 中查询 Google Big Query 并将结果作为 Pandas Dataframe 返回?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将 bigquery 查询保存到自定义 Airflow 运算符中的数据帧.

I'm trying to save a bigquery query to a dataframe in a custom Airflow operator.

我尝试过使用airflow.contrib.hooks.bigquery_hook 和get_pandas_df 方法.任务卡在身份验证上,因为它希望我手动访问 url 进行身份验证.

I've tried using the airflow.contrib.hooks.bigquery_hook and the get_pandas_df method. The task get's stuck on authentication, as it wants me to manually visit a url to authenticate.

因此,我对身份验证进行了硬编码.这有效,但绝对不理想.

As a result, I'm hard coding in authentication. This works, but is definitely not ideal.

工作但不理想(凭证是硬编码的):

Working but not ideal (credentials are hard coded):

def execute(self, context):
        os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'my-file-location.json'
        client = bigquery.Client()

        job_config = bigquery.QueryJobConfig()

        df = client.query(
            self.query,
            location="US",
            job_config=job_config,).to_dataframe()

不工作:

def execute(self, context):
    bq  = BigQueryHook(bigquery_conn_id=self.gcp_conn_id, delegate_to=None,use_legacy_sql=True, location='US')
    df = bq.get_pandas_df(self.query)

此代码卡在身份验证中.这是日志:[2019-06-19 12:56:05,526] {logging_mixin.py:95} INFO - 请访问此 URL 以授权此应用程序.

This code get's stuck authenticating. Here is the log: [2019-06-19 12:56:05,526] {logging_mixin.py:95} INFO - Please visit this URL to authorize this application.

推荐答案

不知何故我无法获取BigQueryPandasConnector 工作.我最终得到的是使用 BigQueryHook 的凭据来创建一个普通的 bigquery.client.Client 使用 BigQuery 的官方 Python 客户端.

Somehow I can't get BigQueryPandasConnector working. What I eventually end up with is using the credentials from BigQueryHook to create a normal bigquery.client.Client using BigQuery's official Python client.

这是一个例子:

from google.cloud import bigquery

bq_hook = BigQueryHook(bigquery_conn_id=bigquery_conn_id, use_legacy_sql=False)
bq_client = bigquery.Client(project = bq_hook._get_field("project"), credentials = bq_hook._get_credentials())
df = bq_client.query(sql).to_dataframe()

这篇关于如何在 Apache Airflow 中查询 Google Big Query 并将结果作为 Pandas Dataframe 返回?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆