从BigQuery到Python DataFrame的实时数据 [英] Live data from BigQuery into a Python DataFrame

查看:47
本文介绍了从BigQuery到Python DataFrame的实时数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在探索将BigQuery数据引入Python的方法,到目前为止,这是我的代码:

I am exploring ways to bring BigQuery data into Python, here is my code so far:

from google.cloud import bigquery
from pandas.io import gbq

client = bigquery.Client.from_service_account_json("path_to_my.json")

project_id = "my_project_name"

query_job = client.query("""
    #standardSQL
    SELECT date,
    SUM(totals.visits) AS visits
    FROM `projectname.dataset.ga_sessions_20*` AS t
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 3 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY date
    """)

results = query_job.result()  # Waits for job to complete.

#for row in results:
#  print("{}: {}".format(row.date, row.visits))

results_df = gbq.read_gbq(query_job,project_id=project_id)

注释掉的行: #for结果行:print("{}:{}".format(row.date,row.visits))从我的查询返回正确的结果,但是它们不能以这种形式使用,下一步,我想将它们放入数据框,但是此代码返回错误 TypeError:类型为'QueryJob'的对象不可JSON序列化.

The commented out lines: #for row in results: print("{}: {}".format(row.date, row.visits)) return the correct results from my query, but they aren't usable in this form, as a next step I'd like to get them into a dataframe, but this code returns the error TypeError: Object of type 'QueryJob' is not JSON serializable.

任何人都可以告诉我我的代码出了什么问题来生成此错误,或者建议使用更好的方法将BigQuery数据引入数据框吗?

Can anyone tell me what is wrong with my code to generate this error, or perhaps suggest a better way to bring in BigQuery data to a dataframe?

推荐答案

方法

The method read_gbq expects a str as input and not a QueryJob one.

尝试像这样运行它:

query = """
    #standardSQL
    SELECT date,
    SUM(totals.visits) AS visits
    FROM `projectname.dataset.ga_sessions_20*` AS t
    WHERE parse_date('%y%m%d', _table_suffix) between 
    DATE_sub(current_date(), interval 3 day) and
    DATE_sub(current_date(), interval 1 day)
    GROUP BY date
"""

results_df = gbq.read_gbq(query, project_id=project_id, private_key='path_to_my.json')

这篇关于从BigQuery到Python DataFrame的实时数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆