从BigQuery到Python DataFrame的实时数据 [英] Live data from BigQuery into a Python DataFrame
问题描述
我正在探索将BigQuery数据引入Python的方法,到目前为止,这是我的代码:
I am exploring ways to bring BigQuery data into Python, here is my code so far:
from google.cloud import bigquery
from pandas.io import gbq
client = bigquery.Client.from_service_account_json("path_to_my.json")
project_id = "my_project_name"
query_job = client.query("""
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
""")
results = query_job.result() # Waits for job to complete.
#for row in results:
# print("{}: {}".format(row.date, row.visits))
results_df = gbq.read_gbq(query_job,project_id=project_id)
注释掉的行: #for结果行:print("{}:{}".format(row.date,row.visits))
从我的查询返回正确的结果,但是它们不能以这种形式使用,下一步,我想将它们放入数据框,但是此代码返回错误 TypeError:类型为'QueryJob'的对象不可JSON序列化
.
The commented out lines: #for row in results:
print("{}: {}".format(row.date, row.visits))
return the correct results from my query, but they aren't usable in this form, as a next step I'd like to get them into a dataframe, but this code returns the error TypeError: Object of type 'QueryJob' is not JSON serializable
.
任何人都可以告诉我我的代码出了什么问题来生成此错误,或者建议使用更好的方法将BigQuery数据引入数据框吗?
Can anyone tell me what is wrong with my code to generate this error, or perhaps suggest a better way to bring in BigQuery data to a dataframe?
推荐答案
The method read_gbq
expects a str
as input and not a QueryJob
one.
尝试像这样运行它:
query = """
#standardSQL
SELECT date,
SUM(totals.visits) AS visits
FROM `projectname.dataset.ga_sessions_20*` AS t
WHERE parse_date('%y%m%d', _table_suffix) between
DATE_sub(current_date(), interval 3 day) and
DATE_sub(current_date(), interval 1 day)
GROUP BY date
"""
results_df = gbq.read_gbq(query, project_id=project_id, private_key='path_to_my.json')
这篇关于从BigQuery到Python DataFrame的实时数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!