将BigQuery脚本的结果返回给Python客户端 [英] Returning Results of BigQuery Script to Python Client
问题描述
从2019年秋季开始,BigQuery支持
最后一个放下表格,不返回任何行:
这就是为什么如果我运行Python文件,会得到类似< google.cloud.bigquery.table._EmptyRowIterator对象的对象,位于0x7f440aa33c88>
.
我们想要的是中间查询的输出结果:
一项快速测试是注释掉 DROP
语句,然后遍历该行以获得 sum = 6676
的结果.那么,如果我们想要中间结果呢?就像前面引用的文档一样,答案是调用 jobs.list
并将脚本作业ID作为 parentJobId
参数传递,以获取子作业ID:
用于client.list_jobs(parent_job = query_job.job_id)中的工作:print(作业ID:{},语句类型:{}".format(job.job_id,job.statement_type))
我们使用 请注意,后缀(0,1,2)指示执行顺序,但是我们可以在检索结果之前添加仔细检查以确认作业实际上是 list_jobs
方法并检查 职位ID:script_job_80e ... 296_2,语句类型:DROP_TABLE作业ID:script_job_9a0 ... 7fd_1,语句类型:SELECT作业ID:script_job_113 ... e13_0,语句类型:CREATE_TABLE_AS_SELECT
SELECT
语句:
导入bigquery客户端= bigquery.Client()QUERY ="开始创建或替换温度表t0 AS选择名称,来自`bigquery-public-data.london_bicycles.cycle_stations`中的bikes_count,其中bikes_count>10;SELECT SUM(bikes_count)AS total_bikes FROM t0;如果存在则丢弃表t0;结尾;"query_job = client.query(QUERY)query_job.result()用于client.list_jobs(parent_job = query_job.job_id)中的工作:#列出所有子工作#print(作业ID:{},语句类型:{}".format(job.job_id,job.statement_type))如果job.statement_type =="SELECT":#仅打印所需的作业输出行= job.result()对于行中的行:print("sum = {}".format(row ["total_bikes"]))
输出:
sum = 6676
As of Fall 2019, BigQuery supports scripting, which is great. What I can't figure out is whether the Python client for BigQuery is capable of utilizing this new functionality yet.
For example, running the the following Python code:
client = bigquery.Client()
QUERY = """
BEGIN
CREATE OR REPLACE TEMP TABLE t0 AS
SELECT * FROM my_dataset.my_table WHERE foo < 1;
SELECT SUM(bar) AS bar_sum FROM t0;
DROP TABLE IF EXISTS t0;
END;
"""
query_job = client.query(QUERY)
rows = query_job.result()
... returns an google.cloud.bigquery.table._EmptyRowIterator
object even though I am able to see the statements in the SQL script have successfully run from BigQuery's web UI.
How do I return the results from SELECT statement in this standard SQL script to the Python client?
It is supported but you need to take into account the following piece of documentation:
Scripts are executed in BigQuery using jobs.insert, similar to any other query, with the multi-statement script specified as the query text. When a script executes, additional jobs, known as child jobs, are created for each statement in the script. You can enumerate the child jobs of a script by calling jobs.list, passing in the script’s job ID as the parentJobId parameter.
When jobs.getQueryResults is invoked on a script, it will return the query results for the last SELECT, DML, or DDL statement to execute in the script, with no query results if none of the above statements have executed. To obtain the results of all statements in the script, enumerate the child jobs and call jobs.getQueryResults on each of them.
As an example, I modified your script to query a public table: bigquery-public-data.london_bicycles.cycle_stations
. This runs three child jobs:
where the last one drops the table and does not return any row:
That's why, if I run the Python file, I get something like <google.cloud.bigquery.table._EmptyRowIterator object at 0x7f440aa33c88>
.
What we want is the output result of the middle query:
A quick test is to comment out the DROP
statement and then iterate over the row(s) to get the result of sum=6676
. So, what if we want the intermediate results? The answer, as in the previously cited docs, is to call jobs.list
and pass the script job ID as the parentJobId
parameter to get the child job IDs:
for job in client.list_jobs(parent_job=query_job.job_id):
print("Job ID: {}, Statement Type: {}".format(job.job_id, job.statement_type))
We use the list_jobs
method and check ID and statement type:
Job ID: script_job_80e...296_2, Statement Type: DROP_TABLE
Job ID: script_job_9a0...7fd_1, Statement Type: SELECT
Job ID: script_job_113...e13_0, Statement Type: CREATE_TABLE_AS_SELECT
Note that the suffix (0, 1, 2) indicates the execution order but we can add a double check to verify that the job is actually a SELECT
statement before retrieving the results:
from google.cloud import bigquery
client = bigquery.Client()
QUERY = """
BEGIN
CREATE OR REPLACE TEMP TABLE t0 AS
SELECT name, bikes_count FROM `bigquery-public-data.london_bicycles.cycle_stations` WHERE bikes_count > 10;
SELECT SUM(bikes_count) AS total_bikes FROM t0;
DROP TABLE IF EXISTS t0;
END;
"""
query_job = client.query(QUERY)
query_job.result()
for job in client.list_jobs(parent_job=query_job.job_id): # list all child jobs
# print("Job ID: {}, Statement Type: {}".format(job.job_id, job.statement_type))
if job.statement_type == "SELECT": # print the desired job output only
rows = job.result()
for row in rows:
print("sum={}".format(row["total_bikes"]))
output:
sum=6676
这篇关于将BigQuery脚本的结果返回给Python客户端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!