将BigQuery脚本的结果返回给Python客户端 [英] Returning Results of BigQuery Script to Python Client

查看:26
本文介绍了将BigQuery脚本的结果返回给Python客户端的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

从2019年秋季开始,BigQuery支持

最后一个放下表格,不返回任何行:

这就是为什么如果我运行Python文件,会得到类似< google.cloud.bigquery.table._EmptyRowIterator对象的对象,位于0x7f440aa33c88> .

我们想要的是中间查询输出结果:

一项快速测试是注释掉 DROP 语句,然后遍历该行以获得 sum = 6676 的结果.那么,如果我们想要中间结果呢?就像前面引用的文档一样,答案是调用 jobs.list 并将脚本作业ID作为 parentJobId 参数传递,以获取子作业ID:

 用于client.list_jobs(parent_job = query_job.job_id)中的工作:print(作业ID:{},语句类型:{}".format(job.job_id,job.statement_type)) 

我们使用 list_jobs 方法并检查 职位ID:script_job_80e ... 296_2,语句类型:DROP_TABLE作业ID:script_job_9a0 ... 7fd_1,语句类型:SELECT作业ID:script_job_113 ... e13_0,语句类型:CREATE_TABLE_AS_SELECT

请注意,后缀(0,1,2)指示执行顺序,但是我们可以在检索结果之前添加仔细检查以确认作业实际上是 SELECT 语句:

google.cloud中的

 导入bigquery客户端= bigquery.Client()QUERY ="开始创建或替换温度表t0 AS选择名称,来自`bigquery-public-data.london_bicycles.cycle_stations`中的bikes_count,其中bikes_count>10;SELECT SUM(bikes_count)AS total_bikes FROM t0;如果存在则丢弃表t0;结尾;"query_job = client.query(QUERY)query_job.result()用于client.list_jobs(parent_job = query_job.job_id)中的工作:#列出所有子工作#print(作业ID:{},语句类型:{}".format(job.job_id,job.statement_type))如果job.statement_type =="SELECT":#仅打印所需的作业输出行= job.result()对于行中的行:print("sum = {}".format(row ["total_bikes"])) 

输出:

  sum = 6676 

As of Fall 2019, BigQuery supports scripting, which is great. What I can't figure out is whether the Python client for BigQuery is capable of utilizing this new functionality yet.

For example, running the the following Python code:

client = bigquery.Client()
QUERY = """
BEGIN
    CREATE OR REPLACE TEMP TABLE t0 AS
        SELECT * FROM my_dataset.my_table WHERE foo < 1;

    SELECT SUM(bar) AS bar_sum FROM t0;

    DROP TABLE IF EXISTS t0;
END;
"""

query_job = client.query(QUERY)
rows = query_job.result()

... returns an google.cloud.bigquery.table._EmptyRowIterator object even though I am able to see the statements in the SQL script have successfully run from BigQuery's web UI.

How do I return the results from SELECT statement in this standard SQL script to the Python client?

解决方案

It is supported but you need to take into account the following piece of documentation:

Scripts are executed in BigQuery using jobs.insert, similar to any other query, with the multi-statement script specified as the query text. When a script executes, additional jobs, known as child jobs, are created for each statement in the script. You can enumerate the child jobs of a script by calling jobs.list, passing in the script’s job ID as the parentJobId parameter.

When jobs.getQueryResults is invoked on a script, it will return the query results for the last SELECT, DML, or DDL statement to execute in the script, with no query results if none of the above statements have executed. To obtain the results of all statements in the script, enumerate the child jobs and call jobs.getQueryResults on each of them.

As an example, I modified your script to query a public table: bigquery-public-data.london_bicycles.cycle_stations. This runs three child jobs:

where the last one drops the table and does not return any row:

That's why, if I run the Python file, I get something like <google.cloud.bigquery.table._EmptyRowIterator object at 0x7f440aa33c88>.

What we want is the output result of the middle query:

A quick test is to comment out the DROP statement and then iterate over the row(s) to get the result of sum=6676. So, what if we want the intermediate results? The answer, as in the previously cited docs, is to call jobs.list and pass the script job ID as the parentJobId parameter to get the child job IDs:

for job in client.list_jobs(parent_job=query_job.job_id):
    print("Job ID: {}, Statement Type: {}".format(job.job_id, job.statement_type))

We use the list_jobs method and check ID and statement type:

Job ID: script_job_80e...296_2, Statement Type: DROP_TABLE
Job ID: script_job_9a0...7fd_1, Statement Type: SELECT
Job ID: script_job_113...e13_0, Statement Type: CREATE_TABLE_AS_SELECT

Note that the suffix (0, 1, 2) indicates the execution order but we can add a double check to verify that the job is actually a SELECT statement before retrieving the results:

from google.cloud import bigquery

client = bigquery.Client()
QUERY = """
BEGIN
    CREATE OR REPLACE TEMP TABLE t0 AS
        SELECT name, bikes_count FROM `bigquery-public-data.london_bicycles.cycle_stations` WHERE bikes_count > 10;

    SELECT SUM(bikes_count) AS total_bikes FROM t0;

    DROP TABLE IF EXISTS t0;
END;
"""

query_job = client.query(QUERY)
query_job.result()

for job in client.list_jobs(parent_job=query_job.job_id):  # list all child jobs
    # print("Job ID: {}, Statement Type: {}".format(job.job_id, job.statement_type))
    if job.statement_type == "SELECT":  # print the desired job output only
        rows = job.result()
        for row in rows:
            print("sum={}".format(row["total_bikes"]))

output:

sum=6676

这篇关于将BigQuery脚本的结果返回给Python客户端的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆