BigQuery 异步查询作业 - fetch_results() 方法返回错误数量的值 [英] BigQuery async query job - the fetch_results() method returns wrong number of values
问题描述
我正在使用 BigQuery Client API 编写 Python 代码,并尝试使用异步查询代码(作为代码示例在任何地方编写),但在 fetch_data() 方法调用中失败.Python 错误提示:
I am writing Python code with the BigQuery Client API, and attempting to use the async query code (written everywhere as a code sample), and it is failing at the fetch_data() method call. Python errors out with the error:
ValueError: 解包的值太多
ValueError: too many values to unpack
因此,这 3 个返回值(rows、total_count、page_token)似乎是不正确的返回值数量.但是,除了仅显示这 3 个返回结果的大量代码示例之外,我找不到任何关于此方法应该返回什么的文档.
So, the 3 return values (rows, total_count, page_token) seem to be the incorrect number of return values. But, I cannot find any documentation about what this method is supposed to return -- besides the numerous code examples that only show these 3 return results.
这是一段代码,显示了我正在做的事情(不包括客户端"变量的初始化或导入的库,这些在我的代码中发生过).
Here is a snippet of code that shows what I'm doing (not including the initialization of the 'client' variable or the imported libraries, which happen earlier in my code).
#---> Set up and start the async query job
job_id = str(uuid.uuid4())
job = client.run_async_query(job_id, query)
job.destination = temp_tbl
job.write_disposition = 'WRITE_TRUNCATE'
job.begin()
print 'job started...'
#---> Monitor the job for completion
retry_count = 360
while retry_count > 0 and job.state != 'DONE':
print 'waiting for job to complete...'
retry_count -= 1
time.sleep(1)
job.reload()
if job.state == 'DONE':
print 'job DONE.'
page_token = None
total_count = None
rownum = 0
job_results = job.results()
while True:
# ---- Next line of code errors out...
rows, total_count, page_token = job_results.fetch_data( max_results=10, page_token=page_token )
for row in rows:
rownum += 1
print "Row number %d" % rownum
if page_token is None:
print 'end of batch.'
break
对于异步查询作业,我应该从 job_results.fetch_data(...) 方法调用中得到哪些具体的返回结果?
What are the specific return results I should expect from the job_results.fetch_data(...) method call on an async query job?
推荐答案
看来你是对的!代码不再返回这 3 个参数.
Looks like you are right! The code no longer return these 3 parameters.
正如您在此fetch_data 现在返回 HTTPIterator 类(我猜我之前没有意识到这一点,因为我有一个安装了旧版本 bigquery 客户端的 docker 镜像,它确实返回了 3 个值).
As you can see in this commit from the public repository, fetch_data now returns an instance of the HTTPIterator class (guess I didn't realize this before as I have a docker image with an older version of the bigquery client installed where it does return the 3 values).
我发现返回结果的唯一方法是执行以下操作:
The only way that I found to return the results was doing something like this:
iterator = job_results.fetch_data()
data = []
for page in iterator._page_iter(False):
data.extend([page.next() for i in range(page.num_items)])
请注意,现在我们不必再管理 pageTokens
,它在很大程度上已实现自动化.
Notice that now we don't have to manage pageTokens
anymore, it's been automated for the most part.
:
我刚刚意识到您可以通过以下方式获得结果:
I just realized you can get results by doing:
results = list(job_results.fetch_data())
不得不承认现在比以前容易多了!
Got to admit it's way easier now then it was before!
这篇关于BigQuery 异步查询作业 - fetch_results() 方法返回错误数量的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!