bq.py不分页结果 [英] bq.py Not Paging Results

查看:156
本文介绍了bq.py不分页结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在编写 bq.py 并且在结果集大于100k行时遇到一些问题。它似乎在过去这一直工作得很好(我们有相关的问题与 Google BigQuery不完整查询在奇数尝试上的回复)。也许我不理解在文档页面上解释的限制?



例如:
$ b

 #!/ bin / bash 

for i seq 99999 100002`;
do
bq query -q --nouse_cache --max_rows 99999999SELECT id,FROM [publicdata:samples.wikipedia] LIMIT $ i> $ i.txt
j = $(cat $ i.txt | wc -l)
echoLimit $ i Returned $ j Rows
done



产量(注意有4行格式):

 限制99999返回100003行
限制100000返回100004行
限制100001返回100004行
限制100002返回100004行

在我们的包装中,我们直接访问API:

  while row_count< total_rows:
data = client.apiclient.tabledata()。list(maxResults = total_rows - row_count,
pageToken = page_token,
** table_dict).execute()

#如果结果多于适合页面的数量,
#将为下一页收到一个令牌
page_token = data.get('pageToken',None)

#所有页面上有多少行?
total_rows = min(total_rows,int(data ['totalRows']))#更改为使用get(data [rows],0)
raw_page = data.get('rows',[])

我们希望在这种情况下得到一个令牌,但是没有返回。

解决方案

抱歉,我花了一段时间才找到你。

能够识别存在于服务器端的错误,您最终会看到Java客户端以及Python客户端的错误。我们计划在未来一周内推出修补程序。您的客户应该尽快开始行为。



顺便说一句,我不确定您是否已经知道这一点,但是有一个完整的独立Python客户端您也可以使用从python访问API。我认为这可能比作为bq.py一部分发布的客户端更方便一些。您可以在此页面上找到指向它的链接:
https://developers.google .com / bigquery / client-libraries


We're working on writing a wrapper for bq.py and are having some problems with result sets larger than 100k rows. It seems like in the past this has worked fine (we had related problems with Google BigQuery Incomplete Query Replies on Odd Attempts). Perhaps I'm not understanding the limits explained on the doc page?

For instance:

#!/bin/bash

for i in `seq 99999 100002`;
do
    bq query -q --nouse_cache --max_rows 99999999 "SELECT id, FROM [publicdata:samples.wikipedia] LIMIT $i" > $i.txt
    j=$(cat $i.txt | wc -l)
    echo "Limit $i Returned $j Rows"
done

Yields (note there are 4 lines of formatting):

Limit 99999 Returned   100003 Rows
Limit 100000 Returned   100004 Rows
Limit 100001 Returned   100004 Rows
Limit 100002 Returned   100004 Rows

In our wrapper, we directly access the API:

while row_count < total_rows:
    data = client.apiclient.tabledata().list(maxResults=total_rows - row_count,
                                                 pageToken=page_token,
                                                 **table_dict).execute()

    # If there are more results than will fit on a page, 
    # you will recieve a token for the next page
    page_token = data.get('pageToken', None)

    # How many rows are there across all pages?
    total_rows = min(total_rows, int(data['totalRows'])) # Changed to use get(data[rows],0)
    raw_page = data.get('rows', [])

We would expect to get a token in this case, but none is returned.

解决方案

sorry it took me a little while to get back to you.

I was able to identify a bug that exists server-side, you would end up seeing this with the Java client as well as the python client. We're planning on pushing a fix out this coming week. Your client should start to behave correctly as soon as that happens.

BTW, I'm not sure if you knew this already or not but there's a whole standalone python client that you can use to access the API from python as well. I thought that might be a bit more convenient for you than the client that's distributed as part of bq.py. You'll find a link to it on this page: https://developers.google.com/bigquery/client-libraries

这篇关于bq.py不分页结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆