如何使用pageTokens和Google Client lib for Java请求分页的BigQuery查询结果? [英] How do I request paginated BigQuery query results using pageTokens with the Google Client lib for Java?

查看:114
本文介绍了如何使用pageTokens和Google Client lib for Java请求分页的BigQuery查询结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想运行带有数千行总结果的BigQuery查询,但我只想一次检索100个结果页(使用 maxResults pageToken 参数)。
$ b BigQuery API支持使用 pageToken 参数在 collection.list 方法上。但是,我正在运行异步查询并使用 getQueryResult 方法检索结果,并且它似乎不支持 pageToken 参数。是否有可能通过 getQueryResults

使用 pageToken s

解决方案

更新:有关于如何

我自我回答这个问题,因为开发者私下问我这个问题,我想分享Stack Overflow的答案。



当请求Tabledata.list方法的分页结果时,可以使用pageToken参数。例如,当结果数据超过10万行或10 MB的结果时,结果集会自动分页。您也可以通过显式设置maxResults参数来请求结果分页。每一页结果都会返回一个pageToken参数,然后可以用它来检索下一页结果。



每个查询都会生成一个新的BigQuery表。如果您没有明确指定该表格,则只会持续24小时。但是,即使未命名的匿名表也有一个标识符。在任何一种情况下,在插入查询作业后,检索新创建的表的名称。然后使用tabledata.list方法(和maxResults / pageToken参数的组合)以分页形式请求结果。循环并继续使用先前检索的pageToken调用tabledata.list,直到不再返回pageTokens(意味着您已到达最后一页)。

使用Google API用于Java的客户端库,用于插入查询作业的代码,用于查询完成的轮询以及在查询结果的页面之后检索页面可能如下所示:

  //创建一个新的通过OAuth 2.0协议授权的BigQuery客户端
//请参阅:https://developers.google.com/bigquery/docs/authorization#installed-applications
Bigquery bigquery = createAuthorizedClient();

//开始一个查询作业
字符串querySql =SELECT TOP(word,500),COUNT(*)FROM publicdata:samples.shakespeare;
JobReference jobId = startQuery(bigquery,PROJECT_ID,querySql);

//轮询查询结果,返回结果输出
TableReference completedJob = checkQueryResults(bigquery,PROJECT_ID,jobId);

//返回并显示t查询作业的结果
displayQueryResults(bigquery,completedJob);
$ b $ **
*为特定查询插入查询作业
* /
public static JobReference startQuery(Bigquery bigquery,String projectId,
String querySql)throws IOException {
System.out.format(\\\
Inserting Query Job:%s\\\
,querySql);

Job job = new Job();
JobConfiguration config = new JobConfiguration();
JobConfigurationQuery queryConfig = new JobConfigurationQuery();
config.setQuery(queryConfig);

job.setConfiguration(config);
queryConfig.setQuery(querySql);

插入insert = bigquery.jobs()。insert(projectId,job);
insert.setProjectId(projectId);
JobReference jobId = insert.execute()。getJobReference();

System.out.format(\\\
Job查询作业的ID是:%s\\\
,jobId.getJobId());

return jobId;

$ b $ **
*轮询BigQuery作业的状态,如果完成则返回TableReference结果
* /
private static TableReference checkQueryResults(Bigquery bigquery,String projectId,JobReference jobId)
抛出IOException,InterruptedException {
//用于跟踪总查询时间的变量
long startTime = System.currentTimeMillis();
long elapsedTime; $()
$ b while(true){
Job pollJob = bigquery.jobs()。get(projectId,jobId.getJobId())。execute();
elapsedTime = System.currentTimeMillis() - startTime;
System.out.format(Job status(%dms)%s:%s\,elapsedTime,
jobId.getJobId(),pollJob.getStatus()。getState());
if(pollJob.getStatus()。getState()。equals(DONE)){
return pollJob.getConfiguration()。getQuery()。getDestinationTable();
}
//在执行轮询作业状态之前暂停执行一秒钟,直到
//减少对BigQUery API的不必要调用并降低总体
//应用程序带宽。
Thread.sleep(1000);
}
}

/ **
*通过结果集的页面
* /
private static void displayQueryResults(Bigquery bigquery,
TableReference completedJob)throws IOException {

long maxResults = 20;
String pageToken = null;
int page = 1;

//默认为不循环
boolean moreResults = false;

$ {
TableDataList queryResult = bigquery.tabledata()。list(
completedJob.getProjectId(),
completedJob.getDatasetId(),
completedJob .getTableId())
.setMaxResults(maxResults)
.setPageToken(pageToken)
.execute();
列表< TableRow> rows = queryResult.getRows();
System.out.print(\\\
Query Results,Page#+ page +:\\\
------------ \\\
); (TableRow row:rows){
for(TableCell field:row.getF()){
System.out.printf(% - 50s,field.getV()) ;
}
System.out.println();

if(queryResult.getPageToken()!= null){
pageToken = queryResult.getPageToken();
moreResults = true;
page ++;
} else {
moreResults = false;
}
} while(moreResults);
}


I want to run BigQuery queries with thousands of rows of total results, but I only want to retrieve a page of 100 results at a time (using the maxResults and pageToken parameters).

The BigQuery API supports the use of pageToken parameters on collection.list methods. However, I am running asynchronous queries and retrieving the results using the getQueryResult method, and it doesn't seem to support the pageToken parameter. Is it possible to use pageTokens with getQueryResults?

解决方案

Update: There's new documentation about how to page through list results here.

I am self-answering this question, because a developer asked me this privately and I want to share the answer on Stack Overflow.

The pageToken parameter is available to use when requesting paginated results from the Tabledata.list method. Result sets are paginated automatically when, for example, the result data is over 100k rows or 10 MB of results. You can also request result pagination by setting the maxResults parameter explicitly. Each page of results will return a pageToken parameter, which can then be used to retrieve the next page of results.

Every query results in a new BigQuery table. If you don't name the table explicitly, it only lasts for 24 hours. However, even unnamed "anonymous" tables have an identifier. In either case, after inserting a query job, retrieve the name of newly created table. Then use the tabledata.list method (and a combination of the maxResults/pageToken parameters) to request results in paginated form. Loop and continue to call tabledata.list using the previously retrieved pageToken until the pageTokens are no longer is returned (meaning that you have reached the last page.

Using the Google API Client library for Java, the code for inserting a query job, polling for query completion, and then retrieving page after page of query results might look something like this:

// Create a new BigQuery client authorized via OAuth 2.0 protocol
// See: https://developers.google.com/bigquery/docs/authorization#installed-applications
Bigquery bigquery = createAuthorizedClient();

// Start a Query Job
String querySql = "SELECT TOP(word, 500), COUNT(*) FROM publicdata:samples.shakespeare";
JobReference jobId = startQuery(bigquery, PROJECT_ID, querySql);

// Poll for Query Results, return result output
TableReference completedJob = checkQueryResults(bigquery, PROJECT_ID, jobId);

// Return and display the results of the Query Job
displayQueryResults(bigquery, completedJob);

/**
 * Inserts a Query Job for a particular query
 */
public static JobReference startQuery(Bigquery bigquery, String projectId,
                                      String querySql) throws IOException {
  System.out.format("\nInserting Query Job: %s\n", querySql);

  Job job = new Job();
  JobConfiguration config = new JobConfiguration();
  JobConfigurationQuery queryConfig = new JobConfigurationQuery();
  config.setQuery(queryConfig);

  job.setConfiguration(config);
  queryConfig.setQuery(querySql);

  Insert insert = bigquery.jobs().insert(projectId, job);
  insert.setProjectId(projectId);
  JobReference jobId = insert.execute().getJobReference();

  System.out.format("\nJob ID of Query Job is: %s\n", jobId.getJobId());

  return jobId;
}

/**
 * Polls the status of a BigQuery job, returns TableReference to results if "DONE"
 */
private static TableReference checkQueryResults(Bigquery bigquery, String projectId, JobReference jobId)
    throws IOException, InterruptedException {
  // Variables to keep track of total query time
  long startTime = System.currentTimeMillis();
  long elapsedTime;

  while (true) {
    Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId()).execute();
    elapsedTime = System.currentTimeMillis() - startTime;
    System.out.format("Job status (%dms) %s: %s\n", elapsedTime,
        jobId.getJobId(), pollJob.getStatus().getState());
    if (pollJob.getStatus().getState().equals("DONE")) {
      return pollJob.getConfiguration().getQuery().getDestinationTable();
    }
    // Pause execution for one second before polling job status again, to
    // reduce unnecessary calls to the BigQUery API and lower overall
    // application bandwidth.
    Thread.sleep(1000);
  }
}

/**
 * Page through the result set
 */
private static void displayQueryResults(Bigquery bigquery,
                                        TableReference completedJob) throws IOException {

    long maxResults = 20;
    String pageToken = null;
    int page = 1;

  // Default to not looping
    boolean moreResults = false;

    do {
    TableDataList queryResult = bigquery.tabledata().list(
            completedJob.getProjectId(),
            completedJob.getDatasetId(),
            completedJob.getTableId())
                .setMaxResults(maxResults)
                .setPageToken(pageToken)
         .execute();
    List<TableRow> rows = queryResult.getRows();
    System.out.print("\nQuery Results, Page #" + page + ":\n------------\n");
    for (TableRow row : rows) {
      for (TableCell field : row.getF()) {
      System.out.printf("%-50s", field.getV());
       }
      System.out.println();
    }
    if (queryResult.getPageToken() != null) {
      pageToken = queryResult.getPageToken();
      moreResults = true;
      page++;
    } else {
      moreResults = false;
    }
  } while (moreResults);
}

这篇关于如何使用pageTokens和Google Client lib for Java请求分页的BigQuery查询结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆