如何使用 pageTokens 和适用于 Java 的 Google 客户端库请求分页的 BigQuery 查询结果? [英] How do I request paginated BigQuery query results using pageTokens with the Google Client lib for Java?

查看:12
本文介绍了如何使用 pageTokens 和适用于 Java 的 Google 客户端库请求分页的 BigQuery 查询结果?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想运行包含数千行总结果的 BigQuery 查询,但我只想一次检索 100 个结果的页面(使用 maxResultspageToken 参数).

I want to run BigQuery queries with thousands of rows of total results, but I only want to retrieve a page of 100 results at a time (using the maxResults and pageToken parameters).

BigQuery API 支持在 collection.list 方法上使用 pageToken 参数.但是,我正在运行异步查询并使用 getQueryResult 方法检索结果,它似乎不支持 pageToken 参数.是否可以将 pageTokens 与 getQueryResults 一起使用?

The BigQuery API supports the use of pageToken parameters on collection.list methods. However, I am running asynchronous queries and retrieving the results using the getQueryResult method, and it doesn't seem to support the pageToken parameter. Is it possible to use pageTokens with getQueryResults?

推荐答案

更新:有关于如何在此处翻阅列表结果.

我自己回答这个问题,因为一个开发者私下问我这个问题,我想在 Stack Overflow 上分享答案.

I am self-answering this question, because a developer asked me this privately and I want to share the answer on Stack Overflow.

当从 Tabledata.list 方法请求分页结果时可以使用 pageToken 参数.例如,当结果数据超过 100k 行或 10 MB 结果时,结果集会自动分页.您还可以通过显式设置 maxResults 参数来请求结果分页.每页结果都会返回一个pageToken参数,该参数可用于检索下一页结果.

The pageToken parameter is available to use when requesting paginated results from the Tabledata.list method. Result sets are paginated automatically when, for example, the result data is over 100k rows or 10 MB of results. You can also request result pagination by setting the maxResults parameter explicitly. Each page of results will return a pageToken parameter, which can then be used to retrieve the next page of results.

每个查询都会生成一个新的 BigQuery 表.如果您没有明确命名该表,它只会持续 24 小时.但是,即使是未命名的匿名"表也有标识符.无论哪种情况,在插入查询作业后,检索新创建的表的名称.然后使用 tabledata.list 方法(以及 maxResults/pageToken 参数的组合)以分页形式请求结果.循环并使用先前检索到的 pageToken 继续调用 tabledata.list,直到不再返回 pageTokens(意味着您已到达最后一页.

Every query results in a new BigQuery table. If you don't name the table explicitly, it only lasts for 24 hours. However, even unnamed "anonymous" tables have an identifier. In either case, after inserting a query job, retrieve the name of newly created table. Then use the tabledata.list method (and a combination of the maxResults/pageToken parameters) to request results in paginated form. Loop and continue to call tabledata.list using the previously retrieved pageToken until the pageTokens are no longer is returned (meaning that you have reached the last page.

使用适用于 Java 的 Google API 客户端库,用于插入查询作业、轮询查询完成以及检索一页又一页查询结果的代码可能如下所示:

Using the Google API Client library for Java, the code for inserting a query job, polling for query completion, and then retrieving page after page of query results might look something like this:

// Create a new BigQuery client authorized via OAuth 2.0 protocol
// See: https://developers.google.com/bigquery/docs/authorization#installed-applications
Bigquery bigquery = createAuthorizedClient();

// Start a Query Job
String querySql = "SELECT TOP(word, 500), COUNT(*) FROM publicdata:samples.shakespeare";
JobReference jobId = startQuery(bigquery, PROJECT_ID, querySql);

// Poll for Query Results, return result output
TableReference completedJob = checkQueryResults(bigquery, PROJECT_ID, jobId);

// Return and display the results of the Query Job
displayQueryResults(bigquery, completedJob);

/**
 * Inserts a Query Job for a particular query
 */
public static JobReference startQuery(Bigquery bigquery, String projectId,
                                      String querySql) throws IOException {
  System.out.format("
Inserting Query Job: %s
", querySql);

  Job job = new Job();
  JobConfiguration config = new JobConfiguration();
  JobConfigurationQuery queryConfig = new JobConfigurationQuery();
  config.setQuery(queryConfig);

  job.setConfiguration(config);
  queryConfig.setQuery(querySql);

  Insert insert = bigquery.jobs().insert(projectId, job);
  insert.setProjectId(projectId);
  JobReference jobId = insert.execute().getJobReference();

  System.out.format("
Job ID of Query Job is: %s
", jobId.getJobId());

  return jobId;
}

/**
 * Polls the status of a BigQuery job, returns TableReference to results if "DONE"
 */
private static TableReference checkQueryResults(Bigquery bigquery, String projectId, JobReference jobId)
    throws IOException, InterruptedException {
  // Variables to keep track of total query time
  long startTime = System.currentTimeMillis();
  long elapsedTime;

  while (true) {
    Job pollJob = bigquery.jobs().get(projectId, jobId.getJobId()).execute();
    elapsedTime = System.currentTimeMillis() - startTime;
    System.out.format("Job status (%dms) %s: %s
", elapsedTime,
        jobId.getJobId(), pollJob.getStatus().getState());
    if (pollJob.getStatus().getState().equals("DONE")) {
      return pollJob.getConfiguration().getQuery().getDestinationTable();
    }
    // Pause execution for one second before polling job status again, to
    // reduce unnecessary calls to the BigQUery API and lower overall
    // application bandwidth.
    Thread.sleep(1000);
  }
}

/**
 * Page through the result set
 */
private static void displayQueryResults(Bigquery bigquery,
                                        TableReference completedJob) throws IOException {

    long maxResults = 20;
    String pageToken = null;
    int page = 1;

  // Default to not looping
    boolean moreResults = false;

    do {
    TableDataList queryResult = bigquery.tabledata().list(
            completedJob.getProjectId(),
            completedJob.getDatasetId(),
            completedJob.getTableId())
                .setMaxResults(maxResults)
                .setPageToken(pageToken)
         .execute();
    List<TableRow> rows = queryResult.getRows();
    System.out.print("
Query Results, Page #" + page + ":
------------
");
    for (TableRow row : rows) {
      for (TableCell field : row.getF()) {
      System.out.printf("%-50s", field.getV());
       }
      System.out.println();
    }
    if (queryResult.getPageToken() != null) {
      pageToken = queryResult.getPageToken();
      moreResults = true;
      page++;
    } else {
      moreResults = false;
    }
  } while (moreResults);
}

这篇关于如何使用 pageTokens 和适用于 Java 的 Google 客户端库请求分页的 BigQuery 查询结果?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆