使用App Engine Standard中的BigQuery的最佳做法是什么? [英] What are best practices for using BigQuery from App Engine Standard?

查看:72
本文介绍了使用App Engine Standard中的BigQuery的最佳做法是什么?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用App Engine标准环境(自动缩放),这意味着在取消请求之前,我有10分钟的限制.

I am using App Engine Standard Environment (autoscaled), which means I have a limit of 10 mins before a request is cancelled.

目标是定期从BigQuery查询数据,并为每条记录在任务队列中创建一个任务,以便可以在后台处理记录.

Goal is to query data from BigQuery in regular intervals and, for each record, create a task in the task queue, so that records can be processed in the background.

> https://cloud.google.com/bigquery/create上的说明-simple-app-api 状态来等待这样的工作:

Instructions at https://cloud.google.com/bigquery/create-simple-app-api state to wait for a job like this:

// Create a job ID so that we can safely retry.
JobId jobId = JobId.of(UUID.randomUUID().toString());
Job queryJob = bigquery.create(JobInfo.newBuilder(queryConfig).setJobId(jobId).build());

// Wait for the query to complete.
queryJob = queryJob.waitFor();

问题是10分钟的限制,因为BigQuery查询是在后台处理的,结果可能要花一些时间,所以我可能无法在同一终结点调用中处理响应.

Problem is the 10-minute limit, as BigQuery queries are processed in the background and it may take some time until the result becomes available, so I may not be able to process the response in the same endpoint call.

  • 查询就绪后,是否可以通过URL从BigQuery接收回调?
  • 是否有更聪明的方法来处理App Engine Standard中来自BigQuery的数据?

我知道我可以配置App Engine来延长每个请求的最大时间,但这几乎不是解决方案.

I know I can configure App Engine to extend the maximum time per request, but that can hardly be the solution.

推荐答案

最好的方法是像BQ一样处理长时间运行的任务:提供一个工作ID并允许客户查询,返回一个

The best option is handling long-running tasks as BQ does: provide a job-id and allow clients to query it, returning a 202 while the query has not finished, and a 200 with the result once the result is ready to be consumed by the client.

此外,202可以返回正文,因此您可以为客户端设置不同的状态(例如,已排队",正在运行",正在处理结果",...).

Furthermore, the 202 can return a body, so you can set different status to the clients (e.g. "Queued", "Running", "Processing results", ...).

在服务器端,您开始查询,并且当BQ返回作业ID时,将其存储在一些持久性存储中(我会选择数据存储,但它可能是 memcache cloudSQL 实例,甚至是

On the server side, you start a query and, as soon as BQ returns a job ID, store it in some persistent storage (I would choose Datastore, but it could be memcache, a cloudSQL instance, or even a file in GCS).

然后,您只需要创建一个 cron作业它会检查BQ中未完成查询的状态,并相应地更新它们在持久性存储中的状态. BQ作业完成后,您可以检索结果并将其存储,以便在客户检查您的服务时准备好它们.

Then you just need to create a cron job that checks BQ for the status of the unfinished queries, and updates their status in your persistent storage accordingly. Once the BQ job is finished, you can retrieve the results and store them to have them ready when the client checks your service.

作为示例,这是您应在应用程序内执行的BQ API查询(此处通过curl进行了示例),您以后可以使用

As an example, this are the BQ API queries you should do within your app (in here done with curl to provide an example, you can later translate to any language using the idiomatic libraries):

  1. 创建作业,从响应中获取作业ID,并将其存储:

  1. Create the job, retrieve the job id from the response, and store it:

PROJECT=$(gcloud config get-value project)
QUERY='SELECT * FROM `bigquery-samples.wikipedia_benchmark.Wiki1k` limit 0'
curl -H"Authorization: Bearer $(gcloud auth print-access-token)" -H'content-type:application/json' https://www.googleapis.com/bigquery/v2/projects/$PROJECT/jobs -d"
{
 \"configuration\": {
  \"query\": {
   \"query\": \"$QUERY\",
   \"useLegacySql\": false
  }
 },
 \"jobReference\": {
  \"projectId\": \"$PROJECT\"
 }
}"|jq -r .jobReference.jobId >> running_jobs

  • 继续查询BQ API的作业状态. (这可能是您的Cron工作):

  • Keep querying the BQ API for the job status. (this could be your cron job):

    for job in $(cat running_jobs); do
      if [ $(curl -H"Authorization: Bearer $(gcloud auth print-access-token)" https://www.googleapis.com/bigquery/v2/projects/$PROJECT/jobs/$job|jq -r .status.state) = "DONE" ]; then
        # here your processing part including your callback
        # then remove the job from the list of running jobs
        sed -i "/$job/d" ./running_jobs
      fi
    done
    

  • 您可以在 查看全文

    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆