如何安排从BigQuery表到Cloud Storage的导出? [英] How to schedule an export from a BigQuery table to Cloud Storage?

查看:172
本文介绍了如何安排从BigQuery表到Cloud Storage的导出?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经成功地在BigQuery中安排了查询,并将结果另存为表格在我的数据集中.我看到了许多有关安排数据在 in 中传输到BigQuery或Cloud Storage的信息,但是我还没有发现任何关于调度从 从BigQuery表到Cloud Storage的导出的信息.

I have successfully scheduled my query in BigQuery, and the result is saved as a table in my dataset. I see a lot of information about scheduling data transfer in to BigQuery or Cloud Storage, but I haven't found anything regarding scheduling an export from a BigQuery table to Cloud Storage yet.

是否可以安排将BigQuery表导出到Cloud Storage,以便我可以进一步安排通过Google BigQuery数据传输服务将其通过SFTP发送给我?

Is it possible to schedule an export of a BigQuery table to Cloud Storage so that I can further schedule having it SFTP-ed to me via Google BigQuery Data Transfer Services?

推荐答案

没有用于安排BigQuery表导出的托管服务,但是一种可行的方法是使用 Cloud Scheduler .

There isn't a managed service for scheduling BigQuery table exports, but one viable approach is to use Cloud Functions in conjunction with Cloud Scheduler.

Cloud Function将包含必要的代码,以便从BigQuery表导出到Cloud Storage.有多种编程语言可供选择,例如 Python Node.JS Go .

The Cloud Function would contain the necessary code to export to Cloud Storage from the BigQuery table. There are multiple programming languages to choose from for that, such as Python, Node.JS, and Go.

Cloud Scheduler会定期以 cron 格式向Cloud Function发送一个 HTTP 调用,该函数随后会被触发并以编程方式运行导出.

Cloud Scheduler would send an HTTP call periodically in a cron format to the Cloud Function which would in turn, get triggered and run the export programmatically.

作为示例,更具体地说,您可以按照以下步骤操作:

As an example and more specifically, you can follow these steps:

  1. 创建云功能使用具有 HTTP 触发器的Python.要从代码内与BigQuery进行交互,您需要使用BigQuery 客户端库 .用from google.cloud import bigquery导入.然后,您可以在 main.py 中使用以下代码来创建从BigQuery到Cloud Storage的导出作业:

  1. Create a Cloud Function using Python with an HTTP trigger. To interact with BigQuery from within the code you need to use the BigQuery client library. Import it with from google.cloud import bigquery. Then, you can use the following code in main.py to create an export job from BigQuery to Cloud Storage:

    # Imports the BigQuery client library
    from google.cloud import bigquery

    def hello_world(request):
        # Replace these values according to your project
        project_name = "YOUR_PROJECT_ID" 
        bucket_name = "YOUR_BUCKET" 
        dataset_name = "YOUR_DATASET" 
        table_name = "YOUR_TABLE" 
        destination_uri = "gs://{}/{}".format(bucket_name, "bq_export.csv.gz")

        bq_client = bigquery.Client(project=project_name)

        dataset = bq_client.dataset(dataset_name, project=project_name)
        table_to_export = dataset.table(table_name)

        job_config = bigquery.job.ExtractJobConfig()
        job_config.compression = bigquery.Compression.GZIP

        extract_job = bq_client.extract_table(
            table_to_export,
            destination_uri,
            # Location must match that of the source table.
            location="US",
            job_config=job_config,
        )  
        return "Job with ID {} started exporting data from {}.{} to {}".format(extract_job.job_id, dataset_name, table_name, destination_uri)

requirements.txt 文件中指定客户端库依赖项 通过添加以下行:

Specify the client library dependency in the requirements.txt file by adding this line:

google-cloud-bigquery

  • 创建Cloud Scheduler作业 .设置您想要的频率 要执行的作业.例如,将其设置为0 1 * * 0 将在每周日的上午1点每周运行一次该工作.这 crontab工具在进行实验时非常有用 与cron计划.

  • Create a Cloud Scheduler job. Set the Frequency you wish for the job to be executed with. For instance, setting it to 0 1 * * 0 would run the job once a week at 1 AM every Sunday morning. The crontab tool is pretty useful when it comes to experimenting with cron scheduling.

    选择 HTTP 作为 Target ,将 URL 设置为云 功能的网址(可以通过选择Cloud Function和 导航到触发器"标签),然后选择GET作为HTTP方法.

    Choose HTTP as the Target, set the URL as the Cloud Function's URL (it can be found by selecting the Cloud Function and navigating to the Trigger tab), and as HTTP method choose GET.

    创建后,通过按 RUN NOW 按钮,您可以测试如何导出 表现良好.但是,在执行此操作之前,请确保默认的App Engine服务帐户至少具有Cloud IAM roles/storage.objectCreator 角色,否则操作可能会因权限错误而失败.默认的App Engine服务帐户的格式为YOUR_PROJECT_ID@appspot.gserviceaccount.com.

    Once created, and by pressing the RUN NOW button, you can test how the export behaves. However, before doing so, make sure the default App Engine service account has at least the Cloud IAM roles/storage.objectCreator role, or otherwise the operation might fail with a permission error. The default App Engine service account has a form of YOUR_PROJECT_ID@appspot.gserviceaccount.com.

    如果您希望在不同的表上执行导出, 每次执行的数据集和存储桶,但实质上采用了相同的Cloud Function,则可以使用HTTP POST方法 而是配置一个包含上述参数作为数据的 Body , 将传递给Cloud Function-尽管这意味着 对其代码进行一些小的更改.

    If you wish to execute exports on different tables, datasets and buckets for each execution, but essentially employing the same Cloud Function, you can use the HTTP POST method instead, and configure a Body containing said parameters as data, which would be passed on to the Cloud Function - although, that would imply doing some small changes in its code.

    最后,在创建作业时,您可以使用Cloud Function返回的job IDbq CLI通过bq show -j <job_id>查看导出作业的状态.

    Lastly, when the job is created, you can use the Cloud Function's returned job ID and the bq CLI to view the status of the export job with bq show -j <job_id>.

    这篇关于如何安排从BigQuery表到Cloud Storage的导出?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆