如何使用第三方API发出请求并定期将结果加载到Google BigQuery?我应该使用哪些Google服务? [英] How to make requests in third party APIs and load the results periodically on google BigQuery? What google services should I use?

查看:155
本文介绍了如何使用第三方API发出请求并定期将结果加载到Google BigQuery?我应该使用哪些Google服务?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要从第三方API获取数据并将其提取到Google BigQuery中。也许,我需要通过Google服务自动执行此过程以定期执行此操作。

I need to get the data from a third party API and ingest it in google BigQuery. Perhaps, I need to automate this process through google services to do it periodically.

我正在尝试使用Cloud Functions,但需要一个触发器。我还阅读了有关App Engine的信息,但我认为它不仅仅适用于发出拉取请求的一种功能。

I am trying to use Cloud Functions, but it needs a trigger. I have also read about App Engine, but I believe it is not suitable for only one function to make pull requests.

另一个疑问是:我需要将数据加载到云存储中还是可以直接将其加载到BigQuery?我应该使用Dataflow并进行任何配置吗?

Another doubt is: do I need to load the data into cloud storage or can I load it straight to BigQuery? Should I use Dataflow and make any configuration?

def upload_blob(bucket_name, request_url, destination_blob_name):
    """
    Uploads a file to the bucket.
    """
    storage_client = storage.Client()
    bucket = storage_client.get_bucket(bucket_name)
    blob = bucket.blob(destination_blob_name)

    request_json = requests.get(request_url['url'])

    print('File {} uploaded to {}.'.format(
        bucket_name,
        destination_blob_name))

def func_data(request_url):
    BUCKET_NAME = 'dataprep-staging'
    BLOB_NAME = 'any_name'
    BLOB_STR = '{"blob": "some json"}'

    upload_blob(BUCKET_NAME, request_url, BLOB_NAME)
    return f'Success!'

我希望提供有关创建该管道时应使用的体系结构(Google服务)的建议。例如,使用云函数(从API获取数据),然后使用服务 X计划作业以将数据输入到存储中,最后从存储中提取数据。

I expect advise about the architecture (google services) that I should use for creating this pipeline. For example, use cloud functions (get the data from API), then schedule a job using service 'X' to input data to storage and finally pull the data from storage.

推荐答案

您可以使用function。创建一个 http触发函数并调用它定期使用云调度程序

You can use function. Create an http triggered function and call it periodically with cloud scheduler.

关于存储,答案是否定的。如果API结果对于允许函数使用的内存来说不是太大,您可以在 / tmp 目录使用此文件将数据加载到bigquery 。如果需要,您可以将函数的大小调整为2go

About storage, answer is no. If the API result is not too large for function allowed memory, you can write in /tmp directory and load data to bigquery with this file. You can size your function up to 2go if needed

这篇关于如何使用第三方API发出请求并定期将结果加载到Google BigQuery?我应该使用哪些Google服务?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆