来自外部REST API的AWS Glue作业消耗数据 [英] AWS Glue job consuming data from external REST API

查看：84 发布时间：2021/4/13 18:34:27 aws-glue aws-glue-data-catalog

本文介绍了来自外部REST API的AWS Glue作业消耗数据的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试创建一个工作流，其中AWS Glue ETL作业将从外部REST API而非S3或任何其他AWS内部源中提取JSON数据.那有可能吗?有人吗请帮忙！

I'm trying to create a workflow where AWS Glue ETL job will pull the JSON data from external REST API instead of S3 or any other AWS-internal sources. Is that even possible? Anyone does it? Please help!

推荐答案

是的，我确实从REST API(例如Twitter，FullStory，Elasticsearch等)中提取数据.通常，我确实使用Python Shell作业进行提取，因为它们是更快(冷启动相对较小).完成后，它会触发一个Spark类型的作业，该作业仅读取我需要的json项.我使用请求pyhton库.

Yes, I do extract data from REST API's like Twitter, FullStory, Elasticsearch, etc. Usually, I do use the Python Shell jobs for the extraction because they are faster (relatively small cold start). When is finished it triggers a Spark type job that reads only the json items I need. I use the requests pyhton library.

为了将数据保存到S3中，您可以执行以下操作

In order to save the data into S3 you can do something like this

import boto3
import json

# Initializes S3 client
s3 = boto3.resource('s3')

tweets = []
//Code that extracts tweets from API
tweets_json = json.dumps(tweets)
obj = s3.Object("my-tweets", "tweets.json")
obj.put(Body=data)

这篇关于来自外部REST API的AWS Glue作业消耗数据的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

来自外部REST API的AWS Glue作业消耗数据 [英] AWS Glue job consuming data from external REST API

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

来自外部REST API的AWS Glue作业消耗数据 [英] AWS Glue job consuming data from external REST API

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭