RDS到S3-数据转换AWS [英] RDS to S3 - Data Transformation AWS

查看：89 发布时间：2021/4/3 19:11:18 java amazon-web-services apache-spark amazon-emr data-transfer

本文介绍了RDS到S3-数据转换AWS的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的RDS postgres/oracle中有大约30个表(尚未确定是oracle还是postgres)实例.我想获取最近4个小时内已插入/更新的所有记录(可配置)，创建一个与每个表有关的csv文件，并将这些文件存储在S3中.我希望整个过程都是事务性的.如果从一个表中获取数据时出现任何错误，我不希望将与其他29个表相关的数据保留在S3中.数据不是很大，在4小时内，每个表中的记录数量应在100条以下，或者更少.

I have about 30 tables in my RDS postgres / oracle (haven't decided if it is oracle or postgres yet) instance. I want to fetch all the records that have been inserted / updated in the last 4 hours (configurable) , create a csv file pertaining to each table and store the files in S3. I want this whole process to be transactional. If there is any error in fetching data from one table , I don't want data pertinent to other 29 tables to be persisted in S3. The data isn't very large , it should be in the order of few 100 records or less in each table for the duration of 4 hours.

我正在考虑在EMR群集中执行火花作业，以从RDS提取数据，为每个表创建一个csv，然后在过程结束时将所有文件发布到S3.一旦将数据发布到S3，EMR群集将被破坏.cloudwatch触发器将每4小时调用一次lamda，这将启动一个新的EMR集群来执行此任务.

I am thinking of having a spark job in EMR cluster to fetch data from RDS , create a csv for each table and post all the files to S3 at the end of the process. The EMR cluster will be destroyed once data is posted to S3. A cloudwatch trigger will invoke a lamda every 4 hours which will spin up a new EMR cluster which performs this job.

是否有其他值得探索的替代方法?

Are there any alternate approaches worth exploring for this transformation?

RDS到S3-数据转换AWS [英] RDS to S3 - Data Transformation AWS

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录关闭

RDS到S3-数据转换AWS [英] RDS to S3 - Data Transformation AWS

问题描述

推荐答案

相关文章

Java开发最新文章

热门教程

热门工具

登录 关闭

登录关闭