AWS中最简单的工具即可实现非常简单的ETL(转换)? [英] Simplest tool in AWS for very simple (transform in) ETL?

查看:424
本文介绍了AWS中最简单的工具即可实现非常简单的ETL(转换)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

S3中有大量文件,总计数十GB。我们需要将它们转换为CSV格式,当前文件中的分隔符不是逗号。通常,我会在使用sed的服务器上执行此操作,但我不想将文件传输到服务器,我想直接从S3中读取,逐行转换为CSV,然后将结果写回到新的S3文件中

We have numerous files in S3 totally tens of gigabytes. We need to get them into CSV format, currently the files have delimiters that are not commas. Normally I would do this on a server using sed but I don't want to have to transfer the files to a server, I want to read directly from S3, translate to CSV line by line, and write the results back to new S3 files.

胶水似乎能够做到这一点,但我觉得这样简单的任务所需要的学习曲线和设置实在太大了。

Glue appears to be able to do this but I sense the learning curve and setup for such a simple task is overkill.

是否没有简单的方法可以执行诸如EMR或其他一些AWS工具之类的简单任务?我们使用的是Athena,我想知道是否可以在使用Athena的SQL语句中完成?谢谢

Is there not some easy way to do easy tasks such as this, maybe in EMR or some other AWS tool? We use Athena and I'm wonder if this could be done in an SQL statement using Athena? Thanks

推荐答案

是的,这应该非常容易,并且您不需要任何外部ETL工具或胶水。
假设您有一个名为 cust_transaction_pipe的管道定界表,该表基于管道定界文件,您可以使用Athena查询该表而不会出现任何问题。要将表转换为逗号分隔,只需使用以下查询:

Yes that should be very easy and you dont need any external ETL tool or glue. Suppose you have a pipe delimited table named "cust_transaction_pipe" which is based on a pipe delimited file and you can query the table using Athena without any issues. To convert that table to comma delimited, just use the query below:

create table cust_transaction_csv 
with (external_location = 's3://YOUR_S3_BUCKET_NAME/cust_tx_csv/',format='TEXTFILE',field_delimiter = ',')
as 
select * from cust_transaction_pipe

完成后,您可以检查指定的位置。将以逗号分隔。您可以在WITH()中指定许多其他选项。有关完整的选项集,请参阅Athena AWS文档链接这里

Once its complete, you can check the location you specified. There will be file comma delimited. You can specify lots of other options inside the WITH (). For complete set of options, please see the Athena AWS Documentation link here.

这篇关于AWS中最简单的工具即可实现非常简单的ETL(转换)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆