自动在S3中提取.gz文件 [英] Extract .gz files in S3 automatically

查看:155
本文介绍了自动在S3中提取.gz文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试寻找一种解决方案,以将.gz格式的ALB日志文件从ALB自动上传到S3.

我的存储桶结构是这样的

/log-bucket
..alb-1/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-2/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-3/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz

基本上,每5分钟,每个ALB都会自动将日志推送到对应的S3存储桶.我当时想在同一个存储桶中提取新的.gz文件.

有什么办法可以解决这个问题?

我注意到我们可以使用Lambda函数,但不确定从哪里开始.示例代码将不胜感激!

解决方案

您最好的选择可能是让AWS Lambda函数订阅S3事件.每当创建新对象时,都会触发此Lambda函数.然后,Lambda函数可以从S3读取文件,提取文件,将提取的数据写回到S3并删除原始文件.

将AWS Lambda与Amazon结合使用S3 .

也就是说,如果您确实需要在S3中存储未压缩的日志,则可能还需要重新考虑.压缩文件不仅便宜,因为它们不占用未压缩文件的存储空间,而且通常处理起来也更快,因为在大多数情况下,瓶颈在于传输数据的网络带宽和不可用的CPU资源.减压.大多数工具还支持直接使用压缩文件.采用Amazon Athena(压缩格式)或Amazon EMR (例如,如何处理压缩文件).

I'm trying to find a solution to extract ALB logs file in .gz format when they're uploaded automatically from ALB to S3.

My bucket structure is like this

/log-bucket
..alb-1/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-2/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz
..alb-3/AWSLogs/account-number/elasticloadbalancing/ap-northeast-1/2018/log.gz

Basically, every 5 minutes, each ALB would automatically push logs to correspond S3 bucket. I'd like to extract new .gz files right at that time in same bucket.

Is there any ways to handle this?

I noticed that we can use Lambda function but not sure where to start. A sample code would be greatly appreciated!

解决方案

Your best choice would probably be to have an AWS Lambda function subscribed to S3 events. Whenever a new object gets created, this Lambda function would be triggered. The Lambda function could then read the file from S3, extract it, write the extracted data back to S3 and delete the original one.

How that works is described in Using AWS Lambda with Amazon S3.

That said, you might also want to reconsider if you really need to store uncompressed logs in S3. Compressed files are not only cheaper, as they don't take as much storage space as uncompressed ones, but they are usually also faster to process, as the bottleneck in most cases is network bandwidth for transferring the data and not available CPU-resources for decompression. Most tools also support working directly with compressed files. Take Amazon Athena (Compression Formats) or Amazon EMR (How to Process Compressed Files) for example.

这篇关于自动在S3中提取.gz文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆