AWS Lambda音频功能提取(没有足够的存储-层) [英] Aws lambda audio features extraction ( Not enough storage -Layers )

查看:106
本文介绍了AWS Lambda音频功能提取(没有足够的存储-层)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们有IOT传感器,可将wav文件上传到S3存储桶中.

We have IOT sensors that uploads wav files into S3 Bucket.

我们希望能够使用 aws lambda

  • python librosa pyAudio分析包+ numpy和scipy.(〜240mb解压缩)
  • ffmpeg(未压缩的〜70mb)
  • python librosa or pyAudio analysis package + numpy and scipy. (~ 240mb unzziped)
  • ffmpeg (~ 70mb unzziped)

如您所见,没有办法将它们全部放在同一个lambda包中(最大250mb未压缩).当收集wav文件时,如果不在图层中不包含ffmpeg,则会出现错误:

As you can see there is no way to put them all together in same lambda package (250mb uncompressed max). And im getting an error when not including the ffmpeg in the layers when gathering the wav file:

[ERROR] FileNotFoundError: [Errno 2] No such file or directory: 'ffprobe': 'ffprobe'

与ffmpeg有关.

  1. 将ffmpeg文件放入s3中,并在每次调用时将其获取(无需将其放置在层中.(即使有可能)

  1. Putting the ffmpeg file in s3 and getting it every single invoke ( without having to put it in the layers. ( if it is even possible)

绑定两个lambda :1用于通过ffmpeg处理输入文件并将输出文件放入另一个存储桶中> 2调用函数并从处理后的数据中提取特征.(使用SNS/链接机制)(如果可能的话)

Chaining two lambdas: 1 for processing the input file through ffmpeg and puting the output file in abother bucket > 2 function invoked and extracting features from the processed data. ( using SNS / chaining mechanism) ( if it is even possible)

必须有一种简便的方法,在开始实施之前,很高兴听到其他意见,谢谢大家!

there has to be and easier way, ill be glad to hear for other opinions before diving into implementation, Thank you all!

推荐答案

该场景似乎是:

  • 文件随机进入
  • 文件需要处理,但不是实时的
  • 所需的库对于AWS Lambda函数而言太大

建议的体系结构:

  • 配置 Amazon S3事件,以在文件到达时将消息发送到 Amazon SQS队列
  • 配置 Amazon CloudWatch事件以定期(例如1小时)触发 AWS Lambda功能
    • Lambda函数检查队列中是否有消息
    • 如果有消息,它会使用用户数据脚本启动Amazon EC2实例,该脚本会安装并启动处理系统
    • Configure an Amazon S3 Event to send a message to an Amazon SQS queue when a file arrives
    • Configure an Amazon CloudWatch Event to trigger an AWS Lambda function at regular intervals (eg 1 hour)
      • The Lambda function checks whether there are messages in the queue
      • If there are messages, it launches an Amazon EC2 instance with a User Data script that installs and starts the processing system
      • 从队列中获取消息
      • 处理消息(没有Lambda的限制)
      • 删除消息
      • 如果队列中没有剩余消息,它将终止EC2实例
      • Grab a message from the queue
      • Process the message (without the limitations of Lambda)
      • Delete the message
      • If there are no messages left in the queue, it will terminate the EC2 instance

      这可能非常具有成本效益,因为Amazon EC2 Linux实例是按秒收费的.您可以并行运行多个工作线程来处理消息(但在编写终止代码时要小心,以确保所有工作线程都已完成对消息的处理).或者,如果事情不是时间紧迫的,则选择最小的可用实例类型并对其进行单线程处理,因为无论如何大型实例的成本更高(因此从成本效益的角度来看,它们并没有更好的表现).

      This can be very cost-effective because Amazon EC2 Linux instances are charged per-second. You can run several workers in parallel to process the messages (but be careful when writing the termination code, to ensure that all workers have finished processing messages). Or, if things are not time-critical, just choose the smallest usable Instance Type and single-thread it since larger instances cost more anyway (so they are no better from a cost-efficient standpoint).

      请确保已放置监视,以确保正在处理邮件.在Amazon SQS中实施死信队列,以捕获无法处理的消息,并在DLQ上放置 CloudWatch警报,以在出现问题时通知您.

      Make sure you put monitoring in place to ensure that messages are being processed. Implement a Dead Letter Queue in Amazon SQS to catch messages that are failing to process and put a CloudWatch Alarm on the DLQ to notify you if things seem to be going wrong.

      这篇关于AWS Lambda音频功能提取(没有足够的存储-层)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆