AWS Lambda 性能问题 [英] AWS Lambda Performance issues

查看:43
本文介绍了AWS Lambda 性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用与 aws lambda(java) 集成的 aws api 网关,但我发现这种方法存在一些严重的问题.删除服务器并使您的应用程序开箱即用的概念非常好,但这是我面临的问题.我的 lambda 正在做 2 件简单的事情 - 验证从客户端收到的有效负载,然后将其发送到 kinesis 流以从另一个 lambda 进行进一步处理(您会问为什么我不直接发送到流并且只对所有使用 1 个 lambda的操作.假设我想分离逻辑并有一个抽象层,并且还能够告诉客户端他正在发送无效数据.).

I use aws api gateway integrated with aws lambda(java), but I'm seeing some serious problems in this approach. The concept of removing the server and having your app scaled out of the box is really nice but here are the problem I'm facing. My lambda is doing 2 simple things- validate the payload received from the client and then send it to a kinesis stream for further processing from another lambda(you will ask why I don't send directly to the stream and only use 1 lambda for all of the operations. Let's just say that I want to separate the logic and have a layer of abstraction and also be able to tell the client that he's sending invalid data.).

在 lambda 的实现中,我集成了 spring DI.到现在为止还挺好.我开始做性能测试.我模拟了 50 个并发用户发出 4 个请求,每个请求间隔 5 秒.那么发生了什么 - 在 lambda 的冷启动中,我初始化了 spring 的应用程序上下文,但似乎在 lambda 未启动时同时发出如此多的请求正在做一些奇怪的事情.这是上下文初始化时间的屏幕截图.

In the implementation of the lambda I integrated the spring DI. So far so good. I started making performance testing. I simulated 50 concurrent users making 4 requests each with 5 seconds between the requests. So what happened- In the lambda's coldstart I initialize the spring's application context but it seems that having so many simultaneous requests when the lambda was not started is doing some strange things. Here's a screenshot of the times the context was initialized for.

我们从截图中可以看出,初始化上下文的时间有很大的不同.我对发生的事情的假设是,当收到如此多的请求并且没有活动"的 lambda 时,它会为每个请求初始化一个 lambda 容器,同时它会阻止"其中的一些(那些有大量时间的请求)18s)直到其他人已经开始准备好.因此,它可能对可以同时启动的容器有一些内部限制.问题是,如果您的流量分布不均,这种情况会不时发生,并且某些请求会超时.我们不希望这种情况发生.

What we can see from the screenshot is that the times for initializing the context have big difference. My assumption of what happening is that when so many requests are received and there's no "active" lambda it initializes a lambda container for every one of them and in the same time it "blocks" some of them(the ones with the big times of 18s) until the others already started are ready. So maybe it has some internal limit of the containers it can start at the same time. The problem is that if you don't have equally distributed traffic this will happen from time to time and some of the requests will timeout. We don't want this to happen.

所以接下来的事情是在没有 spring 容器的情况下做一些测试,因为我的想法是好吧,初始化很重,让我们只做普通的旧 java 对象初始化".不幸的是,同样的事情发生了(可能只是减少了某些请求的 3s 容器初始化).下面是更详细的测试数据截图:

So next thing was to do some tests without spring container as my thought was "ok, the initialization is heavy, let's just make plain old java objects initialization". And unfortunatelly the same thing happened(maybe just reduced the 3s container initialization for some of the requests). Here is a more detailed screenshot of the test data:

所以我记录了整个 lambda 执行时间(从构造到结束)、kinesis 客户端初始化和实际将数据发送到流,因为这些是 lambda 中最繁重的操作.我们仍然有 18 岁之类的大时代,但有趣的是,这些时代在某种程度上是成正比的.因此,如果整个 lambda 需要 18 秒,大约 7-8 秒是客户端初始化,6-7 秒用于将数据发送到流,还有 4-5 秒用于 lambda 中的其他操作,目前只是验证.另一方面,如果我们采取一个小的时间(这意味着它重用已经启动的 lambda),即820ms,kinesis客户端初始化需要100ms,数据发送需要340ms,验证需要400ms.所以这再次促使我想到它在内部因为某些限制而睡了一段时间.下一个屏幕截图显示了当 lamda 已经启动时下一轮请求中发生的事情:

So I logged the whole lambda execution time(from construction to the end), the kinesis client initialization and the actual sending of the data to the stream as these are the heaviest operations in the lambda. We still have these big times of 18s or something but the interesting thing is that the times are somehow proportional. So if the whole lambda takes 18s, around 7-8s is the client initialization and 6-7 for sending the data to the stream and 4-5 seconds left for the other operations in the lambda which for the moment is only validation. On the other hand if we take one of the small times(which means that it reuses an already started lambda),i.e. 820ms, it takes 100ms for the kinesis client initialization and 340 for the data sending and 400ms for the validation. So this pushes me again to the thoughts that internally it makes some sleeps because of some limits. The next screenshot is showing what is happening on the next round of requests when the lamda is already started:

所以我们没有这么大的时间,是的,我们在某些请求中仍然有一些相对较大的增量(这对我来说也很奇怪),但事情看起来好多了.

So we don't have this big times, yes we still have some relatively big delta in some of the request(which for me is also strange), but the things looks much better.

因此,我正在寻求了解实际幕后情况的人的解释,因为对于使用云的严肃应用程序来说,这不是一个好的行为,因为它具有无限"的可能性.

So I'm looking for a clarification from someone who knows actually what is happening under the hood, because this is not a good behavior for a serious application which is using the cloud because of it's "unlimited" possibilities.

另一个问题与区域内帐户内所有 lambda 中 lambda-200 并发调用的另一个限制有关.对我来说,这对于拥有大量流量的大型应用程序来说也是一个很大的限制.因此,作为我目前的商业案例(我不知道未来)或多或少是火了,忘记了请求.我开始考虑以网关将数据直接发送到流的方式更改逻辑,而另一个 lambda 负责验证和进一步处理.是的,我正在失去当前的抽象(我目前不需要),但我多次增加了应用程序的可用性.你怎么看?

And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region. For me this is also a big limitation for a big application with lots of traffic. So as my business case in the moment(I don't know for the future) is more or less fire and forget the request. And I'm starting to think of changing the logic in the way that the gateway sends the data directly to the stream and the other lambda is taking care of the validation and the further processing. Yes, I'm loosing the current abstraction(which I don't need at the moment) but I'm increasing the application availability many times. What do you think?

推荐答案

您可以通过 API 网关直接代理到 Kinesis 流.您会在验证和转换方面失去一些控制,但不会像从 Lambda 中看到的冷启动延迟.

You can proxy straight to the Kinesis stream via API Gateway. You would lose some control in terms of validation and transformation, but you won't have the cold start latency that you're seeing from Lambda.

您可以使用 API 网关映射模板来转换数据,如果验证很重要,您可以在流的另一端处理 Lambda 时执行此操作.

You can use the API Gateway mapping template to transform the data and if validation is important, you could potentially do that at the processing Lambda on the other side of the stream.

这篇关于AWS Lambda 性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆