AWS Lambda性能问题 [英] AWS Lambda Performance issues

查看:230
本文介绍了AWS Lambda性能问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用与aws lambda(java)集成的aws api网关,但我在这种方法中看到了一些严重的问题。删除服务器并让您的应用程序开箱即用的概念非常好,但这是我面临的问题。我的lambda正在做两件简单的事情 - 验证从客户端收到的有效负载,然后将其发送到kinesis流,以便从另一个lambda进行进一步处理(你会问为什么我不直接发送到流,只使用1个lambda我只想说我想分离逻辑并有一层抽象,并且能够告诉客户他正在发送无效数据。)

I use aws api gateway integrated with aws lambda(java), but I'm seeing some serious problems in this approach. The concept of removing the server and having your app scaled out of the box is really nice but here are the problem I'm facing. My lambda is doing 2 simple things- validate the payload received from the client and then send it to a kinesis stream for further processing from another lambda(you will ask why I don't send directly to the stream and only use 1 lambda for all of the operations. Let's just say that I want to separate the logic and have a layer of abstraction and also be able to tell the client that he's sending invalid data.).

在lambda的实现中,我集成了弹簧DI。到现在为止还挺好。我开始进行性能测试。我模拟了50个并发用户,每个请求发出4个请求,请求之间有5秒。所以发生了什么 - 在lambda的冷启动中,我初始化了spring的应用程序上下文,但似乎在lambda未启动时有这么多的同时请求正在做一些奇怪的事情。这是上下文初始化时间的屏幕截图。

In the implementation of the lambda I integrated the spring DI. So far so good. I started making performance testing. I simulated 50 concurrent users making 4 requests each with 5 seconds between the requests. So what happened- In the lambda's coldstart I initialize the spring's application context but it seems that having so many simultaneous requests when the lambda was not started is doing some strange things. Here's a screenshot of the times the context was initialized for.

我们从截图中看到的是,初始化上下文的时间有很大差异。我对所发生的事情的假设是,当接收到如此多的请求并且没有活动lambda时,它会为每个请求初始化一个lambda容器,同时它阻塞其中一些(具有大的时间) 18s)直到其他已经开始准备好了。所以也许它可以同时启动容器的内部限制。问题是如果你没有平均分配的流量,这将不时发生,一些请求将会超时。我们不希望这种情况发生。

What we can see from the screenshot is that the times for initializing the context have big difference. My assumption of what happening is that when so many requests are received and there's no "active" lambda it initializes a lambda container for every one of them and in the same time it "blocks" some of them(the ones with the big times of 18s) until the others already started are ready. So maybe it has some internal limit of the containers it can start at the same time. The problem is that if you don't have equally distributed traffic this will happen from time to time and some of the requests will timeout. We don't want this to happen.

所以下一件事就是做一些没有弹簧容器的测试,因为我的想法是好的,初始化很重,让我们做普通的旧java对象初始化。不幸的是,同样的事情发生了(可能只是减少了一些请求的3s容器初始化)。以下是测试数据的更详细屏幕截图:

So next thing was to do some tests without spring container as my thought was "ok, the initialization is heavy, let's just make plain old java objects initialization". And unfortunatelly the same thing happened(maybe just reduced the 3s container initialization for some of the requests). Here is a more detailed screenshot of the test data:

所以我记录了整个lambda执行时间(从构造到结束),kinesis客户端初始化以及实际将数据发送到流,因为这些是lambda中最重的操作。我们仍然有18岁或者其他什么时候,但有趣的是,时代在某种程度上是成比例的。因此,如果整个lambda需要18秒,大约7-8s是客户端初始化,6-7用于将数据发送到流,而剩下4-5秒用于lambda中的其他操作,目前只是验证。另一方面,如果我们采取一个小的时间(这意味着它重新使用已经开始的lambda),即。 820ms,kinesis客户端初始化需要100ms,数据发送需要340ms,验证需要400ms。所以这再次推动了我内心因为一些限制而睡觉的想法。下一个屏幕截图显示了当lamda已经启动时下一轮请求发生的情况:

So I logged the whole lambda execution time(from construction to the end), the kinesis client initialization and the actual sending of the data to the stream as these are the heaviest operations in the lambda. We still have these big times of 18s or something but the interesting thing is that the times are somehow proportional. So if the whole lambda takes 18s, around 7-8s is the client initialization and 6-7 for sending the data to the stream and 4-5 seconds left for the other operations in the lambda which for the moment is only validation. On the other hand if we take one of the small times(which means that it reuses an already started lambda),i.e. 820ms, it takes 100ms for the kinesis client initialization and 340 for the data sending and 400ms for the validation. So this pushes me again to the thoughts that internally it makes some sleeps because of some limits. The next screenshot is showing what is happening on the next round of requests when the lamda is already started:

所以我们没有这么大的时间,是的我们在一些请求中仍然有一些相对较大的增量(对我而言也很奇怪),但事情看起来很多更好。

So we don't have this big times, yes we still have some relatively big delta in some of the request(which for me is also strange), but the things looks much better.

所以我正在寻找一个知道实际发生了什么事情的人的澄清,因为对于一个严肃的应用程序来说这不是一个好的行为。使用云是因为它具有无限的可能性。

So I'm looking for a clarification from someone who knows actually what is happening under the hood, because this is not a good behavior for a serious application which is using the cloud because of it's "unlimited" possibilities.

另一个问题与区域中某个帐户内所有lambda中lambda-200并发调用的另一个限制有关。对我而言,对于拥有大量流量的大型应用程序来说,这也是一个很大的限制。因此,我现在的商业案例(我不知道将来)或多或少是火灾而忘记了请求。我开始考虑改变网关直接将数据发送到流的方式的逻辑,另一个lambda负责验证和进一步处理。是的,我正在失去当前的抽象(目前我不需要)但是我多次增加了应用程序的可用性。您如何看待?

And another question is related to another limit of the lambda-200 concurrent invocations in all lambdas within an account in a region. For me this is also a big limitation for a big application with lots of traffic. So as my business case in the moment(I don't know for the future) is more or less fire and forget the request. And I'm starting to think of changing the logic in the way that the gateway sends the data directly to the stream and the other lambda is taking care of the validation and the further processing. Yes, I'm loosing the current abstraction(which I don't need at the moment) but I'm increasing the application availability many times. What do you think?

推荐答案

您可以通过API网关直接代理Kinesis流。你会在验证和转换方面失去一些控制权,但你不会有从Lambda看到的冷启动延迟。

You can proxy straight to the Kinesis stream via API Gateway. You would lose some control in terms of validation and transformation, but you won't have the cold start latency that you're seeing from Lambda.

你可以使用API用于转换数据的网关映射模板,如果验证很重要,您可以在流的另一端处理Lambda时执行此操作。

You can use the API Gateway mapping template to transform the data and if validation is important, you could potentially do that at the processing Lambda on the other side of the stream.

这篇关于AWS Lambda性能问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆