Azure Functions 的缩放算法是什么(永远无法运行超过 7 个并行实例) [英] What is the scaling algorithm for Azure Functions (never been able to get over 7 parallel instances running)

查看:10
本文介绍了Azure Functions 的缩放算法是什么(永远无法运行超过 7 个并行实例)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试了解如何使用 azure 函数进行缩放.我们一直在测试一个在存储队列中生成 88 条消息的应用程序,这会触发我们的函数.该函数是用c#编写的.该函数下载一个文件,对其执行一些处理(它最终会将其发回,但出于测试目的我们还没有这样做).该函数完成每个请求大约需要 30 秒(总共约 2500 秒的处理时间).出于测试目的,我们将其循环 10 次.

I’m trying to understand how scaling works using azure functions. We’ve been testing with an app that generates 88 messages in a storage queue, which triggers our function. The function is written in c#. The function downloads a file, performs some processing on it (it will eventually post it back but we aren’t doing that yet for testing purposes). The function takes about 30 seconds to complete per request (total ~2500 seconds of processing). For testing purposes we loop this 10 times.

我们的理想情况是,经过一些升温后,Azure 会自动扩展功能,以最方便的方式处理消息.使用某种考虑到启动时间等的算法.或者只是扩大到积压中的消息数量,并设置某种上限.

Our ideal situation would be that after some warming, Azure would automatically scale the function up to handle the messages in the most expedient way. Using some sort of algorithm taking into account spin up time, etc.. Or just scale up to the number of messages in the backlog, with some sort of a cap.

这是它应该如何工作的吗?我们从未能够获得超过 7 个消费单位".并且通常需要大约 45 分钟来处理消息队列.

Is this how it is supposed to work? We have never been able to get over 7 ‘consumption units’. And generally take about 45 minutes to process the queue of messages.

关于可扩展性的其他几个问题......我们的函数是一个内存密集型操作,内存如何在函数的扩展实例之间共享"?我问是因为我们看到了一些我们通常看不到的内存不足错误.我们已经为函数配置了最大内存(1536MB).看到大约 2.5% 的操作因内存不足错误而失败

Couple of other question re scalability… Our function is a memory intensive operation, how is memory ‘shared’ across scaled instances of a function? I ask because we are seeing some out of memory errors, that we don’t normally see. We’ve configure for the max memory for the function (1536MB). Seeing about 2.5% of the operations failing from an out of memory error

在此先感谢您,我们真的很想完成这项工作,因为它可以让我们将大量工作从 EC2 上的专用 Windows VM 转移到 Azure 函数上.

Thanks in advance, we’re really looking to make this work as it would allow us to move a lot of our work off of dedicated windows VMs on EC2 and onto Azure functions.

推荐答案

本意是平台为你自动扩容,最终目标是你不用考虑或关心消费"的数量单位"(有时称为 instances),分配给您的函数应用.也就是说,总会有改进的余地,以确保我们为大多数用户提供正确的服务.:)

The intent is that the platform takes care of automatically scaling for you with the ultimate goal that you don't have to think or care about the number of "consumption units" (sometimes referred to as instances) that are assigned to your function app. That said, there will always be room for improvement to ensure we get this right for the majority of users. :)

但要回答您关于内部细节的问题(就队列处理而言),我们现在拥有的是一个检查队列长度数量的系统每条消息在被您的应用处理之前位于队列中的时间.如果我们认为您的函数应用在处理这些消息方面落后",则会添加更多消耗单元,直到我们认为您的应用能够跟上传入的负载.

But to answer your question about the internal details (as far as queue processing goes), what we have in place right now is a system which examines the queue length and the amount of time each message sits in the queue before being processed by your app. If we feel like your function app is "falling behind" in processing these messages, then more consumption units will be added until we think your app is able to keep up with the incoming load.

值得一提的非常重要的一点是,除了消费单位的数量之外,规模还有另一个方面.每个消费单元都有能力并行处理许多消息. 我们经常看到人们遇到的问题不是分配的消费单元的数量,而是他们工作负载的默认并发配置.查看可以在 host.json 文件中调整的 batchSizenewBatchThreshold 设置.根据您的工作负载,您可能会发现更改这些值后吞吐量会显着提高(在某些情况下,减少并发已被证明可以显着提高吞吐量).例如,如果每个函数执行需要大量内存,或者如果您的函数依赖于只能处理有限并发访问的外部资源(如数据库),您可能会观察到这一点.可以在此处找到有关这些并发控制的更多文档:https://github.com/Azure/azure-webjobs-sdk-script/wiki/host.json.

One thing that's very important to mention is that there is another aspect of scale besides just the number of consumption units. Each consumption unit has the ability to process many messages in parallel. Often times we see that the problem people have is not the number of allocated consumption units, but the default concurrency configuration for their workload. Take a look at the batchSize and newBatchThreshold settings which can be tweaked in your host.json file. Depending on your workload, you may find that you get significantly better throughput when you change these values (in some cases, reducing concurrency has been shown to dramatically increase throughput). For example, you may observe this if each function execution requires a lot of memory or if your functions depend on an external resource (like a database) which can only handle limited concurrent access. More documentation on these concurrency controls can be found here: https://github.com/Azure/azure-webjobs-sdk-script/wiki/host.json.

正如我在上面所暗示的,使用按消耗单元并发可能有助于解决您遇到的内存压力问题.每个消耗单元都有自己的内存池(例如自己的 1.5 GB).但是,如果您在单个消耗单元中处理太多消息,那么这可能是您看到的内存不足错误的根源.

As I hinted at above, playing with per-consumption unit concurrency may help with the memory pressure issues you've been encountering. Each consumption unit has its own pool of memory (e.g. its own 1.5 GB). But if you're processing too many messages in a single consumption unit, then that could be the source of the out-of-memory errors you're seeing.

尽管如此,我们一直在努力识别和优化我们认为最常见的某些负载场景,无论是从队列中排出一堆消息,还是消耗存储容器中的流"blob,处理大量的 HTTP 请求等.期待随着我们学习、成熟并从像您这样的人那里获得更多反馈,事情会发生变化.向产品组提供此类反馈的最佳位置是 我们的 GitHub 存储库的问题列表,定期审查.

With all this said, we are constantly doing work to identify and optimize certain load scenarios which we think are the most common, whether it's draining a pile of messages from a queue, consuming a "stream" of blobs in a storage container, processing a flood of HTTP requests, etc. Expect things to change as we learn, mature, and get more feedback from folks like yourself. The best place to provide such feedback to the product group is in our GitHub repo's issue list, which is reviewed regularly.

感谢您的提问.我希望这些信息对您有所帮助,并且您能够获得您正在寻找的号码.

Thanks for the question. I hope this information was helpful and that you're able to get the numbers you're looking for.

这篇关于Azure Functions 的缩放算法是什么(永远无法运行超过 7 个并行实例)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆