将Azure Functions与事件中心链集成的最佳参数是什么 [英] What is the best parameters to integrate Azure Functions with Event hubs chain

查看:70
本文介绍了将Azure Functions与事件中心链集成的最佳参数是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们需要设置4个EventHub和3个Azure函数.那么,具有高吞吐量和可扩展参数的最佳方法是什么?我们可以将其设置为具有能够处理75,000条消息/秒的系统?

we need to setup 4 EventHub and 3 Azure Functions. So what is the best way to have high throughput and Scalable parameters that we can set to have a system that can handle 75k message/sec?

  • Local.settings.json
  • hosts.json
  • 预取计数
  • 最大批处理量

推荐答案

这篇文章绝对值得一读,是我根据自己的一些工作做的,我需要达到50k p/sec. https://azure.microsoft.com/zh-CN/blog/processing-100-000-events-per-second-on-azure-functions/

This article is definitely worth a read and is something I based some of my work on, I needed to achieve 50k p/sec. https://azure.microsoft.com/en-gb/blog/processing-100-000-events-per-second-on-azure-functions/

一个重要的考虑因素是您将拥有多少个分区,因为这将直接影响您的总吞吐量.在扩展应用程序实例时,事件处理器主机(EPH)将尝试并拥有处理特定分区的所有权,并且每个分区可以处理1MB/秒的入口和2MB/秒的出口.(或每秒1000个事件)

An important consideration is how many partitions you will have, as this will directly impact your total throughput. As you scale out instance of your application, the Event Processor Host (EPH) will try and take ownership of processing a particular partition, and each partition can process 1MB/sec ingress and 2MB/sec egress. (or, 1000 events p/sec)

https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-faq

您需要同时考虑邮件大小和邮件计数.如果可能,将尽可能多的数据点填充到事件中心消息中.在我的场景中,我在每个事件中心消息中处理500个数据点-从单个消息中提取大量数据比从大量消息中提取少量数据要高效得多.

You need to consider both message size and message counts. If possible, cram as many data points as possible into an event hub message. In my scenario, I'm processing 500 data points in each event hub message - it's much more efficient to extract lots of data from a single message rather than a small amount of data from lots of messages.

对于您的吞吐量要求,这是您需要考虑的事项.即使有32个分区,也不会给您75,000 msg p/sec-您可以要求Microsoft增加分区数,就像在我链接的原始文章中所做的那样,那里有100个分区.

For your throughput requirements, this is something you need to consider. Even at 32 partitions, that's not going to give you 75k msg p/sec - you can ask Microsoft to increase the partition count, as they did in the original article I linked, where they have 100 partitions.

关于配置设置:我正在使用

As for configuration settings : I'm running with

{
    "version":  "2.0",
    "extensions": {
        "eventHubs": {
            "batchCheckpointFrequency": 10,
            "eventProcessorOptions": {
                "maxBatchSize": 256,
                "prefetchCount": 512,
                "enableReceiverRuntimeMetric": true
            }            
        }
    }
}

  • 我收到最多256条消息
  • 每条消息最多可以包含500个数据点
  • 我们在10个批次后检查一个分区
  • 这意味着最多有130万个数据点可以再次进行处理,从而导致功能必须从最后一个已知的检查点开始进行处理.这也很重要-您的更新是幂等的,还是重新处理它们无关紧要?

    This means there's up to approx 1.3million data points that could be processed again, in an event that causes the functions to have to begin processing from the last known checkpoint. This is also important - are your updates idempotent, or doesn't matter if they are reprocessed?

    您需要将消息中的数据放入某种类型的数据存储中,并且要以较高的速率插入-目标数据存储可以以这种高频率处理插入吗?如果目标商店停运,您的处理流程会怎样?我采用了与本文中描述的方法类似的方法,该方法总结为在处理一批消息时发生任何故障时,将整个批处理移到错误"中心,然后让另一个函数尝试处理它们".您不能停止如此处理,否则会落后于您!

    You are going to need to put the data from the messages into some sort of data store, and you're going to be inserting at a high rate into that - can your target data store cope with inserts at this high frequency? What happens to your processing pipeline if your target store has an outage? I went with a similar approach as described in this article, which is summarized as 'in the event of any failure when processing a batch of messages, move the entire batch onto an 'errors' hub and let another function try and process them'. You can't stop processing at this volume or you will fall behind!

    https://blog.pragmatists.com/retrying-consumer-architecture-the-apache-kafka-939ac4cb851a

    这也是重要的一点.您的处理需要多实时?如果您开始落后,您是否需要扩大规模以尝试追赶?您怎么知道这是否正在发生?我创建了一个指标来跟踪任何分区的最新事件有多远,这使我可以可视化并设置警报-我还根据该数字扩展功能.

    That's also an important point. How real-time does your processing need to be? If you start falling behind, would you need to scale out to try and catch up? How would you know if this was happening? I created a metric to track how far behind the latest event any partition is, which allows me to visualize and set up alerts on - I also scale out my functions based on this number.

    https://medium.com/@dylanm_asos/azure-functions-event-hub-processing-8a3f39d2cd0f

    在您提到的数量上-不仅仅是一些配置可以使您实现它,还有很多注意事项

    At the volumes you've mentioned - it's not just some configuration that will let you achieve it, there are a number of considerations

    这篇关于将Azure Functions与事件中心链集成的最佳参数是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆