Azure 事件中心和多个消费者组 [英] Azure event hubs and multiple consumer groups

查看:24
本文介绍了Azure 事件中心和多个消费者组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在以下场景中需要有关使用 Azure 事件中心的帮助.我认为消费者群体可能是这种情况的正确选择,但我无法在网上找到具体的例子.

这是问题的粗略描述以及使用事件中心的建议解决方案(我不确定这是否是最佳解决方案.感谢您的反馈)

我有多个事件源,它们生成大量事件数据(来自传感器的遥测数据),这些数据需要保存到我们的数据库中,并且一些分析(例如运行平均值、最小值-最大值)应该并行执行强>.

发送方只能将数据发送到单个端点,但事件中心应该使这些数据可供两个数据处理程序使用.

我正在考虑使用两个消费者组,第一个是一组工作角色实例,负责将数据保存到我们的键值存储中,第二个消费者组将是一个分析引擎(可能与Azure 流分析).

首先,我如何设置消费者组,我需要在发送方/接收方方面做些什么,以便事件副本出现在所有消费者组上?

我确实在网上阅读了许多示例,但它们要么使用 client.GetDefaultConsumerGroup(); 和/或让所有分区由同一辅助角色的多个实例处理.

对于我的场景,当一个事件被触发时,它需要由两个不同的工作角色并行处理(一个保存数据,第二个做一些分析)

谢谢!

解决方案

TLDR:看起来很合理,用 CreateConsumerGroupIfNotExists 使用不同的名称来创建两个消费者组.

消费者组主要是一个概念,因此它们的工作方式取决于您的订阅者的实施方式.如您所知,从概念上讲,它们是一组一起工作的订阅者,因此每个组都会收到所有消息,并且在理想(不会发生)的情况下,每条消息可能会消费一次.这意味着每个消费者组让所有分区由同一工作者角色的多个实例处理".你想要这个.

这可以通过不同的方式实现.Microsoft 提供了两种直接使用来自事件中心的消息的方法,以及使用可能建立在这两种直接方式之上的流分析之类的选项.第一种方法是.因此,不要将默认使用者组用于其他用途,如果您需要两个单独的 Azure 流分析,您可能需要做一些令人讨厌的事情.但它很容易配置!

Need help on using Azure event hubs in the following scenario. I think consumer groups might be the right option for this scenario, but I was not able to find a concrete example online.

Here is the rough description of the problem and the proposed solution using the event hubs (I am not sure if this is the optimal solution. Will appreciate your feedback)

I have multiple event-sources that generate a lot of event data (telemetry data from sensors) which needs to be saved to our database and some analysis like running average, min-max should be performed in parallel.

The sender can only send data to a single endpoint, but the event-hub should make this data available to both the data handlers.

I am thinking about using two consumer groups, first one will be a cluster of worker role instances that take care of saving the data to our key-value store and the second consumer group will be an analysis engine (likely to go with Azure Stream Analysis).

Firstly, how do I setup the consumer groups and is there something that I need to do on the sender/receiver side such that copies of events appear on all consumer groups?

I did read many examples online, but they either use client.GetDefaultConsumerGroup(); and/or have all partitions processed by multiple instances of a same worker role.

For my scenario, when a event is triggered, it needs to be processed by two different worker roles in parallel (one that saves the data and second one that does some analysis)

Thank You!

解决方案

TLDR: Looks reasonable, just make two Consumer Groups by using different names with CreateConsumerGroupIfNotExists.

Consumer Groups are primarily a concept so exactly how they work depends on how your subscribers are implemented. As you know, conceptually they are a group of subscribers working together so that each group receives all the messages and under ideal (won't happen) circumstances probably consumes each message once. This means that each Consumer Group will "have all partitions processed by multiple instances of the same worker role." You want this.

This can be implemented in different ways. Microsoft has provided two ways to consume messages from Event Hubs directly plus the option to use things like Streaming Analytics which are probably built on top of the two direct ways. The first way is the Event Hub Receiver, the second which is higher level is the Event Processor Host.

I have not used Event Hub Receiver directly so this particular comment is based on the theory of how these sorts of systems work and speculation from the documentation: While they are created from EventHubConsumerGroups this serves little purpose as these receivers do not coordinate with one another. If you use these you will need to (and can!) do all the coordination and committing of offsets yourself which has advantages in some scenarios such as writing the offset to a transactional DB in the same transaction as computed aggregates. Using these low level receivers, having different logical consumer groups using the same Azure consumer group probably shouldn't (normative not practical advice) be particularly problematic, but you should use different names in case it either does matter or you change to EventProcessorHosts.

Now onto more useful information, EventProcessorHosts are probably built on top of EventHubReceivers. They are a higher level thing and there is support to enable multiple machines to work together as a logical consumer group. Below I've included a lightly edited snippet from my code that makes an EventProcessorHost with a bunch of comments left in explaining some choices.

//We need an identifier for the lease. It must be unique across concurrently 
//running instances of the program. There are three main options for this. The 
//first is a static value from a config file. The second is the machine's NETBIOS
//name ie System.Environment.MachineName. The third is a random value unique per run which
//we have chosen here, if our VMs have very weak randomness bad things may happen.

string hostName = Guid.NewGuid().ToString();

//It's not clear if we want this here long term or if we prefer that the Consumer 
//Groups be created out of band. Nor are there necessarily good tools to discover 
//existing consumer groups.
NamespaceManager namespaceManager = 
    NamespaceManager.CreateFromConnectionString(eventHubConnectionString);
EventHubDescription ehd = namespaceManager.GetEventHub(eventHubPath);
namespaceManager.CreateConsumerGroupIfNotExists(ehd.Path, consumerGroupName);

host = new EventProcessorHost(hostName, eventHubPath, consumerGroupName, 
    eventHubConnectionString, storageConnectionString, leaseContainerName);
//Call something like this when you want it to start
host.RegisterEventProcessorFactoryAsync(factory)

You'll notice that I told Azure to make a new Consumer Group if it doesn't exist, you'll get a lovely error message if it doesn't. I honestly don't know what the purpose of this is because it doesn't include the Storage connection string which needs to be the same across instances in order for the EventProcessorHost's coordination (and presumably commits) to work properly.

Here I've provided a picture from Azure Storage Explorer of leases the leases and presumably offsets from a Consumer Group I was experimenting with in November. Note that while I have a testhub and a testhub-testcg container, this is due to manually naming them. If they were in the same container it would be things like "$Default/0" vs "testcg/0".

As you can see there is one blob per partition. My assumption is that these blobs are used for two things. The first of these is the Blob leases for distributing partitions amongst instances see here, the second is storing the offsets within the partition that have been committed.

Rather than the data getting pushed to the Consumer Groups the consuming instances are asking the storage system for data at some offset in one partition. EventProcessorHosts are a nice high level way of having a logical consumer group where each partition is only getting read by one consumer at a time, and where the progress the logical consumer group has made in each partition is not forgotten.

Remember that the throughput per partition is measured so that if you're maxing out ingress you can only have two logical consumers that are all up to speed. As such you'll want to make sure you have enough partitions, and throughput units, that you can:

  1. Read all the data you send.
  2. Catch up within the 24 hour retention period if you fall behind for a few hours due to issues.

In conclusion: consumer groups are what you need. The examples you read that use a specific consumer group are good, within each logical consumer group use the same name for the Azure Consumer Group and have different logical consumer groups use different ones.

I haven't yet used Azure Stream Analytics, but at least during the preview release you are limited to the default consumer group. So don't use the default consumer group for something else, and if you need two separate lots of Azure Stream Analytics you may need to do something nasty. But it's easy to configure!

这篇关于Azure 事件中心和多个消费者组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆