从Azure事件中心获取事件后,是否应该将事件放入队列中? [英] Should I put my events inside a queue after getting them from Azure Event Hub?

查看:96
本文介绍了从Azure事件中心获取事件后,是否应该将事件放入队列中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在开发使用Azure事件中心在Azure上托管的应用程序。基本上,我是从Web API向事件中心发送消息(或应该说事件)的,我有两个侦听器:




  • 用于实时分析的流分析任务

  • 标准工作角色,该角色根据接收到的事件计算一些内容,然后将它们存储到Azure SQL数据库(这是lambda体系结构) 。



我当前正在使用 EventProcessorHost 库,以从我的辅助角色中的事件中心检索事件。



我正在尝试找到一些有关如何使用事件中心的最佳实践(使用事件中心比服务总线队列要难一些,即流与消息消耗),我发现有些人说我不应该做很多事情从事件中心检索 EventData 个事件后进行处理。



具体来说:






请记住,无论您做什么,都要做相对的
快速-即不要尝试从这里开始进行许多处理-这就是
消费群体的目的。







本文的作者在事件中心和
工作人员角色之间添加了一个队列(尚不清楚注释是否


所以问题是:我应该直接处理所有处理的东西吗?在事件中心之后(即在 ProcessEventsAsnyc 方法中 IEventProcessor 实现),还是我应该在事件中心和处理对象之间使用队列



任何有关如何正确使用事件中心中事件的建议都会受到赞赏,该文档目前有点...丢失。

解决方案

这属于问题类别,一旦EventProcessorHost的源可用,我的答案就会发生。

简短的答案是您不需要使用队列。但是,我将把ProcessEventsAsync返回一个Task花费的时间相对较短。



这条建议听起来很像第一篇文章,主要区别在于它是返回任务的时间,而不是完成任务的时间。我的假设是在用于EventProcessorHost的线程上调用ProcessEventsAsync用于其他目的。在这种情况下,您需要快速返回,以便其他工作可以继续进行;这项工作可能正在为另一个分区调用ProcessEventsAsync(但是我们不知道如果没有调试,我还没有发现有必要做或在可用的情况下读取代码)。



我通过从ProcessEventsAsync传递整个IEnumerable在每个分区的单独线程上进行处理。这与从IEnumerable中取出所有项目并将其放入队列以供处理线程使用相反。另一个线程在完成对消息的处理后,将完成由ProcessEventsAsync返回的任务。 (我实际上给我的处理线程一个IEnumerable,它通过将块链接在一起并在调用MoveNext时需要时完成Task来隐藏ProcessEventsAsync的详细信息。)



所以在简而言之:在ProcessEventsAsync中,将工作移交给另一个线程,您可能已经躺在那里,知道如何与TPL通信或启动了新任务。



将所有消息放入ProcessEventsAsync内部的队列中并不是很糟糕,这并不是将事件块传递给另一个线程的最有效方法。



如果您决定将事件放入队列中(或在处理代码的下游有一个队列)并完成批处理的任务,则应确保限制代码中未完成的项目数/如果由于流量高峰,EventHub给您的项目提供的速度快于代码无法处理的速度,则可以避免队列耗尽。



Java EventHub用户注意事项 2016-10-27:
引起我注意的是此说明描述了如何调用onEvents,而 onEvents 变慢不会造成悲剧,因为它位于每个分区的线程上,其速度出现会影响下一批的速度收到。因此,对于您的情况而言,取决于您对延迟是否非常快的关注程度对您的情况而言相对重要。


I'm currently developing an application hosted on Azure that uses Azure Event Hub. Basically I'm sending messages (or should I say, events) to the Event Hub from a Web API, and I have two listeners:

  • a Stream Analytics task for real-time analysis
  • a standard worker role that computes some stuff based on the received events and then stores them into an Azure SQL Database (this is a lambda architecture).

I'm currently using the EventProcessorHost library to retrieve my events from the Event Hub inside my worker role.

I'm trying to find some best practices about how to use the Event Hub (it is a bit harder to use Event Hubs than service bus queues, i.e. streaming vs message consuming), and I found some people saying I shouldn't do a lot of processing after retrieving EventData events from my Event Hub.

Specifically :

Keep in mind you want to keep whatever it is you're doing relatively fast - i.e. don't try to do many processes from here - that's what consumer groups are for.

The author of this article added a queue between the Event Hub and the worker role (it's not clear from the comments if it's really required or not).

So the question is: should I do all my processing stuff directly after the Event Hub (i.e. inside the ProcessEventsAsnyc method of my IEventProcessor implementation), or should I use a queue between the Event Hub and the processing stuff?

Any recommendation about how to properly consume events from an Event Hub would be appreciated, the documentation is currently a bit... missing.

解决方案

This falls into the category of question whose answer will be much more obvious once the source for EventProcessorHost is made available, which I've been told is going to happen.

The short answer is that you don't need to use a queue; however, I would keep the time it takes ProcessEventsAsync to return a Task relatively short.

While this advice sounds a lot like that of the first article, the key distinction is that it is the time to returning a Task not the time to Task completion. My assumption has been that ProcessEventsAsync is called on a thread used for the EventProcessorHost for other purposes. In this case you need to return quickly so that the other work can continue; this work might be calling ProcessEventsAsync for another partition (but we won't know without debugging I haven't found it necessary to do or reading the code when available).

I do my processing on a separate thread per partition by passing along the entire IEnumerable from ProcessEventsAsync. This is in contrast to taking all the items out of the IEnumerable and putting them into a Queue for the processing thread to consume. The other thread completes the Task returned by ProcessEventsAsync when it has finished processing the messages. (I actually give my processing thread a single IEnumerable which hides the details of ProcessEventsAsync by chaining the chunks together and completing the Task if needed on call to MoveNext).

So in short: In ProcessEventsAsync hand off the work to another thread, either one you already had lying around that you know how to communicate with or kick off a new Task with the TPL.

Putting all the messages into a Queue inside of ProcessEventsAsync isn't bad it's just not the most efficient way to pass the chunk of events to another thread.

If you decide to put the events into a queue (OR have a queue downstream in your processing code) and complete the task for the batch, you should make sure you limit the number of items you have outstanding in your code/queue to avoid running out of memory in the case where the EventHub is giving you items faster than your code can process them due to a traffic spike.

Note for Java EventHub Users 2016-10-27: Since this came to my attention there's this description describing how onEvents is called, while onEvents being slow won't be tragic since it's on a thread per partition, its speed appears to affect the speed with which the next batch is received. Thus depending on how much you care about the latency being quite fast here could be relatively important for your scenario.

这篇关于从Azure事件中心获取事件后,是否应该将事件放入队列中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆