高吞吐量发送到EventHubs导致MessagingException/TimeoutException/服务器无法处理请求错误 [英] High throughput send to EventHubs resulting into MessagingException / TimeoutException / Server was unable to process the request errors

查看:145
本文介绍了高吞吐量发送到EventHubs导致MessagingException/TimeoutException/服务器无法处理请求错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们遇到了很多这样的异常,它们在高峰流量期间向EventHubs发送事件:

We are experiencing lots of these exceptions sending events to EventHubs during peak traffic:

无法将事件发送到EventHub.异常:Microsoft.ServiceBus.Messaging.MessagingException:服务器无法处理请求;请重试该操作.如果问题仍然存在,请与服务总线管理员联系并提供跟踪ID." 或者 无法将事件发送到EventHub.异常:System.TimeoutException:操作未在分配的时间内完成"

"Failed to send event to EventHub. Exception : Microsoft.ServiceBus.Messaging.MessagingException: The server was unable to process the request; please retry the operation. If the problem persists, please contact your Service Bus administrator and provide the tracking id." or "Failed to send event to EventHub. Exception : System.TimeoutException: The operation did not complete within the allocated time "

您可以在这里清楚地看到它:

You can see it clearly here:

如您所见,当传入消息超过400K事件/小时(或〜270 MB/小时)时,我们收到许多内部错误,服务器繁忙错误,请求失败.这不仅是暂时的问题.显然,这与吞吐量有关.

As you can see, we got lots of Internal Errors, Server Busy Errors, Failed Request when Incoming messages are over 400K events/hour (or ~270 MB/hour). This is not just a transient issue. It's clearly related to throughput.

我们的EH具有32个分区,邮件保留7天,并分配了5个吞吐量单位. OperationTimeout设置为5分钟,我们使用默认的RetryPolicy.

Our EH has 32 partitions, message retention of 7 days, and 5 throughput units assigned. OperationTimeout is set to 5 mins, and we are using the default RetryPolicy.

还有什么需要我们调整的吗?我们真的很担心EH的可扩展性.

Is it anything we still need to tweak here? We are really concerned about the scalability of EH.

谢谢

推荐答案

使用有效的分区分配策略可以实现发送吞吐量的调整.没有任何一个旋钮可以做到这一点.以下是您需要为高通量方案设计的基本信息.

Send throughput tuning can be achieved using efficient partition distribution strategies. There isn't any single knob which can do this. Below is the basic information you will need to be able to design for High-Thruput Scenarios.

1)让我们从命名空间开始:吞吐量单元(也称为TU)是在命名空间级别配置的.请请记住,将应用已配置的TU-该命名空间下所有EventHub的汇总.如果您的命名空间上有5个TU且其下有5个eventhub,则它将在所有5个eventhub中分配.

1) Lets start from the Namespace: Throughput Units(aka TUs) are configured at Namespace level. Pls. bear in mind, that, TUs configured is applied - aggregate of all EventHubs under that Namespace. If you have 5 TUs on your Namespace and 5 eventhubs under it - it will be divided among all 5 eventhubs.

2)现在让我们看一下EventHub级别:如果EventHub分配了5个TU,并且有32个分区-没有一个分区可以使用全部5个TU.对于前.如果您尝试将5TU的数据发送到1个分区,将零"发送到所有其他31个分区-这是不可能的.每个分区最多应计划1 TU.通常,您需要确保数据在所有分区之间均匀分布. EventHubs支持3种类型的发送-为用户提供对分区分配的不同级别的控制:

2) Now lets look at EventHub level: If the EventHub is allocated with 5 TUs and it has 32 partitions - No single partition can use all 5 TUs. For ex. if you are trying to send 5TU of data to 1 partition and 'Zero' to all other 31 partitions - this is not possible. Maximum you should plan per Partition is 1 TU. In general, you will need to ensure that the data is distributed evenly across all partitions. EventHubs support 3 types of sends - which gives users different level of control on Partition distribution:

  1. EventHubClient.Send(EventDataWithoutPartitionKey)->如果您正在使用此API发送-eventhub将负责在所有分区之间平均分配数据. EventHubs服务网关会将数据循环到所有分区.当特定分区关闭时-网关会自动检测并确保客户端看不到任何影响. 这是最推荐的发送到EventHubs的方式.
  2. EventHubClient.Send(EventDataWithPartitionKey)->如果您使用此API发送给EventHubs,则partitionKey将确定数据的分布. PartitionKey用于将EventData散列到适当的分区(散列是Microsoft专有的而不是Shared).通常,需要关联一组消息的用户将使用此发送形式.
  3. EventHubSender.Send(EventData)->在此变体中,发件人已附加到分区上.因此,这完全控制了跨分区到客户端的分发.

要衡量您当前的数据分布,请使用

To measure your present distribution of Data - use EventHubClient.GetPartitionRuntimeInfo Api to estimate which Partition is overloaded. The difference b/w BeginSequenceNumber and LastEnqueuedSequenceNumber is supposed to give an estimate of that partitions load compared to others.

3)最后但并非最不重要的一点-您可以使用SendBatch API在发送操作级别调整性能(而不是吞吐量). 1个TU最多可以购买1000 msgs/sec或1MBPS-您将受到限制先达到的限制-这是无法更改的. 如果您的消息很小-假设100个字节,并且您只能发送1000 msgs/sec(按照TU限制),则您将首先达到1000个事件/sec的限制.但是,总体上使用 SendBatch API -您可以批量说出100byte msgs中的10个并以相同的速率推送-仅需100次API调用即可达到1000 msgs/sec,并改善了系统的端到端延迟(因为它有助于服务也有效地保存消息).请记住,这里的唯一限制是最大.可以发送的消息大小-256 kb(如果使用SendBatch API,此限制将应用于您的BatchSize).

3) Last but not the least - you can tune performance (not Throughput) at send operation level - using the SendBatch API. 1 TU can buy a Max of 1000 msgs/sec or 1MBPS - you will be throttled with whichever limit hits first - this cannot be changed. If your messages are small - lets say 100 bytes and you can send only 1000 msgs/sec (as per the TU limit) - you will first hit the 1000 events/sec limit. However, overall using SendBatch API - you can batch lets say 10 of 100byte msgs and push at the same rate - 1000 msgs/sec with just 100 API calls and improve the end-to-end latency of the system (as it helps service also to persist messages efficiently). Remember, the only limitation here is the Max. Msg Size that can be sent - which is 256 kb (this limit will apply on your BatchSize if you use SendBatch API).

以您的情况为背景: -具有32个分区和5个TU-我真的要仔细检查分区分配策略.

Given that background, in your case: - Having 32 partitions and 5 TUs - I would really double-check the Partition distribution strategy.

这里是事件中心的一些更一般的阅读材料.

这篇关于高吞吐量发送到EventHubs导致MessagingException/TimeoutException/服务器无法处理请求错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆