Service Fabric资源平衡器使用过时的报告负载 [英] Service Fabric Resource balancer uses stale Reported load

查看:88
本文介绍了Service Fabric资源平衡器使用过时的报告负载的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在研究Service Fabric上的资源平衡器和动态负载指标时,我们遇到了一些问题(正在运行devbox SDK GA 2.0.135).
在Service Fabric资源管理器(门户和独立应用程序)中,我们可以看到平衡运行得非常频繁,大多数情况下,它几乎是立即完成的,并且每秒发生一次.在查看节点或分区上的负载指标信息"时,它不会在报告负载时更新值.

While looking into the resource balancer and dynamic load metrics on Service Fabric, we ran into some questions (Running devbox SDK GA 2.0.135).
In the Service Fabric Explorer (the portal and the standalone application) we can see that the balancing is ran very often, most of the time it is done almost instantly and this happens every second. While looking at the Load Metric Information on the nodes or partitions it is not updating the values as we report load.

我们基于交互(向服务的HTTP请求)发送动态负载报告,从而使单个分区的报告负载数据大量增加.在5分钟内某处可见该峰值,此时平衡器实际上开始平衡.这似乎是刷新加载数据的时间间隔. 上次报告的时间会一直更新,但没有新值.

We send a dynamic load report based on our interaction (a HTTP request to a service), increasing the reported load data of a single partition by a large amount. This spike becomes visible somewhere in 5 minutes at which point the balancer actually starts balancing. This seems to be an interval in which the load data gets refreshed. The last reported time gets updated all the time but without the new value.

我们将指标添加到了applicationmanifest和clustermanifest中,以确保将其用于平衡中. 这意味着资源平衡器在5分钟内使用相同的数据.这是可配置的设置吗?是否因为它在devbox上运行而受到约束? 我们在clustermanifest中尝试了很多变量,但是似乎没有一个变量会影响刷新时间.

We added the metrics to applicationmanifest and the clustermanifest to make sure it gets used in the balancing. This means the resource balancer uses the same data for 5 minutes. Is this a configurable setting? Is it constraint because it is running on a devbox? We tried a lot of variables in the clustermanifest but none seem to be affecting this refreshtime.

如果这不能适应,有人可以解释为什么您要使用过时的数据运行平衡器吗?为什么选择这5分钟间隔?

If this is not adaptable, can someone explain why would you run the balancer with stale data? and why this 5 minute interval was chosen?

推荐答案

这确实是可配置的设置,默认值为5分钟.其背后的想法是,在生产过程中,您始终拥有大量的报告副本,所有报告负载始终存在,因此您希望对其进行批处理,以免将所有这些副本作为独立消息发送给集群资源管理器.

This is indeed a configurable setting, and the default is 5 minutes. The idea behind it is that in prod you have tons of replicas all reporting load all the time, and so you want to batch them up so you don't spam the Cluster Resource Manager with all those as independent messages.

您可能是对的,因为对于本地开发来说,此值太长了.我们将研究更改本地集群的时间,但是与此同时,您可以将以下内容添加到本地集群清单中,以更改默认情况下我们等待的时间.如果那里已经有其他设置,只需添加SendLoadReportInterval行.该值以秒为单位,您可以相应地进行调整.下面将默认的负载报告间隔从5分钟(300秒)更改为1分钟(60秒).

You're probably right in that this value is way too long for local development. We'll look into changing that for the local clusters, but in the meantime you can add the following to your local cluster manifest to change the amount of time we wait by default. If there are other settings already in there, just add the SendLoadReportInterval line. The value is in seconds and you can adjust it accordingly. The below would change the default load reporting interval from 5 minutes (300 seconds) to 1 minute (60 seconds).

    <Section Name="ReconfigurationAgent">
        <Parameter Name="SendLoadReportInterval" Value="60" />
    </Section>

请注意,这样做确实会增加某些系统服务(TANSTAAFL)的负载,并且像往常一样,如果您在已生成或完整的集群清单上运行,请确保在部署它之前先执行Test-ServiceFabricClusterManifest.如果您正在使用本地开发集群,则部署它的最简单方法可能就是修改集群清单模板(默认情况下,位于此处:"C:\ Program Files \ Microsoft SDKs \ Service Fabric \ ClusterSetup \ NonSecure \ ClusterManifestTemplate. xml"),然后添加该行,然后右键单击系统托盘中的Service Fabric本地群集管理器,然后选择重置本地群集".这将通过您对模板的更改来重新生成本地集群.

Please note that doing so does increase load on some of the system services (TANSTAAFL), and as always if you're operating on a generated or complete cluster manifest be sure to Test-ServiceFabricClusterManifest before deploying it. If you're working with a local development cluster the easiest way to get it deployed is probably just to modify the cluster manifest template (by default here: "C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\NonSecure\ClusterManifestTemplate.xml") and just add the line, then right click on the Service Fabric Local Cluster Manager in your system tray and select "Reset Local Cluster". This will regenerate the local cluster with your changes to the template.

这篇关于Service Fabric资源平衡器使用过时的报告负载的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆