如何考虑分布式系统中的时钟偏移? [英] How to account for clock offsets in a distributed system?

查看:216
本文介绍了如何考虑分布式系统中的时钟偏移?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个由几个分布式服务组成的系统,每个服务都在不断生成事件并将这些事件报告给中央服务。

I have a system consisting of several distributed services, each of which is continuously generating events and reporting these to a central service.

我需要呈现事件的统一时间轴,其中时间轴中的顺序与事件发生的时间相对应。事件发生的频率和网络延迟使得我不能简单地使用到达中央收集器的时间来订购事件。

I need to present a unified timeline of the events, where the ordering in the timeline corresponds to the moment event occurred. The frequency of event occurrence and the network latency is such that I cannot simply use time of arrival at the central collector to order the events.

例如在以下情况下:

E1需要渲染在E2上方的时间轴中,尽管事后到达收集器,这意味着事件需要与时间戳记元数据一起出现。这就是问题所在。

E1 needs to be rendered in the timeline above E2, despite arriving at the collector afterwards, which means the events need to come with timestamp metadata. This is where the problem arises.

由于环境设置的限制,无法确保每台机器上的本地时间服务可靠地知道当前UTC时间。我可以假设每台机器都可以准确地测量相对时间,即时钟速度足够接近,可以使短时间间隔的测量相同,但是诸如NTP配置错误/分区之类的问题使我们无法保证每台机器都同意

Due to constraints on how the environment is set up, it is not possible to ensure that the local time services on each machine are reliably aware of current UTC time. I can assume that each machine can accurately gauge relative time, i.e. the clock speeds are close enough to make measurement of short timespans identical, but problems like NTP misconfiguration/partitioning make it impossible to guarantee that every machine agrees on the current UTC time.

这意味着只为每个事件生成本地时间戳,然后使用事件排序事件的天真方法将不起作用:

This means that a naive approach of simply generating a local timestamp for each event as it occurs, then ordering events using that will not work: every machine has its own opinion of what universal time is.

所以问题是:如何恢复时钟不一致的分布式系统中生成的事件的顺序? ?

So the question is: how can I recover an ordering for events generated in a distributed system where the clocks do not agree?

我在网上找到的大多数解决方案都是尝试同步所有时钟,这对我来说是不可能的,因为:

Most solutions I find online go down the path of trying to synchronize all the clocks, which is not possible for me since:


  • 我不愿意ntrol有问题的机器

  • 首先时钟不同步的原因是由于网络不稳定,我无法解决

我自己的想法是每次生成事件时都要查询某种中央时间服务,然后用检索到的时间减去网络飞行时间来标记该事件。这很麻烦,因为我必须向系统中添加另一项服务并确保其可用性(如果其他服务无法达到此要求,我将回到零平方)。我希望有一些聪明的方法可以做到这一点,而无需我以这种方式集中计时。

My own idea was to query some kind of central time service every time an event is generated, then stamp that event with the retrieved time minus network flight time. This gets hairy, because I have to add another service to the system and ensure its availability (I'm back to square zero if the other services can't reach this one). I was hoping there is some clever way to do this that doesn't require me to centralize timekeeping in this way.

推荐答案

A一个简单的解决方案是定期ping我称之为时间源服务器的软件,该软件最终受到您自己的启发。在ping中包括服务的芯片时钟;时间源回显并包含其时间戳。然后,该服务可以推断出往返时间,并猜测时源的时钟大约在 往返时间/ 2纳秒之前。然后,您可以使用它作为本地芯片时钟的偏移量来确定全局时间。

A simple solution, somewhat inspired by your own at the end, is to periodically ping what I'll call the time-source server. In the ping include the service's chip clock; the time-source echos that and includes its timestamp. The service can then deduce the round-trip-time and guess that the time-source's clock was at the timestamp roughly round-trip-time/2 nanoseconds ago. You can then use this as an offset to the local chip clock to determine a globalish time.

您不必为此使用其他服务; Collector服务器将执行此操作。重要的是,您不必在每次请求时都要求调用时间源服务器;

You don't have to use a different service for this; the Collector server will do. The important part is that you don't have to ask call the time-source server at every request; it removes it from the critical path.

如果您暂时不希望使用锯齿功能,则可以平滑时差

If you don't want a sawtooth function for the time, you can smooth the time difference

恭喜,您已经重建了NTP!

Congratulations, you've rebuilt NTP!

这篇关于如何考虑分布式系统中的时钟偏移?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆