如何调查Openshift中的延迟峰值 [英] How to investigate latency spikes in Openshift

查看:49
本文介绍了如何调查Openshift中的延迟峰值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们的Openshift集群中经常出现延迟.

我们如何(除了安装Istio之外-正在安装中)如何测量这些延迟以获取更多信息?

有没有出于这种目的而存在的头盔图?

这是我们的加特林测试的结果:

解决方案

测量延迟需要进行分布式跟踪,而DT需要将一些行添加到代码中.实际上,即使要使用Istio,也需要在代码中添加一些行(如果要使用分布式跟踪).这就是为什么您可能永远也不会为此找到Helm图表的原因.

可行的方法是通过 OpentracingAPI (现在是 Opentelemetry )收集数据,并发送到DT后端,例如 Jaeger Zipkin .

关于修改代码,随着API的工作,您将手动启动跟踪对象,并向其添加跨度,这是您要衡量的一项工作.因此,您可以在任何需要的地方 start_span stop_span .您可能在一项服务中有多个跨度,或者只有一个.为了使其他服务将其范围添加到同一跟踪对象,您可以将 context 从一个服务传递到另一个服务.

与Istio有所不同.您不会开始或停止跨度.但是您的跨度将是服务.您将由第一个代理创建的一些标头从一个服务传递到另一个服务,然后Istio将为每个服务执行 start_span stop_span .因此,使用Istio,每个服务不能有多个跨度,而只有一个跨度.

因此,OpentracingAPI难以实现,但是您可以完全控制要测量的内容,Istio易于实现,但有一些限制.

现在,您通常不需要一项服务中的跨度即可.由于这些是微服务,因此它们不会做很多事情.但是最大的限制是,您无法测量与Istio的数据库连接,因为这些标头不是由代码处理的,而是只有一个数据库,因此您需要Envoy代理来支持对特定数据库的跟踪.>

We have recurring latencies in our Openshift cluster.

How can we (besides installing Istio - which is on the way) measure these latencies to get more information?

Is there some helmchart out there that exists for such a purpose?

Here is a result from our Gatling test:

解决方案

Measuring latency requires Distributed Tracing, and DT requires some lines to be added to your code. In fact, even with Istio you need to add some lines to your code, if you want Distributed Tracing. That is why you probably never wll find a Helm chart for that.

The way to go would be to collect the data through OpentracingAPI (now Opentelemetry), and send to some DT backend, like Jaeger or Zipkin.

About modifying your code, As the API works, you would manually start a trace object, and add spans to it, which is an individual work you want to measure. So you would start_span and stop_span wherever you want. You might have several spans in one service, or just one. In order for the other services to add their spans to the same trace object, you would pass a context from one service to another.

With Istio it is a little different. You don't start or stop a span. But your spans will be the services. You would pass some headers, created by the first proxy, from one service to another, and Istio will do the start_span and stop_span for each service. So, with Istio, you can't have several spans per service, but only one.

So, OpentracingAPI is way harder to implement, but you have a complete control over what are you measuring, and Istio is easier to implement, but with some limitations.

Now, you usually don't need more then one span in a service. Since these are microservice, they don't do many things. But the biggest limitation is that you can't measure the database connections with Istio, as these headers are not being handled by a code, but there is just a database, so you need Envoy proxies to support tracing for a specific databases.

这篇关于如何调查Openshift中的延迟峰值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆