如何调查Openshift中的延迟峰值 [英] How to investigate latency spikes in Openshift
问题描述
我们的Openshift集群中经常出现延迟.
我们如何(除了安装Istio之外-正在安装中)如何测量这些延迟以获取更多信息?
有没有出于这种目的而存在的头盔图?
这是我们的加特林测试的结果:
测量延迟需要进行分布式跟踪,而DT需要将一些行添加到代码中.实际上,即使要使用Istio,也需要在代码中添加一些行(如果要使用分布式跟踪).这就是为什么您可能永远也不会为此找到Helm图表的原因.
可行的方法是通过 OpentracingAPI
(现在是 Opentelemetry
)收集数据,并发送到DT后端,例如 Jaeger
或 Zipkin
.
关于修改代码,随着API的工作,您将手动启动跟踪对象,并向其添加跨度,这是您要衡量的一项工作.因此,您可以在任何需要的地方 start_span
和 stop_span
.您可能在一项服务中有多个跨度,或者只有一个.为了使其他服务将其范围添加到同一跟踪对象,您可以将 context
从一个服务传递到另一个服务.
与Istio有所不同.您不会开始或停止跨度.但是您的跨度将是服务.您将由第一个代理创建的一些标头从一个服务传递到另一个服务,然后Istio将为每个服务执行 start_span
和 stop_span
.因此,使用Istio,每个服务不能有多个跨度,而只有一个跨度.
因此,OpentracingAPI难以实现,但是您可以完全控制要测量的内容,Istio易于实现,但有一些限制.
现在,您通常不需要一项服务中的跨度即可.由于这些是微服务,因此它们不会做很多事情.但是最大的限制是,您无法测量与Istio的数据库连接,因为这些标头不是由代码处理的,而是只有一个数据库,因此您需要Envoy代理来支持对特定数据库的跟踪.>
We have recurring latencies in our Openshift cluster.
How can we (besides installing Istio - which is on the way) measure these latencies to get more information?
Is there some helmchart out there that exists for such a purpose?
Here is a result from our Gatling test:
Measuring latency requires Distributed Tracing, and DT requires some lines to be added to your code. In fact, even with Istio you need to add some lines to your code, if you want Distributed Tracing. That is why you probably never wll find a Helm chart for that.
The way to go would be to collect the data through OpentracingAPI
(now Opentelemetry
), and send to some DT backend, like Jaeger
or Zipkin
.
About modifying your code, As the API works, you would manually start a trace object, and add spans to it, which is an individual work you want to measure. So you would start_span
and stop_span
wherever you want. You might have several spans in one service, or just one. In order for the other services to add their spans to the same trace object, you would pass a context
from one service to another.
With Istio it is a little different. You don't start or stop a span. But your spans will be the services. You would pass some headers, created by the first proxy, from one service to another, and Istio will do the start_span
and stop_span
for each service. So, with Istio, you can't have several spans per service, but only one.
So, OpentracingAPI is way harder to implement, but you have a complete control over what are you measuring, and Istio is easier to implement, but with some limitations.
Now, you usually don't need more then one span in a service. Since these are microservice, they don't do many things. But the biggest limitation is that you can't measure the database connections with Istio, as these headers are not being handled by a code, but there is just a database, so you need Envoy proxies to support tracing for a specific databases.
这篇关于如何调查Openshift中的延迟峰值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!