Google Cloud Run的Stackdriver Trace [英] Stackdriver Trace with Google Cloud Run

查看:91
本文介绍了Google Cloud Run的Stackdriver Trace的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在研究Google Cloud Run上的Stackdriver Trace集成.我可以让它与代理一起工作,但是我受到一些问题的困扰.

I have been diving into a Stackdriver Trace integration on Google Cloud Run. I can get it to work with the agent, but I am bothered by a few questions.

  • Stackdriver代理将跟踪聚集在一个小的缓冲区中,并定期发送它们.
  • 当Cloud Run服务未处理请求时,
  • CPU访问受到限制.
  • Cloud Run服务没有关闭挂钩;您无法在关机前清除缓冲区:容器仅获得 SIGKILL .这是您无法从应用程序中捕获的信号.
  • 运行在请求-响应周期之外发送信息的后台进程似乎违反了记录,并且不需要我运行代理,但是遥测没有这种解决方案.
  • 我发现一份报告有人在Cloud Run上丢失了踪迹使用基于代理的方法
  • The Stackdriver agent aggregates traces in a small buffer and sends them periodically.
  • CPU access is restricted when a Cloud Run service is not handling a request.
  • There is no shutdown hook for Cloud Run services; you can't clear the buffer before shutdown: the container just gets a SIGKILL. This is a signal you can't catch from your application.
  • Running a background process that sends information outside of the request-response cycle seems to violate the Knative Container Runtime contract
  • The collections of logging data is documented and does not require me to run an agent, but there is no such solution for telemetry.
  • I found one report of someone experiencing lost traces on Cloud Run using the agent-based approach

我进入了Cloud Endpoints ESP的源代码(Cloud Run集成在beta中),看看他们是否以不同的方式解决它,但是使用了相同的模式:存在一个带有跟踪的缓冲区( 1s),并定期清除.

I went into the source code for the Cloud Endpoints ESP, (the Cloud Run integration is in beta) to see if they solve it in a different way, but there the same pattern is used: there is a buffer with traces (1s) and it is cleared periodically.

虽然我的跟踪集成似乎可以在我的测试设置中运行,但是当我在生产环境中运行跟踪跟踪时,我担心跟踪不完整和丢失.

While my tracing integration seems to work in my test setup, I am worried about incomplete and missing traces when I run this in a production environment.

  • 这是假设的问题还是真实的问题?

  • Is this a hypothetical problem or a real issue?

看来解决此问题的正确方法是将遥测写入日志,而不是使用代理进程. Stackdriver Trace支持吗?

It looks like the right way to approach this is to write telemetry to logs, instead of using an agent process. Is that supported with Stackdriver Trace?

推荐答案

您是对的.这是一个令人担忧的问题,因为大多数跟踪库都倾向于在后台采样/上传跟踪范围.

You're right. This is a fair concern since most tracing libraries tend to sample/upload trace spans in the background.

由于(1)当容器不处理任何请求时,CPU几乎被缩放为几乎为零;(2)容器实例由于不活动而可以随时被杀死,因此您无法可靠地上载收集到的跟踪范围应用程序.正如您所说,由于我们没有完全停止CPU,所以有时它可能会起作用,但它并不总是起作用.

Since (1) your CPU is nearly scaled nearly to zero when the container isn't handling any requests and (2) the container instance can be killed any time due to inactivity, you cannot reliably upload those trace spans collected in your app. As you said, it may sometimes work since we don't fully stop CPU, but it won't always work.

它看起来像某些Stackdriver(和/或OpenTelemetry f.k.a. OpenCensus)库可让您控制推送跟踪范围的生命周期.

It appears like some of the Stackdriver (and/or OpenTelemetry f.k.a. OpenCensus) libraries let you control the lifecycle of pushing trace spans.

例如,用于OpenCensus Stackdriver导出程序的Go程序包具有一个Flush()方法,您可以在完成请求之前调用该方法,而不必依赖运行时来定期上载跟踪范围:

For example, this Go package for OpenCensus Stackdriver exporter has a Flush() method that you can call before completing your request rather than relying on the runtime to periodically upload the trace spans: https://godoc.org/contrib.go.opencensus.io/exporter/stackdriver#Exporter.Flush

我假设其他语言的其他跟踪库也公开了类似的Flush()方法,如果没有,请在注释中告知我,这将是对这些库的有效功能要求.

I assume other tracing libraries in other languages also expose similar Flush() methods, if not, please let me know in the comments and this would be a valid feature request to those libraries.

这篇关于Google Cloud Run的Stackdriver Trace的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆