使用 Google Cloud Run 进行 Stackdriver Trace [英] Stackdriver Trace with Google Cloud Run

查看:27
本文介绍了使用 Google Cloud Run 进行 Stackdriver Trace的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在深入研究 Google Cloud Run 上的 Stackdriver Trace 集成.我可以让它与代理一起工作,但有几个问题困扰着我.

鉴于

  • Stackdriver 代理将跟踪汇总到一个小缓冲区中并定期发送.
  • 当 Cloud Run 服务未处理请求时,CPU 访问会受到限制.
  • Cloud Run 服务没有关闭挂钩;你不能在关闭前清除缓冲区:容器只会得到一个 SIGKILL.这是您无法从应用程序中捕捉到的信号.
  • 在请求-响应循环之外运行发送信息的后台进程似乎违反了Knative 容器运行时合约
  • 记录数据的集合记录在案 并且不需要我运行代理,但没有这样的遥测解决方案.
  • 我发现一份报告有人在 Cloud Run 上丢失了踪迹使用基于代理的方法

谷歌是如何做到的

我查看了 Cloud Endpoints ESP 的源代码(Cloud Run 集成处于测试阶段),看看他们是否以不同的方式解决它,但使用了相同的模式:有一个带有痕迹的缓冲区(1s) 并定期清除.

问题

虽然我的跟踪集成似乎在我的测试设置中工作,但我担心在生产环境中运行它时不完整和丢失的跟踪.

  • 这是一个假设的问题还是一个真实的问题?

  • 看起来解决这个问题的正确方法是将遥测数据写入日志,而不是使用代理进程.Stackdriver Trace 支持吗?

解决方案

Cloud Run 现在支持发送 SIGTERM.如果您的应用程序处理 SIGTERM,它将在关闭前获得 10 秒的宽限时间.

你可以用这 10 秒来:

  • 刷新有未发送数据的缓冲区
  • 关闭与其他系统的连接

文档:容器运行时合同>

I have been diving into a Stackdriver Trace integration on Google Cloud Run. I can get it to work with the agent, but I am bothered by a few questions.

Given that

  • The Stackdriver agent aggregates traces in a small buffer and sends them periodically.
  • CPU access is restricted when a Cloud Run service is not handling a request.
  • There is no shutdown hook for Cloud Run services; you can't clear the buffer before shutdown: the container just gets a SIGKILL. This is a signal you can't catch from your application.
  • Running a background process that sends information outside of the request-response cycle seems to violate the Knative Container Runtime contract
  • The collections of logging data is documented and does not require me to run an agent, but there is no such solution for telemetry.
  • I found one report of someone experiencing lost traces on Cloud Run using the agent-based approach

How Google does it

I went into the source code for the Cloud Endpoints ESP, (the Cloud Run integration is in beta) to see if they solve it in a different way, but there the same pattern is used: there is a buffer with traces (1s) and it is cleared periodically.

Question

While my tracing integration seems to work in my test setup, I am worried about incomplete and missing traces when I run this in a production environment.

  • Is this a hypothetical problem or a real issue?

  • It looks like the right way to approach this is to write telemetry to logs, instead of using an agent process. Is that supported with Stackdriver Trace?

解决方案

Cloud Run now supports sending SIGTERM. If your application handles SIGTERM it'll get 10 seconds grace time before shutdown.

You can use the 10 seconds to:

  • Flush buffers that have unsent data
  • Close connections to other systems

Docs: Container runtime contract

这篇关于使用 Google Cloud Run 进行 Stackdriver Trace的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆