如何在apache梁和数据流中设置logback MDC? [英] How to set up logback MDC in apache beam and dataflow?

查看:26
本文介绍了如何在apache梁和数据流中设置logback MDC?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用 apache beam 并想设置 logback MDC.logback MDC 是一个很好的资源,当你有一个请求进来并且你存储了一个 userId(在我们的例子中,它是 custId、fileId、requestId),然后每当开发人员登录时,它就会神奇地将该信息标记到开发人员日志中.开发人员不再忘记添加他添加的每个日志语句.

We are using apache beam and would like to setup the logback MDC. logback MDC is a great GREAT resource when you have a request come in and you store let's say a userId (in our case, it's custId, fileId, requestId), then anytime a developer logs, it magically stamps that information on to the developers log. the developer no longer forgets to add it every log statement he adds.

我开始进行端到端集成类型测试,在我们的微服务中嵌入 apache beam direct runner 进行测试(在生产中,微服务调用数据流).目前,我看到 MDC 在调用 expand() 方法之前一直很好.一旦调用了 processElement 方法,上下文当然就消失了,因为我在另一个线程中.

I am starting in an end to end integration type test with apache beam direct runner embedded in our microservice for testing (in production, the microservice calls dataflow). currently, I am see that the MDC is good up until after the expand() methods are called. Once the processElement methods are called, the context is of course gone since I am in another thread.

所以,首先尝试修复这件作品.我应该把这个上下文放在哪里,以便我可以在这个线程的开头恢复它.

So, trying to fix this piece first. Where should I put this context such that I can restore it at the beginning of this thread.

举个例子,如果我有一个 Executor.execute(runnable),那么我只需像这样使用该 runnable 传输上下文

As an example, if I have an Executor.execute(runnable), then I simply transfer context using that runnable like so

    public class MDCContextRunnable implements Runnable {
    private final Map<String, String> mdcSnapshot;
    private Runnable runnable;

    public MDCContextRunnable(Runnable runnable) {
        this.runnable = runnable;
        mdcSnapshot = MDC.getCopyOfContextMap();
    }


    @Override
    public void run() {
        try {
            MDC.setContextMap(mdcSnapshot);

            runnable.run();
            
        } Catch {
            //Must log errors before mdc is cleared
            log.error("message", e);.  /// Logs error and MDC
        } finally {
            MDC.clear();
        }

    }
}

所以我基本上需要对 apache beam 做同样的事情.我需要

so I need to do the same with apache beam basically. I need to

  1. 有一点要抓住 MDC
  2. 有一点要恢复 MDC
  3. 有一点需要清除 MDC 以防止它泄漏到另一个请求中(以防万一我错过了似乎不时发生的事情)

关于如何做到这一点的任何想法?

Any ideas on how to do this?

哦,如果框架记录任何异常时 MDC 可以在那里,则加分!!!!(即,理想情况下,框架应该为您执行此操作,但 apache beam 似乎没有执行此操作.大多数 Web 框架都内置了此功能).

oh, bonus points if it the MDC can be there when any exceptions are logged by the framework!!!! (ie. ideally, frameworks are supposed to do this for you but apache beam seems like it is not doing this. Most web frameworks have this built in).

谢谢,院长

推荐答案

根据您提供的上下文和示例,您似乎希望使用 MDC 自动为您自己的 DoFns 捕获更多信息.你最好的选择是,根据你需要上下文可用的生命周期,使用 StartBundle/FinishBundleSetup/<您的 DoFns 上的 code>Teardown 方法以创建您的 MDC 上下文(有关差异的解释,请参阅此答案两者之间).重要的是,这些方法是针对 DoFn 的每个实例执行的,这意味着它们将在为执行这些 DoFn 而创建的新线程上被调用.

Based on the context and examples you gave, it sounds like you want to use MDC to automatically capture more information for your own DoFns. Your best bet for this is, depending on the lifetime you need your context available for, to use either the StartBundle/FinishBundle or Setup/Teardown methods on your DoFns to create your MDC context (see this answer for an explanation of the differences between the two). The important thing is that these methods are executed for each instance of a DoFn, meaning they will be called on the new threads created to execute these DoFns.

我应该解释一下这里发生了什么,以及这种方法与您最初的目标有何不同.Apache Beam 的执行方式是您编写的管道在您自己的机器上执行并执行管道构建(这是所有扩展调用发生的地方).但是,一旦构建了管道,它就会被发送到通常在单独的应用程序上执行的运行器,除非它是 Direct Runner,然后运行器要么直接执行您的用户代码,要么在 docker 环境中运行它.

I should explain what's happening here and how this approach differs from your original goal. The way Apache Beam executes is that your written pipeline executes on your own machine and performs pipeline construction (which is where all the expand calls are occurring). However, once a pipeline is constructed, it is sent to a runner which is often executing on a separate application unless it's the Direct Runner, and then the runner either directly executes your user code or runs it in a docker environment.

在您的原始方法中,您可以成功地将 MDC 应用到所有日志直到执行开始,这是有意义的,因为执行可能不仅发生在不同的线程中,而且还可能发生在不同的应用程序或机器中.但是,上述方法是作为用户代码的一部分执行的,因此在那里设置 MDC 将允许它在执行转换的任何线程/应用程序/机器上运行.

In your original approach it makes sense that you would successfully apply MDC to all logs until execution begins, because execution might not only be occurring in a different thread, but potentially also a different application or machine. However, the methods described above are executed as part of your user code, so setting up your MDC there will allow it to function on whatever thread/application/machine is executing transforms.

请记住,每个 DoFn 都会调用这些方法,并且每个线程通常会有多个 DoFn,根据 MDC 的工作方式,您可能需要警惕这一点.

Just keep in mind that those methods get called for every DoFn and you will often have mutiple DoFns per thread, which is something you may need to be wary of depending on how MDC works.

这篇关于如何在apache梁和数据流中设置logback MDC?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆