如何在Apache Beam和数据流中设置Logback MDC? [英] How to set up logback MDC in apache beam and dataflow?

查看:83
本文介绍了如何在Apache Beam和数据流中设置Logback MDC?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用apache beam,并希望设置回发MDC.logback MDC是一个很棒的很棒的资源,当您有一个请求进入并存储一个userId(在我们的例子中,它是custId,fileId,requestId)时,那么无论何时开发人员登录,它都会神奇地将这些信息标记到开发人员日志上.开发人员不再忘记添加他添加的每个日志语句.

We are using apache beam and would like to setup the logback MDC. logback MDC is a great GREAT resource when you have a request come in and you store let's say a userId (in our case, it's custId, fileId, requestId), then anytime a developer logs, it magically stamps that information on to the developers log. the developer no longer forgets to add it every log statement he adds.

我将开始进行端到端集成类型测试,并在我们的微服务中嵌入apache Beam DirectRunner进行测试(在生产中,微服务称为数据流).目前,我知道直到调用expand()方法之后,MDC才是好的.一旦调用processElement方法,由于我在另一个线程中,因此上下文当然就消失了.

I am starting in an end to end integration type test with apache beam direct runner embedded in our microservice for testing (in production, the microservice calls dataflow). currently, I am see that the MDC is good up until after the expand() methods are called. Once the processElement methods are called, the context is of course gone since I am in another thread.

因此,请先尝试修复此问题.我应该将该上下文放在哪里,以便可以在该线程的开头还原它.

So, trying to fix this piece first. Where should I put this context such that I can restore it at the beginning of this thread.

作为一个例子,如果我有一个Executor.execute(runnable),那么我像这样简单地使用该Runnable传输上下文

As an example, if I have an Executor.execute(runnable), then I simply transfer context using that runnable like so

    public class MDCContextRunnable implements Runnable {
    private final Map<String, String> mdcSnapshot;
    private Runnable runnable;

    public MDCContextRunnable(Runnable runnable) {
        this.runnable = runnable;
        mdcSnapshot = MDC.getCopyOfContextMap();
    }


    @Override
    public void run() {
        try {
            MDC.setContextMap(mdcSnapshot);

            runnable.run();
            
        } Catch {
            //Must log errors before mdc is cleared
            log.error("message", e);.  /// Logs error and MDC
        } finally {
            MDC.clear();
        }

    }
}

所以我基本上需要对Apache Beam做同样的事情.我需要

so I need to do the same with apache beam basically. I need to

  1. 有一点要捕获MDC
  2. 有一个还原MDC的点
  3. 有一点需要清除MDC,以防止其泄漏到另一个请求(以防万一我错过了似乎时不时发生的事情)

关于如何执行此操作的任何想法?

Any ideas on how to do this?

哦,如果框架记录了任何异常,如果MDC可以在那儿,则可以加分!!!(即理想情况下,应该为您提供框架,但是apache Beam似乎没有这样做.大多数Web框架都内置了此功能.)

oh, bonus points if it the MDC can be there when any exceptions are logged by the framework!!!! (ie. ideally, frameworks are supposed to do this for you but apache beam seems like it is not doing this. Most web frameworks have this built in).

谢谢,院长

推荐答案

根据您提供的上下文和示例,听起来您想使用MDC为您自己的DoFns自动捕获更多信息.最好的选择是,根据需要使用上下文的生命周期,使用 StartBundle / FinishBundle Setup / Teardown 方法以创建您的MDC上下文(请参见此答案以了解不同之处两者之间).重要的是,这些方法是针对DoFn的每个实例执行的,这意味着将在为执行这些DoFn而创建的新线程上调用它们.

Based on the context and examples you gave, it sounds like you want to use MDC to automatically capture more information for your own DoFns. Your best bet for this is, depending on the lifetime you need your context available for, to use either the StartBundle/FinishBundle or Setup/Teardown methods on your DoFns to create your MDC context (see this answer for an explanation of the differences between the two). The important thing is that these methods are executed for each instance of a DoFn, meaning they will be called on the new threads created to execute these DoFns.

我应该解释这里发生的事情以及这种方法与您最初的目标有何不同.Apache Beam的执行方式是您编写的管道在您自己的计算机上执行并执行管道构造(所有扩展调用都在此进行).但是,一旦构建了管道,它就会发送到通常在单独的应用程序上执行的运行程序,除非它是Direct Runner,然后运行程序直接执行您的用户代码或在docker环境中运行它.

I should explain what's happening here and how this approach differs from your original goal. The way Apache Beam executes is that your written pipeline executes on your own machine and performs pipeline construction (which is where all the expand calls are occurring). However, once a pipeline is constructed, it is sent to a runner which is often executing on a separate application unless it's the Direct Runner, and then the runner either directly executes your user code or runs it in a docker environment.

在您的原始方法中,您可以将MDC成功地应用于所有日志,直到执行开始才有意义,因为执行不仅可能发生在不同的线程中,而且还可能发生在不同的应用程序或机器上.但是,上述方法是作为用户代码的一部分执行的,因此在此处设置MDC将使其能够在正在执行转换的任何线程/应用程序/机器上运行.

In your original approach it makes sense that you would successfully apply MDC to all logs until execution begins, because execution might not only be occurring in a different thread, but potentially also a different application or machine. However, the methods described above are executed as part of your user code, so setting up your MDC there will allow it to function on whatever thread/application/machine is executing transforms.

请记住,每个DoFn都会调用这些方法,并且每个线程通常会有多个DoFn,这可能需要警惕,具体取决于MDC的工作方式.

Just keep in mind that those methods get called for every DoFn and you will often have mutiple DoFns per thread, which is something you may need to be wary of depending on how MDC works.

这篇关于如何在Apache Beam和数据流中设置Logback MDC?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆