使用Log4j在日志中输出Spark应用程序ID [英] Output Spark application id in the logs with Log4j

查看:723
本文介绍了使用Log4j在日志中输出Spark应用程序ID的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用于Spark应用程序的自定义Log4j文件.我想输出Spark应用程序ID以及其他属性(例如消息和日期),以便JSON字符串结构如下所示:

I have a custom Log4j file for the Spark application. I would like to output Spark app id along with other attributes like message and date so the JSON string structure would look like this:

{"name":,"time":,"date":,"level":,"thread":,"message":,"app_id":}

现在,此结构如下所示:

Now, this structure looks like this:

{"name":,"time":,"date":,"level":,"thread":,"message":}

如何为Spark驱动程序日志定义这种布局?

How can I define such layout for the Spark driver logs?

我的log4j文件如下:

My log4j file looks like this:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j='http://jakarta.apache.org/log4j/'>

    <appender name="Json" class="org.apache.log4j.ConsoleAppender">
        <layout class="org.apache.hadoop.log.Log4Json">
            <param name="ConversionLayout" value=""/>
        </layout>
    </appender>

    <root>
        <level value="INFO"/>
        <appender-ref ref="Json"/>
    </root>
</log4j:configuration>

推荐答案

我怀疑是否可以为此目的调整org.apache.hadoop.log.Log4Json.根据其javadoc和源代码,它可能很麻烦.

I doubt that org.apache.hadoop.log.Log4Json can be adjusted for this purpose. According to its javadoc and source code it might be rather cumbersome.

尽管看起来您正在使用Log4j 1x,但它的API非常灵活,我们可以通过扩展org.apache.log4j.Layout轻松定义自己的布局.

Although it looks like you are using Log4j 1x, its API is quite flexible and we can easily define our own layout by extending org.apache.log4j.Layout.

我们需要根据目标结构将其转换为JSON的case类:

We'll need a case class that will be transformed into JSON according to the target structure:

case class LoggedMessage(name: String,
                         appId: String,
                         thread: String,
                         time: Long,
                         level: String,
                         message: String)

Layout可以扩展如下.要访问"app_id"的值,我们将使用Log4j的映射诊断上下文"

And Layout might be extended as follows. To access the value of "app_id", we'll use Log4j's Mapped Diagnostic Context

import org.apache.log4j.Layout
import org.apache.log4j.spi.LoggingEvent
import org.json4s.DefaultFormats
import org.json4s.native.Serialization.write

class JsonLoggingLayout extends Layout {
  // required by the API
  override def ignoresThrowable(): Boolean = false
  // required by the API
  override def activateOptions(): Unit = { /* nothing */ }

  override def format(event: LoggingEvent): String = {
    // we are using json4s for JSON serialization
    implicit val formats = DefaultFormats

    // retrieve app_id from Mapped Diagnostic Context
    val appId = event.getMDC("app_id") match {
      case null => "[no_app]" // logged messages outside our app
      case defined: AnyRef => defined.toString
    }
    val message = LoggedMessage("TODO",
                                appId,
                                Thread.currentThread().getName,
                                event.getTimeStamp,
                                event.getLevel.toString,
                                event.getMessage.toString)
    write(message) + "\n"
  }

}

最后,当创建Spark会话时,我们将app_id值放入MDC:

Finally, when the Spark session is created, we put the app_id value into MDC:

import org.apache.log4j.{Logger, MDC}

// create Spark session

MDC.put("app_id", session.sparkContext.applicationId)

logger.info("-------- this is info --------")
logger.warn("-------- THIS IS A WARNING --------")
logger.error("-------- !!! ERROR !!! --------")

这将产生以下日志:

{"name":"TODO","appId":"local-1550247707920","thread":"main","time":1550247708149,"level":"INFO","message":"-------- this is info --------"}
{"name":"TODO","appId":"local-1550247707920","thread":"main","time":1550247708150,"level":"WARN","message":"-------- THIS IS A WARNING --------"}
{"name":"TODO","appId":"local-1550247707920","thread":"main","time":1550247708150,"level":"ERROR","message":"-------- !!! ERROR !!! --------"}

当然,不要忘记在log4j config xml中引用实现:

And, of course, do not forget to refer the implementation in log4j config xml:

<appender name="Json" class="org.apache.log4j.ConsoleAppender">
  <layout class="stackoverflow.q54706582.JsonLoggingLayout" />
</appender>

这篇关于使用Log4j在日志中输出Spark应用程序ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆