使用Log4j在日志中输出Spark应用程序ID [英] Output Spark application id in the logs with Log4j
问题描述
我有一个用于Spark应用程序的自定义Log4j文件.我想输出Spark应用程序ID以及其他属性(例如消息和日期),以便JSON字符串结构如下所示:
I have a custom Log4j file for the Spark application. I would like to output Spark app id along with other attributes like message and date so the JSON string structure would look like this:
{"name":,"time":,"date":,"level":,"thread":,"message":,"app_id":}
现在,此结构如下所示:
Now, this structure looks like this:
{"name":,"time":,"date":,"level":,"thread":,"message":}
如何为Spark驱动程序日志定义这种布局?
How can I define such layout for the Spark driver logs?
我的log4j文件如下:
My log4j file looks like this:
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE log4j:configuration SYSTEM "log4j.dtd">
<log4j:configuration xmlns:log4j='http://jakarta.apache.org/log4j/'>
<appender name="Json" class="org.apache.log4j.ConsoleAppender">
<layout class="org.apache.hadoop.log.Log4Json">
<param name="ConversionLayout" value=""/>
</layout>
</appender>
<root>
<level value="INFO"/>
<appender-ref ref="Json"/>
</root>
</log4j:configuration>
推荐答案
我怀疑是否可以为此目的调整org.apache.hadoop.log.Log4Json
.根据其javadoc和源代码,它可能很麻烦.
I doubt that org.apache.hadoop.log.Log4Json
can be adjusted for this purpose. According to its javadoc and source code it might be rather cumbersome.
尽管看起来您正在使用Log4j 1x,但它的API非常灵活,我们可以通过扩展org.apache.log4j.Layout
轻松定义自己的布局.
Although it looks like you are using Log4j 1x, its API is quite flexible and we can easily define our own layout by extending org.apache.log4j.Layout
.
我们需要根据目标结构将其转换为JSON的case类:
We'll need a case class that will be transformed into JSON according to the target structure:
case class LoggedMessage(name: String,
appId: String,
thread: String,
time: Long,
level: String,
message: String)
和Layout
可以扩展如下.要访问"app_id"的值,我们将使用Log4j的映射诊断上下文"
And Layout
might be extended as follows. To access the value of "app_id", we'll use Log4j's Mapped Diagnostic Context
import org.apache.log4j.Layout
import org.apache.log4j.spi.LoggingEvent
import org.json4s.DefaultFormats
import org.json4s.native.Serialization.write
class JsonLoggingLayout extends Layout {
// required by the API
override def ignoresThrowable(): Boolean = false
// required by the API
override def activateOptions(): Unit = { /* nothing */ }
override def format(event: LoggingEvent): String = {
// we are using json4s for JSON serialization
implicit val formats = DefaultFormats
// retrieve app_id from Mapped Diagnostic Context
val appId = event.getMDC("app_id") match {
case null => "[no_app]" // logged messages outside our app
case defined: AnyRef => defined.toString
}
val message = LoggedMessage("TODO",
appId,
Thread.currentThread().getName,
event.getTimeStamp,
event.getLevel.toString,
event.getMessage.toString)
write(message) + "\n"
}
}
最后,当创建Spark会话时,我们将app_id值放入MDC:
Finally, when the Spark session is created, we put the app_id value into MDC:
import org.apache.log4j.{Logger, MDC}
// create Spark session
MDC.put("app_id", session.sparkContext.applicationId)
logger.info("-------- this is info --------")
logger.warn("-------- THIS IS A WARNING --------")
logger.error("-------- !!! ERROR !!! --------")
这将产生以下日志:
{"name":"TODO","appId":"local-1550247707920","thread":"main","time":1550247708149,"level":"INFO","message":"-------- this is info --------"}
{"name":"TODO","appId":"local-1550247707920","thread":"main","time":1550247708150,"level":"WARN","message":"-------- THIS IS A WARNING --------"}
{"name":"TODO","appId":"local-1550247707920","thread":"main","time":1550247708150,"level":"ERROR","message":"-------- !!! ERROR !!! --------"}
当然,不要忘记在log4j config xml中引用实现:
And, of course, do not forget to refer the implementation in log4j config xml:
<appender name="Json" class="org.apache.log4j.ConsoleAppender">
<layout class="stackoverflow.q54706582.JsonLoggingLayout" />
</appender>
这篇关于使用Log4j在日志中输出Spark应用程序ID的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!