将 Logback 中的应用程序日志与 log4j 中的 Spark 日志分开 [英] Separating application logs in Logback from Spark Logs in log4j

查看:23
本文介绍了将 Logback 中的应用程序日志与 log4j 中的 Spark 日志分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用 Spark 的 Scala Maven 项目,我正在尝试使用 Logback 实现日志记录.我正在将我的应用程序编译为 jar,并部署到安装了 Spark 发行版的 EC2 实例.我的 pom.xml 包含 Spark 和 Logback 的依赖项,如下所示:

I have a Scala Maven project using that uses Spark, and I am trying implement logging using Logback. I am compiling my application to a jar, and deploying to an EC2 instance where the Spark distribution is installed. My pom.xml includes dependencies for Spark and Logback as follows:

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.1.7</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>log4j-over-slf4j</artifactId>
            <version>1.7.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

当提交我的 Spark 应用程序时,我在命令行上打印出 slf4j 绑定.如果我使用 java 执行 jars 代码,则绑定到 Logback.但是,如果我使用 Spark(即 spark-submit),则绑定是对 log4j 的.

When submit my Spark application, I print out the slf4j binding on the command line. If I execute the jars code using java, the binding is to Logback. If I use Spark (i.e. spark-submit), however, the binding is to log4j.

  val logger: Logger = LoggerFactory.getLogger(this.getClass)
  val sc: SparkContext = new SparkContext()
  val rdd = sc.textFile("myFile.txt")

  val slb: StaticLoggerBinder = StaticLoggerBinder.getSingleton
  System.out.println("Logger Instance: " + slb.getLoggerFactory)
  System.out.println("Logger Class Type: " + slb.getLoggerFactoryClassStr)

收益

Logger Instance: org.slf4j.impl.Log4jLoggerFactory@a64e035
Logger Class Type: org.slf4j.impl.Log4jLoggerFactory

我知道 log4j-1.2.17.jarslf4j-log4j12-1.7.16.jar 都在/usr/local/spark/jars 中,并且尽管在我的 pom.xml 中排除了这些 jar,但 Spark 最有可能引用这些 jar,因为如果我删除它们,我会在 spark-submit 运行时得到 ClassNotFoundException.

I understand that both log4j-1.2.17.jar and slf4j-log4j12-1.7.16.jar are in /usr/local/spark/jars, and that Spark is most likely referencing these jars despite the exclusion in my pom.xml, because if I delete them I am given a ClassNotFoundException at runtime of spark-submit.

我的问题是:有没有办法使用 Logback 在我的应用程序中实现本地日志记录,同时保留 Spark 的内部日志记录功能.理想情况下,我想将我的 Logback 应用程序日志写入文件,并允许 Spark 日志仍显示在 STDOUT 中.

My question is: Is there a way to implement native logging in my application using Logback while preserving Spark's internal logging capabilities. Ideally, I'd like to write my Logback application logs to a file and allow Spark logs to still be shown at STDOUT.

推荐答案

我遇到了一个非常相似的问题.

I had encountered a very similar problem.

我们的构建与您的类似(但我们使用了 sbt),并在此处详细描述:https://stackoverflow.com/a/45479379/1549135

Our build was similar to yours (but we used sbt) and is described in detail here: https://stackoverflow.com/a/45479379/1549135

在本地运行此解决方案效果很好,但是 spark-submit忽略所有排除项和新的日志记录框架(logback) 因为 spark 的类路径优先于部署的 jar.由于它包含 log4j 1.2.xx,它会简单地加载它并忽略我们的设置.

Running this solution locally works fine, but then spark-submit would ignore all the exclusions and new logging framework (logback) because spark's classpath has priority over the deployed jar. And since it contains log4j 1.2.xx it would simply load it and ignore our setup.

我使用了多个来源.但引用 Spark 1.6.1 文档(适用于 Spark 最新/2.2.0 以及):

I have used several sources. But quoting Spark 1.6.1 docs (applies to Spark latest / 2.2.0 as well):

spark.driver.extraClassPath

附加到驱动程序类路径的额外类路径条目.注意:在客户端模式下,此配置不能直接在您的应用程序中通过 SparkConf 设置,因为驱动程序 JVM 已经在此时启动.相反,请通过 --driver-class-path 命令行选项或在您的默认属性文件中进行设置.

Extra classpath entries to prepend to the classpath of the driver. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.

spark.executor.extraClassPath

附加到执行程序类路径的额外类路径条目.这主要是为了向后兼容旧版本的 Spark.用户通常不需要设置此选项.

Extra classpath entries to prepend to the classpath of executors. This exists primarily for backwards-compatibility with older versions of Spark. Users typically should not need to set this option.

这里没有写的是extraClassPath 优先于默认Spark的类路径!

What is not written here, though is that extraClassPath takes precedence before default Spark's classpath!

所以现在解决方案应该很明显了.

So now the solution should be quite obvious.

- log4j-over-slf4j-1.7.25.jar
- logback-classic-1.2.3.jar
- logback-core-1.2.3.jar

2.运行 spark-submit:

libs="/absolute/path/to/libs/*"

spark-submit 
  ...
  --master yarn 
  --conf "spark.driver.extraClassPath=$libs" 
  --conf "spark.executor.extraClassPath=$libs" 
  ...
  /my/application/application-fat.jar 
  param1 param2

我只是不确定您是否可以将这些 jar 放在 HDFS 上.我们将它们放在本地应用程序 jar 旁边.

I am just not yet sure if you can put those jars on HDFS. We have them locally next to the application jar.

奇怪的是,使用 Spark 1.6.1 我也在文档中找到了这个选项:

Strangely enough, using Spark 1.6.1 I have also found this option in docs:

spark.driver.userClassPathFirstspark.executor.userClassPathFirst

(实验性)在驱动程序中加载类时,是否让用户添加的 jar 优先于 Spark 自己的 jar.这个特性可以用来缓解 Spark 的依赖和用户依赖之间的冲突.它目前是一个实验性功能.这仅用于集群模式.

(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.

但只需设置:

--conf "spark.driver.userClassPathFirst=true" 
--conf "spark.executor.userClassPathFirst=true" 

对我不起作用.所以我很高兴使用 extraClassPath

Did not work for me. So I am gladly using extraClassPath!

干杯!

如果您在将 logback.xml 加载到 Spark 时遇到任何问题,我在这里的问题可能会对您有所帮助:将系统属性传递给 spark-submit 并从类路径或自定义路径读取文件

If you face any problems loading logback.xml to Spark, my question here might help you out: Pass system property to spark-submit and read file from classpath or custom path

这篇关于将 Logback 中的应用程序日志与 log4j 中的 Spark 日志分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆