将日志记录中的应用程序日志与log4j中的Spark日志分开 [英] Separating application logs in Logback from Spark Logs in log4j

查看:135
本文介绍了将日志记录中的应用程序日志与log4j中的Spark日志分开的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个使用Spark的Scala Maven项目,我正在尝试使用Logback实现日志记录。我正在将我的应用程序编译到jar,并部署到安装了Spark发行版的EC2实例。
我的pom.xml包含Spark和Logback的依赖项,如下所示:

I have a Scala Maven project using that uses Spark, and I am trying implement logging using Logback. I am compiling my application to a jar, and deploying to an EC2 instance where the Spark distribution is installed. My pom.xml includes dependencies for Spark and Logback as follows:

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
            <version>1.1.7</version>
        </dependency>
        <dependency>
            <groupId>org.slf4j</groupId>
            <artifactId>log4j-over-slf4j</artifactId>
            <version>1.7.7</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.binary.version}</artifactId>
            <version>${spark.version}</version>
            <exclusions>
                <exclusion>
                    <groupId>org.slf4j</groupId>
                    <artifactId>slf4j-log4j12</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>log4j</groupId>
                    <artifactId>log4j</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

提交我的Spark应用程序时,我在命令行打印出slf4j绑定。如果我使用java执行jar代码,则绑定是Logback。但是,如果我使用Spark(即spark-submit),则绑定到log4j。

When submit my Spark application, I print out the slf4j binding on the command line. If I execute the jars code using java, the binding is to Logback. If I use Spark (i.e. spark-submit), however, the binding is to log4j.

  val logger: Logger = LoggerFactory.getLogger(this.getClass)
  val sc: SparkContext = new SparkContext()
  val rdd = sc.textFile("myFile.txt")

  val slb: StaticLoggerBinder = StaticLoggerBinder.getSingleton
  System.out.println("Logger Instance: " + slb.getLoggerFactory)
  System.out.println("Logger Class Type: " + slb.getLoggerFactoryClassStr)

产量

Logger Instance: org.slf4j.impl.Log4jLoggerFactory@a64e035
Logger Class Type: org.slf4j.impl.Log4jLoggerFactory

据我所知, log4j-1.2.17。 jar slf4j-log4j12-1.7.16.jar 在/ usr / local / spark / jars中,而Spark最有可能引用这些尽管在我的pom.xml中排除了jar,因为如果我删除它们,我会在spark-submit的运行时给出一个ClassNotFoundException。

I understand that both log4j-1.2.17.jar and slf4j-log4j12-1.7.16.jar are in /usr/local/spark/jars, and that Spark is most likely referencing these jars despite the exclusion in my pom.xml, because if I delete them I am given a ClassNotFoundException at runtime of spark-submit.

我的问题是:有没有办法在使用Logback的应用程序中实现本机日志记录,同时保留Spark的内部日志记录功能。理想情况下,我想将Logback应用程序日志写入文件,并允许Spark日志仍显示在STDOUT。

My question is: Is there a way to implement native logging in my application using Logback while preserving Spark's internal logging capabilities. Ideally, I'd like to write my Logback application logs to a file and allow Spark logs to still be shown at STDOUT.

推荐答案

我遇到过一个非常类似的问题。

I had encountered a very similar problem.

我们的构建类似于你的(但我们使用 sbt )并进行了描述详情请见: https://stackoverflow.com/a/45479379/1549135

Our build was similar to yours (but we used sbt) and is described in detail here: https://stackoverflow.com/a/45479379/1549135

在本地运行此解决方案正常,但随后 spark-submit 忽略所有排除项和新的日志框架( logback )因为spark的类路径优先于部署的jar。由于它包含 log4j 1.2.xx ,它只需加载它并忽略我们的设置。

Running this solution locally works fine, but then spark-submit would ignore all the exclusions and new logging framework (logback) because spark's classpath has priority over the deployed jar. And since it contains log4j 1.2.xx it would simply load it and ignore our setup.

我使用过多个来源。但引用 Spark 1.6.1 docs (适用于 Spark latest / 2.2.0 ):

I have used several sources. But quoting Spark 1.6.1 docs (applies to Spark latest / 2.2.0 as well):

spark.driver.extraClassPath


要添加到驱动程序类路径的额外类路径条目。
注意:在客户端模式下,不能直接在应用程序中通过SparkConf设置此配置,因为驱动程序JVM已在此时启动。相反,请通过--driver-class-path命令行选项或默认属性文件设置此项。

Extra classpath entries to prepend to the classpath of the driver. Note: In client mode, this config must not be set through the SparkConf directly in your application, because the driver JVM has already started at that point. Instead, please set this through the --driver-class-path command line option or in your default properties file.

spark.executor.extraClassPath


要添加到执行程序类路径的额外类路径条目。这主要是为了向后兼容旧版本的Spark。用户通常不需要设置此选项。

Extra classpath entries to prepend to the classpath of executors. This exists primarily for backwards-compatibility with older versions of Spark. Users typically should not need to set this option.

这里没有写什么,但是 extraClassPath 优先于默认Spark的类路径!

What is not written here, though is that extraClassPath takes precedence before default Spark's classpath!

所以现在解决方案应该非常明显。

So now the solution should be quite obvious.

- log4j-over-slf4j-1.7.25.jar
- logback-classic-1.2.3.jar
- logback-core-1.2.3.jar



2。运行 spark-submit



2. Run the spark-submit:

libs="/absolute/path/to/libs/*"

spark-submit \
  ...
  --master yarn \
  --conf "spark.driver.extraClassPath=$libs" \
  --conf "spark.executor.extraClassPath=$libs" \
  ...
  /my/application/application-fat.jar \
  param1 param2

我还不确定你是否可以将这些罐放在HDFS上。我们在应用程序jar旁边有它们。

I am just not yet sure if you can put those jars on HDFS. We have them locally next to the application jar.

奇怪的是,使用 Spark 1.6.1 我在docs中也找到了这个选项:

Strangely enough, using Spark 1.6.1 I have also found this option in docs:

spark.driver.userClassPathFirst spark.executor.userClassPathFirst


(实验)是否在加载时为用户添加的jar优先于Spark自己的jar驱动程序中的类。此功能可用于缓解Spark的依赖项和用户依赖项之间的冲突。它目前是一个实验性功能。这仅在集群模式下使用。

(Experimental) Whether to give user-added jars precedence over Spark's own jars when loading classes in the the driver. This feature can be used to mitigate conflicts between Spark's dependencies and user dependencies. It is currently an experimental feature. This is used in cluster mode only.

但只需设置:

--conf "spark.driver.userClassPathFirst=true" \
--conf "spark.executor.userClassPathFirst=true" \

对我不起作用。所以我很高兴使用 extraClassPath

Did not work for me. So I am gladly using extraClassPath!

干杯!

如果您在向Spark添加 logback.xml 时遇到任何问题,我的问题可以帮到你:
将系统属性传递给spark-submit并从类路径或自定义路径读取文件

If you face any problems loading logback.xml to Spark, my question here might help you out: Pass system property to spark-submit and read file from classpath or custom path

这篇关于将日志记录中的应用程序日志与log4j中的Spark日志分开的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆