使用 Log4j v1 Kafka appender 将 APAche Spark 日志记录到 Kafka [英] APAche Spark logging to Kafka with Log4j v1 Kafka appender

查看:66
本文介绍了使用 Log4j v1 Kafka appender 将 APAche Spark 日志记录到 Kafka的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在网上搜索了这里的答案,但我无法得到任何结果,所以也许有人会有新的观点.

I've already trawled the web for answers here, and I cannot get anything to work, so maybe somebody has a fresh perspective.

  • 我正在尝试从 Apache Spark 2.2 内部将日志写入 Kafka 主题应用.
  • 因为 Spark 仍然使用 Log4j v1,所以我必须尝试获取v1 Kafka appender 工作,而不是能够使用Log4j v2 提供的默认 Kafka appender.
  • 我可以使用以下库(来自 build.sbt)在通过 IntelliJ 运行的小演示应用程序中执行此操作:

  • I'm trying to write logs to a Kafka topic from inside an Apache Spark 2.2 application.
  • Because Spark still uses Log4j v1, I have to try and get the v1 Kafka appender to work, instead of being able to use the default Kafka appender provided with Log4j v2.
  • I can do this in a little demo app running via IntelliJ, using the following library (from build.sbt):

//v1 Log4j appender 需要旧版本的 KafkalibraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.2.2"

// Old version of Kafka needed for v1 Log4j appender libraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.2.2"

但是我找不到一种方法让它通过例如运行spark-shell 或 spark-submit.

But I cannot find a way to get this to run via e.g. spark-shell or spark-submit.

我可以使用与我的虚拟应用程序相同的设置在 Spark 的 log4j.properties 中配置 appender.

I can configure the appender in Spark's log4j.properties using the same settings as in my dummy app.

但是当 Spark shell 启动时,它似乎在加载任何额外的 JAR 之前启动记录器,然后立即抛出错误,因为它找不到 Kafka 附加程序:

But when the Spark shell starts up, it seems it fires up the logger before it loads any extra JARs, then throws an error immediately because it can't find the Kafka appender:

log4j:ERROR 无法实例化类 [kafka.producer.KafkaLog4jAppender].java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender

log4j:ERROR Could not instantiate class [kafka.producer.KafkaLog4jAppender]. java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender

我尝试了各种选项,在 Spark 配置文件或 CLI 中,首先加载 JAR,例如--jars, --files, --driver-class-path, 在 spark-default.conf 中设置 spark.driver.extraClassPath 和 spark.executor.extraClassPath 等

I have tried all kinds of options, in the Spark config files or on the CLI, to get the JARs to load up first e.g. --jars, --files, --driver-class-path, setting spark.driver.extraClassPath and spark.executor.extraClassPath in spark-default.conf, etc etc.

似乎没有任何效果,所以有没有人让它工作,即 Spark 2.2.通过 Log4j 登录到 Kafka,如果是这样,他们能否建议正确的配置选项以允许我这样做?

Nothing seems to work, so has anybody ever got this to work i.e. Spark 2.2. logging to Kafka via Log4j, and if so, can they suggest the right config options to allow me to do this?

顺便说一下,SO上有几个类似的问题,但没有一个为我解决了问题,所以请不要将其标记为重复.

By the way, there are several similar questions here on SO, but none of them has solved the problem for me, so please don't mark this as a duplicate.

感谢您提供的任何提示!

Thanks for any tips you can offer!

推荐答案

kafka-log4j-appender with Spark

我设法在 cluster 模式下使用 spark-submit 2.1.1kafka-log4j-appender 2.3.0,但我相信其他版本的行为类似.

kafka-log4j-appender with Spark

I managed to use spark-submit 2.1.1 in cluster mode with kafka-log4j-appender 2.3.0, but I believe other versions will behave similarly.

首先,我认为阅读日志确实很有帮助,因此您需要能够阅读应用程序纱线日志和spark-submit 信息.有时,当应用程序在 ACCEPT 阶段挂起时(由于 kafka 生产者配置错误),有必要从 Hadoop Yarn 应用程序概览中读取日志.

First of all, I think it is really helpful to read the logs so you need to be able to read both application yarn logs and spark-submit informations. Sometimes when the application hanged in ACCEPT phase (because of kafka producer missconfiguration) it was necessary to read the logs from the Hadoop Yarn application overview.

所以每当我启动我的应用程序时,抓住是非常重要的

So whenever I was starting my app, it was very important to grab

19/08/01 10:52:46 INFO yarn.Client: Application report for application_1564028288963_2380 (state: RUNNING)

完成后从 YARN 中下载所有日志

line and download all the logs from YARN when it was completed

yarn logs -applicationId application_1564028288963_2380

好的,让我们试试吧!

基本上,spark 缺少 kafka-log4j-appender.

通常,您应该能够在您的胖 jar 中提供 kafka-log4j-appender.我以前有过类似问题的一些经验,但它不起作用.仅仅是因为在集群环境中,您的类路径被 Spark 覆盖.因此,如果它也不适合您,请继续.

Generally, you should be able to provide kafka-log4j-appender in your fat jar. I had some previous experience with similar problem where it does not work. Simply because in a cluster environment your classpath is overridden by Spark. So if it does not work for you either, move on.

kafka-log4j-appender-2.3.0.jar
kafka-clients-2.3.0.jar

实际上两者都需要,因为没有客户端,appender 将无法工作.
将它们放在您触发 spark-submit 的同一台机器上.
好处是,您可以随意命名它们.

You actually need both, because appender won't work without clients.
Place them on the same machine you fire spark-submit from.
The benefit is, that you can name them as you like.

现在是 client 模式

JARS='/absolute/path/kafka-log4j-appender-2.3.0.jar,/absolute/path/kafka-clients-2.3.0.jar'
JARS_CLP='/absolute/path/kafka-log4j-appender-2.3.0.jar:/absolute/path/kafka-clients-2.3.0.jar'
JARS_NAMES='kafka-log4j-appender-2.3.0.jar:kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

或者对于cluster模式

spark-submit \
    --deploy-mode cluster \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_NAMES" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

选项B.使用--packages从maven下载jar包:

我觉得这样更方便,但你必须准确地得到名字.

Option B. Use --packages to download jars from maven:

I think this is more convenient, but you have to get the name precisely.

你需要在运行过程中寻找那些类型的行:

You need to look for those kinds of lines during run:

19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-log4j-appender-2.3.0.jar
19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-clients-2.3.0.jar

并记下如何在 hdfs 上的 application_1569430771458_10776 文件夹中调用 jars.

and note down how the jars are called inside application_1569430771458_10776 folder on hdfs.

现在是 client 模式

JARS_CLP='/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar'
KAFKA_JARS='org.apache.kafka_kafka-log4j-appender-2.3.0.jar:org.apache.kafka_kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \

或者对于cluster模式

spark-submit \
    --deploy-mode cluster \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$KAFKA_JARS" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \

<小时>

以上应该已经可以了

额外的步骤

如果您想提供您的 logging.proprietes,请在此处按照我的教程进行操作:https:///stackoverflow.com/a/55596389/1549135


The above should work already

Extra steps

If you want to provide your logging.proprietes, follow my tutorial on that here: https://stackoverflow.com/a/55596389/1549135

这篇关于使用 Log4j v1 Kafka appender 将 APAche Spark 日志记录到 Kafka的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆