APAche Spark使用Log4j v1 Kafka附加程序记录到Kafka [英] APAche Spark logging to Kafka with Log4j v1 Kafka appender

查看:125
本文介绍了APAche Spark使用Log4j v1 Kafka附加程序记录到Kafka的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经在这里拖网寻找答案,而且我什么都无法工作,所以也许有人有新的见解.

I've already trawled the web for answers here, and I cannot get anything to work, so maybe somebody has a fresh perspective.

  • 我正在尝试从Apache Spark 2.2内部将日志写入Kafka主题 应用.
  • 因为Spark仍然使用Log4j v1,所以我必须尝试获取 v1 Kafka附加程序正常运行,而不是能够使用 Log4j v2随附的默认Kafka附加程序.
  • 我可以使用以下库(来自build.sbt)在通过IntelliJ运行的小演示应用程序中做到这一点:

  • I'm trying to write logs to a Kafka topic from inside an Apache Spark 2.2 application.
  • Because Spark still uses Log4j v1, I have to try and get the v1 Kafka appender to work, instead of being able to use the default Kafka appender provided with Log4j v2.
  • I can do this in a little demo app running via IntelliJ, using the following library (from build.sbt):

//v1 Log4j附加程序需要Kafka的旧版本 libraryDependencies + ="org.apache.kafka" %%"kafka"%"0.8.2.2"

// Old version of Kafka needed for v1 Log4j appender libraryDependencies += "org.apache.kafka" %% "kafka" % "0.8.2.2"

但是我找不到办法让它通过例如spark-shell或spark-submit.

But I cannot find a way to get this to run via e.g. spark-shell or spark-submit.

我可以使用与虚拟应用程序相同的设置在Spark的log4j.properties中配置附加程序.

I can configure the appender in Spark's log4j.properties using the same settings as in my dummy app.

但是当Spark shell启动时,似乎它在加载任何额外的JAR之前启动了记录器,然后由于找不到Kafka附加程序而立即引发错误:

But when the Spark shell starts up, it seems it fires up the logger before it loads any extra JARs, then throws an error immediately because it can't find the Kafka appender:

log4j:ERROR无法实例化类[kafka.producer.KafkaLog4jAppender]. java.lang.ClassNotFoundException:kafka.producer.KafkaLog4jAppender

log4j:ERROR Could not instantiate class [kafka.producer.KafkaLog4jAppender]. java.lang.ClassNotFoundException: kafka.producer.KafkaLog4jAppender

我已经尝试了Spark配置文件或CLI中的各种选项,以使JAR首先加载,例如--jars,-files,-driver-class-path,在spark-default.conf中设置spark.driver.extraClassPath和spark.executor.extraClassPath等,等等.

I have tried all kinds of options, in the Spark config files or on the CLI, to get the JARs to load up first e.g. --jars, --files, --driver-class-path, setting spark.driver.extraClassPath and spark.executor.extraClassPath in spark-default.conf, etc etc.

似乎什么都没有用,所以没有人能使用它,例如Spark 2.2.通过Log4j登录到Kafka,如果可以,他们是否可以建议正确的配置选项允许我执行此操作?

Nothing seems to work, so has anybody ever got this to work i.e. Spark 2.2. logging to Kafka via Log4j, and if so, can they suggest the right config options to allow me to do this?

顺便说一下,SO上有几个类似的问题,但是没有一个问题为我解决了这个问题,所以请不要将其标记为重复.

By the way, there are several similar questions here on SO, but none of them has solved the problem for me, so please don't mark this as a duplicate.

感谢您提供的任何提示!

Thanks for any tips you can offer!

推荐答案

带有Spark的kafka-log4j-appender

我设法在cluster模式下将spark-submit 2.1.1kafka-log4j-appender 2.3.0一起使用,但是我相信其他版本的行为也会类似.

kafka-log4j-appender with Spark

I managed to use spark-submit 2.1.1 in cluster mode with kafka-log4j-appender 2.3.0, but I believe other versions will behave similarly.

首先,我认为阅读日志真的很有帮助,因此您需要能够阅读应用纱线日志和spark-submit信息. 有时,当应用程序处于ACCEPT阶段挂起时(由于kafka生产者配置错误),有必要从Hadoop Yarn应用程序概述中读取日志.

First of all, I think it is really helpful to read the logs so you need to be able to read both application yarn logs and spark-submit informations. Sometimes when the application hanged in ACCEPT phase (because of kafka producer missconfiguration) it was necessary to read the logs from the Hadoop Yarn application overview.

因此,每当我启动我的应用程序时,抓住它非常重要

So whenever I was starting my app, it was very important to grab

19/08/01 10:52:46 INFO yarn.Client: Application report for application_1564028288963_2380 (state: RUNNING)

在完成后从YARN上线并下载所有日志

line and download all the logs from YARN when it was completed

yarn logs -applicationId application_1564028288963_2380

好吧,让我们尝试!

基本上,spark缺少kafka-log4j-appender.

通常,您应该能够在胖子罐中提供kafka-log4j-appender.我以前在类似问题上遇到过一些无法解决的问题.仅仅是因为在集群环境中,Spark会覆盖您的类路径.因此,如果它也不适合您,请继续.

Generally, you should be able to provide kafka-log4j-appender in your fat jar. I had some previous experience with similar problem where it does not work. Simply because in a cluster environment your classpath is overridden by Spark. So if it does not work for you either, move on.

kafka-log4j-appender-2.3.0.jar
kafka-clients-2.3.0.jar

您实际上需要两者,因为没有客户,追加程序将无法工作.
将它们放在您从中启动spark-submit的同一台计算机上.
好处是您可以随意命名.

You actually need both, because appender won't work without clients.
Place them on the same machine you fire spark-submit from.
The benefit is, that you can name them as you like.

现在为client模式

JARS='/absolute/path/kafka-log4j-appender-2.3.0.jar,/absolute/path/kafka-clients-2.3.0.jar'
JARS_CLP='/absolute/path/kafka-log4j-appender-2.3.0.jar:/absolute/path/kafka-clients-2.3.0.jar'
JARS_NAMES='kafka-log4j-appender-2.3.0.jar:kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

或者对于cluster模式

spark-submit \
    --deploy-mode cluster \
    --jars "$JARS"
    --conf "spark.driver.extraClassPath=$JARS_NAMES" \
    --conf "spark.executor.extraClassPath=$JARS_NAMES" \

选项B.使用--packages从maven下载jar:

我认为这更方便,但是您必须准确地获得名称.

Option B. Use --packages to download jars from maven:

I think this is more convenient, but you have to get the name precisely.

您需要在运行期间查找那些类型的行:

You need to look for those kinds of lines during run:

19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-log4j-appender-2.3.0.jar
19/11/15 19:44:08 INFO yarn.Client: Uploading resource file:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar -> hdfs:///user/atais/.sparkStaging/application_1569430771458_10776/org.apache.kafka_kafka-clients-2.3.0.jar

,记下在hdfsapplication_1569430771458_10776文件夹中如何调用jars.

and note down how the jars are called inside application_1569430771458_10776 folder on hdfs.

现在为client模式

JARS_CLP='/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-log4j-appender-2.3.0.jar:/srv/cortb/home/atais/.ivy2/jars/org.apache.kafka_kafka-clients-2.3.0.jar'
KAFKA_JARS='org.apache.kafka_kafka-log4j-appender-2.3.0.jar:org.apache.kafka_kafka-clients-2.3.0.jar'

spark-submit \
    --deploy-mode client \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$JARS_CLP" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \

或者对于cluster模式

spark-submit \
    --deploy-mode cluster \
    --packages "org.apache.kafka:kafka-log4j-appender:2.3.0"
    --conf "spark.driver.extraClassPath=$KAFKA_JARS" \
    --conf "spark.executor.extraClassPath=$KAFKA_JARS" \


以上内容应该已经可以使用

其他步骤

如果您想提供logging.proprietes,请在此遵循我的教程: https://stackoverflow.com/a/55596389/1549135


The above should work already

Extra steps

If you want to provide your logging.proprietes, follow my tutorial on that here: https://stackoverflow.com/a/55596389/1549135

这篇关于APAche Spark使用Log4j v1 Kafka附加程序记录到Kafka的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆