Spark Streaming应用程序和kafka log4j附加程序问题 [英] spark streaming application and kafka log4j appender issue

查看:142
本文介绍了Spark Streaming应用程序和kafka log4j附加程序问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在测试我的Spark Streaming应用程序,并且我的代码中有多个功能: -其中一些在DStream [RDD [XXX]]上运行,其中一些在RDD [XXX]上运行(在我执行DStream.foreachRDD之后).

I am testing my spark streaming application, and I have multiple functions in my code: - some of them operate on a DStream[RDD[XXX]], some of them on RDD[XXX] (after I do DStream.foreachRDD).

我使用Kafka log4j附加程序记录在我的函数中发生的,同时在DStream [RDD]& amp;自己RDD.

I use Kafka log4j appender to log business cases that occur within my functions, that operate on both DStream[RDD] & RDD it self.

但是只有在RDD上运行的函数中数据才被附加到Kafka->当我想从DStream上运行的函数中将数据附加到kafka时,数据将不起作用.

But data gets appended to Kafka only when from functions that operate on RDD -> it doesn't work when I want to append data to kafka from my functions that operate on DStream.

有人知道这种行为的原因吗?

Does anyone know reason to this behaviour?

我正在一个装有Spark& amp;的虚拟机上工作.卡夫卡我使用spark提交来提交应用程序.

I am working on a single virtual machine, where I have Spark & Kafka. I submit applications using spark submit.

已编辑

实际上,我已经找出了问题的一部分.数据仅从主函数中的代码部分追加到Kafka.我主体以外的所有代码都不会将数据写入kafka.

Actually I have figured out the part of the problem. Data gets appended to Kafka only from the part of the code that is in my main function. All the code that Is outside of my main, doesnt write data to kafka.

主要,我这样声明记录器:

In main I declared the logger like this:

val kafkaLogger = org.apache.log4j.LogManager.getLogger("kafkaLogger")

在主目录之外,我必须像这样声明它:

While outside of my main, I had to declare it like:

@transient lazy val kafkaLogger = org.apache.log4j.LogManager.getLogger("kafkaLogger")

为了避免序列化问题.

原因可能是JVM序列化概念的背后,或者仅仅是因为工作人员看不到log4j配置文件(但是我的log4j文件在我的源代码中,在资源文件夹中)

The reason might be behind JVM serialization concept, or simply because workers don't see the log4j configuration file (but my log4j file is in my source code, in resource folder)

已编辑2

我已经尝试了多种方式将log4j文件发送给执行者,但无法正常工作.我试过了:

I have tried in many ways to send log4j file to executors but not working. I tried:

  • 在spark-submit的--files命令中发送log4j文件

  • sending log4j file in --files command of spark-submit

设置:火花提交中的--conf "spark.executor.extraJavaOptions =-Dlog4j.configuration=file:/home/vagrant/log4j.properties"

在spark-submit的--driver-class-path中设置log4j.properties文件...

setting log4j.properties file in --driver-class-path of spark-submit...

此选项均无效.

有人解决吗?我的错误日志中没有看到任何错误.

Anyone has the solution? I do not see any errors in my error log..

谢谢

推荐答案

我认为您已经接近..首先,您要确保所有文件都导出到工作目录(不是CLASSPATH)在所有使用--files标志的节点上.然后您要将这些文件引用到执行程序和驱动程序的extracClassPath选项.我已附加以下命令,希望对您有所帮助.关键是要了解文件一旦导出,就可以仅使用工作目录的文件名(而不是URL路径)在节点上访问所有文件.

I think you are close..first you want to make sure all the files are exported to the WORKING DIRECTORY (not CLASSPATH) on all nodes using --files flag. And then you want to reference these files to extracClassPath option of executors and driver. I have attached the following command, hope it helps. Key is to understand once the files are exported, all the files can be accessed on the node using just file name of the working directory (and not url path).

注意:将log4j文件放在resources文件夹中将不起作用. (至少当我尝试过时,它没有.)

Note: Putting log4j file in the resources folder will not work. (at least when i had tried, it didnt.)

sudo -u hdfs spark-submit --class "SampleAppMain" --master yarn --deploy-mode cluster --verbose --files file:///path/to/custom-log4j.properties,hdfs:///path/to/jar/kafka-log4j-appender-0.9.0.0.jar --conf "spark.driver.extraClassPath=kafka-log4j-appender-0.9.0.0.jar" --conf "spark.executor.extraClassPath=kafka-log4j-appender-0.9.0.0.jar"  --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=custom-log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=custom-log4j.properties"  /path/to/your/jar/SampleApp-assembly-1.0.jar

这篇关于Spark Streaming应用程序和kafka log4j附加程序问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆