将apache-spark登录信息发送到Amazon EMR集群上的redis/logstash的最佳方法 [英] Best way to send apache-spark loggin to redis/logstash on an Amazon EMR cluster

查看:77
本文介绍了将apache-spark登录信息发送到Amazon EMR集群上的redis/logstash的最佳方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在Amazon EMR集群上火花提交作业.我希望将所有火花记录发送到redis/logstash.在EMR下配置spark的正确方法是什么?

I spark-submit jobs on an Amazon EMR cluster. I'd like all spark logging to be sent to redis/logstash. What is the proper way to configure spark under EMR to do this?

  • 保留log4j:添加引导操作来修改/home/hadoop/spark/conf/log4j.properties以添加附加程序吗?但是,该文件已经包含很多东西,并且是hadoop conf文件的符号链接.我不想摆弄太多,因为它已经包含了一些rootLoggers.哪个追加程序效果最好? ryantenney/log4j-redis-appender + logstash/log4j-jsonevent-layout或pavlobaron/log4j2redis吗?

  • Keep log4j: Add a bootstrap action to modify /home/hadoop/spark/conf/log4j.properties to add an appender? However, this file already contains a lot of stuff and is a symlink to hadoop conf file. I don't want to fiddle too much with that as it already contains some rootLoggers. Which appender would do best? ryantenney/log4j-redis-appender + logstash/log4j-jsonevent-layout OR pavlobaron/log4j2redis ?

迁移到slf4j + logback:从spark-core中排除slf4j-log4j12,添加log4j-over-slf4j ...并将logback.xml与com.cwbase.logback.RedisAppender一起使用?看起来这会给依赖项带来问题.它会隐藏log4j.properties中已经定义的log4j.rootLogger吗?

Migrate to slf4j+logback: Exclude slf4j-log4j12 from spark-core, add log4j-over-slf4j ... and use a logback.xml with a com.cwbase.logback.RedisAppender? Looks like this will be problematic with dependencies. Will it hide log4j.rootLoggers already defined in log4j.properties?

我还有什么想念的吗?

您对此有何看法?

更新

看起来我无法获得第二选择.运行测试很好,但是使用spark-submit(带有--conf spark.driver.userClassPathFirst = true)始终以可怕的在类路径上同时检测到log4j-over-slf4j.jar和slf4j-log4j12.jar来结束,抢占StackOverflowError."

Looks like I can't get second option to work. Running tests is just fine but using spark-submit (with --conf spark.driver.userClassPathFirst=true) always end up with the dreaded "Detected both log4j-over-slf4j.jar AND slf4j-log4j12.jar on the class path, preempting StackOverflowError."

推荐答案

我将在集群上为此设置一个额外的守护程序.

I would setup an extra daemon for that on the cluster.

这篇关于将apache-spark登录信息发送到Amazon EMR集群上的redis/logstash的最佳方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆