Apache Spark:纱线日志分析 [英] Apache Spark: Yarn logs Analysis

查看:95
本文介绍了Apache Spark:纱线日志分析的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个流媒体应用程序,我想使用Elasticsearch-Kibana分析作业日志.我的工作是在纱线簇上运行的,因此将yarn.log-aggregation-enable设置为true时,日志已写入HDFS.但是,当我尝试执行此操作时:

I am having a spark-streaming application, and I want to analyse the logs of the job using Elasticsearch-Kibana. My job is run on yarn cluster, so the logs are getting written to HDFS as I have set yarn.log-aggregation-enable to true. But, when I try to do this :

hadoop fs -cat ${yarn.nodemanager.remote-app-log-dir}/${user.name}/logs/<application ID>

我看到一些加密/压缩的数据.这是什么文件格式?如何从该文件读取日志?我可以使用logstash来阅读此内容吗?

I am seeing some encrypted/compressed data. What file format is this? How can I read the logs from this file? Can I use logstash to read this?

此外,如果有更好的方法来分析Spark日志,我欢迎您提出建议.

Also, if there is a better approach to analyse Spark logs, I am open to your suggestions.

谢谢.

推荐答案

该格式称为

The format is called a TFile, and it is a compressed file format.

但是,纱线选择将应用程序日志写入TFile中!!对于不知道TFile是什么的人(我敢打赌很多人不知道),您可以在此处了解更多信息,但是现在,此基本定义应足以满足"TFile是密钥的容器"的要求. -值对.键和值都是无类型字节".

Yarn however chooses to write the application logs into a TFile!! For those of you who don’t know what a TFile is (and I bet a lot of you don’t), you can learn more about it here, but for now this basic definition should suffice "A TFile is a container of key-value pairs. Both keys and values are type-less bytes".

Splunk/Hadoop Rant

可能有一种方法可以将YARN和Spark的log4j.properties编辑为

There may be a way to edit YARN and Spark's log4j.properties to send messages to Logstash using SocketAppender

但是,该方法已被弃用

这篇关于Apache Spark:纱线日志分析的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆