HDP 2.4,如何在一个文件中使用flume收集hadoop mapreduce日志,以及最佳做法是什么 [英] HDP 2.4, How to collect hadoop mapreduce log using flume in one file and what is the best practice

查看:159
本文介绍了HDP 2.4,如何在一个文件中使用flume收集hadoop mapreduce日志,以及最佳做法是什么的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我们正在使用HDP 2.4,并且有许多以各种方式编写的缩减地图作业(Java MR / Hive /等)。日志在应用程序ID下收集在hadoop文件系统中。我想收集应用程序的所有日志,并将其追加到单个文件(一台机器的hdfs或OS文件)中,以便我可以无需麻烦地在单个位置分析我的应用程序日志。还建议我在HDP 2.4中实现的最佳方式(栈版本信息=> HDFS 2.7.1.2.4 / YARN 2.7.1.2.4 / MapReduce2 2.7.1.2.4 / Log Search 0.5.0 / Flume 1.5.2.2.4 )。

解决方案

为了做到这一点,您需要在指向配置的 yarn.log.dir 的所有NodeManagers上运行Flume代理,并以某种方式解析应用程序/容器/尝试/文件信息从本地操作系统文件路径。

我不确定收集到一个单个文件的效果如何,因为每个容器至少生成5个不同信息的文件,但YARN日志聚合已经完成这个。这只是在HDFS中的可读文件格式,除非您使用Splunk / Hunk,据我所知



另一种解决方案包括将这些文件编入索引到Solr或Elasticsearch等实际搜索服务中,我将推荐它们用于在HDFS上存储和搜索日志


We are using HDP 2.4 and have many map reduce jobs written in various ways ( java MR / Hive / etc. ) . The logs are collect in hadoop file system under the application ID. I want to collect all the logs of application and append in single file (hdfs or OS files of one machine) so that I can analyze my application log in a single location with out hassle . Also advise me the best way to achieve in HDP 2.4 ( Stack version info => HDFS 2.7.1.2.4 / YARN 2.7.1.2.4 / MapReduce2 2.7.1.2.4 / Log Search 0.5.0 / Flume 1.5.2.2.4 ) .

解决方案

Flume cannot collect the logs after they are already on HDFS.

In order to do this, you need a Flume agent running on all NodeManagers pointed at the configured yarn.log.dir, and somehow parse out the application/container/attempt/file information from the local OS file path.

I'm not sure how well collecting into a "single file" would work, as each container generates at least 5 files of different information, but YARN log aggregation already does this. It's just not in a readable file format in HDFS unless you are using Splunk/Hunk, as far as I know

Alternative solutions include indexing these files into actual search services like Solr or Elasticsearch, which I would recommend for storing and searching logs over HDFS

这篇关于HDP 2.4,如何在一个文件中使用flume收集hadoop mapreduce日志,以及最佳做法是什么的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆