hadoop流:应用程序日志在哪里? [英] hadoop streaming: where are application logs?

查看:92
本文介绍了hadoop流:应用程序日志在哪里?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题类似于: hadoop流式传输:如何查看应用程序日志? (答案中的链接当前不起作用.因此,我不得不再次发布它并附加一个问题)

My question is similar to : hadoop streaming: how to see application logs? (The link in the answer is not currently working. So I have to post it again with an additional question)

我可以在/usr/local/hadoop/logs路径上看到所有hadoop日志

I can see all hadoop logs on my /usr/local/hadoop/logs path

但是我在哪里可以看到应用程序级别的日志?例如:

but where can I see application level logs? for example :

reducer.py-

reducer.py -

import logging
....
logging.basicConfig(level=logging.ERROR, format='MAP %(asctime)s%(levelname)s%(message)s')
logging.error('Test!')  
...

我无法在stderr中看到任何日志(警告,错误).

I am not able to see any of the logs (WARNING,ERROR) in stderr.

在哪里可以找到应用程序的日志语句?我正在使用Python并使用hadoop流.

Where I can find my log statements of the application? I am using Python and using hadoop-streaming.

其他问题:

如果我想使用文件来存储/汇总我的应用程序日志,例如:

If I want to use a file to store/aggregate my application logs like :

reducer.py-

reducer.py -

....
logger = logging.getLogger('test')
hdlr = logging.FileHandler(os.environ['HOME']+'/test.log')
formatter = logging.Formatter('MAP %(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.ERROR)
logger.error('please work!!')
.....

(假设我在hadoop集群的master和所有slaves的$ HOME位置中有test.log).我可以在像Hadoop这样的分布式环境中实现这一目标吗?如果是这样,如何实现呢?

(Assuming that I have test.log in $HOME location of master & all slaves in my hadoop cluster). Can I achieve this in a distributed environment like Hadoop? If so, how can achieve this?

我尝试了这一点,并运行了一个示例流工作,但只看到以下错误:

I tried this and ran a sample streaming job, but to only see the below error :

Error: java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:330)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:543)
    at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:134)
    at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:237)
    at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:484)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:397)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:175)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:170)

Container killed by the ApplicationMaster.
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143

请帮助我了解在hadoop流作业中如何实现日志记录.

Please help me understand how logging can be achieved in hadoop streaming jobs.

谢谢

推荐答案

尝试以下HDFS路径: /yarn/apps/& {user_name}/logs/application _ $ {appid}/

Try this HDFS path: /yarn/apps/&{user_name}/logs/application_${appid}/

通常:

存储容器日志的位置.在$ {yarn.nodemanager.log-dirs}/application _ $ {appid}中可以找到应用程序的本地化日志目录.各个容器的日志目录将位于该目录下面,位于名为container _ {$ contid}的目录中.每个容器目录将包含该容器生成的文件stderr,stdin和syslog.

Where to store container logs. An application's localized log directory will be found in ${yarn.nodemanager.log-dirs}/application_${appid}. Individual containers' log directories will be below this, in directories named container_{$contid}. Each container directory will contain the files stderr, stdin, and syslog generated by that container.

如果您打印到stderr,您会在上面提到的该目录下的文件中找到它.每个节点应有一个文件.

If you print to stderr you'll find it in files under this dir I mentioned above. There should be one file per one node.

这篇关于hadoop流:应用程序日志在哪里?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆