将地图缩小作业的输出记录到文本文件中 [英] logging the output of a map reduce job to a text file

查看:108
本文介绍了将地图缩小作业的输出记录到文本文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直使用jobclient.monitorandprintjob()方法将地图缩减作业的输出打印到控制台。我的用法是这样的:
$ b $ pre $ job_client.monitorAndPrintJob(job_conf,job_client.getJob(j.getAssignedJobID()))

输出如下(打印在控制台上):

  13/03/04 07:20:00信息mapred.JobClient:正在运行的作业:job_201302211725_10139< br> 
13/03/04 07:20:01信息mapred.JobClient:地图0%减少0%< br>
13/03/04 07:20:08信息mapred.JobClient:地图100%减少0%< br>
13/03/04 07:20:13信息mapred.JobClient:地图100%减少100%< br>
13/03/04 07:20:13信息mapred.JobClient:工作完成:job_201302211725_10139< br>
13/03/04 07:20:13信息mapred.JobClient:计数器:26< br>
13/03/04 07:20:13信息mapred.JobClient:作业计数器< br>
13/03/04 07:20:13信息mapred.JobClient:启动reduce tasks = 1< br>
13/03/04 07:20:13信息mapred.JobClient:mappers(ms)= 5539的总执行时间< br>
13/03/04 07:20:13信息mapred.JobClient:预留槽(ms)= 0后所有花费的时间减少等待时间(ms)= 0< br>
13/03/04 07:20:13信息mapred.JobClient:预留槽(ms)= 0后等待所有地图花费的时间总和(< br>
13/03/04 07:20:13信息mapred.JobClient:启动地图任务= 2< br>
13/03/04 07:20:13信息mapred.JobClient:数据本地地图任务= 2< br>
13/03/04 07:20:13信息mapred.JobClient:reducers(ms)的总执行时间= 4337< br>
13/03/04 07:20:13信息mapred.JobClient:FileSystemCounters< br>
13/03/04 07:20:13信息mapred.JobClient:MAPRFS_BYTES_READ = 583< br>
13/03/04 07:20:13信息mapred.JobClient:MAPRFS_BYTES_WRITTEN = 394< br>
13/03/04 07:20:13信息mapred.JobClient:FILE_BYTES_WRITTEN = 140219< br>
13/03/04 07:20:13信息mapred.JobClient:Map-Reduce Framework< br>
13/03/04 07:20:13信息mapred.JobClient:地图输入记录= 6< br>
13/03/04 07:20:13信息mapred.JobClient:减少随机字节= 136< br>
13/03/04 07:20:13信息mapred.JobClient:Spilled Records = 22< br>
13/03/04 07:20:13信息mapred.JobClient:映射输出字节= 116< br>
13/03/04 07:20:13信息mapred.JobClient:CPU_MILLISECONDS = 1320< br>
13/03/04 07:20:13信息mapred.JobClient:映射输入字节= 64< br>
13/03/04 07:20:13信息mapred.JobClient:合并输入记录= 13< br>
13/03/04 07:20:13信息mapred.JobClient:SPLIT_RAW_BYTES = 180< br>
13/03/04 07:20:13信息mapred.JobClient:减少输入记录= 11< br>
13/03/04 07:20:13信息mapred.JobClient:减少输入组= 11< br>
13/03/04 07:20:13信息mapred.JobClient:合并输出记录= 11< br>
13/03/04 07:20:13信息mapred.JobClient:PHYSICAL_MEMORY_BYTES = 734961664< br>
13/03/04 07:20:13信息mapred.JobClient:减少输出记录= 11< br>
13/03/04 07:20:13信息mapred.JobClient:VIRTUAL_MEMORY_BYTES = 9751805952< br>
13/03/04 07:20:13信息mapred.JobClient:地图输出记录= 13< br>
13/03/04 07:20:13信息mapred.JobClient:经过的时间(ms)= 0< br>

我想将上面的输出/日志打印在文本文件中,而不是控制台中。任何建议?

解决方案

在您的HADOOP_HOME / conf中,您可以找到一个名为 log4j.properties 。我相信你可以配置在哪里以及如何登录。



准确地说,您应该使用滚动文件appender ,因此您应该取消注释(只需移除 log4j.properties file:

 #滚动文件附加程序


#log4j.appender.RFA = org.apache.log4j.RollingFileAppender
#log4j.appender.RFA.File = $ {hadoop.log.dir} / $ {hadoop.log.file}

#日志文件大小和30天备份
#log4j.appender.RFA.MaxFileSize = 1MB
#log4j.appender.RFA.MaxBackupIndex = 30

#log4j.appender.RFA.layout = org.apache.log4j.PatternLayout
#log4j.appender.RFA.layout.ConversionPattern = %d {ISO8601}%-5p%c {2} - %m%n
#log4j.appender.RFA.layout.ConversionPattern =%d {ISO8601}%-5p%c {2}(%F: %M(%L)) - %m%n

并根据自己的喜好自定义其他参数。

有关log4j配置的更多信息,请阅读 here

I've been using this jobclient.monitorandprintjob() method to print the output of a map reduce job to the console. My usage is something like this:

job_client.monitorAndPrintJob(job_conf, job_client.getJob(j.getAssignedJobID()))

The output of which is as follows (printed on the console):

13/03/04 07:20:00 INFO mapred.JobClient: Running job: job_201302211725_10139<br>
13/03/04 07:20:01 INFO mapred.JobClient:  map 0% reduce 0%<br>
13/03/04 07:20:08 INFO mapred.JobClient:  map 100% reduce 0%<br>
13/03/04 07:20:13 INFO mapred.JobClient:  map 100% reduce 100%<br>
13/03/04 07:20:13 INFO mapred.JobClient: Job complete: job_201302211725_10139<br>
13/03/04 07:20:13 INFO mapred.JobClient: Counters: 26<br>
13/03/04 07:20:13 INFO mapred.JobClient:   Job Counters<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Launched reduce tasks=1<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Aggregate execution time of mappers(ms)=5539<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Launched map tasks=2<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Data-local map tasks=2<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Aggregate execution time of reducers(ms)=4337<br>
13/03/04 07:20:13 INFO mapred.JobClient:   FileSystemCounters<br>
13/03/04 07:20:13 INFO mapred.JobClient:     MAPRFS_BYTES_READ=583<br>
13/03/04 07:20:13 INFO mapred.JobClient:     MAPRFS_BYTES_WRITTEN=394<br>
13/03/04 07:20:13 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=140219<br>
13/03/04 07:20:13 INFO mapred.JobClient:   Map-Reduce Framework<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map input records=6<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce shuffle bytes=136<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Spilled Records=22<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map output bytes=116<br>
13/03/04 07:20:13 INFO mapred.JobClient:     CPU_MILLISECONDS=1320<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map input bytes=64<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Combine input records=13<br>
13/03/04 07:20:13 INFO mapred.JobClient:     SPLIT_RAW_BYTES=180<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce input records=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce input groups=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Combine output records=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     PHYSICAL_MEMORY_BYTES=734961664<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Reduce output records=11<br>
13/03/04 07:20:13 INFO mapred.JobClient:     VIRTUAL_MEMORY_BYTES=9751805952<br>
13/03/04 07:20:13 INFO mapred.JobClient:     Map output records=13<br>
13/03/04 07:20:13 INFO mapred.JobClient:     GC time elapsed (ms)=0<br>

I would like the above output/log to be printed in a text file, rather than the console. any suggestions?

解决方案

In your HADOOP_HOME/conf you may find one file named : log4j.properties. I believe you can configure where and how to log in there.

To be precise, you shall be using a rolling file appender, so you shall un-comment(just remove #) the following lines from log4j.properties file:

# Rolling File Appender
#

#log4j.appender.RFA=org.apache.log4j.RollingFileAppender
#log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}

# Logfile size and and 30-day backups
#log4j.appender.RFA.MaxFileSize=1MB
#log4j.appender.RFA.MaxBackupIndex=30

#log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} - %m%n
#log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n

And customize the other parameters to your liking.

For more about log4j configurations, read here.

这篇关于将地图缩小作业的输出记录到文本文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆