Hadoop HPROF分析没有编写CPU样本 [英] Hadoop HPROF profiling no CPU SAMPLES written

查看:121
本文介绍了Hadoop HPROF分析没有编写CPU样本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用HPROF来分析我的Hadoop作业。问题是我得到 TRACES ,但 profile.out中没有 CPU SAMPLES 文件。我在run方法中使用的代码是:

  / **获取配置* / 
配置conf = getConf();
conf.set(textinputformat.record.delimiter,\\\
\\\
);
conf.setStrings(args,args);

/ ** JVM PROFILING * /
conf.setBoolean(mapreduce.task.profile,true);
conf.set(mapreduce.task.profile.params,-agentlib:hprof = cpu = samples,+
heap = sites,depth = 6,force = n,thread = y ,冗长= N,文件=%S);
conf.set(mapreduce.task.profile.maps,0-2);
conf.set(mapreduce.task.profile.reduces,);

/ **作业配置* /
作业作业=新作业(conf,HadoopSearch);
job.setJarByClass(Search.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(NullWritable.class);

/ **设置Mapper和Reducer,使用identity reducer * /
job.setMapperClass(Map.class);
job.setReducerClass(Reducer.class);

/ **设置输入和输出格式* /
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
$ b $ / **设置输入和输出路径* /
FileInputFormat.addInputPath(job,new Path(/ user / niko / 16M));
FileOutputFormat.setOutputPath(job,new Path(cmd.getOptionValue(output)));

job.waitForCompletion(true);

返回0;

如何获得 CPU SAMPLES 到写在输出中?



我也在 stderr 上有错误消息,但我认为它不是因为当分析被设置为false或启用分析的代码被注释掉时它也存在。错误是:

  log4j:WARN记录器没有找到appender(org.apache.hadoop.metrics2.impl.MetricsSystemImpl) 。 
log4j:WARN请正确初始化log4j系统。
log4j:WARN请参阅http://logging.apache.org/log4j/1.2/faq.html#noconfig了解更多信息。


解决方案

纱线(或MRv1)你的工作完成。
无法将CPU样本写入分析文件。事实上,您的痕迹也应该被截断。



您必须添加folowwing选项(或Hadoop版本中的等效项):

  yarn.nodemanager.sleep-delay-before-sigkill.ms = 30000 
发送一个SIGTERM和SIGKILL到一个容器

yarn.nodemanager.process-kill-wait.ms = 30000
#尝试清理容器时等待进程出现的最长时间

mapreduce.tasktracker.tasks.sleeptimebeforesigkill = 30000
#Same en MRv1?

(30秒似乎就够了)

I want to use HPROF to profile my Hadoop job. The problem is that I get TRACES but there is no CPU SAMPLES in the profile.out file. The code that I am using inside my run method is:

    /** Get configuration */
    Configuration conf = getConf();
    conf.set("textinputformat.record.delimiter","\n\n");
    conf.setStrings("args", args);

    /** JVM PROFILING */
    conf.setBoolean("mapreduce.task.profile", true);
    conf.set("mapreduce.task.profile.params", "-agentlib:hprof=cpu=samples," +
       "heap=sites,depth=6,force=n,thread=y,verbose=n,file=%s");
    conf.set("mapreduce.task.profile.maps", "0-2");
    conf.set("mapreduce.task.profile.reduces", "");

    /** Job configuration */
    Job job = new Job(conf, "HadoopSearch");
    job.setJarByClass(Search.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(NullWritable.class);

    /** Set Mapper and Reducer, use identity reducer*/
    job.setMapperClass(Map.class);
    job.setReducerClass(Reducer.class);

    /** Set input and output formats */
    job.setInputFormatClass(TextInputFormat.class);
    job.setOutputFormatClass(TextOutputFormat.class);

    /** Set input and output path */
    FileInputFormat.addInputPath(job, new Path("/user/niko/16M"));  
    FileOutputFormat.setOutputPath(job, new Path(cmd.getOptionValue("output")));

    job.waitForCompletion(true);

    return 0;

How do I get the CPU SAMPLES to be written in the output?

I also have s trange error message on the stderr but I think it is not related, since it is present also when the profiling is set to false or the code for enabling profiling is commented out. The error is

 log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.impl.MetricsSystemImpl).
 log4j:WARN Please initialize the log4j system properly.
 log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

解决方案

Yarn (or MRv1) is killing the container just after your job finish. The CPU Samples can't be wrote on your profiling file. In fact, your traces should be truncated also.

You have to add the folowwing option (or the equivalent on your Hadoop version) :

yarn.nodemanager.sleep-delay-before-sigkill.ms = 30000
# No. of ms to wait between sending a SIGTERM and SIGKILL to a container

yarn.nodemanager.process-kill-wait.ms = 30000
# Max time to wait for a process to come up when trying to cleanup a container

mapreduce.tasktracker.tasks.sleeptimebeforesigkill = 30000
# Same en MRv1 ?

(30 sec seems to enough)

这篇关于Hadoop HPROF分析没有编写CPU样本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆