Hadoop - 了解处理步骤所需的帮助 [英] Hadoop - Help required to understand the processing steps

查看:29
本文介绍了Hadoop - 了解处理步骤所需的帮助的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有压缩文件,它包含 8 个大小为 5-10kb 的 xml 文件.我将这些数据用于测试目的.我编写了一个仅地图程序来解压缩压缩文件.我在 MR2 中编写程序并在伪分布式模式下使用 Hadoop 2.7.1.我使用 sbin/start-dfs.sh 命令启动集群.我能够在几秒钟内看到文件系统中的解压缩输出,但处理将持续 5-6 分钟.不知道为什么?

I have compressed file and it contains 8 xml files of size 5-10kb. I took this data for testing purpose. I wrote one map only program to uncompress the compressed file. I wrote program in MR2 and using Hadoop 2.7.1 in psuedo distributed mode. I start the cluster using sbin/start-dfs.sh command. I am able to see the decompressed output in the file system within few seconds but the processing continues for next 5-6 minutes. I don't know why?

MR 程序将文件解压到这个阶段,我可以查看/下载这些文件.

MR program uncompressed the files till this stage and I can view / download those files.

无法理解我的 mapreduce 程序在这里做什么处理.我在我的代码中使用 MR2 API,为什么它在这里使用 MR1 API(mapred)? 当我有 128mb 的压缩文件并且在 5-10 分钟内解压缩时,情况会变得更糟,其余时间它正忙于做一些其他的任务.

Not able to understand what processing my mapreduce program is doing here. I am using MR2 API in my code and why it is using MR1 API(mapred) here? Situation become worse when I have 128mb of zipped files and it uncompressed in 5-10 mins and rest of the time it is busy in doing some other tasks.

我得到的性能无法接受,需要了解处理 hadoop 在第二个屏幕截图中的作用.

The performance I am getting in unacceptable and need to understand what processing hadoop does in 2nd screen shot.

请帮助我了解是安装问题、我的程序问题还是其他问题?

Please help me to understand whether it is installation issue, my program issue or any other issue?

推荐答案

这是一个配置问题,我将通过更改 mapred-site.xml 文件来解决此问题.

This is an config issue and I am resolve this with change in mapred-site.xml file.

<configuration>
<property>  
 <name>mapreduce.framework.name</name>  
 <value>yarn</value>  
 </property>
</configuration>

这篇关于Hadoop - 了解处理步骤所需的帮助的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆