hadoop默认只启动本地作业,为什么呢? [英] hadoop only launch local job by default why?

查看:128
本文介绍了hadoop默认只启动本地作业,为什么呢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经编写了自己的hadoop程序,并且可以在自己的笔记本电脑中使用伪分发模式运行,但是,当我将该程序放在可以运行hadoop示例jar的群集中时,默认情况下,它会启动本地作业指示hdfs文件路径,下面是输出,给出建议?

I have written my own hadoop program and I can run using pseudo distribute mode in my own laptop, however, when I put the program in the cluster which can run example jar of hadoop, it by default launches the local job though I indicate the hdfs file path, below is the output, give suggestions?

./hadoop -jar MyRandomForest_oob_distance.jar  hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
12/03/16 16:21:25 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
12/03/16 16:21:25 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
12/03/16 16:21:25 INFO mapred.JobClient: Running job: job_local_0001
12/03/16 16:21:25 INFO mapred.MapTask: io.sort.mb = 100
12/03/16 16:21:25 INFO mapred.MapTask: data buffer = 79691776/99614720
12/03/16 16:21:25 INFO mapred.MapTask: record buffer = 262144/327680
12/03/16 16:21:25 WARN mapred.LocalJobRunner: job_local_0001
java.io.FileNotFoundException: File /user/randomforest/input/genotype1.txt does not exist.
    at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:361)
    at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:245)
    at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:125)
    at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:283)
    at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:356)
    at Data.Data.loadData(Data.java:103)
    at MapReduce.DearMapper.loadData(DearMapper.java:261)
    at MapReduce.DearMapper.setup(DearMapper.java:332)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:621)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
    at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:177)
12/03/16 16:21:26 INFO mapred.JobClient:  map 0% reduce 0%
12/03/16 16:21:26 INFO mapred.JobClient: Job complete: job_local_0001
12/03/16 16:21:26 INFO mapred.JobClient: Counters: 0
Total Running time is: 1 secs

推荐答案

已选择LocalJobRunner作为您的配置,很有可能将mapred.job.tracker属性设置为local或根本没有设置(在这种情况下,默认设置是是本地的).要进行检查,请转到无论您在何处提取/安装hadoop"/etc/hadoop/,并查看是否存在mapred-site.xml文件(对我而言不存在,那里有一个名为mapping-site.xml.template的文件).在该文件中(如果不存在,请创建该文件),确保其具有以下属性:

LocalJobRunner has been chosen as your configuration most probably has the mapred.job.tracker property set to local or has not been set at all (in which case the default is local). To check, go to "wherever you extracted/installed hadoop"/etc/hadoop/ and see if the file mapred-site.xml exists (for me it did not, a file called mapped-site.xml.template was there). In that file (or create it if it doesn't exist) make sure it has the following property:

<configuration>
<property>  
 <name>mapreduce.framework.name</name>  
 <value>yarn</value>  
 </property>
</configuration>

  • 请参阅org.apache.hadoop.mapred.JobClient.init(JobConf)
  • 的来源

    • See the source for org.apache.hadoop.mapred.JobClient.init(JobConf)
    • 您要从中提交此信息的计算机上的hadoop配置中,此配置属性的值是什么?还要确认您正在运行的hadoop可执行文件引用了此配置(并且您没有2个以上的安装配置有所不同)-键入which hadoop并跟踪遇到的所有符号链接.

      What is the value of this configuration property in the hadoop configuration on the machine you are submitting this from? Also confirm that the hadoop executable you are running references this configuration (and that you don't have 2+ installations configured differently) - type which hadoop and trace any symlinks you come across.

      或者,如果您知道使用-jt选项的JobTracker主机和端口号,则可以在提交作业时覆盖它:

      Alternatively you can override this when you submit your job, if you know the JobTracker host and port number using the -jt option:

      hadoop jar MyRandomForest_oob_distance.jar -jt hostname:port hdfs://montana-01:8020/user/randomforest/input/genotype1.txt hdfs://montana-01:8020/user/randomforest/input/phenotype1.txt hdfs://montana-01:8020/user/randomforest/output1_distance/ hdfs://montana-01:8020/user/randomforest/input/genotype101.txt hdfs://montana-01:8020/user/randomforest/input/phenotype101.txt 33 500 1
      

      这篇关于hadoop默认只启动本地作业,为什么呢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆