通过Rest API运行MapReduce作业 [英] Run a MapReduce job via rest api

查看:115
本文介绍了通过Rest API运行MapReduce作业的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用hadoop2.7.1的rest api在集群外运行mapreduce作业.此示例" http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api "帮助过我.但是当我提交帖子回复时,会发生一些奇怪的事情:

I use hadoop2.7.1's rest apis to run a mapreduce job outside the cluster. This example "http://hadoop-forum.org/forum/general-hadoop-discussion/miscellaneous/2136-how-can-i-run-mapreduce-job-by-rest-api" really helped me. But when I submit a post response, some strange things happen:

  1. 我看着" http://master:8088/cluster/apps "一个帖子响应产生两个工作,如下图: 奇怪的事情:一个响应产生两个工作

  1. I look at "http://master:8088/cluster/apps" and a post response produce two jobs as following picture: strange things: a response produces two jobs

长时间等待后,由于FileAlreadyExistsException,我在http响应正文中定义的作业失败.原因是另一个作业创建了输出目录,因此输出目录hdfs://master:9000/output/output16已经存在.

After wait a long time, the job which I defined in the http response body fail because of FileAlreadyExistsException. The reason is another job creates the output directory, so Output directory hdfs://master:9000/output/output16 already exists.

这是我的回复正文:

{
    "application-id": "application_1445825741228_0011",
    "application-name": "wordcount-demo",
    "am-container-spec": {
        "commands": {
            "command": "{{HADOOP_HOME}}/bin/hadoop jar /home/hadoop/hadoop-2.7.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /data/ /output/output16"
        },
        "environment": {
            "entry": [{
                "key": "CLASSPATH",
                "value": "{{CLASSPATH}}<CPS>./*<CPS>{{HADOOP_CONF_DIR}}<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/*<CPS>{{HADOOP_COMMON_HOME}}/share/hadoop/common/lib/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/*<CPS>{{HADOOP_HDFS_HOME}}/share/hadoop/hdfs/lib/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/*<CPS>{{HADOOP_YARN_HOME}}/share/hadoop/yarn/lib/*<CPS>./log4j.properties"
            }]
        }
    },
    "unmanaged-AM": false,
    "max-app-attempts": 2,
    "resource": {
        "memory": 1024,
        "vCores": 1
    },
    "application-type": "MAPREDUCE",
    "keep-containers-across-application-attempts": false
}

这是我的命令:

curl -i -X POST -H 'Accept: application/json' -H 'Content-Type: application/json' http://master:8088/ws/v1/cluster/apps?user.name=hadoop -d @post-json.txt

有人可以帮助我吗?非常感谢.

Can anybody help me? Thanks a lot.

推荐答案

运行map reduce时,请注意您没有输出文件夹,因为如果存在该作业将无法运行.您可以编写程序,以便删除存在的文件夹,或者在调用其余api之前手动将其删除.这只是为了防止数据丢失并避免覆盖其他作业的输出.

When you run the map reduce, see that you do not have output folder as the job will not run if it is present. You can write program so that you can delete the folder is it exists, or manually delete it before calling the rest api. This is just to prevent the data loss and avoid overwriting the output of other job.

这篇关于通过Rest API运行MapReduce作业的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆